Outline
Tables(1  2  3  4  5  6  7  8  9  10)
Figures(1  2  3  4)
Celebrity and ordinary users: A comparative study of microblog user behaviors on Sina Weibo
Xingjun LIU1,2,3Corresponding authorE-mail to the corresponding author, Weijun WANG2,3E-mail to the corresponding author & Jinke WU1E-mail to the corresponding author
1School of Information Management, Hubei University of Economics, Wuhan 430205, China
2School of Information Management, Central China Normal University, Wuhan 430079, China
3 Key Laboratory of Adolescent Cyberpsychology and Behavior of CCNU affiliated with the Ministry of Education, Wuhan 430079, China
2015, 8(2):83-95, Received: Jun. 8, 2015 Revised: Jun. 22, 2015 Accepted: Jun. 24, 2015
X.J. LIU (lxj@hbue.edu.cn, corresponding author) designed the study, collected data and drafted the manuscript. W.J. Wang (wangwj@mail.ccnu.edu.cn) revised the manuscript and helped with the data analysis and the discussion of the findings. J.K. Wu (whwjk@163.com) was in charge of the English translation and editing of the manuscript.

Abstract
Purpose: This study aims to investigate and compare celebrity and ordinary users' behaviors on Sina Weibo.

Design/methodology/approach: Data was collected from 12,555 ordinary users and 2,467 celebrity users on Sina Weibo. Correlation and regression analysis was performed on users' number of followings, number of followers and number of posts.

Findings: The results revealed significant difference between famous and ordinary users' behaviors on Sina Weibo. We found correlation among ordinary users' number of followings, number of followers and number of posts, but for celebrity users, only their number of followings and number of posts are related with each other. For both ordinary and celebrity users, their number of followings significantly affects how many posts they publish.

Research limitations: We only carried out our investigation on Sina Weibo and the findings need to be further verified on other microblogging platforms.

Practical implications: This research is useful for microblogging service providers to understand different types of users and promote the continuous use of their services.

Originality/value: This research delivers valuable insights into understanding of the characteristics of different types of microbloggers and the ways to increase user viscosity.
Keywords

1 Introduction
With the rapid development of information technology, the Internet has gradually entered from the Web1.0 era, the first stage of its evolution into the Web2.0 era which emphasizes user-generated content and social communication. As a typical Web2.0 application, the microblog has had a tremendous impact on traditional media such as newspapers and magazines. On microblogging platforms such as Twitter, users interact with one another by following others or being followed by others, forwarding messages and commenting on posts. In this interaction process, microblog messages are created, transmitted, shared and used. Many studies[1,2] investigated the dynamic microblog information dissemination process. These multidisciplinary studies, using knowledge in library and information science, media communication, management and economics, focused on technical aspects of using microblogs and possible applications. By contrast, studies concerned with microblog users’ behaviors were scarce.
Microbloggers start to establish connections among themselves by “following” or “being followed” by other users[1]. Microblog users and their following relationships could form a social network, where messages are spread via the nodes (microbloggers) in the network[2]. At the same time, interactions such as forwarding messages and commenting on posts make it easier for more users to stay connected[3].
There are some influential users in the microblog network. Their opinions can exercise influence over other people’s adoption of products or services, and even affect how others view political events[4]. Consequently, studies of opinion leaders are important for understanding information flow and dissemination in microblog networks[5].
Researchers have proposed various techniques and algorithms to identify influential microblog users[6,7,8,9,10]. For instance, Tan et al.[9] clustered microblog users by generating user tags based on the frequency of occurrence of keywords in microblog posts and similarity calculations of user tags. Ma et al.[10] proposed a new method of user influence analysis based on individual attributes such as the number of followings and the number of followers and user behaviors such as publishing, forwarding and commenting on posts.
The above-mentioned studies focused on social network analysis and identification of influential users, and explored the topics of microblog posts. However, there are far more ordinary users than influential users and few studies have examined their microblogging behaviors. Thus, this study investigated and compared the microblogging behaviors of both celebrity and ordinary users to enhance our understanding of the behaviors of influential users and find the ways to encourage more users to continue using microblogs.
2 Data collection and pre-processing
2.1 Data collection
Sina Weibo, China’s first Twitter-like microblogging service provider, has had more than 500 million registered users by the end of 2012[11]. Data was crawled and collected through Sina’s application programming interface (API) from May 20 to May 21, 2013, including 2,467 celebrity users and 12,555 ordinary users. For each user, we recorded the name, the nickname, the location, gender, user identity, status of active user, the count of followings, the count of followers, the number of posts, etc.
We began our data crawling from a carefully selected user by using the breadthfirst search method. For celebrity user data (dataset 1), we tried to find the users who had more than 100,000 followers. For ordinary user data (dataset 2), we selected users who had over 10 followers and followings.
Moreover, we also collected relationship data. For celebrity users (dataset 3), we first searched for top 100 celebrity users and then extracted the users that they followed. For active users (dataset 4), we gathered top 100 active users, and then searched for the users that they followed.
2.2 Data pre-processing
For dataset 1 and dataset 2, we used the average of the number of followings, the number of followers, and the number of posts as a threshold value, and compared each user’s number of followings, number of followers, and number of posts, respectively. If the variable value was bigger than the threshold, it was set to be 0; otherwise it was set to be 1. For dataset 3 and dataset 4, we constructed a 100 × 100 matrix, where 1 represented the relationship of following and otherwise 0 was marked.
3 Results
3.1 User characteristics
3.1.1 Gender distribution
Gender distribution of the celebrity and common users is illustrated in Fig. 1(a) and Fig. 1(b), respectively. Among the celebrity users, 74% of them were male, but for ordinary users, males took up a relatively smaller proportion of the total users (61%). Overall, there were far more male users than female users in the sample.
Fig. 1    Sex ratio of (a) celebrity users and (b) ordinary users.
Website design of the course Combustion Theory

3.1.2 Regional distribution
Table 1 shows the area where the celebrity and common users were located. We collected a total of 2,467 celebrity users and 12,555 ordinary users, but 84 celebrity users and 664 ordinary users did not indicate their locations. Beijing (57.84%), Shanghai (7.86%), Guangdong Province (7.5%) and overseas (3.89%) were the places where celebrity user adoption of Sina Weibo was high. The geographical distribution of ordinary users was relatively dispersed and most of the users were from Guangdong Province (18.52%), Beijing (8.16%), Shanghai (6.5%) and Zhejiang Province (6.26%).
Table 1    Users’ geographical distribution

3.2 Microblog users’ behaviors
3.2.1 Following others
Figure 2 displays the distribution of users’ followings. The x-axis indicates each user’s number of people whom they follow, and the y-axis shows the frequency of a certain number. The shape of Fig. 2(a), the distribution of celebrity users’ followings, is roughly a power-law distribution. That means many elite Sina Weibo users follow a small number of users and few of them follow more than 2,000 people’s feeds. Moreover, the distribution of ordinary users’ followings also obeys a power-law distribution (Fig. 2(b)). It is observed that only a small portion of ordinary users follow updates to over 2,000 people’s microblogs.
Fig. 2    The distribution of (a) celebrity users’ followings and (b) ordinary users’ followings.
Website design of the course Combustion Theory

Ov erall, users’ followings fit a power-law distribution. However, according to the data collected in this study, we found a more typical power-law distribution of ordinary users’ followings.
3.2.2 Followed by others
The distribution of celebrity users’ followers is illustrated in Fig. 3(a). It exhibits a power-law property. By comparison, the distribution of ordinary users’ followers shows a more typical power-law property (Fig. 3(b)). This means that the majority of ordinary users had less than 2,000 followers. This result is consistent with most of previous studies[12].
Fig. 3    The distribution of (a) celebrity users’ followers and (b) ordinary users’ followers.
Website design of the course Combustion Theory

3.2.3 Posting microblogs
In this study, the number of posts refers to the total number of posts a user has published since he or she registered in Sina Weibo. Figure 4(a) shows that the distribution of celebrity users’ posts follows a power-law distribution. It is observed that for most posts, their frequency of occurrence is 1 and for a few posts, their frequency of occurrence is 2 or 3. Figure 4(b) illustrates the distribution of posts of ordinary users. Compared with celebrity users, the distribution of ordinary users’ posts shows a more typical power-law distribution.
Fig. 4    The distribution of posts published by (a) celebrity users and (b) ordinary users.
Website design of the course Combustion Theory

3.3 Variables’ correlation analysis
3.3.1 Correlation analysis
Spearman correlation analysis is a method of determining the statistical relationship between variables based on hypothesis testing. The value of a correlation coefficient ranges between –1 and 1. In general, the greater the absolute value of a correlation coefficient, the stronger the linear relationship. According to Hair et al.[13], an absolute value between 0.8 and 1.0 is considered perfect correlation and that between 0.6 and 0.8 strong correlation; a value that falls within the range of 0.4 to 0.6 indicates moderate correlation and that within the range of 0.2 to 0.4 indicates weak correlation.
A closer look at the distribution of the number of followings, the number of followers and the number of posts in both celebrity users and ordinary users’ dataset shows that they are not normally distributed (we have confirmed that they all follow a power-law distribution), so Spearman correlation analysis has been conducted. Table 2 displays the results.
Table 2    Spearman correlation analysis of celebrity and ordinary users

Note: ** The result is statistically significant at the 0.01 confidence level.
(i) Celebrity users. We drew the following conclusions for celebrity microbloggers.
• The correlation coefficient between the number of followings and the number of followers is –0.201 (significant at the 0.01 level), which means that there is a weak correlation between celebrity users’ number of followings and number of followers.
• The correlation coefficient of the number of followings and the number of posts is 0.46 (significant at the 0.01 level), which indicates a moderate correlation. This implies that the more people a celebrity user pays attention to, the more likely that he or she will publish more microblog posts.
• We found that celebrity users’ number of followers is not related to the number of their posts as the correlation coefficient of the two variables is 0.071 (significant at the 0.01 level).
(ii) Ordinary users. The correlation analysis results of ordinary users were summarized as follows.
• The correlation coefficient between the number of followings and the number of followers is 0.782 (significant at the 0.01 level), which means that there is a strong correlation between ordinary users’ number of followings and number of followers.
• We found a moderate correlation between ordinary users’ number of followings and number of posts as the correlation coefficient of the two variables is 0.524 (significant at the 0.01 level). This implies that the more people ordinary users select to follow, the more posts they generate.
• There is a strong correlation between ordinary users’ number of followers and number of posts as the correlation coefficient of the two variables is 0.618 (significant at the 0.01 level). This supports the likelihood that the number of ordinary users’ posts will increase with the growth of their followers.
Compared with ordinary microblog users, no significant correlation has been found among celebrity users’ number of followings, number of followers and number of posts. There are two possible reasons. First, celebrity users usually have a lot of fans, and how many posts they would like to publish is not affected by how many fans they have. Second, celebrity users tend to write and post articles related to what they have seen, heard, and thought in their daily life and even famous quotations and proverbs, attracting their fans to pay attention to their blogs all the time. For ordinary microblog users, however, there is a significant correlation found among their number of followings, number of followers and number of posts. The reason may be due to characteristics of ordinary microblog users. For most ordinary users, the more fans they have, the more likely they are motivated to publish posts. Moreover, ordinary users are often more willing to pay attention to others such as friends, celebrities and organizations.
3.3.2 Partial correlation analysis
(i) Celebrity users. We performed partial correlation analysis of celebrity users’ number of followings and number of posts with the number of their followers as a controlling variable and conducted a two-sided test. According to the results in Table 3, there is a significant correlation found between the two variables (correlation coefficient is 0.311, significant at the 0.000 level). The reason why celebrities publish posts may be that they want to record their thoughts and communicate or share with others. In the real life, many celebrities, especially stars in sport and entertainment often pay little attention to other microbloggers, i.e., they follow few people, but they publish a lot of posts. However, for those celebrity users who pay much attention to other microbloggers, they are willing to post blog articles because they are active microbloggers.
Table 3    Partial correlation analysis of celebrity users

(ii) Ordinary users. A partial correlation analysis and a two-sided test were conducted with ordinary users’ number of followers, number of followings and number of posts as the controlling variable, respectively. The results are displayed in Table 4.
Table 4    Partial correlation analysis of ordinary users

When the number of followers was set as the controlling variable, we found a positive correlation between ordinary users’ number of followings and number of posts (correlation coefficient is 0.338, significant at the 0.000 level). This means that the ordinary users who follow more microbloggers are active users and are ready to publish posts.
When the number of posts was set as the controlling variable, there was no significant correlation found between ordinary users’ number of followings and number of followers. This is consistent with the previous research finding that the number of followings is related to a user’s interest in others, and the number of followers is correlated with others’ interests in this user[9].
No significant correlation was found between ordinary users’ number of followers and number of posts when the number of followings was set as the controlling variable. This is probably because publishing posts is a person’s own behavior, but the number of followers is associated with the behaviors of others. For ordinary users, their followers may be their friends and family members, who do not care whether they publish posts or not and how many posts have been published.
3.3.3 Regression analysis
Regression analysis is a statistical process for estimating the relationships among variables, with the focus on the relationship between a dependent variable and one or more independent variables. In order to examine whether the number of followings and the number of followers affect the number of posts, we set the number of posts as the dependent variable Y, the number of followings and the number of followers as the independent variable X1 and X2, respectively, using SPSS 16.0 statistics software with the method of “entering”.
(i) Celebrity users. Tables 5 and 6 show the regression analysis results of celebrity users’ dataset.
Table 5    Model fitting test

Note: a refers to predictors: (constant), followings and followers.
Table 6    ANOVA analysis

Note: a refers to predictors: (constant), followings and followers.
The final results in Table 7 indicate a positive relationship between celebrity users’ number of followings and number of posts (β = 0.314, significant at the 0.000 level). Most celebrity users, especially star users, seldom follow others. However, some celebrity users who follow more people tend to use microblog services more often and are more likely to publish posts frequently. The results also show that the number of celebrity users’ posts is not related to the number of their followers.
Table 7    Regression analysis result (dependent variable: the number of posts)

(ii) Ordinary users. Tables 8 and 9 display the regression analysis results of ordinary users’ dataset.
Table 8    Model fitting test

Note: a refers to predictors: (constant), followings and followers.
Table 9    ANOVA analysis

Note: a refers to predictors: (constant), followings and followers.
The final results in Table 10 indicate that there is a significant correlation between ordinary users’ number of followings and number of posts (β = 0.338, significant at the 0.000 level). This implies that the more an ordinary user follows other microbloggers, the more frequently he or she uses microblogs, and the more likely he or she publishes posts. Like the regression analysis results of celebrity users, Table 10 also shows that ordinary users’ number of posts is not correlated with their number of followers.
Table 10    Regression analysis result (dependent variable: the number of posts)

4 Conclusions
This study investigated and compared different microbloggers’ information dissemination behaviors on Sina Weibo. For celebrity users, the distribution of their number of followings, number of followers and number of posts roughly follows a power-law, respectively. No significant correlation was found among celebrity users’ number of followings, number of followers and number of posts. Celebrity users usually have a lot of fans, but how many articles they would like to post is not related to the number of their followers. For ordinary users, the distribution of their number of followings, number of followers and number of posts exhibits a more typical power-law property, respectively. There is a correlation found among ordinary users’ number of followings, number of followers and number of posts, but the number of followers and the number of posts are not significantly correlated, and neither are the number of followings and the number of followers. It is found that for both celebrity and ordinary users, the number of posts is correlated with the number of followings.

References
1 Yin, D., Hong, L., & Davison, B.D. Structural link analysis and prediction in microblogs. In Proceedings of 20th ACM International Conference on Information and Knowledge Management (CIKM 2011). New York: ACM, 2011, 10: 24-28. Retrieved on June 5, 2015, from http://dl.acm.org/citation.cfm?id=2063743. DOI:10.1145/2063576.2063743.
2 Lerman, K., & Ghosh, R. Information contagion: An empirical study of the spread of news on Digg and Twitter social networks. Proceedings of the 4th International Conference on Weblogs and Social Media, 2010: 90-97. Retrieved on June 5, 2015, from http://arxiv.org/abs/1003.2664.
3 Peng, L. Strategy of using microblogging to spread ideas. Chinese Journalist (in Chinese),2011, 2: 82-84. Retrieved on June 5, 2015, from http://d.g.wanfangdata.com.cn/Periodical_zhonggjz201102036.aspx. DOI:10.3969/j.issn.1003-1146.2011.02.036.
4 Ellison, N.B., Steinfield, C., & Lampe, C. The benefits of Facebook “friends”: Social capital and college students' use of online social network sites. Journal of Computer-Mediated Communication,2007, 12(4): 1143-1168. Retrieved on June 5, 2015, from http://onlinelibrary.wiley.com/doi/10.1111/j.1083-6101.2007.00367.x/abstract;jsessionid=D8E4F63D4B7E806 FF6DBAD3ECB9F3197.f03t02. DOI:10.1111/j.1083-6101.2007.00367.x.
5 Cho, Y., Hwang, J., & Lee, D. Identification of effective opinion leaders in the diffusion of technological innovation: A social network approach. Technological Forecasting and Social Change, 2012, 79(1): 97-106. Retrieved on June 5, 2015, from http://www.sciencedirect.com/science/article/pii/S0040162511001272. DOI:10.1016/j.techfore.2011.06.003.
6 Sharma, P., Khurana, U., & Shneiderman, B., et al. Speeding up network layout and centrality measures for social computing goals. In Social Computing, Behavioral Cultural Modeling and Prediction. Berlin: Springer Heidelberg, 2011: 244-251. Retrieved on June 5, 2015, from http://link.springer.com/chapter/10.1007%2F978-3-642-19656-0_35. DOI:10.1007/978-3-642-19656-0_35.
7 Wang, Y., Zeng, J.P., & Zhou, B.H., et al. Online forum opinion leaders discovering method based on clustering analysis, Computer Engineering (in Chinese), 2011, 37(5): 44-46, 49. Retrieved on June 5, 2015, from http://d.g.wanfangdata.com.cn/Periodical_jsjgc201105015.aspx. DOI:10.3969/j.issn.1000-3428.2011.05.015.
8 Jiang, C.Q., Zhu, Y.S., & Ding, Y. On discovery of opinion leaders based on UGC. Journal of Intelligence (in Chinese), 2011, 30(10): 82-85. Retrieved on June 5, 2015, from http://d.g.wanfangdata.com.cn/Periodical_qbzz201110016.aspx. DOI:10.3969/j.issn.1002-1965.2011.10.016.
9 Tan, M.H., Jin, Y.S., & Qiu, Y.Q. A research of recommendation mechanism for micro-blog customer relationship based on content analysis. Library Tribune (in Chinese), 2013, 33(4):104-108. Retrieved on June 5, 2015, from http://d.g.wanfangdata.com.cn/Periodical_tsglt201304019.aspx. DOI:10.3969/j.issn.1002-1167.2013.04.019.
10 Ma, J., Zhou, G., & Xu, B., et al. Analysis of user influence in microblog based on individual attribute features. Application Research of Computers (in Chinese), 2013, 30(8): 2483-2487. Retrieved on June 5, 2015, from http://d.g.wanfangdata.com.cn/Periodical_jsjyyyj201308061.aspx. DOI:10.3969/j.issn.1001-3695.2013.08.061.
11 Sina Weibo boasts 500m users. Retrieved on June 5, 2015, from http://www.chinadaily.com.cn/business/2013-02/21/content_16243934.htm.
12 Zhao, W., Zhu, Q., & Wu, K., et al. Analysis of micro-blogging user character and motivation —Take micro-blogging of Hexun.com as an example. New Technology of Library and Information Service (in Chinese), 2011, 27(2): 69-75. Retrieved on June 5, 2015, from http://www.infotech.ac.cn/CN/abstract/abstract3350.shtml.
13 Hair, J.F., Black, W.C., & Babin, B.J., et al. Multivariate data analysis. 6th edition. New Jersey: Pearson Education, 2006.