Knowledge Discovery and Complex Network Dynamics in Social Media Space

Edward Yellakuor Baagyere, Zhen Qin, Xiong Hu, Qin Zhiguang

Abstract


Pattern discovery and correlation in text data have been research hotbed in recent times. However, a composite model that captures patterns and correlations as a quantitative measure in social media space is yet to receive much research attention. The paper therefore analyzed social media data from Twitter about the 2014-FIFA World Cup both as lexical text and a complex network system. Quantitatively it is discovered that the 140 character upper bound in Twitter does not have negative impact on the formation of ideas. For as a lexical text, the following key statistics were confirmed: the distribution of the words in the corpus obeys a Zipf’s law, 3-character length words accounted for almost 22% of the corpus and the distribution of the article "the" also follows a Zipf’s or power-law. Moreover, the three most frequent terms related to the world cup event, that is (url, worldcup, rt) account for about 14.5% of the corpus.

In particular, the corpus is modeled as a network,  where 12 V">  is the set of vocabularies in the corpus and  is the set of bigrams (two words phrases). An algorithm is developed and implemented in python to obtain the bigrams from the corpus. Using concepts from graph theory, the bigram network is analyzed and the results show compelling facts about text network. Firstly, all the characteristics of complex networks known in literature are observed in the bigram network. These include the degree distribution, which is observed to follow power-law with degree exponent  value of 2.14. Secondly, the average path length of words is observed to be 4.78, which is within the ”small world” categories. Thirdly, other complex network characteristics such as eigenvector and betweenness centralities metrics are observed within the bigram network both having weak power-law distributions as observed in other complex networks in literature.

These findings call for the need to study the topological characteristics of text data and comparing their structural properties to that of known complex network metrics in literature. The results will be of great importance in studying complex systems. Also the application areas of these findings are numerous ranging from information retrieval, data compression to information security.

To the best of our knowledge, this is the first work that studied the textual and topological structure of text from social media platform as a complex network and analyzed important topological properties of complex network on it.

Keywords: complex network, bigram, media space, Twitter, information science


Full Text: PDF
Download the IISTE publication guideline!

To list your conference here. Please contact the administrator of this platform.

Paper submission email: NCS@iiste.org

ISSN (Paper)2224-610X ISSN (Online)2225-0603

Please add our address "contact@iiste.org" into your email contact list.

This journal follows ISO 9001 management standard and licensed under a Creative Commons Attribution 3.0 License.

Copyright © www.iiste.org