LINIS Found the Limitations of Text Clustering on the Internet
Sergey Koltsov, Deputy Director of the Laboratory for Internet Studies (LINIS), presented his project on the problems of topic modeling of on-line texts at the Web Science conference in Bloomington.
Web Science studies the vast information network of people, communities, organizations, applications, and policies that shape and are shaped by the Web. Computing, physical, and social sciences come together, complementing each other in understanding how the Web affects our interactions and behaviors.
Sergey Koltsov (LINIS) got notable feedback for his paper at the ACM Web Science Conference in Bloomington, USA. In it he discussed the unresolved methodological problem of clustering of large text collections obtained on-line, in particular the issue of instability of the topic modeling algorithm. During experiments at LINIS it was found that different solutions produced by this algorithm are not just slightly different, but they differ dramatically so that no conclusions about the topical composition of the collection can be drawn. LINIS is currently working on methods to stabilize topic modeling results.
The conference was at Indiana university at the end of June and was co-sponsored by Google, Microsoft, Facebook and other businesses. This highly selective event included 30 presentations with an acceptance rate of less than one third. The presentation that was awarded the status of the best paper analyzed 2.3 million tweets devoted to the Gezi park protests in Turkey and, among other things, it found out that with time the discussion of the problem became more democratic and the ability to influence other users more equally distributed.