Explainable User Clustering in Short Text Streams.

Yukun Zhao, Shangsong Liang, Zhaochun Ren, Jun Ma, Emine Yilmaz, Maarten de Rijke

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

51 Scopus citations

Abstract

User clustering has been studied from different angles: behavior-based, to identify similar browsing or search patterns, and content-based, to identify shared interests. Once user clusters have been found, they can be used for recommendation and personalization. So far, content-based user clustering has mostly focused on static sets of relatively long documents. Given the dynamic nature of social media, there is a need to dynamically cluster users in the context of short text streams. User clustering in this setting is more challenging than in the case of long documents as it is difficult to capture the users' dynamic topic distributions in sparse data settings. To address this problem, we propose a dynamic user clustering topic model (or UCT for short). UCT adaptively tracks changes of each user's time-varying topic distribution based both on the short texts the user posts during a given time period and on the previously estimated distribution. To infer changes, we propose a Gibbs sampling algorithm where a set of word-pairs from each user is constructed for sampling. The clustering results are explainable and human-understandable, in contrast to many other clustering algorithms. For evaluation purposes, we work with a dataset consisting of users and tweets from each user. Experimental results demonstrate the effectiveness of our proposed clustering model compared to state-of-the-art baselines.
Original languageAmerican English
Title of host publicationSIGIR '16 Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval
PublisherAssociation for Computing Machinery, Inc
Pages155-164
Number of pages10
ISBN (Electronic)9781450342902
DOIs
StatePublished - 2016
Event39th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2016 - Pisa, Italy
Duration: Jul 17 2016Jul 21 2016

Publication series

NameSIGIR 2016 - Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval

Conference

Conference39th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2016
Country/TerritoryItaly
CityPisa
Period07/17/1607/21/16

Keywords

  • Short text processing
  • User clustering
  • User topic modeling

Fingerprint

Dive into the research topics of 'Explainable User Clustering in Short Text Streams.'. Together they form a unique fingerprint.
  • Dynamic User Interests

    Liang, S. (CoI), Ren, Z. (CoI), Zhao, Y. (CoI), Yilmaz, E. (CoI), Kanoulas, E. (CoI), Ma, J. (CoI), De Rijke, M. (CoI) & Hobby, M. (CoI)

    08/1/1507/1/19

    Project: Research

Cite this