Search Result Diversification in Short Text Streams

Shangsong Liang, Emine Yilmaz, Hong Shen, Maarten de Rijke, W. Bruce Croft

Research output: Contribution to journalArticle

12 Scopus citations

Abstract

We consider the problem of search result diversification for streams of short texts. Diversifying search results in short text streams is more challenging than in the case of long documents, as it is difficult to capture the latent topics of short documents. To capture the changes of topics and the probabilities of documents for a given query at a specific time in a short text stream, we propose a dynamic Dirichlet multinomial mixture topic model, called D2M3, as well as a Gibbs sampling algorithm for the inference. We also propose a streaming diversification algorithm, SDA, that integrates the information captured by D2M3 with our proposed modified version of the PM-2 (Proportionality-based diversification Method -- second version) diversification algorithm. We conduct experiments on a Twitter dataset and find that SDA statistically significantly outperforms state-of-the-art non-streaming retrieval methods, plain streaming retrieval methods, as well as streaming diversification methods that use other dynamic topic models.
Original languageAmerican English
JournalACM Transactions on Information Systems
DOIs
Publication statusPublished - 2017

    Fingerprint

Cite this