We propose an extension to language models for information retrieval. Typically, language models estimate the probability of a document generating the query, where the query is considered as a set of independent search terms. We extend this approach by considering the concepts implied by both the query and words in the document. The model combines the probability of the document generating the concept embodied by the query, and the traditional language model probability of the document generating the query terms. We use a word embedding space to express concepts. The similarity between two vectors in this space is estimated using a weighted cosine distance. The weighting significantly enhances the discrimination between vectors. We evaluate our model on benchmark datasets (TREC 6--8) and empirically demonstrate it outperforms state-of-the-art baselines.
|Original language||American English|
|Journal||WWW '17 Companion Proceedings of the 26th International Conference on World Wide Web Companion|
|State||Published - 2017|