Book chapter
A Latent Dirichlet Framework for Relevance Modeling
Information Retrieval Technology, pp.13-25
Lecture Notes in Computer Science, Springer Berlin Heidelberg
2009
DOI: 10.1007/978-3-642-04769-5_2
Abstract
Relevance-based language models operate by estimating the probabilities of observing words in documents relevant (or pseudo relevant) to a topic. However, these models assume that if a document is relevant to a topic, then all tokens in the document are relevant to that topic. This could limit model robustness and effectiveness. In this study, we propose a Latent Dirichlet relevance model, which relaxes this assumption. Our approach derives from current research on Latent Dirichlet Allocation (LDA) topic models. LDA has been extensively explored, especially for discovering a set of topics from a corpus. LDA itself, however, has a limitation that is also addressed in our work. Topics generated by LDA from a corpus are synthetic, i.e., they do not necessarily correspond to topics identified by humans for the same corpus. In contrast, our model explicitly considers the relevance relationships between documents and given topics (queries). Thus unlike standard LDA, our model is directly applicable to goals such as relevance feedback for query modification and text classification, where topics (classes and queries) are provided upfront. Thus although the focus of our paper is on improving relevance-based language models, in effect our approach bridges relevance-based language models and LDA addressing limitations of both.
Details
- Title: Subtitle
- A Latent Dirichlet Framework for Relevance Modeling
- Creators
- Viet Ha-Thuc - Computer Science Department, The University of Iowa, Iowa City, USAPadmini Srinivasan - Computer Science Department, The University of Iowa, Iowa City, USA
- Resource Type
- Book chapter
- Publication Details
- Information Retrieval Technology, pp.13-25
- Publisher
- Springer Berlin Heidelberg; Berlin, Heidelberg
- Series
- Lecture Notes in Computer Science
- DOI
- 10.1007/978-3-642-04769-5_2
- eISSN
- 1611-3349
- ISSN
- 0302-9743
- Language
- English
- Date published
- 2009
- Academic Unit
- Nursing; Computer Science; Business Analytics
- Record Identifier
- 9984003009302771
Metrics
16 Record Views