Latent variable models for a probabilistic timeline browser

No Thumbnail Available
Journal Title
Journal ISSN
Volume Title
School of Science | Master's thesis
Checking the digitized thesis and permission for publishing
Instructions for the author
Date
2011
Major/Subject
Informaatiotekniikka
Mcode
T-61
Degree programme
Language
en
Pages
vi + 52
Series
Abstract
Probabilistic models have been extensively applied in Information Retrieval (IR) systems; they treat the process of document retrieval as probabilistic inference. Integrated with a relevance feedback mechanism, an IR system is able to infer both the search query and document relevance from the browsing pattern of a user. However, if there are no constraints imposed on the query, the model over fits easily and results in poor predictive performance. In this thesis, several latent variable models with feature selection are proposed for a probabilistic proactive timeline browser. The proactive timeline browser is suitable for finding events from timelines, in particular from life logs and other timelines containing a familiar narrative. The proposed models are based on several classical variable selection methods in linear regression, including Gibbs Variable Selection and Stochastic Search Variable Selection. Feature selection helps the model effectively avoid over-fitting and hence achieve better predictive performance. The new proposed models are more robust against noisy features, compared to models without feature selection. The models proposed in this thesis are general enough to apply to a wide variety of IR problems.
Description
Supervisor
Kaski, Samuel
Thesis advisor
Gönen, Mehmet
Keywords
information retrieval, feature selection, probabilistic models
Other note
Citation