Latent variable models for a probabilistic timeline browser
No Thumbnail Available
URL
Journal Title
Journal ISSN
Volume Title
School of Science |
Master's thesis
Checking the digitized thesis and permission for publishing
Instructions for the author
Instructions for the author
Authors
Date
2011
Department
Major/Subject
Informaatiotekniikka
Mcode
T-61
Degree programme
Language
en
Pages
vi + 52
Series
Abstract
Probabilistic models have been extensively applied in Information Retrieval (IR) systems; they treat the process of document retrieval as probabilistic inference. Integrated with a relevance feedback mechanism, an IR system is able to infer both the search query and document relevance from the browsing pattern of a user. However, if there are no constraints imposed on the query, the model over fits easily and results in poor predictive performance. In this thesis, several latent variable models with feature selection are proposed for a probabilistic proactive timeline browser. The proactive timeline browser is suitable for finding events from timelines, in particular from life logs and other timelines containing a familiar narrative. The proposed models are based on several classical variable selection methods in linear regression, including Gibbs Variable Selection and Stochastic Search Variable Selection. Feature selection helps the model effectively avoid over-fitting and hence achieve better predictive performance. The new proposed models are more robust against noisy features, compared to models without feature selection. The models proposed in this thesis are general enough to apply to a wide variety of IR problems.Description
Supervisor
Kaski, SamuelThesis advisor
Gönen, MehmetKeywords
information retrieval, feature selection, probabilistic models