Latent variable models for a probabilistic timeline browser
No Thumbnail Available
School of Science | Master's thesis
vi + 52
AbstractProbabilistic models have been extensively applied in Information Retrieval (IR) systems; they treat the process of document retrieval as probabilistic inference. Integrated with a relevance feedback mechanism, an IR system is able to infer both the search query and document relevance from the browsing pattern of a user. However, if there are no constraints imposed on the query, the model over fits easily and results in poor predictive performance. In this thesis, several latent variable models with feature selection are proposed for a probabilistic proactive timeline browser. The proactive timeline browser is suitable for finding events from timelines, in particular from life logs and other timelines containing a familiar narrative. The proposed models are based on several classical variable selection methods in linear regression, including Gibbs Variable Selection and Stochastic Search Variable Selection. Feature selection helps the model effectively avoid over-fitting and hence achieve better predictive performance. The new proposed models are more robust against noisy features, compared to models without feature selection. The models proposed in this thesis are general enough to apply to a wide variety of IR problems.
Thesis advisorGönen, Mehmet
information retrieval, feature selection, probabilistic models