Latent variable models for a probabilistic timeline browser

No Thumbnail Available

URL

Journal Title

Journal ISSN

Volume Title

School of Science | Master's thesis
Checking the digitized thesis and permission for publishing
Instructions for the author

Date

2011

Major/Subject

Informaatiotekniikka

Mcode

T-61

Degree programme

Language

en

Pages

vi + 52

Series

Abstract

Probabilistic models have been extensively applied in Information Retrieval (IR) systems; they treat the process of document retrieval as probabilistic inference. Integrated with a relevance feedback mechanism, an IR system is able to infer both the search query and document relevance from the browsing pattern of a user. However, if there are no constraints imposed on the query, the model over fits easily and results in poor predictive performance. In this thesis, several latent variable models with feature selection are proposed for a probabilistic proactive timeline browser. The proactive timeline browser is suitable for finding events from timelines, in particular from life logs and other timelines containing a familiar narrative. The proposed models are based on several classical variable selection methods in linear regression, including Gibbs Variable Selection and Stochastic Search Variable Selection. Feature selection helps the model effectively avoid over-fitting and hence achieve better predictive performance. The new proposed models are more robust against noisy features, compared to models without feature selection. The models proposed in this thesis are general enough to apply to a wide variety of IR problems.

Description

Supervisor

Kaski, Samuel

Thesis advisor

Gönen, Mehmet

Keywords

information retrieval, feature selection, probabilistic models

Other note

Citation