Machine learning in applied econometrics: Deriving personal income drivers with randomized decision forests

No Thumbnail Available
Journal Title
Journal ISSN
Volume Title
School of Business | Master's thesis
Ask about the availability of the thesis by sending email to the Aalto University Learning Centre oppimiskeskus@aalto.fi
Date
2016
Major/Subject
Kansantaloustiede
Economics
Mcode
Degree programme
Language
en
Pages
66
Series
Abstract
In this paper I explore a modern field of research in applied econometrics: machine learning and the estimation of synthetic treatment effects. Data generation is currently on an exponential growth path: smart phones, social media and networks of interconnected devices are generating information at an unprecedented pace. The size, structure and velocity of these information streams vary to a great extent. The field of econometrics is also evolving: classic econometric models can lead to biased results with big data and will not scale up to modern data sets. I propose the well- performing Random Forests algorithm for use in econometrics. To adjust this method for causal analysis, recent theory on causal decision trees is explored. The proposed framework is then tested by estimating personal income drivers for the top 1% in U.S. population. The data used is the American Community Survey 5- year sample consisting of approximately 20 million rows. It appears that high income is in fact driven by four core factors: education, experience, working hours and gender. To rank these predictors, a synthetic treatment effect simulation is run. I find that investing in education after a master's degree has a significant positive effect in the likelihood of high income. Additionally, it appears that the negative gender income effect for females can be undone with a combination of work experience and exceptional work- ethic.
Description
Keywords
econometrics, machine learning, decision trees, causality, big data, random forests, income, american community survey
Other note
Citation