Machine learning in applied econometrics: Deriving personal income drivers with randomized decision forests

 |  Login

Show simple item record

dc.contributor Aalto-yliopisto fi
dc.contributor Aalto University en Ikonen, Henri 2016-06-14T06:01:03Z 2016-06-14T06:01:03Z 2016
dc.description.abstract In this paper I explore a modern field of research in applied econometrics: machine learning and the estimation of synthetic treatment effects. Data generation is currently on an exponential growth path: smart phones, social media and networks of interconnected devices are generating information at an unprecedented pace. The size, structure and velocity of these information streams vary to a great extent. The field of econometrics is also evolving: classic econometric models can lead to biased results with big data and will not scale up to modern data sets. I propose the well- performing Random Forests algorithm for use in econometrics. To adjust this method for causal analysis, recent theory on causal decision trees is explored. The proposed framework is then tested by estimating personal income drivers for the top 1% in U.S. population. The data used is the American Community Survey 5- year sample consisting of approximately 20 million rows. It appears that high income is in fact driven by four core factors: education, experience, working hours and gender. To rank these predictors, a synthetic treatment effect simulation is run. I find that investing in education after a master's degree has a significant positive effect in the likelihood of high income. Additionally, it appears that the negative gender income effect for females can be undone with a combination of work experience and exceptional work- ethic. en
dc.format.extent 66
dc.language.iso en en
dc.title Machine learning in applied econometrics: Deriving personal income drivers with randomized decision forests en
dc.type G2 Pro gradu, diplomityö fi Kauppakorkeakoulu fi School of Business en
dc.contributor.department Taloustieteen laitos fi
dc.contributor.department Department of Economics en
dc.subject.keyword econometrics
dc.subject.keyword machine learning
dc.subject.keyword decision trees
dc.subject.keyword causality
dc.subject.keyword big data
dc.subject.keyword random forests
dc.subject.keyword income
dc.subject.keyword american community survey
dc.identifier.urn URN:NBN:fi:aalto-201609083405
dc.type.dcmitype text en
dc.programme.major Kansantaloustiede fi
dc.programme.major Economics en
dc.type.ontasot Pro gradu tutkielma fi
dc.type.ontasot Master's thesis en
dc.subject.helecon taloustieteet
dc.subject.helecon economic science
dc.subject.helecon ekonometria
dc.subject.helecon econometrics
dc.subject.helecon tietämyksenhallinta
dc.subject.helecon knowledge management
dc.subject.helecon oppiminen
dc.subject.helecon learning
dc.subject.helecon varallisuus
dc.subject.helecon wealth
dc.subject.helecon kehitys
dc.subject.helecon development
dc.subject.helecon Yhdysvallat
dc.subject.helecon United States
dc.ethesisid 14370 2016-04-15
dc.location P1 I

Files in this item

Files Size Format View

There are no files associated with this item.

This item appears in the following Collection(s)

Show simple item record

Search archive

Advanced Search

article-iconSubmit a publication


My Account