Abstract:
Machine learning (ML) and deep learning methods have become common and publicly available, while ML security to date struggles to cope with rising threats. One rising threat is model extraction attacks where adversaries are able to reproduce a target model close to perfection. The attack is widely deployable since the attacker needs only to have access to predictions to perform this attack. Stolen ML models could either be used for personal advantage to abuse paid prediction services or to create transferable adversarial examples that can be used to undermine the integrity of prediction services, i.e. prediction quality. This is a significant threat in several application areas, such as in autonomous driving, which rely heavily of computer vision via deep neural networks. In this thesis, we reproduce existing model extraction attacks and evaluate novel techniques to extract deep neural network (DNN) classifiers. We introduce new synthetic query generation strategies, and demonstrate their efficiency at extracting models for creating transferable targeted adversarial examples from stolen DNNs.