Android Malfare Detection

 |  Login

Show simple item record

dc.contributor Aalto-yliopisto fi
dc.contributor Aalto University en
dc.contributor.author Sayfullina, Luiza
dc.contributor.author Eirola, Emil
dc.contributor.author Komashinskiy, Dmitri
dc.contributor.author Palumbo, Paolo
dc.contributor.author Karhunen, Juha
dc.date.accessioned 2016-12-16T14:26:04Z
dc.date.issued 2017
dc.identifier.citation Sayfullina , L , Eirola , E , Komashinskiy , D , Palumbo , P & Karhunen , J 2017 , Android Malfare Detection : Building Useful Representations . in 2016 15th IEEE International Conference on Machine Learning and Applications, ICMLA 2016, Proceedings : Anaheim, California, USA, December 18-20, 2016. . IEEE , pp. 201-206 , IEEE International Conference on Machine Learning and Applications , Anaheim , United States , 18/12/2016 . https://doi.org/10.1109/ICMLA.2016.0041 en
dc.identifier.other PURE UUID: 8536242a-592b-4e16-a3f0-0392edb10f09
dc.identifier.other PURE ITEMURL: https://research.aalto.fi/en/publications/android-malfare-detection(8536242a-592b-4e16-a3f0-0392edb10f09).html
dc.identifier.other PURE FILEURL: https://research.aalto.fi/files/9397294/ICMLA_2016_final.pdf
dc.identifier.uri https://aaltodoc.aalto.fi/handle/123456789/23809
dc.description.abstract The problem of proactively detecting Android Malware has proven to be a challenging one. The challenges stem from a variety of issues, but recent literature has shown that this task is hard to solve with high accuracy when only a restricted set of features, like permissions or similar fixed sets of features, are used. The opposite approach of including all available features is also problematic, as it causes the features space to grow beyond reasonable size. In this paper we focus on finding an efficient way to select a representative feature space, preserving its discriminative power on unseen data. We go beyond traditional approaches like Principal Component Analysis, which is too heavy for large-scale problems with millions of features. In particular we show that many feature groups that can be extracted from Android application packages, like features extracted from the manifest file or strings extracted from the Dalvik Executable (DEX), should be filtered and used in classification separately. Our proposed dimensionality reduction scheme is applied to each group separately and consists of raw string preprocessing, feature selection via log-odds and finally applying random projections. With the size of the feature space growing exponentially as a function of the training set's size, our approach drastically decreases the size of the feature space of several orders of magnitude, this in turn allows accurate classification to become possible in a real world scenario. After reducing the dimensionality we use the feature groups in a light-weight ensemble of logistic classifiers. We evaluated the proposed classification scheme on real malware data provided by the antivirus vendor and achieved state-of-the-art 88.24% true positive and reasonably low 0.04% false positive rates with a significantly compressed feature space on a balanced test set of 10,000 samples. en
dc.format.mimetype application/pdf
dc.language.iso en en
dc.relation.ispartof IEEE International Conference on Machine Learning and Applications en
dc.relation.ispartofseries Proc. of The IEEE 15th Int. Conf. on Machine Learning and Applications (ICMLA 2016) en
dc.rights openAccess en
dc.subject.other Artificial Intelligence en
dc.subject.other Computer Networks and Communications en
dc.subject.other Computer Science Applications en
dc.subject.other 113 Computer and information sciences en
dc.title Android Malfare Detection en
dc.type A4 Artikkeli konferenssijulkaisussa fi
dc.description.version Peer reviewed en
dc.contributor.department Department of Computer Science
dc.contributor.department Arcada University of Applied Sciences
dc.contributor.department F-Secure Corp.
dc.subject.keyword Android
dc.subject.keyword Dimensionality reduction
dc.subject.keyword Feature selection
dc.subject.keyword Logistic regression
dc.subject.keyword Malware classification
dc.subject.keyword Random projection
dc.subject.keyword Artificial Intelligence
dc.subject.keyword Computer Networks and Communications
dc.subject.keyword Computer Science Applications
dc.subject.keyword 113 Computer and information sciences
dc.identifier.urn URN:NBN:fi:aalto-201612165986
dc.identifier.doi 10.1109/ICMLA.2016.0041
dc.type.version acceptedVersion


Files in this item

Files Size Format View

There are no open access files associated with this item.

This item appears in the following Collection(s)

Show simple item record

Search archive


Advanced Search

article-iconSubmit a publication

Browse