Improving Data Generalization with Variational Autoencoders for Network Traffic Anomaly Detection

dc.contributorAalto-yliopistofi
dc.contributorAalto Universityen
dc.contributor.authorMonshizadeh, Mehrnooshen_US
dc.contributor.authorKhatri, Vikramajeeten_US
dc.contributor.authorGamdou, Marahen_US
dc.contributor.authorKantola, Raimoen_US
dc.contributor.authorYan, Zhengen_US
dc.contributor.departmentDepartment of Communications and Networkingen
dc.contributor.groupauthorNetwork Security and Trusten
dc.contributor.organizationLucenten_US
dc.contributor.organizationUniversité Paris-Saclayen_US
dc.date.accessioned2021-04-28T06:28:22Z
dc.date.available2021-04-28T06:28:22Z
dc.date.issued2021en_US
dc.descriptionPublisher Copyright: CCBY Copyright: Copyright 2021 Elsevier B.V., All rights reserved.
dc.description.abstractDeep generative models have increasingly become popular in different domains such as image processing, though, they hardly appear in the cybersecurity arena. While the main application of these models is dimensionality reduction, marginally they have been utilized for overcoming challenges such as data generalization and overfitting issues inherited from feature selection methods. To solve the mentioned challenges, we propose a combined architecture comprising a Conditional Variational AutoEncoder (CVAE) and a Random Forest (RF) classifier to automatically learn similarity among input features, provide data distribution in order to extract discriminative features from original features, and finally classify various types of attacks. CVAE introduces the labels of traffic packets into a latent space in order to better learn the changes of input samples and distinguish the data characteristics of each class. It avoids the confusion between classes while learning the whole data distribution. Compared with feature selection mechanisms such as Support Vector Machine Online (SVMo) by considering various evaluation metrics, the proposed architecture demonstrates considerable improvement in terms of performance. To verify the versatility of the proposed architecture, two publicly available datasets have been used in experiments.en
dc.description.versionPeer revieweden
dc.format.extent2169-3536
dc.format.mimetypeapplication/pdfen_US
dc.identifier.citationMonshizadeh, M, Khatri, V, Gamdou, M, Kantola, R & Yan, Z 2021, ' Improving Data Generalization with Variational Autoencoders for Network Traffic Anomaly Detection ', IEEE Access, vol. 9, 9399440, pp. 56893-56907 . https://doi.org/10.1109/ACCESS.2021.3072126en
dc.identifier.doi10.1109/ACCESS.2021.3072126en_US
dc.identifier.issn2169-3536
dc.identifier.otherPURE UUID: 2ab0c112-006d-4d72-a8e7-277919bb74c6en_US
dc.identifier.otherPURE ITEMURL: https://research.aalto.fi/en/publications/2ab0c112-006d-4d72-a8e7-277919bb74c6en_US
dc.identifier.otherPURE LINK: http://www.scopus.com/inward/record.url?scp=85104182619&partnerID=8YFLogxKen_US
dc.identifier.otherPURE FILEURL: https://research.aalto.fi/files/62232669/Monshizadeh_Improving_data_generalization_ieeeaccess.pdfen_US
dc.identifier.urihttps://aaltodoc.aalto.fi/handle/123456789/107098
dc.identifier.urnURN:NBN:fi:aalto-202104286382
dc.language.isoenen
dc.publisherIEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
dc.relation.ispartofseriesIEEE Accessen
dc.relation.ispartofseriesVolume 9en
dc.rightsopenAccessen
dc.subject.keywordAnomaly Detectionen_US
dc.subject.keywordAnomaly detectionen_US
dc.subject.keywordClassification algorithmsen_US
dc.subject.keywordData Miningen_US
dc.subject.keywordFeature extractionen_US
dc.subject.keywordFeature Selectionen_US
dc.subject.keywordMachine Learningen_US
dc.subject.keywordMeasurementen_US
dc.subject.keywordRandom forestsen_US
dc.subject.keywordSecurityen_US
dc.subject.keywordTelecommunication trafficen_US
dc.subject.keywordVegetationen_US
dc.titleImproving Data Generalization with Variational Autoencoders for Network Traffic Anomaly Detectionen
dc.typeA1 Alkuperäisartikkeli tieteellisessä aikakauslehdessäfi
dc.type.versionpublishedVersion
Files