All You Need Is "Love": Evading Hate Speech Detection

dc.contributorAalto-yliopistofi
dc.contributorAalto Universityen
dc.contributor.authorGröndahl, Tommien_US
dc.contributor.authorPajola, Lucaen_US
dc.contributor.authorJuuti, Mikaen_US
dc.contributor.authorConti, Mauroen_US
dc.contributor.authorAsokan, N.en_US
dc.contributor.departmentDepartment of Computer Scienceen
dc.contributor.departmentSchool services,SCIen
dc.contributor.groupauthorAdj. Prof Asokan N. groupen
dc.contributor.groupauthorHelsinki Institute for Information Technology (HIIT)en
dc.contributor.organizationDepartment of Computer Scienceen_US
dc.contributor.organizationUniversity of Paduaen_US
dc.date.accessioned2019-01-14T09:19:24Z
dc.date.available2019-01-14T09:19:24Z
dc.date.issued2018en_US
dc.description| openaire: EC/H2020/688061/EU//TagItSmart
dc.description.abstractWith the spread of social networks and their unfortunate use for hate speech, automatic detection of the latter has become a pressing problem. In this paper, we reproduce seven state-of-the-art hate speech detection models from prior work, and show that they perform well only when tested on the same type of data they were trained on. Based on these results, we argue that for successful hate speech detection, model architecture is less important than the type of data and labeling criteria. We further show that all proposed detection techniques are brittle against adversaries who can (automatically) insert typos, change word boundaries or add innocuous words to the original hate speech. A combination of these methods is also effective against Google Perspective - a cutting-edge solution from industry. Our experiments demonstrate that adversarial training does not completely mitigate the attacks, and using character-level features makes the models systematically more attack-resistant than using word-level features.en
dc.description.versionPeer revieweden
dc.format.extent10
dc.format.extent2-12
dc.format.mimetypeapplication/pdfen_US
dc.identifier.citationGröndahl, T, Pajola, L, Juuti, M, Conti, M & Asokan, N 2018, All You Need Is "Love": Evading Hate Speech Detection . in Proceedings of the 11th ACM Workshop on Artificial Intelligence and Security . ACM, New York, pp. 2-12, ACM Workshop on Artificial Intelligence and Security, Toronto, Canada, 19/10/2018 . https://doi.org/10.1145/3270101.3270103en
dc.identifier.doi10.1145/3270101.3270103en_US
dc.identifier.isbn978-1-4503-6004-3
dc.identifier.otherPURE UUID: 1dfcbc7a-fa08-4633-9ffd-fb1c30dd844een_US
dc.identifier.otherPURE ITEMURL: https://research.aalto.fi/en/publications/1dfcbc7a-fa08-4633-9ffd-fb1c30dd844een_US
dc.identifier.otherPURE FILEURL: https://research.aalto.fi/files/31027356/SCI_Gr_ndahl_Pajola_et.al._All_You_Need_is_Love.1808.09115_1.pdfen_US
dc.identifier.urihttps://aaltodoc.aalto.fi/handle/123456789/35918
dc.identifier.urnURN:NBN:fi:aalto-201901141101
dc.language.isoenen
dc.relationinfo:eu-repo/grantAgreement/EC/H2020/688061/EU//TagItSmarten_US
dc.relation.ispartofseriesProceedings of the 11th ACM Workshop on Artificial Intelligence and Securityen
dc.rightsopenAccessen
dc.titleAll You Need Is "Love": Evading Hate Speech Detectionen
dc.typeConference article in proceedingsfi
dc.type.versionacceptedVersion
Files