Detecting obfuscated scripts with machine learning techniques

Loading...
Thumbnail Image

URL

Journal Title

Journal ISSN

Volume Title

Perustieteiden korkeakoulu | Master's thesis

Department

Mcode

T3011

Language

en

Pages

50+5

Series

Abstract

Complex operating system administration tasks can be automated and simplified by using scripting languages. For the Windows operating system, one of the most commonly used scripting languages is PowerShell. The PowerShell scripting language provides vast functionality for the system administrators. At the same time, it leaves a large attack surface for adversaries to bypass the OS protections. Signature and supervised machine learning based intrusion detection systems (IDS) can be used for monitoring and detecting such malicious scripts. However, the detection can be evaded by obfuscating the scripts. As the next step in the defense, we can use obfuscation itself as a reliable sign of malicious code. This thesis investigates the methods of detecting obfuscated PowerShell scripts with machine learning (ML) techniques. We trained the logistic regression, random forest and gradient boosting models on a balanced dataset. To generate the dataset, unobfuscated scripts were taken from open-source projects and they were obfuscated by open-source obfuscators. We then selected the most important independent features for obfuscation detection. The ML methods were compared using their ROC curves and AUC values. The best method turns out to be the gradient boosting model, which has the AUC close to one for the used dataset. Moreover, the model can classify a script faster than in one millisecond. Thus, the model can replace existing approaches to obfuscation detection, and it can be used by antivirus vendors in the process of detecting malicious PowerShell scripts.

Description

Supervisor

Aura, Tuomas

Thesis advisor

Aura, Tuomas
Kraemer, Frank Alexander

Other note

Citation