KARTAL: Web Application Vulnerability Hunting Using Large Language Models

Loading...
Thumbnail Image

URL

Journal Title

Journal ISSN

Volume Title

Perustieteiden korkeakoulu | Master's thesis

Date

2023-08-21

Department

Major/Subject

Security and Cloud Computing

Mcode

SCI3113

Degree programme

Master’s Programme in Security and Cloud Computing (SECCLO)

Language

en

Pages

85+8

Series

Abstract

Broken Access Control is the most serious web application security risk as published by Open Worldwide Application Security Project (OWASP). This category has highly complex vulnerabilities such as Broken Object Level Authorization (BOLA) and Exposure of Sensitive Information. Finding such critical vulnerabilities in large software systems requires intelligent and automated tools. State-of-the-art (SOTA) research including hybrid application security testing tools, algorithmic bruteforcers, and artificial intelligence has shown great promise in detection. Nevertheless, there exists a gap in research for reliably identifying logical and context-dependant Broken Access Control vulnerabilities. We propose KARTAL, a novel method for web application vulnerability detection using a Large Language Model (LLM). It consists of 3 components: Fuzzer, Prompter, and Detector. The Fuzzer is responsible for methodically collecting application behaviour. The Prompter processes the data from the Fuzzer and formulates a prompt. The Detector uses an LLM which we have finetuned for detecting vulnerabilities. In the study, we investigate the performance, key factors, and limitations of the proposed method. We experiment with finetuning three types of decoder-only pre-trained transformers for detecting two sophisticated vulnerabilities. Our best model attained an accuracy of 87.19%, with an F1 score of 0.82. By using hardware acceleration on a consumer-grade laptop, our fastest model can make up to 539 predictions per second. The experiments on varying the training sample size demonstrated the great learning capabilities of our model. Every 400 samples added to training resulted in an average MCC score improvement of 19.58%. Furthermore, the dynamic properties of KARTAL enable inference-time adaption to the application domain, resulting in reduced false positives.

Description

Supervisor

Ylä-Jääski, Antti

Thesis advisor

Mörsky, Marika
Sung, Ki Won

Keywords

vulnerability detection, large language models, web applications, application security, AI, broken acccess ontrol

Other note

Citation