Domain adapting LLMs for cyberecurity awareness

dc.contributorAalto-yliopistofi
dc.contributorAalto Universityen
dc.contributor.advisorPapadimitratos, Panagiotis
dc.contributor.advisorHussain, Ahmed
dc.contributor.authorSalahuddin, Salahuddin
dc.contributor.schoolPerustieteiden korkeakoulufi
dc.contributor.schoolSchool of Scienceen
dc.contributor.supervisorHellas, Arto
dc.date.accessioned2025-10-20T17:01:38Z
dc.date.available2025-10-20T17:01:38Z
dc.date.issued2025-09-26
dc.description.abstractWhile Large Language Models (LLMs) has shown exceptional performance in natural language, it struggles with domain-specialized queries. This thesis investigates the effectiveness of Domain-Adaptive Continuous Pretraining (DAP) for enhancing cybersecurity awareness of three open-source pretrained LLMs—Llama-3.1-8B, DeepSeek-Distill-Qwen-14B, and Llama-3.3-70B—on a relatively small domain- specific corpus (1M, 50M, 118.8M). The adapted models are evaluated against their base counterparts and the cybersecurity LLM baseline, Llama-Primus-Base (8B parameters, 2.77B tokens). Across three benchmarks—CTI-MCQ, CyberMetric, and SecEVal—the DAP models outperformed base models and Llama-Primus-Base, with the 70B model demonstrating better results than the open-source baseline models. These results indicate that DAP can enhance LLMs cybersecurity understanding with a small dataset size and no Supervised Fine-Tuning (SFT)/Reinforcement Learning with Human Feedback (RLHF),en
dc.format.extent52
dc.format.mimetypeapplication/pdfen
dc.identifier.urihttps://aaltodoc.aalto.fi/handle/123456789/140103
dc.identifier.urnURN:NBN:fi:aalto-202510208272
dc.language.isoenen
dc.programmeMaster's Programme in Security and Cloud Computingen
dc.programme.majorSecurity and Cloud Computingen
dc.subject.keywordgenerative AIen
dc.subject.keywordcybersecurityen
dc.subject.keywordlarge language modelsen
dc.subject.keyworddomain adaptive continuous pretrainingen
dc.subject.keywordthreat intelligenceen
dc.subject.keywordfoundation modelsen
dc.titleDomain adapting LLMs for cyberecurity awarenessen
dc.typeG2 Pro gradu, diplomityöfi
dc.type.ontasotMaster's thesisen
dc.type.ontasotDiplomityöfi
local.aalto.electroniconlyyes
local.aalto.openaccessno

Files