Towards Explainability for Language Models in Security Testing

dc.contributor.authorHadfield, Cameron
dc.date.accessioned2026-05-07T18:49:43Z
dc.date.available2026-05-07T18:49:43Z
dc.date.issued2026-05-07
dc.date.submitted2026-05-05
dc.description.abstractModern generative Language Models (LMs) present as black boxes, requiring significant trust in their capabilities and making it difficult to understand the reasoning behind their decisions. As these LMs are increasingly used for code and test-case generation, testers must trust them without knowing what drives the model's outputs. To improve accuracy, modern LMs rely on supplementary documentation, such as Retrieval-Augmented Generation (RAG), or other content directly provided in their prompts to enhance background knowledge. When testers use LM-generated test cases for other purposes, such as fuzz testing, they must place greater trust in their quality, as seed cases can significantly affect fuzzer coverage performance. We adapt existing methods to build an analysis pipeline that explains document retrieval when the LM relies on documentation to generate test cases. We achieve this with only black-box access to the LMs under test. We use RFC-959 (the File Transfer Protocol (FTP) protocol) and two synthetic protocols to isolate the LM's reliance on data in its RAG system. Statistical analysis shows that the explanations from our pipeline capture real phenomena rather than random data. To aid integration with automated security testing, we present a formal definition of protocol communication. This formalism helps map our pipeline's features to the protocol domain and lays a foundation for future work with fuzzers. The explanations our pipeline generates yield plausible results, with some unexpected outputs, suggesting the need for tuning to improve explanations.
dc.identifier.urihttps://hdl.handle.net/10012/23262
dc.language.isoen
dc.pendingfalse
dc.publisherUniversity of Waterlooen
dc.subjectexplainability
dc.subjectcybersecurity
dc.subjectembedded systems
dc.subjectlarge language models
dc.subjectsecurity testing
dc.titleTowards Explainability for Language Models in Security Testing
dc.typeMaster Thesis
uws-etd.degreeMaster of Applied Science
uws-etd.degree.departmentElectrical and Computer Engineering
uws-etd.degree.disciplineElectrical and Computer Engineering
uws-etd.degree.grantorUniversity of Waterlooen
uws-etd.embargo.terms1 year
uws.contributor.advisorFischmeister, Sebastian
uws.contributor.affiliation1Faculty of Engineering
uws.peerReviewStatusUnrevieweden
uws.published.cityWaterlooen
uws.published.countryCanadaen
uws.published.provinceOntarioen
uws.scholarLevelGraduateen
uws.typeOfResourceTexten

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Cameron_Hadfield.pdf
Size:
5.13 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
6.4 KB
Format:
Item-specific license agreed upon to submission
Description: