Towards Explainability for Language Models in Security Testing

Hadfield, Cameron

Towards Explainability for Language Models in Security Testing

dc.contributor.author	Hadfield, Cameron
dc.date.accessioned	2026-05-07T18:49:43Z
dc.date.available	2026-05-07T18:49:43Z
dc.date.issued	2026-05-07
dc.date.submitted	2026-05-05
dc.description.abstract	Modern generative Language Models (LMs) present as black boxes, requiring significant trust in their capabilities and making it difficult to understand the reasoning behind their decisions. As these LMs are increasingly used for code and test-case generation, testers must trust them without knowing what drives the model's outputs. To improve accuracy, modern LMs rely on supplementary documentation, such as Retrieval-Augmented Generation (RAG), or other content directly provided in their prompts to enhance background knowledge. When testers use LM-generated test cases for other purposes, such as fuzz testing, they must place greater trust in their quality, as seed cases can significantly affect fuzzer coverage performance. We adapt existing methods to build an analysis pipeline that explains document retrieval when the LM relies on documentation to generate test cases. We achieve this with only black-box access to the LMs under test. We use RFC-959 (the File Transfer Protocol (FTP) protocol) and two synthetic protocols to isolate the LM's reliance on data in its RAG system. Statistical analysis shows that the explanations from our pipeline capture real phenomena rather than random data. To aid integration with automated security testing, we present a formal definition of protocol communication. This formalism helps map our pipeline's features to the protocol domain and lays a foundation for future work with fuzzers. The explanations our pipeline generates yield plausible results, with some unexpected outputs, suggesting the need for tuning to improve explanations.
dc.identifier.uri	https://hdl.handle.net/10012/23262
dc.language.iso	en
dc.pending	false
dc.publisher	University of Waterloo	en
dc.subject	explainability
dc.subject	cybersecurity
dc.subject	embedded systems
dc.subject	large language models
dc.subject	security testing
dc.title	Towards Explainability for Language Models in Security Testing
dc.type	Master Thesis
uws-etd.degree	Master of Applied Science
uws-etd.degree.department	Electrical and Computer Engineering
uws-etd.degree.discipline	Electrical and Computer Engineering
uws-etd.degree.grantor	University of Waterloo	en
uws-etd.embargo.terms	1 year
uws.contributor.advisor	Fischmeister, Sebastian
uws.contributor.affiliation1	Faculty of Engineering
uws.peerReviewStatus	Unreviewed	en
uws.published.city	Waterloo	en
uws.published.country	Canada	en
uws.published.province	Ontario	en
uws.scholarLevel	Graduate	en
uws.typeOfResource	Text	en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Cameron_Hadfield.pdf
Size:: 5.13 MB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 6.4 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Theses
Electrical and Computer Engineering