Visual Segmentation with Explanations for Medical Decision Making: A CNN Architecture with a Transformer-Attention Extension

Khalaf, Mahmoud

Visual Segmentation with Explanations for Medical Decision Making: A CNN Architecture with a Transformer-Attention Extension

dc.contributor.author	Khalaf, Mahmoud
dc.date.accessioned	2026-05-19T17:46:05Z
dc.date.available	2026-05-19T17:46:05Z
dc.date.issued	2026-05-19
dc.date.submitted	2026-05-14
dc.description.abstract	Medical image segmentation is essential for assisting medical professionals in locating anomalies in images. However, the lack of explainability in current segmentation frameworks limits clinical trust and adoption. This thesis presents two complementary frameworks for explainable medical image segmentation, demonstrating that accuracy and interpretability can be achieved simultaneously through innovative architectural design. The first framework introduces an attention-based CNN architecture that generates model-specific visual explanations through spatial attention gates integrated directly into the network. The proposed model achieves a Dice score of 0.8621 on the Kvasir-SEG polyp segmentation dataset, outperforming all evaluated explainable models, while producing attention heatmaps that faithfully reflect the model's decision-making process. The second framework advances this by introducing a dual encoder architecture combining a pretrained ResNet-34 CNN encoder with a pretrained Swin Transformer encoder, fused through a learned directional attention-gated mechanism at multiple scales. The dual encoder achieves a Dice score of 0.907 on Kvasir-SEG, outperforming all single encoder baselines, while generating richer multi-scale visual explanations that reflect the complementary contributions of both encoders. Finally, this thesis outlines a pathway in detail towards a fully multimodal explainability system, integrating textual explanations through SigLIP, Retrieval Augmented Generation, and a Large Language Model alongside the visual heatmaps. We explore future directions regarding breast cancer and echocardiogram segmentation, more specifically applications for ejection fraction computation,and insights gained from trials and errors that led to the innovative designs of the thesis.
dc.identifier.uri	https://hdl.handle.net/10012/23338
dc.language.iso	en
dc.pending	false
dc.publisher	University of Waterloo	en
dc.relation.uri	https://github.com/MaudDK/MedSeg-XAI-AGFusion
dc.relation.uri	https://datasets.simula.no/kvasir-seg/
dc.subject	Medical image segmentation
dc.subject	Explainable AI
dc.subject	Convolution Neural Networks
dc.subject	Vision Transformers
dc.subject	Deep Learning
dc.title	Visual Segmentation with Explanations for Medical Decision Making: A CNN Architecture with a Transformer-Attention Extension
dc.type	Master Thesis
uws-etd.degree	Master of Mathematics
uws-etd.degree.department	David R. Cheriton School of Computer Science
uws-etd.degree.discipline	Computer Science
uws-etd.degree.grantor	University of Waterloo	en
uws-etd.embargo.terms	0
uws.contributor.advisor	Cohen, Robin
uws.contributor.advisor	Bentahar, Jamal
uws.contributor.affiliation1	Faculty of Mathematics
uws.peerReviewStatus	Unreviewed	en
uws.published.city	Waterloo	en
uws.published.country	Canada	en
uws.published.province	Ontario	en
uws.scholarLevel	Graduate	en
uws.typeOfResource	Text	en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Khalaf_Mahmoud.pdf
Size:: 12.16 MB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 6.4 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Theses
Computer Science