Visual Segmentation with Explanations for Medical Decision Making: A CNN Architecture with a Transformer-Attention Extension

dc.contributor.authorKhalaf, Mahmoud
dc.date.accessioned2026-05-19T17:46:05Z
dc.date.available2026-05-19T17:46:05Z
dc.date.issued2026-05-19
dc.date.submitted2026-05-14
dc.description.abstractMedical image segmentation is essential for assisting medical professionals in locating anomalies in images. However, the lack of explainability in current segmentation frameworks limits clinical trust and adoption. This thesis presents two complementary frameworks for explainable medical image segmentation, demonstrating that accuracy and interpretability can be achieved simultaneously through innovative architectural design. The first framework introduces an attention-based CNN architecture that generates model-specific visual explanations through spatial attention gates integrated directly into the network. The proposed model achieves a Dice score of 0.8621 on the Kvasir-SEG polyp segmentation dataset, outperforming all evaluated explainable models, while producing attention heatmaps that faithfully reflect the model's decision-making process. The second framework advances this by introducing a dual encoder architecture combining a pretrained ResNet-34 CNN encoder with a pretrained Swin Transformer encoder, fused through a learned directional attention-gated mechanism at multiple scales. The dual encoder achieves a Dice score of 0.907 on Kvasir-SEG, outperforming all single encoder baselines, while generating richer multi-scale visual explanations that reflect the complementary contributions of both encoders. Finally, this thesis outlines a pathway in detail towards a fully multimodal explainability system, integrating textual explanations through SigLIP, Retrieval Augmented Generation, and a Large Language Model alongside the visual heatmaps. We explore future directions regarding breast cancer and echocardiogram segmentation, more specifically applications for ejection fraction computation,and insights gained from trials and errors that led to the innovative designs of the thesis.
dc.identifier.urihttps://hdl.handle.net/10012/23338
dc.language.isoen
dc.pendingfalse
dc.publisherUniversity of Waterlooen
dc.relation.urihttps://github.com/MaudDK/MedSeg-XAI-AGFusion
dc.relation.urihttps://datasets.simula.no/kvasir-seg/
dc.subjectMedical image segmentation
dc.subjectExplainable AI
dc.subjectConvolution Neural Networks
dc.subjectVision Transformers
dc.subjectDeep Learning
dc.titleVisual Segmentation with Explanations for Medical Decision Making: A CNN Architecture with a Transformer-Attention Extension
dc.typeMaster Thesis
uws-etd.degreeMaster of Mathematics
uws-etd.degree.departmentDavid R. Cheriton School of Computer Science
uws-etd.degree.disciplineComputer Science
uws-etd.degree.grantorUniversity of Waterlooen
uws-etd.embargo.terms0
uws.contributor.advisorCohen, Robin
uws.contributor.advisorBentahar, Jamal
uws.contributor.affiliation1Faculty of Mathematics
uws.peerReviewStatusUnrevieweden
uws.published.cityWaterlooen
uws.published.countryCanadaen
uws.published.provinceOntarioen
uws.scholarLevelGraduateen
uws.typeOfResourceTexten

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Khalaf_Mahmoud.pdf
Size:
12.16 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
6.4 KB
Format:
Item-specific license agreed upon to submission
Description: