The Interplay of Information Theory and Deep Learning: Frameworks to Improve Deep Learning Efficiency and Accuracy

dc.contributor.authorMohajer Hamidi, Shayan
dc.date.accessioned2025-01-23T16:31:10Z
dc.date.available2025-01-23T16:31:10Z
dc.date.issued2025-01-23
dc.date.submitted2025-01-21
dc.description.abstractThe intersection of information theory (IT) and machine learning (ML) represents a promising, yet relatively under-explored, frontier with significant potential for innovation. Despite the clear benefits of combining these fields, progress has been limited by two main challenges: (i) the highly specialized nature of IT and ML, which creates a barrier to cross-disciplinary expertise, and (ii) the computational complexity involved in applying information-theoretic concepts to large-scale ML problems. This dissertation seeks to overcome these challenges and explore the rich possibilities at the intersection of IT and ML. By leveraging powerful tools and concepts from IT, we aim to uncover novel insights and develop innovative ML algorithms. Given that deep neural networks (DNNs) form the backbone of modern ML models, the integration of IT principles into ML requires a focus on optimizing the training and performance of DNNs using information-theoretic frameworks. While DNNs have a broad range of applications, this thesis narrows its focus to two key areas: classification and generative DNNs. The objective is to harness IT principles to enhance the performance of these models. • Classification DNNs. For classification DNNs, this dissertation targets improvements in three critical areas: (i) Improving classification accuracy. The performance of classification DNNs is traditionally measured by classification accuracy, but we argue that conventional error metrics are insufficient for capturing a model’s true performance. By introducing the concepts of conditional mutual information (CMI) and normalized conditional mutual information (NCMI), we propose a new metric for evaluating DNNs. The CMI measures intra-class concentration, while the ratio of CMI to NCMI reflects inter-class separation. We then modify the standard loss function in deep learning (DL) framework to minimize the standard cross entropy function subject to an NCMI constraint, yielding CMI constrained deep learning (CMIC-DL). Then, via extensive experiment results, we show that DNNs trained within CMIC-DL achieves a higher classification accuracy compared to the state-of-the-art models trained within the standard DL and other loss functions in the literature. (ii) Enhancing distributed learning accuracy. In the context of distributed learning, particularly federated learning (FL), we tackle the challenge of class imbalance using informationtheoretic concepts to improve the accuracy of the shared global model. To this end, we introduce new information-theoretic quantities into FL and propose a modified loss function based on these principles. This leads to the development of a federated learning framework, Fed-IT, which enhances the classification accuracy of models trained in distributed environments. (iii) Reduce the size and training/inference complexity. We introduce coded deep learning (CDL), a novel framework aimed at reducing the computational and storage complexity of classification DNNs. CDL achieves this by compressing model weights and activations through probabilistic quantization. Both forward and backward passes during training are performed using quantized weights and activations, significantly reducing floating-point operations and computational overhead. Furthermore, CDL imposes entropy constraints on weights and activations, ensuring compressibility at every stage of training, which also reduces communication costs in parallel computing environments. This leads to models that are more efficient in both training and inference, with lower storage and computational requirements. • Generative DNNs. For generative DNNs, this dissertation focuses on diffusion models and their application to solving inverse problems. Inverse problems are common in fields like medical imaging, signal processing, and physics, where the goal is to recover an underlying cause from corrupted or incomplete observations. These problems are often ill-posed, with multiple possible solutions or high sensitivity to small changes in the data. In this dissertation, we enhance the performance of diffusion models by incorporating probabilistic principles, making them more effective at capturing the posterior distribution of the underlying causes in inverse problems. This approach improves the model’s ability to accurately reconstruct signals and provides more reliable solutions in challenging inverse problem scenarios. Overall, this dissertation demonstrates the powerful synergy between IT and ML, showcasing novel methods that improve the accuracy and efficiency of both classification and generative DNNs. By addressing key challenges in training and optimization, this work lays the foundation for future research at the intersection of these two fields.
dc.identifier.urihttps://hdl.handle.net/10012/21424
dc.language.isoen
dc.pendingfalse
dc.publisherUniversity of Waterlooen
dc.subjectDeep Learning
dc.subjectMachine Learning
dc.subjectInformation theory
dc.titleThe Interplay of Information Theory and Deep Learning: Frameworks to Improve Deep Learning Efficiency and Accuracy
dc.typeDoctoral Thesis
uws-etd.degreeDoctor of Philosophy
uws-etd.degree.departmentElectrical and Computer Engineering
uws-etd.degree.disciplineElectrical and Computer Engineering
uws-etd.degree.grantorUniversity of Waterlooen
uws-etd.embargo.terms0
uws.contributor.advisorYang, En-Hui
uws.contributor.affiliation1Faculty of Engineering
uws.peerReviewStatusUnrevieweden
uws.published.cityWaterlooen
uws.published.countryCanadaen
uws.published.provinceOntarioen
uws.scholarLevelGraduateen
uws.typeOfResourceTexten

Files

Original bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
Mohajer Hamidi_Shayan.pdf
Size:
6.96 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
6.4 KB
Format:
Item-specific license agreed upon to submission
Description: