The Interplay of Information Theory and Deep Learning: Frameworks to Improve Deep Learning Efficiency and Accuracy

Mohajer Hamidi, Shayan

The Interplay of Information Theory and Deep Learning: Frameworks to Improve Deep Learning Efficiency and Accuracy

dc.contributor.advisor	Yang, En-Hui
dc.contributor.author	Mohajer Hamidi, Shayan
dc.date.accessioned	2025-01-23T16:31:10Z
dc.date.available	2025-01-23T16:31:10Z
dc.date.issued	2025-01-23
dc.date.submitted	2025-01-21
dc.description.abstract	The intersection of information theory (IT) and machine learning (ML) represents a promising, yet relatively under-explored, frontier with significant potential for innovation. Despite the clear benefits of combining these fields, progress has been limited by two main challenges: (i) the highly specialized nature of IT and ML, which creates a barrier to cross-disciplinary expertise, and (ii) the computational complexity involved in applying information-theoretic concepts to large-scale ML problems. This dissertation seeks to overcome these challenges and explore the rich possibilities at the intersection of IT and ML. By leveraging powerful tools and concepts from IT, we aim to uncover novel insights and develop innovative ML algorithms. Given that deep neural networks (DNNs) form the backbone of modern ML models, the integration of IT principles into ML requires a focus on optimizing the training and performance of DNNs using information-theoretic frameworks. While DNNs have a broad range of applications, this thesis narrows its focus to two key areas: classification and generative DNNs. The objective is to harness IT principles to enhance the performance of these models. • Classification DNNs. For classification DNNs, this dissertation targets improvements in three critical areas: (i) Improving classification accuracy. The performance of classification DNNs is traditionally measured by classification accuracy, but we argue that conventional error metrics are insufficient for capturing a model’s true performance. By introducing the concepts of conditional mutual information (CMI) and normalized conditional mutual information (NCMI), we propose a new metric for evaluating DNNs. The CMI measures intra-class concentration, while the ratio of CMI to NCMI reflects inter-class separation. We then modify the standard loss function in deep learning (DL) framework to minimize the standard cross entropy function subject to an NCMI constraint, yielding CMI constrained deep learning (CMIC-DL). Then, via extensive experiment results, we show that DNNs trained within CMIC-DL achieves a higher classification accuracy compared to the state-of-the-art models trained within the standard DL and other loss functions in the literature. (ii) Enhancing distributed learning accuracy. In the context of distributed learning, particularly federated learning (FL), we tackle the challenge of class imbalance using informationtheoretic concepts to improve the accuracy of the shared global model. To this end, we introduce new information-theoretic quantities into FL and propose a modified loss function based on these principles. This leads to the development of a federated learning framework, Fed-IT, which enhances the classification accuracy of models trained in distributed environments. (iii) Reduce the size and training/inference complexity. We introduce coded deep learning (CDL), a novel framework aimed at reducing the computational and storage complexity of classification DNNs. CDL achieves this by compressing model weights and activations through probabilistic quantization. Both forward and backward passes during training are performed using quantized weights and activations, significantly reducing floating-point operations and computational overhead. Furthermore, CDL imposes entropy constraints on weights and activations, ensuring compressibility at every stage of training, which also reduces communication costs in parallel computing environments. This leads to models that are more efficient in both training and inference, with lower storage and computational requirements. • Generative DNNs. For generative DNNs, this dissertation focuses on diffusion models and their application to solving inverse problems. Inverse problems are common in fields like medical imaging, signal processing, and physics, where the goal is to recover an underlying cause from corrupted or incomplete observations. These problems are often ill-posed, with multiple possible solutions or high sensitivity to small changes in the data. In this dissertation, we enhance the performance of diffusion models by incorporating probabilistic principles, making them more effective at capturing the posterior distribution of the underlying causes in inverse problems. This approach improves the model’s ability to accurately reconstruct signals and provides more reliable solutions in challenging inverse problem scenarios. Overall, this dissertation demonstrates the powerful synergy between IT and ML, showcasing novel methods that improve the accuracy and efficiency of both classification and generative DNNs. By addressing key challenges in training and optimization, this work lays the foundation for future research at the intersection of these two fields.
dc.identifier.uri	https://hdl.handle.net/10012/21424
dc.language.iso	en
dc.pending	false
dc.publisher	University of Waterloo	en
dc.subject	Deep Learning
dc.subject	Machine Learning
dc.subject	Information theory
dc.title	The Interplay of Information Theory and Deep Learning: Frameworks to Improve Deep Learning Efficiency and Accuracy
dc.type	Doctoral Thesis
uws-etd.degree	Doctor of Philosophy
uws-etd.degree.department	Electrical and Computer Engineering
uws-etd.degree.discipline	Electrical and Computer Engineering
uws-etd.degree.grantor	University of Waterloo	en
uws-etd.embargo.terms	0
uws.contributor.advisor	Yang, En-Hui
uws.contributor.affiliation1	Faculty of Engineering
uws.peerReviewStatus	Unreviewed	en
uws.published.city	Waterloo	en
uws.published.country	Canada	en
uws.published.province	Ontario	en
uws.scholarLevel	Graduate	en
uws.typeOfResource	Text	en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Mohajer Hamidi_Shayan.pdf
Size:: 6.96 MB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 6.4 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Theses
Electrical and Computer Engineering