The Interplay of Information Theory and Deep Learning: Frameworks to Improve Deep Learning Efficiency and Accuracy

No Thumbnail Available

Date

2025-01-23

Advisor

Yang, En-Hui

Journal Title

Journal ISSN

Volume Title

Publisher

University of Waterloo

Abstract

The intersection of information theory (IT) and machine learning (ML) represents a promising, yet relatively under-explored, frontier with significant potential for innovation. Despite the clear benefits of combining these fields, progress has been limited by two main challenges: (i) the highly specialized nature of IT and ML, which creates a barrier to cross-disciplinary expertise, and (ii) the computational complexity involved in applying information-theoretic concepts to large-scale ML problems. This dissertation seeks to overcome these challenges and explore the rich possibilities at the intersection of IT and ML. By leveraging powerful tools and concepts from IT, we aim to uncover novel insights and develop innovative ML algorithms. Given that deep neural networks (DNNs) form the backbone of modern ML models, the integration of IT principles into ML requires a focus on optimizing the training and performance of DNNs using information-theoretic frameworks. While DNNs have a broad range of applications, this thesis narrows its focus to two key areas: classification and generative DNNs. The objective is to harness IT principles to enhance the performance of these models. • Classification DNNs. For classification DNNs, this dissertation targets improvements in three critical areas: (i) Improving classification accuracy. The performance of classification DNNs is traditionally measured by classification accuracy, but we argue that conventional error metrics are insufficient for capturing a model’s true performance. By introducing the concepts of conditional mutual information (CMI) and normalized conditional mutual information (NCMI), we propose a new metric for evaluating DNNs. The CMI measures intra-class concentration, while the ratio of CMI to NCMI reflects inter-class separation. We then modify the standard loss function in deep learning (DL) framework to minimize the standard cross entropy function subject to an NCMI constraint, yielding CMI constrained deep learning (CMIC-DL). Then, via extensive experiment results, we show that DNNs trained within CMIC-DL achieves a higher classification accuracy compared to the state-of-the-art models trained within the standard DL and other loss functions in the literature. (ii) Enhancing distributed learning accuracy. In the context of distributed learning, particularly federated learning (FL), we tackle the challenge of class imbalance using informationtheoretic concepts to improve the accuracy of the shared global model. To this end, we introduce new information-theoretic quantities into FL and propose a modified loss function based on these principles. This leads to the development of a federated learning framework, Fed-IT, which enhances the classification accuracy of models trained in distributed environments. (iii) Reduce the size and training/inference complexity. We introduce coded deep learning (CDL), a novel framework aimed at reducing the computational and storage complexity of classification DNNs. CDL achieves this by compressing model weights and activations through probabilistic quantization. Both forward and backward passes during training are performed using quantized weights and activations, significantly reducing floating-point operations and computational overhead. Furthermore, CDL imposes entropy constraints on weights and activations, ensuring compressibility at every stage of training, which also reduces communication costs in parallel computing environments. This leads to models that are more efficient in both training and inference, with lower storage and computational requirements. • Generative DNNs. For generative DNNs, this dissertation focuses on diffusion models and their application to solving inverse problems. Inverse problems are common in fields like medical imaging, signal processing, and physics, where the goal is to recover an underlying cause from corrupted or incomplete observations. These problems are often ill-posed, with multiple possible solutions or high sensitivity to small changes in the data. In this dissertation, we enhance the performance of diffusion models by incorporating probabilistic principles, making them more effective at capturing the posterior distribution of the underlying causes in inverse problems. This approach improves the model’s ability to accurately reconstruct signals and provides more reliable solutions in challenging inverse problem scenarios. Overall, this dissertation demonstrates the powerful synergy between IT and ML, showcasing novel methods that improve the accuracy and efficiency of both classification and generative DNNs. By addressing key challenges in training and optimization, this work lays the foundation for future research at the intersection of these two fields.

Description

Keywords

Deep Learning, Machine Learning, Information theory

LC Subject Headings

Citation