JPEG-Inspired Encoding for Deep Learning

Loading...
Thumbnail Image

Advisor

Yang, En-Hui

Journal Title

Journal ISSN

Volume Title

Publisher

University of Waterloo

Abstract

JPEG is the dominant standard for storing and transmitting digital images, while Deep Neural Networks (DNNs) have become the preeminent method for automated image understanding. This dissertation investigates how these two ubiquitous technologies can be synergistically integrated to enhance the performance of DNNs. JPEG was originally engineered for the Human Visual System (HVS), and its default parameters are not optimized for DNNs, which process visual information differently. This suboptimality, stemming from JPEG’s default implementation, is not a fundamental limitation but rather an opportunity to adapt its core components—especially the non-linear quantization stage—for DNNs. This research addresses this suboptimality by first optimizing the trade-off between compression rate and classification accuracy, and second, by introducing a learnable, end-to-end differentiable JPEG layer whose quantization parameters are jointly trained with the underlying DNN. This dissertation demonstrates that this principle of a learnable, JPEG-inspired transformation extends beyond compression, offering a novel way to address challenges in related domains such as knowledge distillation (KD), where large 'teacher' models often overfit the training set. This overfitting causes them to generate overconfident, near one-hot probability vectors that serve as poor supervisory signals for the student model, suggesting the need for novel approaches to information transfer. This dissertation addresses these issues by systematically revisiting the relationship between JPEG encoding and deep learning. It charts a logical progression from adapting JPEG externally for DNNs, to integrating it internally as a learnable network component, and finally to repurposing its core principles to amplify knowledge transfer. This progressive framework is methodically developed and empirically substantiated through three interconnected contributions: -Optimizing Compression for DNNs. To improve the interaction between standard JPEG and pre-trained DNNs, this work first reframes compression from a human-centric "rate-distortion" problem to a DNN-centric "rate-accuracy" one. This is achieved by introducing the Sensitivity Weighted Error (SWE), a novel distortion measure derived from a DNN’s loss sensitivity to frequency-domain perturbations, where higher sensitivity in a frequency band indicates its greater importance for the DNN’s decision-making. The SWE guides the OptS algorithm to generate model-specific JPEG quantization tables. This approach produces fully compliant JPEGs optimized for DNN consumption, demonstrably improving the rate-accuracy trade-off by increasing accuracy up to 2.12% at the same rate, or enabling rate reductions up to 67.84% with no loss of model accuracy. -Integrating a Differentiable JPEG Layer into the DNN Architecture. Building on this, the next contribution integrates the codec into the network architecture itself via the JPEG-Inspired Deep Learning (JPEG-DL) framework, which introduces a novel, end-to-end differentiable JPEG layer. By replacing JPEG's standard hard quantization with a differentiable alternative, this layer's parameters are jointly optimized with the network's weights. This transforms the JPEG pipeline from a static pre-processor into a dynamic, learnable component, significantly improving model accuracy—by an average of 7% on fine-grained classification tasks with only 128 additional trainable parameters—and enhancing robustness against adversarial attacks. - Amplifying Knowledge Transfer via JPEG-Inspired Perturbation. Finally, the differentiable layer is repurposed to address the "overconfident teacher" problem in KD by perturbing teacher inputs to force softer, more informative predictions. Crucially, this method requires no retraining or modification of the fixed teacher model, ensuring its practical utility with proprietary or deployed networks. Our investigation begins with Coded Knowledge Distillation (CKD), a practical heuristic that uses adaptive JPEG compression to perturb teacher inputs and soften their overconfident predictions. While effective, this approach prompted a search for a more principled theoretical foundation. This led to Generalized Coded Knowledge Distillation (GCKD), a framework that establishes the maximization of the teacher's Conditional Mutual Information (CMI) as the core objective. However, directly optimizing for CMI on a per-input basis is computationally prohibitive. This efficiency challenge is resolved in the culminating synthesis, Differentiable JPEG-based Input Perturbation (DJIP). DJIP operationalizes the GCKD theory by deploying the trainable differentiable JPEG layer as a fast, learnable, and amortized operator. Instead of performing a slow, per-input optimization search, the layer is trained once to automatically generate CMI-maximizing perturbations, making the process highly efficient. This approach demonstrably generates richer supervisory signals, boosting student model accuracy by up to 4.11%. In conclusion, this dissertation demonstrates that the relationship between JPEG and DNNs can be systematically revisited to create a powerful synergy. By progressing from adaptation to integration and synthesis, this work transforms the suboptimal default interaction of JPEG and DNNs into a versatile architectural tool. The research delivers a suite of methods that not only improve the performance of DNNs on compressed images but also offer a theoretically-grounded solution to a key challenge in knowledge distillation. By demonstrating that legacy codecs can be repurposed to enhance model accuracy, efficiency, and knowledge transfer, this work thus reframes the role of classical codecs, proposing JPEG-inspired encoding as a principled foundation for the integration of classical compression and deep learning.

Description

Keywords

LC Subject Headings

Citation