Constraining Robust Information Quantities Improves Adversarial Robustness
Loading...
Date
2024-12-11
Authors
Advisor
Yang, En-Hui
Journal Title
Journal ISSN
Volume Title
Publisher
University of Waterloo
Abstract
It is known that deep neural networks (DNNs) are vulnerable to imperceptible adversarial attacks, and this fact raises concerns about their safety and reliability in real-world applications. In this thesis, we aim to boost the robustness of DNNs against white-box adversarial attacks by defining three information quantities: robust conditional mutual information (CMI), robust separation, and robust normalized CMI (NCMI), which can serve as evaluation metrics of robust performance for a DNN. We then utilize these concepts to introduce a novel regularization method that constrains intra-class concentration and increases inter-class separation simultaneously among output probability distributions of attacked data. Our experimental results demonstrate that our method consistently enhances model robustness against C&W and AutoAttack on CIFAR and Tiny-ImageNet datasets, both with and without additional synthetic data. The results show that our approach enhances the robust accuracy of DNNs by up to 2.66% on CIFAR datasets and 3.49% on Tiny-ImageNet against PGD attacks, and by 1.70% on CIFAR and 1.63% on Tiny-ImageNet against AutoAttack, compared to several state-of-the-art adversarial training methods.