Constraining Robust Information Quantities Improves Adversarial Robustness

Loading...
Thumbnail Image

Date

2024-12-11

Advisor

Yang, En-Hui

Journal Title

Journal ISSN

Volume Title

Publisher

University of Waterloo

Abstract

It is known that deep neural networks (DNNs) are vulnerable to imperceptible adversarial attacks, and this fact raises concerns about their safety and reliability in real-world applications. In this thesis, we aim to boost the robustness of DNNs against white-box adversarial attacks by defining three information quantities: robust conditional mutual information (CMI), robust separation, and robust normalized CMI (NCMI), which can serve as evaluation metrics of robust performance for a DNN. We then utilize these concepts to introduce a novel regularization method that constrains intra-class concentration and increases inter-class separation simultaneously among output probability distributions of attacked data. Our experimental results demonstrate that our method consistently enhances model robustness against C&W and AutoAttack on CIFAR and Tiny-ImageNet datasets, both with and without additional synthetic data. The results show that our approach enhances the robust accuracy of DNNs by up to 2.66% on CIFAR datasets and 3.49% on Tiny-ImageNet against PGD attacks, and by 1.70% on CIFAR and 1.63% on Tiny-ImageNet against AutoAttack, compared to several state-of-the-art adversarial training methods.

Description

Keywords

LC Subject Headings

Citation