Revisiting Benchmarks for Privacy-Preserving Image Classification

Mokhtari, Sabrina

Revisiting Benchmarks for Privacy-Preserving Image Classification

Files

Mokhtari_Sabrina.pdf (4.41 MB)

Date

2024-09-17

Authors

Mokhtari, Sabrina

Advisor

Kamath, Gautam

Publisher

University of Waterloo

Abstract

Differential privacy (DP) is a standard method for preserving the privacy of individual data points. DP prevents models from memorizing training data, thus reducing the risk of data leakage. While DP has been effective in machine learning (ML), there are growing concerns about some common practices in differentially private machine learning (DP ML), particularly the reliance on non-private ML benchmarks to measure progress. Popular datasets like CIFAR-10, while extensively used in non-private settings, may not accurately capture the complexities of privacy-sensitive areas like medical data. Additionally, pre-training on publicly available datasets may not yield the same benefits when the private data differs significantly and is not well represented in the public domain. This thesis addresses these concerns by evaluating DP methods using various privacy-sensitive datasets and training scenarios. We focus on medical datasets, where privacy is crucial, and study a thorough set of techniques. These techniques cover a wide range of settings, including those with public data pre-training, cases without public data, full-layer and last-layer fine-tuning, and different privacy levels.