Revisiting Benchmarks for Privacy-Preserving Image Classification
Loading...
Date
2024-09-17
Authors
Advisor
Kamath, Gautam
Journal Title
Journal ISSN
Volume Title
Publisher
University of Waterloo
Abstract
Differential privacy (DP) is a standard method for preserving the privacy of individual data points. DP prevents models from memorizing training data, thus reducing the risk of data leakage. While DP has been effective in machine learning (ML), there are growing concerns about some common practices in differentially private machine learning (DP ML), particularly the reliance on non-private ML benchmarks to measure progress. Popular datasets like CIFAR-10, while extensively used in non-private settings, may not accurately capture the complexities of privacy-sensitive areas like medical data. Additionally, pre-training on publicly available datasets may not yield the same benefits when the private data differs significantly and is not well represented in the public domain. This thesis addresses these concerns by evaluating DP methods using various privacy-sensitive datasets and training scenarios. We focus on medical datasets, where privacy is crucial, and study a thorough set of techniques. These techniques cover a wide range of settings, including those with public data pre-training, cases without public data, full-layer and last-layer fine-tuning, and different privacy levels.
Description
Keywords
differential privacy, machine learning, image classification, benchmarking, computer vision