Trustworthy Machine Learning with Data in the Wild

No Thumbnail Available

Date

2025-06-19

Advisor

Yu, Yaoliang
Sun, Sun

Journal Title

Journal ISSN

Volume Title

Publisher

University of Waterloo

Abstract

Recent advances in machine learning (ML) have been largely fueled by models trained on extensive internet-collected datasets. While this approach has yielded remarkable capabilities, it introduces critical vulnerabilities: training data can be untrustworthy, containing harmful content or becoming susceptible to data poisoning attacks. In such scenarios, model behavior can be maliciously altered, resulting in reduced test performance for classification models or the replication of copyrighted materials for generative models. This thesis examines the influence of untrusted training data on machine learning training dynamics through two crucial perspectives: the ML developer's lens, focusing on model integrity, and the data owner's viewpoint, addressing privacy and copyright concerns. Specifically, this thesis analyzes the impact of data in the wild from both theoretical and empirical perspectives. The first part formulates data poisoning attacks (specifically, accuracy degradation attacks) as bi-level optimization problems, also known as Stackelberg games. It provides a viable algorithm to poison modern machine learning models, particularly neural networks, which demonstrate significantly greater robustness to such attacks compared to traditional linear models. The second part investigates this robustness distinction and develops a principled theoretical framework for understanding the effectiveness boundaries of data poisoning attacks across various scenarios. Given some clean training data, a target model, and malicious parameter objectives, this theoretical tool determines the minimum amount of poisoned data required to achieve these parameters, thereby quantifying the fundamental limits of data poisoning attacks. Building upon the understanding of data poisoning attacks in supervised settings (i.e., classification tasks), this thesis further examines their threats in two realistic machine learning pipelines. The third part presents the first comprehensive analysis of data poisoning attacks against pre-trained feature extractors—components frequently utilized for various downstream ML tasks, such as adapting large models to medical data. This analysis reveals that drastic domain shifts can significantly increase ML models' vulnerability to data poisoning attacks, necessitating more robust countermeasures. The final section examines the role of harmful data in generative models, specifically focusing on advanced latent diffusion models for text-to-image generation tasks. Copyright infringement concerns arise when such models produce outputs substantially similar to copyrighted training data. This section introduces a novel scenario termed "disguised copyright infringement" incurred by targeted data poisoning attacks, providing a thorough description of potential attack vectors and corresponding defensive strategies.

Description

Keywords

trustworthy AI, machine learning, security and privacy, data poisoning attack, optimization

LC Subject Headings

Citation