Towards Honest, Practicable and Efficient Private Learning

Mohapatra, Shubhankar

Towards Honest, Practicable and Efficient Private Learning

dc.contributor.author	Mohapatra, Shubhankar
dc.date.accessioned	2025-11-24T18:54:12Z
dc.date.available	2025-11-24T18:54:12Z
dc.date.issued	2025-11-24
dc.date.submitted	2025-11-14
dc.description.abstract	Protecting our personal information is a major challenge in today's data-driven world. When scientists and companies analyze large datasets, they need a way to ensure our individual privacy isn't compromised. This thesis focuses on Differential Privacy, a powerful, mathematical guarantee that places a strict, verifiable limit on how much personal information can be leaked, even if an attacker has the worst-case advantage. Researchers have developed various sophisticated algorithms to accomplish useful tasks, like building machine learning models or generating realistic synthetic data, while maintaining Differential Privacy. Crucially, these operations must be conducted within a predetermined, strict limit, often referred to as the "privacy budget." This budget mathematically quantifies the total acceptable loss of privacy for the entire process, enforcing a crucial trade-off between data utility and individual protection. All routine procedures of the machine learning pipeline, including data cleaning, hyperparameter tuning, and model training, must be performed within the budget. Several tools can perform these tasks in disjunction when the dataset is non-private. However, these tools do not translate easily to differential privacy and often do not consider the cumulative privacy costs. In this thesis, we explore various pragmatic problems that a data science practitioner may face when deploying a differentially private learning framework from data collection to model training. In particular, we are interested in real-world data quality problems, such as missing data, inconsistent data, and incorrectly labeled data, as well as machine learning pipeline requirements, including hyperparameter tuning. We envision building a general-purpose private learning framework that can handle real data as input and can be used in learning tasks such as generating a highly accurate private machine learning model or creating a synthetic version of the dataset with end-to-end differential privacy guarantees. We envision our work will make differentially private learning more accessible to data science practitioners and easily deployable in day-to-day applications.
dc.identifier.uri	https://hdl.handle.net/10012/22645
dc.language.iso	en
dc.pending	false
dc.publisher	University of Waterloo	en
dc.subject	Data Privacy
dc.subject	Databases
dc.subject	Machine Learning
dc.subject	Differential Privacy
dc.title	Towards Honest, Practicable and Efficient Private Learning
dc.type	Doctoral Thesis
uws-etd.degree	Doctor of Philosophy
uws-etd.degree.department	David R. Cheriton School of Computer Science
uws-etd.degree.discipline	Computer Science
uws-etd.degree.grantor	University of Waterloo	en
uws-etd.embargo.terms	0
uws.contributor.advisor	He, Xi
uws.contributor.affiliation1	Faculty of Mathematics
uws.peerReviewStatus	Unreviewed	en
uws.published.city	Waterloo	en
uws.published.country	Canada	en
uws.published.province	Ontario	en
uws.scholarLevel	Graduate	en
uws.typeOfResource	Text	en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Mohapatra_Shubhankar.pdf
Size:: 30.11 MB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 6.4 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Theses
Computer Science