Towards Honest, Practicable and Efficient Private Learning

dc.contributor.authorMohapatra, Shubhankar
dc.date.accessioned2025-11-24T18:54:12Z
dc.date.available2025-11-24T18:54:12Z
dc.date.issued2025-11-24
dc.date.submitted2025-11-14
dc.description.abstractProtecting our personal information is a major challenge in today's data-driven world. When scientists and companies analyze large datasets, they need a way to ensure our individual privacy isn't compromised. This thesis focuses on Differential Privacy, a powerful, mathematical guarantee that places a strict, verifiable limit on how much personal information can be leaked, even if an attacker has the worst-case advantage. Researchers have developed various sophisticated algorithms to accomplish useful tasks, like building machine learning models or generating realistic synthetic data, while maintaining Differential Privacy. Crucially, these operations must be conducted within a predetermined, strict limit, often referred to as the "privacy budget." This budget mathematically quantifies the total acceptable loss of privacy for the entire process, enforcing a crucial trade-off between data utility and individual protection. All routine procedures of the machine learning pipeline, including data cleaning, hyperparameter tuning, and model training, must be performed within the budget. Several tools can perform these tasks in disjunction when the dataset is non-private. However, these tools do not translate easily to differential privacy and often do not consider the cumulative privacy costs. In this thesis, we explore various pragmatic problems that a data science practitioner may face when deploying a differentially private learning framework from data collection to model training. In particular, we are interested in real-world data quality problems, such as missing data, inconsistent data, and incorrectly labeled data, as well as machine learning pipeline requirements, including hyperparameter tuning. We envision building a general-purpose private learning framework that can handle real data as input and can be used in learning tasks such as generating a highly accurate private machine learning model or creating a synthetic version of the dataset with end-to-end differential privacy guarantees. We envision our work will make differentially private learning more accessible to data science practitioners and easily deployable in day-to-day applications.
dc.identifier.urihttps://hdl.handle.net/10012/22645
dc.language.isoen
dc.pendingfalse
dc.publisherUniversity of Waterlooen
dc.subjectData Privacy
dc.subjectDatabases
dc.subjectMachine Learning
dc.subjectDifferential Privacy
dc.titleTowards Honest, Practicable and Efficient Private Learning
dc.typeDoctoral Thesis
uws-etd.degreeDoctor of Philosophy
uws-etd.degree.departmentDavid R. Cheriton School of Computer Science
uws-etd.degree.disciplineComputer Science
uws-etd.degree.grantorUniversity of Waterlooen
uws-etd.embargo.terms0
uws.contributor.advisorHe, Xi
uws.contributor.affiliation1Faculty of Mathematics
uws.peerReviewStatusUnrevieweden
uws.published.cityWaterlooen
uws.published.countryCanadaen
uws.published.provinceOntarioen
uws.scholarLevelGraduateen
uws.typeOfResourceTexten

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Mohapatra_Shubhankar.pdf
Size:
30.11 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
6.4 KB
Format:
Item-specific license agreed upon to submission
Description: