Pluralistic Alignment in Recursive Retraining of Generative Models

Falahati, Ali

Pluralistic Alignment in Recursive Retraining of Generative Models

dc.contributor.author	Falahati, Ali
dc.date.accessioned	2026-05-21T17:36:22Z
dc.date.available	2026-05-21T17:36:22Z
dc.date.issued	2026-05-21
dc.date.submitted	2026-05-18
dc.description.abstract	Reinforcement Learning from Human Feedback (RLHF) is widely used to align generative models with human preferences. However, most work studies alignment as a one-time procedure applied to a fixed dataset. In practice, training data is dynamic. Over time, generative models begin to train on curated outputs produced by earlier generations, creating a feedback loop that leads to recursive retraining. In this setting, alignment is a dynamic process in which curation decisions compound over time and continually shape the support, diversity, and alignment profile of future models. This thesis develops a framework for studying how alignment evolves over time under recursive retraining, focusing on how heterogeneous preferences interact through Bradley-Terry style pairwise comparison mechanisms used in curation. The thesis studies two cases of recursive curation. We begin by revisiting prior work on single-preference curation, which shows that repeatedly optimizing for a fixed preference can lead to degradation in quality, loss of diversity, and collapse toward a narrow subset of outputs. These findings have raised concerns that recursive training loops inevitably reinforce a single dominant preference over time. Moving beyond this, we study settings where multiple preferences jointly curate the data at each retraining step. Instead of reinforcing a single preference, the training data reflects a mixture of competing preferences. We show that in such settings, recursive retraining can maintain a range of desirable behaviors rather than collapsing, and the resulting models reflect a stable balance between different preferences. Second, the thesis analyzes recursive retraining with sequential curation by different stakeholders, a setting that reflects how alignment is applied in practice. In real-world settings, model outputs are not curated by a single preference but are curated in stages by different actors, such as model developers and end users, each with their own preferences. This raises a fundamental question: when different preferences curate sequentially over generations, how does the order and structure of curation shape the long-term behavior of the model? We show that the order in which preferences are applied plays a critical role. Recursive retraining can lead to consensus collapse, compromise on shared outcomes, or asymmetric influence where one stakeholder’s preferences dominate over time. These dynamics highlight that alignment is not only determined by which preferences are present, but also by how they are introduced and reinforced across generations. Overall, we show that the long-term behavior of aligned generative models is not fixed, but depends on the structure of the retraining process. Alignment should therefore be understood as a mechanism design problem, where the way preferences are aggregated determines whether models collapse, compromise, or remain pluralistic.
dc.identifier.uri	https://hdl.handle.net/10012/23372
dc.language.iso	en
dc.pending	false
dc.publisher	University of Waterloo	en
dc.title	Pluralistic Alignment in Recursive Retraining of Generative Models
dc.type	Master Thesis
uws-etd.degree	Master of Mathematics
uws-etd.degree.department	David R. Cheriton School of Computer Science
uws-etd.degree.discipline	Computer Science
uws-etd.degree.grantor	University of Waterloo	en
uws-etd.embargo.terms	0
uws.contributor.advisor	Golab, Lukasz
uws.contributor.affiliation1	Faculty of Mathematics
uws.peerReviewStatus	Unreviewed	en
uws.published.city	Waterloo	en
uws.published.country	Canada	en
uws.published.province	Ontario	en
uws.scholarLevel	Graduate	en
uws.typeOfResource	Text	en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Falahati_Ali.pdf
Size:: 2.48 MB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 6.4 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Theses
Computer Science