Pluralistic Alignment in Recursive Retraining of Generative Models

dc.contributor.authorFalahati, Ali
dc.date.accessioned2026-05-21T17:36:22Z
dc.date.available2026-05-21T17:36:22Z
dc.date.issued2026-05-21
dc.date.submitted2026-05-18
dc.description.abstractReinforcement Learning from Human Feedback (RLHF) is widely used to align generative models with human preferences. However, most work studies alignment as a one-time procedure applied to a fixed dataset. In practice, training data is dynamic. Over time, generative models begin to train on curated outputs produced by earlier generations, creating a feedback loop that leads to recursive retraining. In this setting, alignment is a dynamic process in which curation decisions compound over time and continually shape the support, diversity, and alignment profile of future models. This thesis develops a framework for studying how alignment evolves over time under recursive retraining, focusing on how heterogeneous preferences interact through Bradley-Terry style pairwise comparison mechanisms used in curation. The thesis studies two cases of recursive curation. We begin by revisiting prior work on single-preference curation, which shows that repeatedly optimizing for a fixed preference can lead to degradation in quality, loss of diversity, and collapse toward a narrow subset of outputs. These findings have raised concerns that recursive training loops inevitably reinforce a single dominant preference over time. Moving beyond this, we study settings where multiple preferences jointly curate the data at each retraining step. Instead of reinforcing a single preference, the training data reflects a mixture of competing preferences. We show that in such settings, recursive retraining can maintain a range of desirable behaviors rather than collapsing, and the resulting models reflect a stable balance between different preferences. Second, the thesis analyzes recursive retraining with sequential curation by different stakeholders, a setting that reflects how alignment is applied in practice. In real-world settings, model outputs are not curated by a single preference but are curated in stages by different actors, such as model developers and end users, each with their own preferences. This raises a fundamental question: when different preferences curate sequentially over generations, how does the order and structure of curation shape the long-term behavior of the model? We show that the order in which preferences are applied plays a critical role. Recursive retraining can lead to consensus collapse, compromise on shared outcomes, or asymmetric influence where one stakeholder’s preferences dominate over time. These dynamics highlight that alignment is not only determined by which preferences are present, but also by how they are introduced and reinforced across generations. Overall, we show that the long-term behavior of aligned generative models is not fixed, but depends on the structure of the retraining process. Alignment should therefore be understood as a mechanism design problem, where the way preferences are aggregated determines whether models collapse, compromise, or remain pluralistic.
dc.identifier.urihttps://hdl.handle.net/10012/23372
dc.language.isoen
dc.pendingfalse
dc.publisherUniversity of Waterlooen
dc.titlePluralistic Alignment in Recursive Retraining of Generative Models
dc.typeMaster Thesis
uws-etd.degreeMaster of Mathematics
uws-etd.degree.departmentDavid R. Cheriton School of Computer Science
uws-etd.degree.disciplineComputer Science
uws-etd.degree.grantorUniversity of Waterlooen
uws-etd.embargo.terms0
uws.contributor.advisorGolab, Lukasz
uws.contributor.affiliation1Faculty of Mathematics
uws.peerReviewStatusUnrevieweden
uws.published.cityWaterlooen
uws.published.countryCanadaen
uws.published.provinceOntarioen
uws.scholarLevelGraduateen
uws.typeOfResourceTexten

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Falahati_Ali.pdf
Size:
2.48 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
6.4 KB
Format:
Item-specific license agreed upon to submission
Description: