UWSpace

UWSpace is the University of Waterloo’s institutional repository for the free, secure, and long-term home of research produced by faculty, students, and staff.

Depositing Theses/Dissertations or Research to UWSpace

Are you a Graduate Student depositing your thesis to UWSpace? See our Thesis Deposit Help and UWSpace Thesis FAQ pages to learn more.

Are you a Faculty or Staff member depositing research to UWSpace? See our Waterloo Research Deposit Help and Self-Archiving pages to learn more.

Photo by Waterloo staff

Recent Submissions

  • Item type: Item ,
    SQLyzr: A Comprehensive Benchmark and Framework for Evaluating Text-to-SQL Systems
    (University of Waterloo, 2026-04-23) Abedini, Sepideh
    Natural language–to–SQL (text-to-SQL) systems aim to enable users to interact with relational databases using natural language instead of SQL. Recent advances in large language models have significantly improved the performance of these systems, making them increasingly practical for real-world applications. With the rapid pace of progress and the growing adoption of text-to-SQL systems, robust benchmarking has become essential. However, existing benchmarks typically rely on a single correctness metric, lack alignment with real-world query usage patterns, and do not evaluate the scalability of generated queries, which limits their ability to provide realistic and practically meaningful evaluation. This thesis introduces SQLyzr, a comprehensive text-to-SQL benchmark and evaluation framework designed to address these limitations. SQLyzr incorporates a fine-grained taxonomy of SQL queries and reports evaluation results at the level of query categories and subcategories, enabling detailed insights into system performance across different query types. In addition, SQLyzr extends traditional evaluation by introducing complementary metrics that assess not only the correctness but also the efficiency and structural complexity of generated SQL queries. To better reflect real-world usage, SQLyzr aligns the distribution of query categories with empirical SQL workload distributions and supports dataset scaling to enable evaluation on larger databases. Building on these ideas, we also introduce a configurable text-to-SQL benchmarking framework that allows users to customize and extend benchmark components such as workloads, datasets, and evaluation metrics. The framework further provides novel features such as detailed error analysis for identifying incorrect queries with minor issues and workload augmentation for synthesizing additional question-SQL pairs that target weaknesses of a given text-to-SQL system. We use SQLyzr to evaluate two state-of-the-art text-to-SQL systems with similar overall correctness scores. Our results demonstrate that SQLyzr enables clearer comparison between systems and reveals deeper insights into their relative strengths and weaknesses.
  • Item type: Item ,
    State Complexity of Linear Relations and Linear Subsequences of Automatic Sequences
    (University of Waterloo, 2026-04-23) Moradi, Delaram
    In this thesis, we study the state complexity of specific formal languages; for example, we study the number of states required in the minimal automaton reading the representation of two integers $i, j$ in parallel and accepting them if and only $i+c = j$ for some constant integer $c \geq 1$. We also study the state complexity of linear subsequences of automatic sequences; for example, we study the number of states required in the minimal automaton generating the linear subsequence $(h(i+c))_{i \geq 0}$ for some automatic sequences $(h(i))_{i \geq 0}$ and some constant integer $c \geq 1$. Moreover, we study the runtime complexity of generating automata for specific formal languages and linear subsequences of automatic sequences using a reasonable interpretation of B\"uchi arithmetic; for example we study the runtime complexity of creating an automaton reading the representation of two integers $i, j$ in parallel and accepting them if and only if $ni=j$ for some constant $n \geq 2$. We also state some open problems. The above topics are studied both for automata with input in base-$k$ representation for some integer $k \geq 2$ and for automata with input in Fibonacci representation. Most results are for automata reading input in most-significant-digit-first format and some results are stated for automata reading input in least-significant-digit-first format.
  • Item type: Item ,
    Understanding the Impact of Training and Inference Components in Semantic Segmentation
    (University of Waterloo, 2026-04-23) Mahdizadeh Sani, Matina
    Semantic segmentation is one of the core tasks in computer vision, with applications ranging from autonomous driving to medical image analysis. Recent advances have produced increasingly powerful segmentation models based on both convolutional and transformer based architectures. Despite these advances, segmentation performance depends not only on model architecture, but also on a range of design choices made during training and inference. In practice, different components are often adopted from prior work without systematic analysis. As a result, the contribution of individual design choices to final segmentation performance is not understood sufficiently. This thesis presents a systematic experimental analysis of key components in modern semantic segmentation frameworks, focusing on loss functions, inference strategies, and matching techniques. Motivated by the observation that transformer based segmentation models often rely on inference strategies that differ from the classic approaches commonly used in CNN based architectures, this thesis first investigates the effect of transferring inference strategies across model families. Specifically, classic inference methods that are well established for CNNs are applied to a transformer based model, while inference strategies originally developed for transformers are evaluated within a CNN based framework. This analysis aims to determine whether inference strategies are inherently architecture specific or can be interchanged across model classes. Following the same cross architecture perspective, the thesis further explores matching strategies by replacing bipartite matching, commonly used in transformer based models, with fixed matching inspired by CNN based formulations, and also applying bipartite matching to a CNN based model. Building on these analysis, the thesis systematically studies the role of loss functions by evaluating a wide range of loss formulations both in isolation and in combination. This includes examining how different losses interact with inference strategies and how their effectiveness varies across architectures. These experiments disentangle the individual and combined effects of loss design choices on segmentation performance. The study evaluates both a CNN based and a transformer based model, using DeepLabV3+ and Mask2Former as representative frameworks, and conducts experiments on the Cityscapes and Pascal VOC 2012 datasets under controlled settings. The results show that inference strategies exhibit architecture dependent behavior in which classic inference generally performs better for CNN based models, while semantic inference is more effective for transformer based architectures. The experiments further demonstrate that fixed matching can consistently outperform bipartite matching. Additionally, the effect of class level supervision is found to be configuration dependent, providing benefits in some settings but offering limited or inconsistent gains in others. By examining inference, loss, and matching components through systematic cross-architecture transfer, this thesis clarifies which segmentation design choices are architecture-specific and which can be effectively borrowed across model families. The findings provide practical guidance for reusing well performing components from one architecture in another, enabling effective semantic segmentation systems.
  • Item type: Item ,
    On the Statistical Query Complexity of Learning Semiautomata: a Random Walk Approach
    (University of Waterloo, 2026-04-23) Giapitzakis Tzintanos, George
    Semiautomata form a rich class of sequence-processing algorithms with applications in natural language processing, robotics, computational biology, and data mining. We establish the first Statistical Query hardness result for semiautomata under the uniform distribution over input words and initial states. We show that Statistical Query hardness can be established when both the alphabet size and input length are polynomial in the number of states. Unlike the case of deterministic finite automata, where hardness typically arises through the hardness of the language they recognize (e.g., parity), our result is derived solely from the internal state-transition structure of semiautomata. Our analysis reduces the task of distinguishing the final states of two automata to studying the behavior of a random walk on the group S_N x S_N. By applying tools from Fourier analysis and the representation theory of the symmetric group, we obtain tight spectral gap bounds, demonstrating that after a polynomial number of steps in the number of states, distinct semiautomata become nearly uncorrelated, yielding the desired hardness result.
  • Item type: Item ,
    Celebrity Anorexics and Fasting Girls: Purveyors of the Victorian “Culture of Anorexia”
    (University of Waterloo, 2026-04-23) Solylo, Rachel
    This Master’s thesis explores the historical continuity between Victorian fasting girls and 20th and 21st century anorexic celebrities. Both of these groups contributed to the spread of the Victorian “culture of anorexia”. The three main pathways of parallelisms between Victorian fasting girls and anorexic celebrities are seen through their embodying of aesthetic perfection (i.e., most significantly in the extremes of the sick role and waifhood), self-mastery and/or spirituality, and their isolation from normative society. Another major aspect of this Master’s thesis is the exploration of “pro-ana” online communities, and how these online spaces disseminate the culture of anorexia through Victorian fasting girls and anorexic celebrities. Case studies and examples of Victorian fasting girls and anorexic celebrities will be analyzed. I have employed a text analysis when looking at different posts and interactions between users on “pro-ana” online communities, namely Eating Disorder Support Forum (EDSF). The first part of this thesis is concerned with the historical background of anorexia nervosa and anorexia mirabilis, as well as defining “pro-ana”. Next, Victorian fasting girls Sarah Jacob and Mollie Fancher are analyzed and their biographies are compared. The ways in which they portray the “sick role” and become arbiters of Victorian spirituality is studied. The methods in which they were stigmatized by medical professionals and othered by mainstream society are explored. The second part of this thesis looks at the case studies of Karen Carpenter and Kate Moss and their contributions as anorexic celebrities. I provide a disclaimer that Kate Moss has never been confirmed to have anorexia; rather she is accused of this in the media. I explore the ways they embody the “waif” role and how they have been constructed as masters of self-discipline through popular media discourse. The third part of the Master’s thesis builds on the case studies of Victorian fasting girls and anorexic celebrities, and looks at their influence in the spread of the Victorian “culture of anorexia” within “pro-ana” online communities. Search queries and anorexia-related search terms have been entered into the engine of EDSF. Smaller sample sizes from these results have been textually analyzed and their individual meanings are explored in the closing chapters of the thesis. This thesis contributes to the existing research on the Victorian history of anorexia nervosa as well as its historical relevance within the “pro-ana” online sphere.