Computer Science
Permanent URI for this collectionhttps://uwspace.uwaterloo.ca/handle/10012/9930
This is the collection for the University of Waterloo's Cheriton School of Computer Science.
Research outputs are organized by type (eg. Master Thesis, Article, Conference Paper).
Waterloo faculty, students, and staff can contact us or visit the UWSpace guide to learn more about depositing their research.
Browse
Recent Submissions
Item Efficient Algorithms for RDV graphs(University of Waterloo, 2025-05-22) Gokhale, Prashant AbhijitIn this thesis, we study the maximum matching and minimum dominating set problem in RDV graphs, i.e., graphs that are vertex-intersection graphs of downward paths in a rooted tree. A straightforward implementation of these algorithms would require $O(n+m)$ time. We improve their efficiency by transforming the question about the neighborhood of $v$ into a type of range query amid a set of horizontal and vertical line segments. Our algorithms run in $O(n \log{n})$ time, presuming a $O(n)$-sized intersection representation of the graph is given. In addition, our techniques can also be used to obtain faster algorithms for maximum independent set and perfect $k$-clique packing in RDV graphs.Item Puck Possession and Net Traffic Metrics in Ice Hockey(University of Waterloo, 2025-05-22) Pitassi, MilesThis thesis investigates two elements of hockey widely believed to be critical to success: puck possession and traffic (skaters who are in or near the triangular area formed between the puck and the posts during a shot attempt). Our analysis draws on puck and player tracking (PPT) data from the 2023–2024 and 2024–2025 NHL regular seasons. We determine average team puck possession percentage, defined as the average proportion of total game time that each team has possession of the puck. We find that this metric has only a moderate correlation with average goal differential (r=0.56). To further explore how different aspects of possession relate to team success, we compute additional metrics, including Average Offensive Zone Possession Time Differential (Avg. OZPTD). This captures the difference between the time a team spends with possession in the offensive zone and the time their opponents spend with possession in their offensive zone. We find a strong correlation (r=0.77) between Avg. OZPTD and average goal differential. Further analysis shows that Avg. OZPTD is stable across games, effectively distinguishes between teams, and, despite being correlated with existing metrics like Shot Attempt Percentage (SAT%), offers additional predictive value for goal differential. SAT% (also known as Corsi) refers to the percentage of total shot attempts that each team takes. We also study the relationship between the amount of skaters creating traffic during shot attempts and shot outcomes. Our findings show that increased levels of traffic significantly increase the percentage of shot attempts that are blocked and reduce the chance of a shot attempt resulting in a shot on goal. Overall, we find that 29% of all shot attempts are blocked and that the highest goal rates occur from the center of the ice on short-to-mid-range attempts with no traffic present. For long-distance shot attempts that reach the goaltender, scoring probability increases with traffic. We also show that defensive skaters primarily reduce shot-on-goal rates but can inadvertently increase goal likelihood on mid-range shot attempts (23-45 feet) presumably due to screening their own goaltender. Together, the findings in this thesis offer valuable insights into how puck possession contributes to team success and how traffic influences shot outcomes. In addition to these empirical results, we contribute a suite of methodological techniques that can support future analysis of possession and traffic. We present a comprehensive pipeline for cleaning, filtering, and processing individual possession data sourced from the NHL’s puck and player tracking system which is an essential foundation for our findings and a resource for future research. We also describe how we assemble a set of shot events by aligning official NHL API shot data with shot attempts in the PPT data. This involves adapting the NHL’s own inference algorithms to identify undetected shot attempts and applying custom techniques to improve timestamp accuracy.Item What Slows Down FMware Development? An Empirical Study of Developer Challenges and Resolution Times(University of Waterloo, 2025-05-16) Wang, ZitaoFoundation Models (FMs), such as GPT-4, have revolutionized software engineering by enabling the development of FMware — applications and infrastructures built around these powerful models. Despite their transformative potential, FMware solutions face significant challenges in their development, deployment, and maintenance, particularly across cloudbased and on-premise platforms; this is because many of the goals, processes, tools, and technical assets of FMware development are different from those of traditional software systems. This study presents an empirical investigation of the current FMware ecosystem, focusing on three key questions: (1) what topics are most prevalent in issue discussions of FMware systems, (2) what specific challenges are commonly faced by FMware developers, and (3) what kinds of issues in FMware development have the longest resolution times? Our analysis uses data extracted from both GitHub repositories of FMware systems as well as systems hosted on popular FMware platforms such as HuggingFace, GPTStore, Ora, and Poe. Our findings reveal a strong emphasis on education, content creation, and business strategy, alongside critical technical challenges such as memory errors, dependency management, and tokenizer configurations. We further identify bug reports and core functionality issues as the most common problem types on GitHub, and show that topics concerning code review, similarity search, and prompt template design require the longest time to resolve. By uncovering insights into developer practices and pain points, this research highlights opportunities for improving FMware development tools, workflows, and community support. These insights contribute to a deeper understanding of the current FMware landscape and provide actionable recommendations for practitioners and researchers.Item Embedded System Anomaly Detection via Boot Power Trace Analysis(University of Waterloo, 2025-05-14) Qiao, SkyEmbedded systems play a crucial role in safety-critical domains, and it is essential to maintain their integrity. This thesis presents a robust framework for detecting hardware and firmware anomalies in embedded systems through boot-phase power consumption analysis. The proposed Sliding Window Anomaly Detection (SWAD) method establishes a nominal boot power profile and compares new boot traces against this baseline using sliding windows. By analyzing localized power dynamics, SWAD detects deviations caused by hardware or firmware modifications while accommodating natural variations in power behaviour. Experimental validation on single-board computers and flight controllers demonstrates the method’s effectiveness in identifying diverse hardware and firmware attacks, achieving overall F1 scores of 98\%, 96\%, and 85\% across three systems used in the case studies. These results highlight the promising role of power side-channel analysis in enhancing security in complex embedded systems.Item LiDAR-based 3D Perception from Multi-frame Point Clouds for Autonomous Driving(University of Waterloo, 2025-05-13) Huang, Chengjie3D perception is a critical component of autonomous driving systems, where accurately detecting objects and understanding the surrounding environment is essential for safety. Recent advances in Light Detection and Ranging (LiDAR) technology and deep neural network architectures have enabled state-of-the-art (SOTA) methods to achieve high performance in 3D object detection and segmentation tasks. Many approaches leverage the sequential nature of LiDAR data by aggregating multiple consecutive scans to generate dense multi-frame point clouds. However, the challenges and applications of multi-frame point clouds have not been fully explored. This thesis makes three key contributions to advance the understanding and application of multi-frame point clouds in 3D perception tasks. First, we address the limitations of multi-frame point clouds in 3D object detection. Specifically, we observe that increasing the number of aggregated frames has diminishing returns and even performance degradation, due to objects responding differently to the number of aggregated frames. To overcome this performance trade-off, we propose an efficient adaptive method termed Variable Aggregation Detection (VADet). Instead of aggregating the entire scene using a fixed number of frames, VADet performs aggregation per object, with the number of frames determined by an object's observed properties, such as speed and point density. This adaptive approach reduces the inherent trade-offs of fixed aggregation, improving detection accuracy. Next, we tackle the challenge of applying multi-frame point cloud to 3D semantic segmentation. Point-wise prediction on dense multi-frame point clouds can be computationally expensive, especially for SOTA transformer-based architectures. To address this issue, we propose MFSeg, an efficient multi-frame 3D semantic segmentation framework. MFSeg aggregates point cloud sequences at the feature level and regularizes the feature extraction and aggregation process to reduce computational overhead without compromising accuracy. Additionally, by employing a lightweight MLP-based point decoder, MFSeg eliminates the need to upsample redundant points from past frames, further improving efficiency. Finally, we explore the use of multi-frame point clouds for cross-sensor domain adaptation. Based on the observation that multi-frame point clouds can weaken the distinct LiDAR scan patterns for stationary objects, we propose Stationary Object Aggregation Pseudo-labelling (SOAP) to generate high quality pseudo-labels for 3D object detection in a target domain. In contrast to the current SOTA in-domain practice of aggregating few input frames, SOAP utilizes entire sequences of point clouds to effectively reduce the sensor domain gap.Item Finding Behavioural Biometrics Scripts on the Web Using Dynamic Taint Analysis(University of Waterloo, 2025-05-13) Bara, AlexandruIn an era of escalating cyber threats, behavioural biometrics have emerged as a transformative security mechanism, leveraging user interaction patterns like keystrokes and mouse movements for continuous authentication on the web. However, detecting these scripts at scale remains challenging due to obfuscation, dynamic execution, and overlap with analytics tools. This thesis addresses these challenges through three interconnected contributions: (1) enhancing FoxHound, a dynamic taint analysis tool, to achieve 97% effectiveness in tracking behavioural biometric data flows; (2) developing the first open-source checkout crawler to navigate e-commerce workflows with upwards of 78% accuracy; and (3) creating a machine learning classifier to distinguish behavioural biometric scripts from other tracking scripts. Large-scale analyses reveal that behavioural biometric scripts are deployed on 0.3% of top websites, with significantly higher adoption on sensitive pages (4.55% of banking logins). The work concludes with ethical recommendations to balance security benefits with privacy risks, advocating for transparency, deobfuscation, and regulatory oversight.Item Program Reduction: Versatility, Insights, and Efficacy(University of Waterloo, 2025-05-08) Zhang, MengxiaoGiven a program P and a property ψ it preserves, the goal of program reduction is to minimize this program while ensuring that the minimized version Pmin still preserves ψ. Program reduction is a widely used technique to facilitate compiler debugging, as most compiler bugs are triggered by program inputs, which often contain a significant amount of code unrelated to the bug. Although program reduction automates eliminating such bug-irrelevant components from a program, it can take hours to complete due to its trial-and-error strategy. Furthermore, even if the reduction process produces a smaller program after a long wait, the resulting program is not guaranteed to be optimal, as program reduction is an NP-complete problem. As a result, compiler developers consistently seek program reduction approaches that offer faster reduction speeds or produce smaller outputs. It is critical to design superior program reduction approaches to further explore the potential of this field. This thesis aims to help enhance program reduction approaches in the following three ways. The first study aims to enhance the versatility of program reduction approaches. While existing techniques, such as C-Reduce and Perses, can effectively reduce a bug-triggering program as a whole, they fail to consider the varied degrees of relevance each remaining element has to the bug. To address this limitation, this study proposes PPR, a new program reduction technique designed to minimize a pair of programs w.r.t. certain properties. Given a seed program Ps and its variant Pv with differing properties (e.g., Pv crashes a compiler while Ps does not), PPR produces a pair of minimized programs that preserve these properties separately, with reduced differences highlighting bug-related elements. Evaluation results demonstrate that PPR effectively reduces pairs of programs, producing outputs of comparably small size to those generated by Perses and C-Reduce. In addition, PPR significantly outperforms the Delta Debugging algorithm in isolating bug-related differences. The second study concentrates on the understanding of a state-of-the-art program reduction approach ProbDD. ProbDD, a variant of ddmin, employs a probabilistic model to estimate the likelihood of each element being relevant to ψ, achieving superior performance compared to the Delta Debugging algorithm. However, the theoretical probabilistic model underlying ProbDD is intricate and not fully understood. To address this, the study conducts the first in-depth theoretical analysis of ProbDD, comprehending and demystifying this state-of-the-art approach from multiple perspectives. Building upon these insights, the study further proposes CDD, a simplified counter-based model that retains the core strengths of ProbDD. The evaluation demonstrates that CDD can achieve the same performance as ProbDD, despite being much simplified. The third study integrates large language models (LLMs) to improve the efficacy of program reduction. Existing program reduction techniques typically fall into two categories: generic approaches applicable to a wide range of languages but lacking domain-specific knowledge, and language-specific approaches that exploit detailed language knowledge but fail to generalize across multiple languages. However, effectively combining both language generality and language-specific expertise in program reduction is yet to be explored. To this end, this chapter proposes LPR, the first LLM-aided technique leveraging LLMs to perform language-specific program reduction for multiple languages. The key insight is to synergize the language generality of existing program reducers, such as Perses, with the language-specific semantics learned by LLMs. The evaluation shows that LPR surpasses Vulcan by producing 24.93%, 4.47%, and 11.71% smaller programs, while reducing the reduction time by 10.77%, 34.88%, 36.96% on benchmarks in C, Rust and JavaScript, separately. Collectively, these studies advance the field of program reduction by enhancing its versatility, insights, and efficacy. By making reduction more effective, faster, and easier to understand, this thesis facilitates efficient debugging and robust compiler development.Item Paidian Playful Interaction in Non-game User Interfaces(University of Waterloo, 2025-05-07) Lakier, MatthewI investigate "paidian" play in the context of non-game software user interfaces. Paidian play means activities that are open-ended, exploratory, and free-form. In an initial investigation, I characterize 16 types of playful experiences, 7 characteristics of play in software, and guidelines for the role of play in user interfaces, based on a qualitative analysis of a series of surveys and a brainstorming session with experts. As a case study of inspiring design with paidian play, a second investigation focuses on Easter eggs (hidden features in software applications), an interface feature associated with one of the playful experience types. I analyze source code repositories of open-source software containing Easter eggs, scrape online Easter egg databases, and interview developers of well-known software containing Easter eggs, to characterize 14 different Easter eggs purposes. Results show that Easter eggs provide significant value to developers and users, for example, by enabling recruitment of new developers and teaching users transferable knowledge and skills. I also propose implications for how Easter eggs could be applied in new ways, such as providing educational value. In a third investigation, as a design case study for paidian play, I define and implement "digital knick-knacks" as a form of playful digital possession (e.g., virtual pet). I deploy three exemplar designs in a diary study, in which participants customize and install a digital knick-knack on a personal device. The investigation reveals implications for how playful digital knick-knacks can bring joy and even support mental health. Taken together, the investigations show how user interfaces can be designed to provide social and emotional value to users through paidian play, including in workplace contexts.Item Optimizing ORAM Datastores for Scalability, Fault Tolerance, and Performance(University of Waterloo, 2025-05-01) Setayesh, AminOblivious RAM (ORAM) mitigates access pattern attacks, where adversaries infer sensitive data by observing access patterns. These attacks can compromise privacy even when data is encrypted. While ORAM ensures privacy by obfuscating these patterns, its adoption in cloud environments faces significant challenges, particularly related to scalability, fault tolerance, and performance. This thesis presents Treebeard: an ORAM-based datastore that addresses these challenges through a novel multi-layer architecture. Unlike traditional ORAM systems that rely on a centralized proxy to manage data access and security, this design separates responsibilities across specialized layers that are independently scalable. Each layer handles distinct functionalities and efficiently batches and processes requests. Treebeard facilitates horizontal scaling, and adds fault tolerance by eliminating single points of failure. Experiments show that Treebeard is scalable, highly performant, and fault-tolerant. Treebeard outperforms existing ORAM systems in terms of throughput while simultaneously addressing scalability and fault tolerance in its design.Item Latra: A Template-Based Language-Agnostic Transformation Framework for Program Reduction(University of Waterloo, 2025-04-29) Wang, YiranEssential for debugging compilers and interpreters, existing reduction tools face a fundamental trade-off. Language-specific reducers, such as C-Reduce and ddSMT, offer highly effective reductions but require substantial engineering effort for each target language. Conversely, language-agnostic reducers, like Vulcan, sacrifice effectiveness for broad applicability. To bridge this gap, we present Latra, a novel template-based framework that balances both aspects, enabling general, effective, targeted program reduction. Latra combines language-agnostic reduction with user-defined, language-specific transformations. It facilitates user-defined transforms through a user-friendly domain-specific language based on simple matching and rewriting templates. This minimizes the need for deep formal grammar knowledge. Latra empowers users to tailor reductions to specific languages with reduced implementation overhead. Evaluation shows that Latra significantly outperforms Vulcan. It reduces 33.77% more tokens in C and 9.17% more tokens in SMT-LIB, with 32.27% faster execution in SMT-LIB. Notably, Latra closely matches the effectiveness of language-specific reducers C-Reduce and ddSMT (89 vs. 85, 103 vs. 109 tokens), while significantly reducing engineering effort (167 vs. 5,508, 62 vs. 118 lines of code). We strongly believe that Latra provides a practical and cost-efficient approach to program reduction, effectively balancing language-specific effectiveness with language-agnostic generality.Item Characterizing Recheck Builds in Continuous Integration(University of Waterloo, 2025-04-29) Brus, YelizavetaContinuous Integration (CI) is a critical component of modern software development, automating the process of downloading, compiling, and testing patch sets. Unsurprisingly, it periodically fails due to non-deterministic (a.k.a., “flaky”) behavior. Since a patch set may not be the cause of a flaky failure, developers may issue a “recheck” command to request the CI to re-execute a build request on that patch set. While necessary in some cases, prior work shows that rechecks are often issued carelessly, leading to substantial waste. In the OpenStack community alone, an estimated 187.4 compute years have been wasted on rechecks that resulted in repeated failures. As software development scales, optimizing CI efficiency by reducing wasteful rechecks is essential for improving resource utilization, reducing operational costs, and enhancing developer productivity. To mitigate unnecessary rechecks, I fit and analyze statistical models that discriminate between recheck requests where a failing outcome will (a) change to passing (i.e., successful rechecks) or (b) continue to fail (i.e., failed rechecks). My empirical study is based on 314,947 recheck requests collected from OpenStack over a 10-year period. I extract and analyze features related to bot behavior, job outcomes, user activity, patch characteristics, and the timing of rechecks. Using logistic regression with restricted cubic splines, I model nonlinear relationships and evaluate predictive performance using AUROC, AUPRC, and Brier score. My model achieves an AUROC of 0.736, outperforming baseline approaches by 23.6 percentage points (or 47.2%), while maintaining strong calibration with a Brier score of 0.191. The model also produces an AUPRC of 0.604, exceeding the baseline performance by 73%. If the requests that my model identifies as failed rechecks were skipped, 86.49% of wasted recheck requests could have been avoided, saving substantial CI resources. Our analysis reveals that the historical success rates of jobs, bots, and users are the strongest predictors of recheck outcomes. Feature importance analysis shows that the “jobs success ratio” and “bots success ratio” together account for 50% of the explanatory power, indicating that certain jobs and bots are more prone to triggering unnecessary rechecks. In contrast, static patch characteristics, such as the number of modified lines or affected files, contribute minimally to predictive performance. I also investigate the impact of different time windows for feature computation, finding that model performance remains stable regardless of the amount of historical data used. AUROC varies by only 0.59% between a one-day and an all-time window, indicating that both short-term and long-term data can be effectively used to predict recheck outcomes. Guided by the analysis, I suggest practical steps to improve CI efficiency. Since authors have little control over the past behaviour characteristics of their patch sets, the feedback of the model may be disheartening for individual contributors who wish to avoid issuing useless rechecks. Instead of applying model insights at the patch set level, my findings indicate that misbehaving bots could be throttled to limit excessive rechecks, unreliable jobs could have their voting power revoked, and automated feedback mechanisms could be introduced to discourage wasteful recheck requests. Additionally, organizations could integrate model-driven insights into CI dashboards, allowing teams to monitor inefficiencies and take corrective action in real time. By focusing on process-level improvements rather than individual developer actions, CI workflows can be optimized to reduce resource waste, accelerate build turnaround times, and enhance the overall developer experience.Item What Do You Mean? Using Large Language Models for Semantic Evaluation of NL2SQL Queries(University of Waterloo, 2025-04-23) Noor, HarrumWhile significant research focuses on improving natural language to SQL (NL2SQL) translation, the evaluation of generated queries remains an understudied problem. Current metrics: Exact Match (EM) and Execution Accuracy (EA), fail to address critical nuances. EM prioritizes syntactic fidelity over semantic equivalence, penalizing valid query variations, while EA’s reliance on execution results introduces false positives when incorrect SQL coincidentally returns correct output due to dataset characteristics. These limitations make existing frameworks inadequate for real-world deployment, where robustness and intent preservation are of key importance. The advent of large language models (LLMs) offers new opportunities to improve these evaluation methodologies. We propose a hybrid framework that integrates execution based validation with LLM-driven semantic analysis to address these gaps. Our pipeline validates candidate SQL queries through EA, and applies Qwen 2.5 Coder, a 1.5B-parameter model optimized for code generation, to perform semantic equivalence checks. This two-fold process eliminates false positives by detecting logical inconsistencies (e.g., incorrect JOIN conditions or aliases) that EA overlooks. Crucially, Qwen 2.5 operates without database connectivity or schema linking, enabling schema-agnostic evaluation through cross-attention between natural language questions and SQL. Experiments on a combined dataset from Spider and other sources demonstrate that combining EA with Qwen 2.5’s generative reasoning achieves 94% accuracy. The framework identifies 100% of EA’s false positives. Ablation studies reveal that encoder models like Microsoft’s codeBERT perform best on simple SELECT-WHERE queries (94% F1), whereas Qwen 2.5 maintains 90% consistency across all complexity levels. This work establishes a new paradigm for NL2SQL evaluation.Item NaviX: A Native Vector Index Design for Graph DBMSs With Robust Predicate-Agnostic Search Performance(University of Waterloo, 2025-04-17) Sehgal, GauravThere is an increasing demand for extending existing DBMSs with vector indices to become unified systems that can support modern predictive applications, which require joint querying of vector embeddings and structured properties and connections of objects. We present NaviX, a Native vector indeX for graph DBMSs (GDBMSs) that has two main design goals. First, we aim to implement a disk-based vector index that leverages the core storage and query processing capabilities of the underlying GDBMS. To this end, NaviX is a hiearchical navigable small world (HNSW) index, which is itself a graph-based structure. Second, we aim to evaluate predicate-agnostic vector search queries, where the k nearest neighbors (kNNs) of a query vector vQ is searched across an arbitrary subset S of vectors that is specified by an ad-hoc selection sub-query QS. We adopt a prefiltering based approach that evaluates QS first and passes the full information about S to the kNN search operator. We study how to design a pre-filtering-based search algorithm that is robust under different selectivities as well as correlations of S with vQ. We propose an adaptive algorithm that utilizes local selectivity of each vector in the HNSW graph to pick a suitable heuristic at each iteration of the kNN search algorithm. We demonstrate NaviX’s robustness and efficiency through extensive experiments against both existing prefiltering- and postfiltering-based baselines that include specialized vector databases as well as DBMSs.Item MIRAGE-ANNS: Mixed Approach Graph-based Indexing for Approximate Nearest Neighbour Search(University of Waterloo, 2025-04-14) Voruganti, SairajApproximate nearest neighbor search (ANNS) on high dimensional vectors is important for numerous applications, such as search engines, recommendation systems, and more recently, large language models (LLMs), where Retrieval Augmented Generation (RAG) is used to add context to an LLM query. Graph-based indexes built on these vectors have been shown to perform best but have challenges. These indexes can either employ refinement-based construction strategies such as K-Graph and NSG, or increment-based strategies such as HNSW. Refinement-based approaches have fast construction times, but worse search performance and do not allow for incremental inserts, requiring a full reconstruction each time new vectors are added to the index. Increment-based approaches have good search performance and allow for incremental inserts, but suffer from slow construction. This work presents MIRAGE-ANNS (Mixed Incremental Refinement Approach Graph-based Exploration for Approximate Nearest Neighbor Search) that constructs the index as fast as refinement-based approaches while retaining search performance comparable or better than increment-based ones. It also allows incremental inserts. We show that MIRAGE achieves state of the art construction and query performance, outperforming existing methods by up to 2x query throughput on real-world datasets.Item A Modular Framework for the Plausible Depiction of Starburst Patterns(University of Waterloo, 2025-04-11) Sun, YuxiangWhen human viewers look at an intense light source, they can observe a set of dense spikes radiating from its center. The observed spike patterns result from a complex optical process known as the starburst phenomenon. These patterns have been frequently depicted in virtual applications (e.g., films and video games) to enhance the perception of brightness. From a broader scientific perspective, they also have relevant real life implications such as the impairing of driving safety. This may occur when an observed starburst pattern saturates a driver’s field of view. Previous computer graphics works on the simulation of the starburst phenomenon have primarily relied on the assumption that the resulting patterns are formed by light being obscured and diffracted by particles in the observer’s eyeball during the focusing process. However, the key role played by background luminance on the perception of starburst patterns has been consistently overlooked. By also taking into account this pivotal factor as well as the different physiological characteristics of the human photoreceptor cells, we propose a modular framework capable of producing plausible visual depictions of starburst patterns for daytime and nighttime scenes. To enhance the physical correctness and resolution of the simulated patterns, its formulation was guided by the Rayleigh-Sommerfeld diffraction theory and implemented using the Chirp Z Transform, respectively. We also introduce a biophysically-inspired algorithm to enable the seamless incorporation of the resulting patterns onto computer-rendered scenes. Besides examining theoretical and practical aspects associated with the use of the proposed framework in realistic image synthesis, we also derive biophysical insights that may contribute to the current knowledge about the inner workings of the starburst phenomenon.Item Bridging Technology and Therapy: Assessing the Quality and Analyzing the Impact of Human Editing on AI-Generated SOAP Notes in Pediatric Rehabilitation(University of Waterloo, 2025-04-07) Amenyo, SolomonThis thesis explores the integration of artificial intelligence (AI) into clinical documentation, focusing on evaluating AI-generated SOAP (Subjective, Objective, Assessment, and Plan) notes in pediatric rehabilitation settings. AI-powered tools, such as Microsoft's Copilot and the University of Waterloo-developed KAUWbot, offer potential efficiencies by automating aspects of clinical documentation. However, their quality, reliability, and applicability to clinical practice have remained largely unexamined. The research aims to assess and compare the quality of human-authored SOAP notes and AI-generated notes across five key criteria: clarity, completeness, conciseness, relevance, and organization. A dataset of 432 SOAP notes, divided into four pools, was evaluated using a custom rubric. The pools included human-authored notes, Copilot-generated notes edited by occupational therapists, unedited KAUWbot-generated notes, and KAUWbot-generated notes edited by occupational therapists. A rigorous anonymization process ensured evaluator impartiality. Findings indicate that AI-generated notes, particularly when edited by clinicians, achieve comparable or superior quality to human-authored notes. After editing, notes generated by KAUWbot, a model fine-tuned on pediatric occupational therapy data, exhibited notable improvements in relevance and organization. Statistical analyses demonstrated some differences among note pools, with edited AI-generated notes consistently receiving the highest ratings. This highlights the importance of human oversight in enhancing AI output and tailoring it to specific clinical needs. The research shows the potential of AI to augment clinical documentation processes, reduce clinician workload, and improve documentation quality. However, it also emphasizes the necessity of human-AI collaboration and robust training to mitigate limitations such as contextual inaccuracies and misclassifications. These findings provide a foundation for future research and practical recommendations for integrating AI into healthcare documentation workflows.Item Application Persistence, Performance, and Deployment in a UNIX-Compatible Single Level Store Operating System(University of Waterloo, 2025-04-01) Tsalapatis, AimiliosThis thesis presents the Aurora single level store, an OS design that uses continuous checkpointing for application persistence and deployment. Aurora provides submillisecond application checkpoint and restore operations to efficiently turn applications into on-disk images and back. Fast checkpointing/restore as an OS service also serves as a foundation for further research into open problems like efficient persistence APIs for memory-mapped data and serverless computing. Aurora’s single level store-based persistence has recently become practical because of advances in hardware and file system technology. Modern SSD storage devices have low latency at 10µs, allowing us to persist application checkpoints to the disk with minimal latency overhead. Modern CPUs also have IO throughput that rivals that of their memory bandwidth, making it possible to continuously checkpoint and forward in-memory application state to the disk. This thesis describes three systems that demonstrate the efficiency and flexibility of the single level store paradigm. We first present Aurora (SOSP 2021), an OS design capable of continuous application checkpointing at a fast enough granularity to provide transparent persistence. We follow up with MemSnap (ASPLOS 2024), an OS single level store API and associated virtual memory mechanism. MemSnap persists application data, e.g., database data, more efficiently than the file API. Finally, we present Metropolis, a serverless invoker that uses the single level store paradigm to create serverless function instances at submillisecond latency.Item Symbols, Dynamics, and Maps: A Neurosymbolic Approach to Spatial Cognition(University of Waterloo, 2025-03-12) Dumont, Nicole Sandra-Yaffa; Eliasmith, Chris; Orchard, JeffThe discovery of various spatially sensitive neurons in the hippocampal formation, such as place, grid, and boundary cells, has provided valuable insights into the neural mechanisms underlying spatial representation and navigation. However, neural activity and connectivity data alone cannot fully reveal the brain’s algorithms. Bridging this gap requires computational models that not only explain the low-level activity of spatially sensitive cells but also link it to higher-level symbolic representations manipulable within a cognitive framework – models capable of binding spatial representations to discrete abstractions, while also supporting hierarchical and probabilistic structures that enable reasoning and decision-making. The Semantic Pointer Architecture (SPA; Eliasmith, 2013), in combination with the Neural Engineering Framework (NEF; Eliasmith et al., 2003), provides a mathematical and computational framework to represent symbols and implement dynamical systems in spiking neural networks. Spatial Semantic Pointers (SSPs; Komer et al., 2019), an extension to the SPA, encode continuous variables, such as spatial locations, while supporting the binding of spatial information with other features – continuous or discrete – into compressed, multi-domain representations. This flexibility allows SSPs to model diverse cognitive processes, ranging from spatial memory to abstract reasoning, offering a unified theory for how continuous variables might be represented and manipulated in the brain. In this thesis, we leverage these tools to model key components of spatial cognition, including path integration, cognitive map creation, and reinforcement learning. Our contributions include the development of SSP-PI, a SSP-based path integration model that combines velocity controlled oscillators with attractor dynamics to integrate continuous spatial variables. We also introduce SSP-SLAM, a biologically inspired spiking neural SLAM system capable of constructing semantic cognitive maps that bind and associate spatial and non spatial features. Furthermore, we propose spiking RL models that demonstrate how SSP embeddings can effectively represent successor features, reward distributions, and stochastic policies. Finally, we use the SPA and SSPs to construct state embeddings for deep RL networks, demonstrating their utility in tasks requiring mixed semantic-spatial representations. Our findings underscore the potential of SSPs to act as a unifying framework for understanding spatial representation in the brain while advancing biologically inspired approaches to navigation and learning in artificial systems. This work bridges theoretical neuroscience and artificial intelligence, laying the groundwork for future explorations of shared principles across spatial and abstract cognition.Item Perspectives of Graph Diffusion: Computation, Local Partitioning, Statistical Recovery, and Applications(University of Waterloo, 2025-03-06) Yang, Shenghao; Fountoulakis, KimonDiffusion describes the process of mass moving from one region to another. In the con- text of graph, the diffusing mass spreads from nodes to nodes along the edges of the graph. Broadly speaking, this includes a number of stochastic and deterministic processes such as random walk, heat diffusion, network flow and electrical flow on graphs. Graph diffusion is a highly important primitive, and has been shown to have a variety of surprising properties both theoretically and practically. In this thesis, we present several new perspectives of graph diffusion, with an emphasis on how diffusion algorithms uncover the local clustering structure of the input data without necessarily exploring the entire graph. In the first two parts of the thesis, we introduce a new class of graph diffusion methods that are provably better at extracting the local clustering structure of a graph or a hy- pergraph. Here, diffusion is formulated as a pair of primal and dual convex optimization problems, based on the idea of spreading mass in the graph while minimizing a p-norm net- work flow cost. The primal solution of the diffusion problem provides an intuitive physical interpretation where paint (i.e. mass) spills from the source nodes, spreads over the graph, and there is a sink at each node where up to a certain amount of paint can settle. The dual solution embeds the nodes on the non-negative real line and is considered as the output of diffusion. We will show that the dual variables nicely encode the local clustering structure around a given set of seed nodes. In particular, assume the existence of a cluster C of low conductance Φ(C), the sweep cut procedure on the dual variables returns a cluster whose conductance is not too much larger than Φ(C). In the next two parts of the thesis, we introduce a weighted diffusion mechanism which allows any existing diffusion method to take into account additional node information such as node attributes and labels. The method weighs the edges of the graph based on the attributes or the labels of each node. Depending on the nature and availability of additional node information, two simple yet effective edge-weighting schemes are introduced and analyzed. Over contextual random graphs generated by a local variant of the stochastic block model with noisy node information, we will show that, if the additional information contains enough signal about the ground-truth cluster, then employing existing diffusion algorithms in the weighted graph can more accurately recover the ground-truth cluster than employing diffusion in the original graph without edge weights. In particular, statistical recovery guarantees in terms of precision and F1 score will be derived and compared. All of the results are supplemented with extensive experiments on both synthetic and real-world data to illustrate the technical results and the effectiveness of the new methods in practice. The code is open-source on GitHub.Item Operating Systems are a Service(University of Waterloo, 2025-03-05) Hancock, Kenneth; Mashtizadeh, AliOS containers have set the standard for the deployment of applications in modern systems. OS containers are combined sandboxes/manifests of applications that isolate the running applications and its dependencies from other applications running on top of the same kernel. Containers make it easy to provide multi-tenancy and control over the application, making it ideal for use within cloud architectures such as serverless. This thesis explores and develops novel systems to address three problems faced by containers and the services that use them. First, OS containers currently lack a fast checkpoint-restore mechanism. Second, container security is still inadequate due to its underlying security mechanisms, which provide coarse-grained policies that are abused. Third, the lack of a benchmark for serverless clouds, one of the largest consumers of containers, and specifically checkpoint-restore. This thesis outlines solutions to these problems. First, ObjSnap, a storage system designed and built for two modern single-level store systems, Aurora and MemSnap, which enable checkpoint restore for container systems. ObjSnap is a transactional copy-on-write object store that can outperform other storage systems by up to 4×. Second, we introduce SlimSys, a framework that tackles security issues found within containers by binding a policy to kernel resources. Lastly, we introduce Orcbench, the first benchmark used to evaluate serverless orchestrators.