Computer Science
Permanent URI for this collectionhttps://uwspace.uwaterloo.ca/handle/10012/9930
This is the collection for the University of Waterloo's Cheriton School of Computer Science.
Research outputs are organized by type (eg. Master Thesis, Article, Conference Paper).
Waterloo faculty, students, and staff can contact us or visit the UWSpace guide to learn more about depositing their research.
Browse
Recent Submissions
Item The Role of Modularization in Minimizing Vulnerability Propagation and Enhancing SCA Precision(University of Waterloo, 2024-10-24) Abdollahpour, Mohammad MahdiIn today’s software development landscape, the use of third-party libraries is near-ubiquitous; leveraging third-party libraries can significantly accelerate development, allowing teams to implement complex functionalities without reinventing the wheel. However, one significant cost of reusing code is security vulnerabilities. Vulnerabilities in third-party libraries have allowed attackers to breach databases, conduct identity theft, steal sensitive user data, and launch mass phishing campaigns. Notorious examples of vulnerabilities in libraries from the past few years include log4shell, solarwinds, event-stream, lodash, and equifax. Existing software composition analysis (SCA) tools track the propagation of vulnerabilities from libraries through dependencies to downstream clients and alert those clients. Due to their design, many existing tools are highly imprecise—they create alerts for clients even when the flagged vulnerabilities are not exploitable. Library developers occasionally release new versions of their software with refactorings that improve modularity. In this work, we explore the impacts of modularity improvements on vulnerability detection. In addition to generally improving the nonfunctional properties of the code, refactoring also has several security-related beneficial side effects: (1) it improves the precision of existing (fast and stable) SCAs; and (2) it protects from vulnerabilities that are exploitable when the vulnerable code is present and not even reachable, as in gadget chain attacks. Our primary contribution is thus to quantify, using a novel simulation-based counterfactual vulnerability analysis, two main ways that improved modularity can boost security. We propose a modularization method using a DAG partitioning algorithm, and statically measure properties of systems that we (synthetically) modularize. In our experiments, we find that modularization can improve precision of Software Composition Analysis (SCA) tools to 71%, up from 35%. Furthermore, migrating to modularized libraries results in 78% of clients no longer being vulnerable to attacks referencing inactive dependencies. We further verify that the results of our modularization reflect the structures that are already implicit in the projects (but for which no modularity boundaries are enforced).Item Query Complexity of Recursively Composed Functions(University of Waterloo, 2024-10-21) Al-Dhalaan, BandarIn this work, we explore two well-studied notions of randomized query complexity; bounded-error randomized ($\R(f)$), and zero-error randomized ($\R_0(f)$). These have their natural analogues from the classical model of computation, $\R$ corresponding to BPP or ``Monte Carlo" algorithms and $\R_0$ to ZPP or ``Las Vegas" algorithms. For a query complexity measure $M$, one can define the composition limit of $M$ on $f$ by $M^*(f) = \lim_{k \to \infty} \sqrt[k]{M(f^k)}$. The composition limit is a useful way to understand the asymptotic complexity of a function with respect to a specific measure (e.g. if $M(f) = O(1)M(g)$, then $M^*(f) = M^*(g)$). We show that under the composition limit, Las Vegas algorithms can be reduced to Monte Carlo algorithms in the query complexity world. Specifically, $\R_0^*(f) = \max(\C^*(f), \R^*(f))$ for all possibly-partial boolean functions $f$. This has wide-reaching implications for the classical query complexity of boolean functions that are still open. For example, this result implies that any bounded-error algorithm for recursive 3-majority can be converted into a zero-error algorithm with no additional cost (i.e. $R^*(\text{3-MAJ}) = R_0^*(\text{3-MAJ})$. Furthermore, we explore one possible generalization of the recursive 3-majority problem itself, by analyzing 3-majority as a special case of a combinatorial game we call Denial Nim.Item Automated Generation of Dynamic Occlusion-Caused Collisions(University of Waterloo, 2024-10-17) Dykhne, Eli-HenryDynamic occlusions (occlusions caused by other moving road objects) pose some of the most difficult challenges for autonomous driving systems (ADS). Validating the robustness of ADSs to safety critical dynamic occlusions is a difficult task due to the rarity of such scenarios in recorded driving logs. We provide a novel typology of dynamic occlusion scenarios involving vehicles, as well as a framework for ADS safety validation in the presence of dynamic occlusions. Our framework allows for the generation of a diverse set of dynamic occlusion-caused collisions (OCCs) across a wide variety of intersections. We provide results demonstrating that our technique achieves higher generation efficiency and diversity of OCCs than prior works, while being applicable to a wide range of intersections. In this work, we present our generation technique and provide a detailed analysis of the variety and quality of our generated scenarios, as measured by typology coverage, scenario severity, and robustness to faster reaction times.Item Evaluating Container-based and WebAssembly-based Serverless Platforms(University of Waterloo, 2024-10-04) Monum, AbdulServerless computing, often also referred to as Function-as-a-Service (FaaS), allows de- velopers to write scalable event-driven applications while the cloud provider manages the burden of provisioning and maintaining compute resources. Serverless computing is en- abled using virtualized sandboxes like containers or lightweight virtual machines that form the execution units for FaaS applications. However, applications suffer from expensive startup latency (cold starts) due to the compulsory overhead of creating a sandbox and initializing the application code and its dependencies. FaaS platforms keep function ex- ecutors warm in memory to avoid this latency which incurs additional memory overhead on the system. Recently, WebAssembly (Wasm) has emerged as a promising alternative for FaaS applications with its lightweight sandboxing, providing negligible startup delays and reduced memory footprint. However, Wasm applications experience slower execution speeds compared to native execution. This thesis presents a performance evaluation of WebAssembly-based serverless computing in comparison with container-based serverless platforms using analytical performance models and its experimental evaluation. The per- formance model for container-based serverless platforms is used from existing literature, reflecting the behavior of commercial platforms like AWS Lambda, IBM Cloud Functions, and Azure Functions. For WebAssembly-based serverless platforms, this thesis proposes a new performance model based on queueing systems. These models are verified exper- imentally using open-source platforms: Apache OpenWhisk for containers and Spin for WebAssembly. A suite of representative serverless applications is used to validate the models. The comparison of the performance models with experimental results highlights the trade-offs between container-based and WebAssembly-based serverless platforms, pro- viding insights into their respective efficiencies in handling serverless workloads.Item Inherent Limitations of Dimensions for Characterizing Learnability(University of Waterloo, 2024-09-24) Lechner, Tosca KerstinThe fundamental theorem of statistical learning establishes the equivalence between various notions of both agnostic and realizable Probably Approximately Correct (PAC) learnability for binary classification with the finiteness of the VC dimension. Motivated by these foundational results, this work explores whether similar characterizations of learnability and sample complexity can be defined through dimensions in other learning models. We introduce the concept of a scale-invariant dimension and demonstrate that for a range of learning tasks, including distribution learning, any such dimension fails to capture learnability. Additionally, we define Computable PAC (CPAC) learning and show that not all classes are properly computably PAC learnable, highlighting a significant limitation in classical PAC frameworks. Our analysis further reveals that, unlike binary classification, realizable learning does not always imply agnostic learning in settings such as distribution learning and CPAC learning. Finally, we address learning under conditions of data corruption, whether from adversaries or self-manipulating agents, and show that the extent of prior knowledge about manipulation capabilities can significantly affect learnability. We address various ways in overcoming uncertainty in manipulation capabilities by either learning manipulation capabilities from distribution shifts or further oracle access or by allowing abstentions. We also show that this uncertainty in manipulation capabilities can sometimes be overcome without additional oracle access in the realizable case while the agnostic case requires such resources. These findings further underscore that in order to characterize agnostic learnability it is not always sufficient to understand the realizable case.Item BugLLM: Explainable Bug Localization through LLMs(University of Waterloo, 2024-09-24) Subramanian, Vikram NachiappanBug localization is the process of identifying the files in a codebase that contain a bug based on a bug report. This thesis presents BugLLM, a novel zero-shot bug localization method leveraging Large Language Models (LLMs) and semantic search techniques. BugLLM comprises two main phases: ingestion and inference. In the ingestion phase, the codebase is chunked using an Abstract Syntax Tree (AST) parser, embedded using OpenAI's Ada V2 model and indexed in a Milvus vector database for efficient querying. In the inference phase, a query is built from the bug report using an LLM to filter out non-technical details. This refined query is then used to search the vector database, retrieving semantically similar code chunks. These chunks undergo further filtering using another LLM query to establish their relevance to the bug, ensuring only the most pertinent chunks are considered. Our method was evaluated on a dataset that includes bugs from six large Java projects. The evaluation metrics used include top-5 accuracy, where BugLLM achieved a top-5 accuracy ranging from 44.7% to 61.1%. BugLLM's performance was competitive, often surpassing traditional methods, and demonstrated efficiency with no training required. To further aid developers, BugLLM also generates explanations for why specific files are relevant to a bug. The motivation behind this is twofold: helping developers understand why a file is important to fixing a bug and increasing transparency about how our tool works. Our methodology employs Chain-of-Thought prompting to generate detailed explanations from LLMs. These explanations are evaluated based on technical accuracy, groundedness, and informativeness. We find that the explanations generated by BugLLM are largely accurate and grounded in the actual content and context of the code, with minimal hallucination. The explanations were also found to be informative, providing valuable insights to developers. The mean scores (out of 5) for technical accuracy, groundedness, and informativeness were 3.9, 4.5, and 4.3, respectively, across different prompting techniques.Item Studying Practical Challenges of Automated Code Review Suggestions(University of Waterloo, 2024-09-24) Kazemi, FarshadCode review is a critical step in software development, focusing on systematic source code inspection. It identifies potential defects and enhances code quality, maintainability, and knowledge sharing among developers. Despite its benefits, it is time-consuming and error-prone. Therefore, approaches such as Code Reviewer Recommendation (CRR) have been proposed to streamline the process. However, when deployed in real-world scenarios, they often fail to account for various complexities, making them impractical or even harmful. This thesis aims to identify and address challenges at various stages of the code review process: validity of recommendations, quality of the recommended reviewers, and the necessity and usefulness of CRR approaches considering emerging alternative automation. We approach these challenges in three empirical studies presented in three chapters of this thesis. First, we empirically explore the validity of the recommended reviewers by measuring the rate of stale reviewers, i.e., those who no longer contribute to the project. We observe that stale recommendations account for a considerable portion of the suggestions provided by CRR approaches, accounting for up to 33.33% of the recommendations with a median share of 8.30% of all the recommendations. Based on our analysis, we suggest separating the reviewer contribution recency from the other factors used by the CRR objective function. The proposed filter reduces the staleness of recommendations, i.e., the Staleness Reduction Ratio (SRR) improves between 21.44%–92.39%. While the first study assesses the validity of the recommendations, it does not measure their quality or potential unintended impacts. Therefore, we next probe the potential unintended consequences of assigning recommended reviewers. To this end, we study the impact of assigning recommended reviewers without considering the safety of the submitted changeset. We observe existing approaches tend to improve one or two quantities of interest while degrading others. We devise an enhanced approach, Risk Aware Recommender (RAR), which increases the project safety by predicting changeset bug proneness. Given the evolving landscape of automation in code review, our final study examines whether human reviewers and, hence, recommendation tools are still beneficial to the review process. To this end, we focus on the behaviour of Review Comment Generators (RCGs), models trained to automate code review tasks, as a potential way to replace humans in the code review process. Our quantitative and qualitative study of the RCG-generated interrogative comments shows that RCG-generated and human-submitted comments differ in mood, i.e., whether the comment is declarative or interrogative. Our qualitative analysis of sampled comments demonstrates that RCG-generated interrogative comments suffer from limitations in the RCG capacity to communicate. Our observations show that neither task-specific RCGs nor LLM-based ones can fully replace humans in asking questions. Therefore, practitioners can still benefit from using code review tools. In conclusion, our findings highlight the need for further support of human participants in the code review process. Thus, we advocate for the improvement of code review tools and approaches, particularly code review recommendation approaches. Furthermore, tool builders can use our observations and proposed methods to address two critical aspects of existing CRR approaches.Item Impossibility of Two-Round MPC with the Black-Box Use of Additive Homomorphic Encryption(University of Waterloo, 2024-09-24) Ghadirli, AliMinimizing the number of rounds in the context of the Multiparty Computation (MPC) realm with respect to an arbitrary number of semi-honest adversaries is considered one of the branches that has gotten attention from researchers recently. Garg et al. proved that two-round semi-honest MPC is impossible from black-box use of two-round oblivious transfer (OT). Before this work, Garg and Srinivasan and Benhamouda and Lin showed a construction of a two-round MPC with a non-black-box use of the underlying two-round OT. Constructions of cryptographic protocols with the black-box use of cryptographic primitives have the advantage of being more efficient compared to non-black-box constructions, since in these constructions treat the underlying primitives as oracles which simplifies protocol design and analysis, leading to potentially more efficient constructions. Reducing the number of rounds has the advantage of making parties able to send their first messages and go offline until all the other parties send their message of the second round and compute the output. Our main result in this paper is to prove an impossibility result: We show that a two-round MPC based on black-box use of additive homomorphic encryption is impossible. This result is stronger than the previous result by Garg et al., mainly because OT can be constructed using additive homomorphic encryption.Item Enhancing Spatial Query Efficiency Through Dead Space Indexing in Minimum Bounding Boxes(University of Waterloo, 2024-09-24) Chen, YingThis thesis introduces the Grid Minimum Bounding Box (GMB) as an enhancement to traditional Minimum Bounding Box (MBB), designed to mitigate the negative impact of dead space and improve query efficiency. The GMB achieves this by utilizing a low- cost grid bitmap to index dead space within the MBB, enabling the filtering of queries that intersect only with dead space. This filtering reduces false positives in intersection tests and minimizes unnecessary disk I/O to leaf nodes. A key advantage of GMB is that it is developed as an augmentation technique, enabling seamless integration with any R-tree variant without altering the core indexing architecture. The effectiveness of this technique is validated through a comprehensive set of experiments on both real-world and synthetic datasets, demonstrating significant improvements in query performance across various datasets and query types. The key contributions of this thesis include the development of a data-driven algorithm for constructing GMBs, the design of a grid bitmap compression technique, the imple- mentation of an efficient maintenance system for dynamic GMBs, and the enhancement of search operations through GMB-based intersection tests. These contributions collectively establish GMB as a robust solution to the challenges presented by dead space within MBBs across different R-tree variants.Item A Longitudinal Analysis Of Replicas in the Wild Wild Android(University of Waterloo, 2024-09-24) Abbas Zaidi, Syeda MashalIn this thesis, we report and study a phenomenon that contributes to Android API sprawls. We observe that OEM developers introduce private APIs that are composed by copy-paste-editing full or partial code from AOSP and other OEM APIs – we call such APIs, Replicas. To quantify the prevalence of Replicas in the wildly fragmented Android ecosystem, we perform the first large-scale (security) measurement study, aiming at detecting and evaluating Replicas across 342 ROMs, manufactured by 10 vendors and spanning 7 versions. Our study is motivated by the intuition that Replicas contribute to the production of bloated custom Android codebases, add to the complexity of the Android access control mechanism and updates process, and hence may lead to access control vulnerabilities. Our study is facilitated by RepFinder, a tool we develop. It infers the core functionality of an API and detects syntactically and semantically similar APIs using static program paths. RepFinder reveals that Replicas are commonly introduced by OEMs and more importantly, they unnecessarily introduce security enforcement anomalies. Specifically, RepFinder reports an average of 141 Replicas per the studied ROMs, accounting for 9% to 17% of custom APIs – where 37% (on average) are identified as under-protected. Our study thus points to the urgent need to debloat Replicas.Item On Classifying the outcomes of Legal Motions(University of Waterloo, 2024-09-23) Cardoso, OluwaseunConflict is inherent to the human condition, and socially acceptable methods of resolving conflict typically begin with dialogue, compromise, or negotiation. When these efforts fail, the legal process, often culminating in the courtroom, becomes the final recourse. Legal practitioners strive to position themselves advantageously by predicting the outcomes of legal disputes, increasingly relying on predictive tools to navigate the complexities of the courtroom. This thesis investigates the feasibility of predicting the outcomes of legal motion disputes using supervised machine learning methods. While previous research has predominantly utilized expertly hand-crafted features for judicial predictions, this study explores the use of written arguments, known as briefs, as the only basis for prediction. We trained 36 classifiers to predict the outcomes of legal motions and compared their performance to that of a baseline model. The best-performing classifier achieved an accuracy of 62\% on the test dataset. However, statistical analysis reveals that the performance of the top 10 classifiers is not statistically different from the baseline model. These findings suggest that, among the top-performing classifiers, there is no conclusively dominant approach for predicting legal motion outcomes using briefs. The thesis also offers theoretical considerations to explain these results.Item Trustworthy Machine Learning with Deep Generative Models(University of Waterloo, 2024-09-23) Jiang, DihongThe recent decade has witnessed the remarkable progress of machine learning (ML) technologies across diverse domains, especially the deep generative models (DGMs) developed on high-dimensional data. However, as ML systems become integral to critical and sensitive applications, ensuring their trustworthiness becomes paramount beyond mere performance. Trustworthy machine learning encompasses a suite of principles and practices aimed at enhancing the reliability and safety of ML systems. This thesis investigates the intersection of trustworthiness and deep generative models, focusing on two pivotal aspects of trustworthiness: out-of-distribution (OOD) detection and privacy preservation. Generative models serve purposes beyond generating realistic samples. Likelihood-based DGMs, e.g. flow generative models, can additionally compute the likelihood of input data, which can be used as an unsupervised OOD detector by thresholding the likelihood values. However, they have been found to occasionally assign higher likelihoods to OOD data compared to in-distribution (InD) data, raising concerns about their reliability and credibility in OOD detection. This thesis presents new insights demonstrating that flow generative models can reliably detect OOD data by leveraging their bijectivity property. The proposed approach involves comparing two high-dimensional distributions in the latent space. To manage the high dimensionality, we extend a univariate statistical test (e.g., Kolmogorov-Smirnov (KS) test) into higher dimensions using random projections. Our method indicates robustness across various datasets and data modalities. The second focus of this thesis is on privacy preservation in DGMs. Generative models can also be seen as proxies for publishing training data, and there is growing interest in ensuring privacy preservation beyond generation fidelity in modern generative models. Differentially private generative models (DPGMs) offer a solution to protect individual data privacy in the training set of generative models. Existing methods either apply the workhorse algorithm in differentially private deep learning, e.g. differentially private stochastic gradient descent (DP-SGD), to DGMs, or use kernel methods by making the maximum mean discrepancy (MMD) objective differentially private. However, DP-SGD methods suffer from high training costs and poor scalability to stronger differential privacy (DP) guarantees, while kernel methods face the issue of mode collapse in generation. We propose novel methods to sidestep the above challenges for both methods, respectively. To alleviate the training cost overhead and scalability issues on small privacy budgets for DP-SGD, we propose to train a flow generative model in a lower-dimensional latent space, which significantly reduces the model size and thereby avoids unnecessary computation in the full pixel space. This design is more resilient to larger noise perturbations, and enables us to be the first related work to present high-resolution image generations with DP constraints. On the other hand, to improve the model utility of MMD methods, we propose to make the MMD objective differentially private without truncating the reproducing kernel Hilbert Space (RKHS). To do so, we extend Rényi differential privacy (RDP) from vectors to functions, and then propose many useful building blocks to facilitate its use in practice, e.g. Gaussian mechanism, composition theorems, and post-processing theorem. The proposed method consistently outperforms state-of-the-art methods by a large margin, especially under stronger DP guarantees. The thesis is expected to provide new insights into the application of flow generative models in OOD detection, highlight practical challenges in training generative models with DP-SGD on high-dimensional datasets, bridge the gap between RDP and functional mechanisms, and expand the family of DPGM.Item Enumerated Types in C∀(University of Waterloo, 2024-09-23) Liang, JiadaAn enumeration is a type defining a (ordered) set of named constant values. enum Week { Mon, Tue, Wed, Thu, Fri, Sat, Sun }; enum Math { PI = 3.14159, Tau = 6.28318, Phi = 1.61803 }; enum RGB { Red = 100b, Green = 010b, Blue = 001b }; Its purpose is for readability: replacing constant values in a program with symbolic names that are more meaningful to programmers in the context of the application. Thereafter, associating a name to a different value automatically distributes this rebinding, preventing errors. One of the key properties of an enumeration is the ability to enumerate (iterate) through the constants, and hence, access their values, if present. C restricts an enumeration to the integral type signed int, while C++ extends enumerations to all integral types, meaning enumeration names must bind to integer constants. Other modern programming languages provide bindings to any type and additional features to extend enumeration capabilities for better software engineering practices. The C∀ (C-for-all) programming language is an evolutionary refinement of the C programming language. One of its distinctive features is a parametric-polymorphic generic type. However, legacy data types from C, such as enumerations, do not adapt well to the C∀ generic type system. This thesis extends the simple and unsafe enumeration type in the C programming language into a complex and safe enumeration type in the C∀ programming-language, while maintaining backwards compatibility with C. The major contribution is an adaptation of enumerated types with the C∀ type-system in a way that integrates naturally with the generic types. This thesis also presents several smaller refinements to the C∀ overload resolution rules for enumerated types, each of which improves the intuitive nature of enumeration name resolution by the compiler. Finally, this work adds other useful features to enumerations that better support software engineering practices and simplify program development.Item Black-Box Barriers Behind Registration-Based Encryption(University of Waterloo, 2024-09-19) Sarfaraz, SaraRegistration-Based Encryption (RBE) is a cryptographic primitive designed to achieve the functionalities of Identity-Based Encryption (IBE) while avoiding the key-escrow problem. In an RBE system, participants generate their own secret and public keys and register their public keys with a transparent entity known as the Key Curator (KC), who does not possess any secret information. The KC’s role is limited to managing the public keys without any secret information, effectively eliminating the central key management issues inherent in IBE. Early constructions of RBE relied on non-black-box techniques, incorporating advanced cryptographic primitives such as indistinguishability obfuscation or garbled circuits. However, non-black-box constructions often face practical inefficiencies. Recent works have shown that black-box constructions of RBE are achievable, though these constructions often involve a relaxed model where the Common Reference String (CRS) can grow with the number of registered users. This work investigates the minimal assumptions needed for black-box constructions of RBE. Specifically, we explore whether it is possible to construct RBE schemes using assumptions comparable to those used in public-key encryption or algebraic assumptions that hold in the generic group model. In our work, we present the first black-box separation results for RBE that extend beyond the implications of the known relationship between RBE and public-key encryption. We demonstrate that neither trapdoor permutations nor generic group model, including Shoup’s model, are sufficient on their own to serve as the basis for constructing RBE schemes. Furthermore, we demonstrate that even a relaxed version of RBE, where all keys are registered and compressed simultaneously, cannot be constructed using these primitives in a black-box manner.Item Tactile Narratives in Virtual Reality(University of Waterloo, 2024-09-19) Kunjam, PunitThis research explores how to design haptic feedback systems in virtual reality (VR) environments to support relational presence, a concept central to the design of Virtual Learning Environments. Unlike traditional presence, which focuses on simulation, interactivity, and user satisfaction, relational presence emphasizes engagement with values such as impression, witnessing, self-awareness, awareness of difference, interpretation, inquiry, and affective dissonance. The objective is to develop haptic designs that enhance these aspects of relational presence, facilitating users’ engagement with challenging content that prompts thoughtful questions about responsibility and action. The key technical contributions to this project include the design and implementation of the haptic feedback system, VR User Interface, the development of the Sankofa Interface, and enhancements to the Macaron haptic editor. These efforts were essential in aligning haptic feedback with emotional and narrative arcs, helping to refine the interaction design and ensure its effectiveness in promoting relational presence. The integration of advanced programming techniques and responsive haptic feedback synchronized with audio-visual elements contributed to the creation of a cohesive and relationally enhanced VR environment. Through these technical developments, this research identified mechanisms and design criteria that could help align haptic feedback with emotional and narrative arcs, potentially enhancing users’ connection to the content and fostering a more reflective and empathetic engagement. By focusing on creating interactions that resonate emotionally and cognitively with users, rather than just achieving representational fidelity, this research contributes design principles that enable haptic technology to foster a richer, more reflective user experience in educational and narrative-driven VR applications.Item Waring rank, Border rank and support concentration of partials(University of Waterloo, 2024-09-19) Sanyal, AbhiroopIn this thesis, we study the classical problem of decomposition of a homogeneous polynomial into sums of powers of linear forms with minimal summands (also known as the Waring Rank of the polynomial) and related problems. The case of quadratic polynomials and binary forms was studied by Sylvester in his seminal paper in 1851 and the case for generic polynomials was resolved more than a century later by Alexander and Hirschowitz in 1995. The problem is NP-hard computationally and finding the Waring rank for several interesting classes of polynomials, for example, the general n*n symbolic determinant/permanent, remains an open problem. An important parameter in the study of this problem is the dimension D of the vector space of partial derivatives of the given polynomial. It is known that if the Waring rank of a polynomial in n variables of degree d is s, then D is at most s*(d+1). A longstanding conjecture states that given D, the Waring rank is upper bounded by a polynomial in n, d, and D. To study this conjecture, we restrict to a very special class of polynomials with no redundant variables (also called concise), which we call 1-support concentrated polynomials, that are defined by the following property: Given such a polynomial f, all its partial derivatives can be obtained as linear combinations of derivatives with respect to powers of a fixed set of n linearly independent linear forms. A crucial property of such f is that the dimension of partial derivatives of f at any degree is at most n. We show that the converse is true: any concise polynomial for which dimension of partial derivatives at any degree is less than n, is also 1-support concentrated. We also generalize an example given by Stanley to give an explicit class of concise polynomials ST(n,d) in O(max{n^d, d^n}) variables of degree d that is 1-support concentrated. A polynomial is a Direct Sum if it can be written as a sum of two polynomials in distinct sets of variables, up to a linear change of variables. A polynomial f is a limit of direct sums if there is a sequence of polynomials, each a direct sum, converging to f. Necessary and sufficient conditions for a polynomial to be a direct sum or a limit of direct sums were extensively studied by Buczy\'nska et al. and Kleppe. We show that any concise 1-support concentrated polynomial in n variables with degree d > 2n is a limit of direct sums. We also show that ST(n,d) (which does not satisfy the previous degree hypothesis) is a limit of direct sums. The border rank of a homogeneous polynomial f is the minimal r such that there is a sequence of polynomials, each with Waring rank at most r, converging to f. The debordering question is as follows: given f with border Waring rank r, what is the best upper bound for Waring rank of f in terms of n,r, and d? The best-known bound is due to Dutta et al., in 2024. In the context of this problem, it is interesting to find examples f for which the Waring rank of f is strictly greater than its border Waring rank. We show that ST(3,4) and ST(2,d), for any d > 2, have this property.Item Improving OLAP Workload Performance in Apache Ignite(University of Waterloo, 2024-09-19) Dodds, MarkApache Ignite is an open-source system that is used to process database workloads. It utilizes Apache Calcite as the underlying query engine to parse SQL queries into relational algebra operators that are passed to the Ignite execution engine. This thesis investigates the performance of online analytical processing (OLAP) workloads with varying memory and data distribution settings using Apache Ignite with Apache Calcite. From empirical studies, practical strategies to decrease response times for OLAP queries are designed and implemented in Apache Ignite. Through studies using the TPC-H benchmark, it is demonstrated that each strategy yields performance improvements across multiple queries, and when all strategies are enabled, average performance improves generally for all queries in the workload.Item Efficient Image-space Shape Splatting for Monte Carlo Rendering(University of Waterloo, 2024-09-18) Tong, XiaochunPhotorealistic image synthesis through physically based rendering is essential to achieve high visual fidelity. Monte Carlo rendering provides a scalable solution to accurately simulating the complex interactions between light and geometries within a scene. Monte Carlo integration numerically integrates a function by taking multiple pointwise estimations of the function and computing the average. Therefore, a typical Monte Carlo rendering algorithm independently samples one light path or path tree at a time for each pixel in the image. One approach to significantly reduce computational costs and enhance efficiency is to reuse light paths across multiple pixels and therefore amortize sample generation. The current state of the art of path reusing methods employs a technique known as shift mapping to deterministically map light paths from one pixel to another at low cost, yet the total computation cost is still linear w.r.t to the number of pixels processed in shift mapping. We proposed a general framework, that generalizes reusing light paths to multiple pixels arranged in arbitrary two-dimensional shapes as image space shape splatting. Our shape is defined as a set of multiple pixels, and the framework allows us to sample such shapes more efficiently than by evaluating each pixel individually through shift mapping. Our key insight is to design a fast biased estimator that sparsely evaluates the contribution of shifted paths at random pixels within the shape and interpolates the contribution to the other pixels. We then employ a debiasing estimator to eliminate the bias from approximation. Our method can be seamlessly integrated with many state-of-the-art rendering methods and improving their efficiency over traditional pointwise sampling.Item Revisiting Benchmarks for Privacy-Preserving Image Classification(University of Waterloo, 2024-09-17) Mokhtari, SabrinaDifferential privacy (DP) is a standard method for preserving the privacy of individual data points. DP prevents models from memorizing training data, thus reducing the risk of data leakage. While DP has been effective in machine learning (ML), there are growing concerns about some common practices in differentially private machine learning (DP ML), particularly the reliance on non-private ML benchmarks to measure progress. Popular datasets like CIFAR-10, while extensively used in non-private settings, may not accurately capture the complexities of privacy-sensitive areas like medical data. Additionally, pre-training on publicly available datasets may not yield the same benefits when the private data differs significantly and is not well represented in the public domain. This thesis addresses these concerns by evaluating DP methods using various privacy-sensitive datasets and training scenarios. We focus on medical datasets, where privacy is crucial, and study a thorough set of techniques. These techniques cover a wide range of settings, including those with public data pre-training, cases without public data, full-layer and last-layer fine-tuning, and different privacy levels.Item Robust Recursive Query Parallelization in Graph Database Management Systems(University of Waterloo, 2024-09-17) Chakraborty, AnuragRecursive joins such as shortest path and variable length path queries are a core feature set of modern graph database management systems (GDBMS). Since these queries tend to be computationally expensive and may suffer from high execution time, they require efficient parallel processing using multiple cores to achieve good performance. Existing work on parallel query processing includes the morsel driven parallelism approach that distributes a unit of work (denoted as “morsel”) to threads for parallel execution. We revisit this technique in the context of parallelization of recursive joins in GDBMS and discuss how the traditional approach of morsel driven query execution is inadequate to tackle recursive join queries. We show how this approach can be modified to better accommodate scalable parallelization of recursive joins. We further describe how this modified parallel query execution approach has been integrated into Kuzu, an embedded disk based columnar GDBMS. Compared to vanilla morsel driven parallelism, our modified parallel query execution approach can be orders of magnitude faster and scales well on multiple cores.