Computer Science

Permanent URI for this collectionhttps://uwspace.uwaterloo.ca/handle/10012/9930

This is the collection for the University of Waterloo's Cheriton School of Computer Science.

Research outputs are organized by type (eg. Master Thesis, Article, Conference Paper).

Waterloo faculty, students, and staff can contact us or visit the UWSpace guide to learn more about depositing their research.

Browse

Now showing 1 - 7 of 7

Classifying Code as Human Authored or GPT-4 Generated
(University of Waterloo, 2024-05-09) Idialu, Oseremen Joy; Nagappan, Meiyappan; Atlee, Joanne
Artificial intelligence (AI) assistants such as GitHub Copilot and ChatGPT, built on large language models like GPT-4, are revolutionizing how programming tasks are performed, raising questions about whether generative AI models author code. Such questions are of particular interest to educators, who worry that these tools enable a new form of academic dishonesty, in which students submit AI-generated code as their work. Our research explores the viability of using code stylometry and machine learning to distinguish between GPT-4 generated and human-authored code and attempts to explain the predictions. Our study comprises two analyses, each based on different datasets, one sourced from CodeChef and the other from an introductory programming course. Both datasets encompass human-authored solutions alongside AI-authored solutions generated by GPT-4. The human-authored solutions selected are from before 2021 to ensure that the solutions were not contaminated with contributions from an AI coding assistant. The initial analysis serves to establish the potential of our approach, while the subsequent analysis extends our approach to actual programming assignments. In our first analysis, our classifier outperforms the baselines, achieving an F1-score and AUC-ROC score of 0.91. Even a variant of our classifier, which excludes gameable features (features susceptible to manipulation e.g., empty lines, whitespace), maintains a good performance, achieving an F1-score and AUC-ROC score of 0.89. We also conducted an evaluation based on the difficulty level of programming problems, revealing little to no differences across the difficulty levels. Specifically, the F1-score and AUC-ROC remained consistent with scores of 0.89 for easy and medium problems and a slight decrease to 0.87 for harder problems. These results highlight the promise of our approach regardless of the complexity of the programming tasks. In our second analysis, our classifier, trained and evaluated on programming assignments achieved an F1-score of 0.69 and an AUC-ROC of 0.73. A subsequent evaluation trained and evaluated our classifier on assignments submitted in 2023, a period after the release of Copilot and ChatGPT; we identified 13 out of 54 submissions as GPT-4 generated with an accuracy rate of 73%. We believe educators should recognize and proactively address this emerging trend within academic settings.
Detecting Feature-Interaction Hotspots in Automotive Software using Relational Algebra
(University of Waterloo, 2018-05-14) Muscedere, Bryan James; Atlee, Joanne
Modern software projects are programmed by multiple teams, consist of millions of lines of code, and are split into separate components that, during runtime, may not be contained in the same process. Due to these complexities, software defects are a common reality; defects cost the global economy over a trillion dollars each year. One area where developing safe software is crucial is the automotive domain. As the typical modern vehicle consists of over 100 million lines of code and is responsible for controlling vehicle motion through advanced driver-assistance systems (ADAS), there is a potential for these systems to malfunction in catastrophic ways. Due to this risk, automotive software needs to be inspected to verify that it is safe. The problem is that it can be difficult to carry out this detection in code; manual analysis does not scale well, search tools like grep have no contextual awareness of code, and although code reviews can be effective, they cannot target the entire codebase properly. Furthermore, automotive systems are comprised of numerous, communicating features that can possibly interact in unexpected or undefined ways. This thesis addresses this problem through the development of a static-analysis methodology that detects custom interaction patterns coined as hotspots. We identify several classes of automotive hotspots that describe patterns in automotive software that have the possibility of manifesting as a feature interaction. To detect these hotspots, this methodology employs a static, relational analysis toolchain that create a queryable model from source code and enable engineer defined queries to be run on the model that aim to reveal potential hotspots in the underlying source code. The purpose of this methodology is not to detect bugs with surety but work towards an analysis methodology that can scale to automotive software systems. We test this hotspot detection methodology through a case study conducted on the Autonomoose autonomous driving platform. In it, we generate a model of the entire Autonomoose codebase and run relational algebra queries on the model. Each script in the case study detects a type of hotspot we identify in this thesis. The results of each query are presented.
Improving the Precision of Analyses Queries in Factbase Models of Software Systems
(University of Waterloo, 2024-05-31) Ke, Xiang Yun (Fa Fa); Atlee, Joanne
Large software systems are developed by multiple teams of software engineers, each working on different components that are supposed to work together. Each component is responsible for a subset of system functionality and the components communicate with each other to react to information received from hardware sensors and user inputs. It is often infeasible to perform manual code reviews on large software systems because the code base may be too large, the components may be written in different languages or language variants, or the concurrency of components can lead to a state explosion of the system's analysis space. To mitigate these challenges, we create a software model consisting of facts about the software system and perform analyses on the model. Analyses performed on these software models are not always sound and complete. One of the reasons is that the order of execution of facts in the model is unknown, leading to many false-positive results that refer to infeasible execution paths. Our work addresses this problem by extending a fact-based software model with control-flow-graph facts and associating existing facts with their corresponding control flow blocks. Then, the analyses are revised to check that results correspond to execution paths (in terms of control-flow-graph facts) before reporting results to the engineers. This extra execution-path check causes the revised analyses to exhibit significant performance overhead. To reduce the overall execution time of the analyses, we (1) stage analysis queries so that they work on a subset of the facts to generate partial results incrementally and (2) employ an on-the-fly execution path check that eliminates invalid sub-results within the analysis engine. Our work is evaluated with ten different analyses performed on six software systems that use the ROS (Robot Operating System) framework for cross-component communication. A detailed precision and performance evaluation was performed on Autonomoose and WISE-ADS, two ROS-based autonomous driving systems. In addition, this thesis adapts our approach to non-ROS systems (in which components communicate via function calls instead of passed messages) and we evaluate that work by analyzing a non-ROS software controller. The controller experiment is designed to test the scalability of our work when applied to large real-world applications.
mel - Model Extraction Language and Interpreter
(University of Waterloo, 2021-04-27) Hackman, Robert; Atlee, Joanne
There is a large body of research on extracting models from code-related artifacts to enable model-based analyses of large software systems. However, engineers do not always have access to the entire code base of a system: some components may be procured from third-party suppliers based on a model specification or their code may be generated automatically from models. Additionally, the development of software systems does not produce only source code as its output. Modern large software system have various artifacts relevant to them, such as software models, build scripts, test scripts, version control history data, and more. In order to produce a more complete view of a modern software system heterogeneous fact extraction of various artifacts is necessary - not just of source code. This thesis introduces mel— a model extraction language and interpreter for extracting “facts” from models represented in XMI; these facts can be combined with facts extracted from other system components to form a lightweight model of an entire software system. We provide preliminary evidence that mel is sufficient to specify fact extraction from models that have very different XMI representations. We also show that it can be easier to use mel to create a fact extractor for a particular model representation than to develop a specialized fact extractor for the model from scratch.
Quantitative Analyses of Software Product Lines
(University of Waterloo, 2022-01-18) Olaechea Velazco, Rafael Ernesto; Atlee, Joanne; Czarnecki, Krzysztof
A software product-line (SPL) is a family of related software systems that are jointly developed and reuse a set of shared assets. Each individual software system in an SPL is called a software product and includes a set of mandatory and optional features, which are independent units of functionality. Software-analysis techniques, such as model checking, analyze a model of a software system to determine whether the software system satisfies its requirements. Because many software-analysis techniques are computationally intensive, and the number of software products in an SPL grows exponentially with the number of features in an SPL, it tends to be very time consuming to individually analyze each product of an SPL. Family-based analyses have adapted standard software-analysis techniques (e.g., model checking, type checking) to simultaneously analyze all of the software products in an SPL, reusing partial analysis results between different software products to speed up the analysis. However, these family-based analyses verify only the functional requirements of an SPL, and we are interested in analyzing the quality of service that different software products in an SPL would exhibit. Quantitative analyses of a software system model (e.g., of a weighted transition system) can estimate how long a system will take to reach its goal, how much energy a system will consume, and so on. Quantitative analyses are known to be computationally intensive. In this thesis, we investigate whether executing a family-based quantitative analysis on a model of an SPL is faster than individually analyzing every software product of the SPL. First, we present a family-based trace-checking analysis that facilitates the reconfig- uration of a dynamic software product line (DSPL), which is a type of SPL in which features can be activated or deactivated at runtime. We assessed whether executing the family-based trace-checking analysis is faster than executing the trace-checking analysis on every software product in three case studies. Our results indicated that the family-based trace checking analysis, when combined with simple data-abstraction over an SPL model’s quality-attribute values to facilitate sharing of partial-analysis results, is between 1.4 and 7.7 times faster than individually analyzing each software product. This suggests that abstraction over the quality-attribute values is key to make family-based trace-checking analysis efficient. Second, we consider an SPL’s maximum long-term average value of a quality attribute (e.g., because it represents the long-term rate of energy consumption of the system). Specifically, the maximum limit-average cost of a weighted transition represents an upper bound on the long-term average value of a quality attribute over an infinite execution of the system. Because computing the maximum limit-average cost of a software system is computationally intensive, we developed a family-based analysis that simultaneously computes the maximum limit-average cost for each software product in an SPL. We assessed its per- formance compared to individually analyzing each software product in two case studies. Our results suggest that our family-based analysis will perform best in SPLs in which many products share the same set of strongly connected components. Finally, because both of our family-based analyses require as input a timed (weighted) behaviour model of a Software Product Line, we present a method to learn such a timed (weighted) behaviour model. Specifically, the objective is to learn, for each transition t, a regression function that maps a software product to a real-valued weight that represents the duration of transition t’s execution in that software product. We apply supervised learning techniques, linear regression and regularized linear regression, to learn such functions. We assessed the accuracy of the learnt models against ground truth in two different SPL and also compared the accuracy of our method against two different state-of-the-art methods: Perfume and a Performance-Influence model. Our results indicate that the accuracy of our learnt models ranged from a mean error of 3.8% to a mean error of 193.0%. Our learnt models were most accurate for those transitions whose execution times had low variance across repeated executions of the transition in the same software product, and in which there is a linear relationship between the transition’s execution time and the presence of features in a software product.
Refining, Implementing, and Evaluating the Extended Continuous Variable-Specific Resolutions of Feature Interactions
(University of Waterloo, 2016-08-19) Zhang, Chi; Atlee, Joanne
Systems that involve feature-oriented software development suffer from feature interactions, in which features affect one another’s behaviour in surprising ways. As the number of features increases, the complexity of examining feature combinations and fixing undesired interactions increases exponentially, such that the workload of resolving interactions comes to dominate feature development. The Feature Interaction Problem results from aiming resolve feature interaction by providing optimal resolutions. Resolution strategies combat the Feature Interaction Problem by offering default strategies that resolve entire classes of interactions, thereby reducing the work of the developer who is charged with the task of resolving interactions. However, most such approaches employ coarse-grained resolution strategies (e.g., feature priority) or a centralized arbitrator. This thesis focuses on evaluating and refining a proposed architecture that resolves features’ conflicting actions on system’s outputs. In this thesis, we extend a proposed architecture based on variable-specific resolution to enable co-resolution of related outputs and to promote smooth continuous resolutions over execution sequences. We implemented our approach within the PreScan simulator for advanced driver assistance systems, and performed a case study involving 15 automotive features that we implemented. We also devised and implemented three resolution strategies for the features’ outputs. The results of the case study show that the approach produces smooth and continuous resolutions of interactions throughout interesting scenarios.
Variability-aware Neo4j for Analyzing a Graphical Model of a Software Product Line
(University of Waterloo, 2023-08-21) Chen, Xiang; Atlee, Joanne
A Software product line (SPLs) eases the development of families of related products by managing and integrating a collection of mandatory and optional features (units of functionality). Individual products can be derived from the product line by selecting among the optional features. Companies that successfully employ SPLs report dramatic improvements in rapid product development, software quality, labour needs, support for mass customization, and time to market. In a product line of reasonable size, it is impractical to verify every product because the number of possible feature combinations is exponential in the number of features. As a result, developers might verify a small fraction of products and limit the choices offered to consumers, thereby foregoing one of the greatest promises of product lines — mass customization. To improve the efficiency of analyzing SPLs, (1) we analyze a model of an SPL rather than its code and (2) we analyze the SPL model itself rather than models of its products. We extract a model comprising facts (e.g., functions, variables, assignments) from an SPL’s source-code artifacts. The facts from different software components are linked together into a lightweight model of the code, called a factbase. The resulting factbase is a typed graphical model that can be analyzed using the Neo4j graph database. In this thesis, we lift the Neo4j query engine to reason over a factbase of an entire SPL. By lifting the Neo4j query engine, we enable any analysis that can be expressed in the query language to be applicable to an SPL model. The lifted analyses return variability-aware results, in which each result is annotated with a feature expression denoting the products to which the result applies. We evaluated lifted Neo4j on five real-world open-source SPLs, with respect to ten commonly used analyses of interest. The first evaluation aims at comparing the performance of a post-processing approach versus an on-the-fly approach computing the feature expressions that annotate to variability-aware results of lifted Neo4j. In general, the on-the-fly approach has a smaller runtime compared to the post-processing approach. The second evaluation aims at assessing the overhead of analyzing a model of an SPL versus a model of a single product, which ranges from 1.88% to 456%. In the third evaluation, we compare the outputs and performance of lifted Neo4j to a related work that employs the variability-aware V-Soufflé Datalog engine. We found that lifted Neo4j is usually more efficient than V-Soufflé when returning the same results (i.e., the end points of path results). When lifted Neo4j returns complete path results, it is generally slower than V-Soufflé, although lifted Neo4j can outperform V-Soufflé on analyses that return short fixed-length paths.

Browse

Browsing Computer Science by Author "Atlee, Joanne"