Browsing by Author "Wang, Yiran"

Now showing 1 - 2 of 2

Latra: A Template-Based Language-Agnostic Transformation Framework for Program Reduction
(University of Waterloo, 2025-04-29) Wang, Yiran
Essential for debugging compilers and interpreters, existing reduction tools face a fundamental trade-off. Language-specific reducers, such as C-Reduce and ddSMT, offer highly effective reductions but require substantial engineering effort for each target language. Conversely, language-agnostic reducers, like Vulcan, sacrifice effectiveness for broad applicability. To bridge this gap, we present Latra, a novel template-based framework that balances both aspects, enabling general, effective, targeted program reduction. Latra combines language-agnostic reduction with user-defined, language-specific transformations. It facilitates user-defined transforms through a user-friendly domain-specific language based on simple matching and rewriting templates. This minimizes the need for deep formal grammar knowledge. Latra empowers users to tailor reductions to specific languages with reduced implementation overhead. Evaluation shows that Latra significantly outperforms Vulcan. It reduces 33.77% more tokens in C and 9.17% more tokens in SMT-LIB, with 32.27% faster execution in SMT-LIB. Notably, Latra closely matches the effectiveness of language-specific reducers C-Reduce and ddSMT (89 vs. 85, 103 vs. 109 tokens), while significantly reducing engineering effort (167 vs. 5,508, 62 vs. 118 lines of code). We strongly believe that Latra provides a practical and cost-efficient approach to program reduction, effectively balancing language-specific effectiveness with language-agnostic generality.
Modeling and Bayesian Computations for Capture-Recapture Studies
(University of Waterloo, 2024-08-19) Wang, Yiran; Beliveau, Audrey; Lysy, Martin
Capture-recapture methods are often used for population size estimation, which plays a fundamental role in informing management decisions in ecology and epidemiology. In this thesis, we develop novel approaches to population size estimation that more comprehensively incorporate various sources of statistical uncertainty in the data which are often overlooked. By addressing these uncertainties, our methods provide more accurate and reliable estimates of the parameters of interest. Furthermore, we introduce various techniques to enhance computational efficiency, particularly in the context of Markov Chain Monte Carlo (MCMC) algorithms used for Bayesian inference. In Chapter 2, we delve into the plant-capture method, which is a special case of classical capture-recapture techniques. In this method, decoys referred to as "plants" are introduced into the population to estimate the capture probability. The method has shown considerable success in estimating population sizes from limited samples in many epidemiological, ecological, and demographic studies. However, previous plant-recapture studies have not systematically accounted for uncertainty in the capture status of each individual plant. To address this issue, we propose a novel modeling framework to formally incorporate uncertainty into the plant-capture model arising from (i) the capture status of plants and (ii) the heterogeneity between multiple survey sites. We present two inference methods and compare their performance through simulation studies. We then apply these methods to estimate the homeless population size in five U.S. cities using the large-scale "S-night" study conducted by the U.S. Census Bureau. In Chapter 3, we look into the uncertainty in compositional data. Understanding population composition is essential in many ecological, evolutionary, conservation, and management contexts. Modern methods like genetic stock identification (GSI) allow for estimating the proportions of individuals from different subpopulations using genetic data. These estimates are ideally obtained through mixture analysis, which can provide standard errors that reflect the uncertainty in population composition accurately. However, traditional methods that rely on historical data often only account for sample-level uncertainty, making them inadequate for estimating population-level uncertainties. To address this issue, we develop a reverse Dirichlet-multinomial model and multiple variance estimators to effectively propagate uncertainties from the sample-level composition to the population level. We extend this approach to genetic mark-recapture scenarios, validate it with simulation studies, and apply it to estimate the escapement of Sockeye Salmon (Oncorhynchus nerka) in the Taku River. In Chapter 4, motivated by the long run times of some of the Bayesian computations in this thesis, we shift our focus to the development and evaluation of Bayesian credible intervals. Markov chain Monte Carlo (MCMC) methods are crucial for sampling from posterior distributions in Bayesian analysis. However, slow convergence or mixing can hinder obtaining a large effective sample size due to limited computational resources. This issue is particularly significant when estimating credible interval quantiles, which require more MCMC iterations than posterior means, medians, or variances. Consequently, prematurely stopping MCMC chains can lead to inaccurate credible interval estimates. To mitigate this issue in cases where the posterior distribution is approximately normal, we make a case for the use of parametric quantile estimation for determining credible interval endpoints. This chapter investigates the asymptotic properties of the parametric quantile estimation and compares it with the empirical quantile method to illustrate performance as MCMC chains are prolonged. Furthermore, we apply these techniques to a real-world capture-recapture dataset on Leisler’s bat to compare their performance in a practical scenario. Overall, this thesis contributes to the field of population size estimation by developing innovative statistical methods that improve accuracy and computational efficiency. Our work addresses critical uncertainties and provides practical solutions for ecological and epidemiological applications, demonstrating the broad applicability and impact of advanced capture-recapture methodologies.