Robust Reinforcement Learning for Linear Temporal Logic Specifications with Finite Trajectory Duration
Loading...
Date
2024-09-23
Advisor
Vardhan Pant, Yash
Fischmeister, Sebastian
Fischmeister, Sebastian
Journal Title
Journal ISSN
Volume Title
Publisher
University of Waterloo
Abstract
Linear Temporal Logic (LTL) is a formal behavioral specification language that offers a mathematically unambiguous and succinct way to represent operating requirements for a wide variety of systems, including autonomous and robotic systems. Traditional methods in this domain rely on model-checking approaches to ensure that a devised policy adheres to the provided specification. However, these methods are limited in the scope of problems they can solve and often lack generalizability to novel specifications and environments. Despite progress in synthesizing satisfying policies for LTL specifications under different operating conditions, learning policies that reliably satisfy complex LTL specifications in challenging environments remains an open problem. With the emergence of Machine Learning (ML) approaches, researchers have explored the use of ML-based techniques with LTL policy synthesis. Among the various approaches investigated, Reinforcement Learning (RL) has garnered particular attention for this task.
While LTL specifications are evaluated over infinite-length trajectories, this work focuses on satisfying a class of specifications within a finite number of steps, as is to be expected in most real-world applications involving robotic or autonomous systems where the run-time of the robot is limited before it needs to recharge itself, e.g., a robot vacuum which has to perform certain cleaning tasks before recharge. Therefore, in this work, an RL-based technique is developed for the problem of generating trajectories of a system that satisfy a given LTLf specification in a system with finite (discrete) states and actions and a priori unknown transition probabilities modeled as a Markov Decision Process (MDP). The proposed approach builds upon the popular AlphaGo Zero Reinforcement Learning (RL) framework, which has found great success in the two-player game of Go, to learn policies that can satisfy an LTLf specification given a limit on the trajectory duration.
In this thesis, first the motivation and the necessary background on the problem are provided, followed by a brief overview of existing methods. Then the problem statement is introduced, the proposed methodology and its variants are presented, and extensive simulations of complex robot motion planning problems are conducted and their results are explained. These simulations demonstrate how the approach achieves higher success rates under time constraints compared to state-of-the-art methods. The thesis concludes with a section discussing potential directions for future work and examining the results and their implications for the work completed.