Robust Reinforcement Learning for Linear Temporal Logic Specifications with Finite Trajectory Duration

Loading...
Thumbnail Image

Date

2024-09-23

Advisor

Vardhan Pant, Yash
Fischmeister, Sebastian

Journal Title

Journal ISSN

Volume Title

Publisher

University of Waterloo

Abstract

Linear Temporal Logic (LTL) is a formal behavioral specification language that offers a mathematically unambiguous and succinct way to represent operating requirements for a wide variety of systems, including autonomous and robotic systems. Traditional methods in this domain rely on model-checking approaches to ensure that a devised policy adheres to the provided specification. However, these methods are limited in the scope of problems they can solve and often lack generalizability to novel specifications and environments. Despite progress in synthesizing satisfying policies for LTL specifications under different operating conditions, learning policies that reliably satisfy complex LTL specifications in challenging environments remains an open problem. With the emergence of Machine Learning (ML) approaches, researchers have explored the use of ML-based techniques with LTL policy synthesis. Among the various approaches investigated, Reinforcement Learning (RL) has garnered particular attention for this task. While LTL specifications are evaluated over infinite-length trajectories, this work focuses on satisfying a class of specifications within a finite number of steps, as is to be expected in most real-world applications involving robotic or autonomous systems where the run-time of the robot is limited before it needs to recharge itself, e.g., a robot vacuum which has to perform certain cleaning tasks before recharge. Therefore, in this work, an RL-based technique is developed for the problem of generating trajectories of a system that satisfy a given LTLf specification in a system with finite (discrete) states and actions and a priori unknown transition probabilities modeled as a Markov Decision Process (MDP). The proposed approach builds upon the popular AlphaGo Zero Reinforcement Learning (RL) framework, which has found great success in the two-player game of Go, to learn policies that can satisfy an LTLf specification given a limit on the trajectory duration. In this thesis, first the motivation and the necessary background on the problem are provided, followed by a brief overview of existing methods. Then the problem statement is introduced, the proposed methodology and its variants are presented, and extensive simulations of complex robot motion planning problems are conducted and their results are explained. These simulations demonstrate how the approach achieves higher success rates under time constraints compared to state-of-the-art methods. The thesis concludes with a section discussing potential directions for future work and examining the results and their implications for the work completed.

Description

Keywords

LC Subject Headings

Citation