Reinforcement Learning Based Temporal Logic Control with Maximum Probabilistic Satisfaction

Mingyu Cai; Shaoping Xiao; Baoluo Li; Zhiliang Li; Zhen Kan

doi:10.1109/ICRA48506.2021.9561903

Back

Conference proceeding

Reinforcement Learning Based Temporal Logic Control with Maximum Probabilistic Satisfaction

Mingyu Cai, Shaoping Xiao, Baoluo Li, Zhiliang Li and Zhen Kan

2021 IEEE International Conference on Robotics and Automation (ICRA), pp.806-812

05/30/2021

DOI: 10.1109/ICRA48506.2021.9561903

Files and links (1)

url

https://arxiv.org/pdf/2010.06797View

Open Access

Abstract

This paper presents a model-free reinforcement learning (RL) algorithm to synthesize a control policy that maximizes the satisfaction probability of complex tasks, which are expressed by linear temporal logic (LTL) specifications. Due to the consideration of environment and motion uncertainties, we model the robot motion as a probabilistic labeled Markov decision process (PL-MDP) with unknown transition probabilities and probabilistic labeling functions. The LTL task specification is converted to a limit deterministic generalized Büchi automaton (LDGBA) with several accepting sets to maintain dense rewards during learning. The novelty of applying LDGBA is to construct an embedded LDGBA (E-LDGBA) by designing a synchronous tracking-frontier function, which enables the record of non-visited accepting sets of LDGBA at each round of the repeated visiting pattern, to overcome the difficulties of directly applying conventional LDGBA. With appropriate dependent reward and discount functions, rigorous analysis shows that any method, which optimizes the expected discount return of the RL-based approach, is guaranteed to find the optimal policy to maximize the satisfaction probability of the LTL specifications. A model-free RL-based motion planning strategy is developed to generate the optimal policy in this paper. The effectiveness of the RL-based control synthesis is demonstrated via simulation and experimental results.

Conferences

Learning automata

Markov processes

Probabilistic logic

Reinforcement learning

Robot motion

Uncertainty

Details

Title: Subtitle: Reinforcement Learning Based Temporal Logic Control with Maximum Probabilistic Satisfaction
Creators: Mingyu Cai - University of Iowa
Shaoping Xiao - University of Iowa
Baoluo Li - University of Science and Technology of China
Zhiliang Li - University of Science and Technology of China
Zhen Kan - University of Science and Technology of China
Resource Type: Conference proceeding
Publication Details: 2021 IEEE International Conference on Robotics and Automation (ICRA), pp.806-812
DOI: 10.1109/ICRA48506.2021.9561903
eISSN: 2577-087X
Publisher: IEEE
Grant note: National Natural Science Foundation of China (10.13039/501100001809)
Language: English
Date published: 05/30/2021
Academic Unit: Mechanical Engineering
Record Identifier: 9984201441502771

Metrics

10 Record Views

21 Times Cited - Web of Science