Finite-time optimal policy identification for the stochastic shortest path problem

Wangzhi Zhou; Yuanqiu Mo; Soura Dasgupta

doi:10.1109/LCSYS.2026.3697256

Back

Finite-time optimal policy identification for the stochastic shortest path problem

Journal article

Peer reviewed

Finite-time optimal policy identification for the stochastic shortest path problem

Wangzhi Zhou, Yuanqiu Mo and Soura Dasgupta

IEEE control systems letters, Vol.10, pp.409-414

2026

DOI: 10.1109/LCSYS.2026.3697256

View Online

Abstract

This letter investigates finite-time optimal policy identification for stochastic shortest path (SSP) problems. Unlike classical assumptions such as graph acyclicity or the prior requirement that all policies be proper, the proposed approach employs a Lyapunov-like function to guarantee that the optimal policy can be obtained within a finite number of iterations under both value iteration (VI) and policy iteration (PI). The proposed condition is further shown to be both necessary and sufficient for policy properness. Moreover, within this framework, the expected hitting time to the terminal state is guaranteed to be finite under any stationary policy. Simulation results are provided to validate the theoretical findings.

Details

Title: Subtitle: Finite-time optimal policy identification for the stochastic shortest path problem
Creators: Wangzhi Zhou - Southeast University
Yuanqiu Mo - Southeast University
Soura Dasgupta - University of Iowa
Resource Type: Journal article
Publication Details: IEEE control systems letters, Vol.10, pp.409-414
DOI: 10.1109/LCSYS.2026.3697256
ISSN: 2475-1456
eISSN: 2475-1456
Publisher: IEEE
Language: English
Electronic publication date: 05/26/2026
Date published: 2026
Academic Unit: Electrical and Computer Engineering
Record Identifier: 9985166828202771

Metrics

1 Record Views