Journal article
On the Convergence of Modified Policy Iteration in Risk-Sensitive Exponential Cost Markov Decision Processes
Operations research
11/27/2025
DOI: 10.1287/opre.2024.0818
Abstract
Modified policy iteration (MPI) is a dynamic programming algorithm that combines elements of policy iteration and value iteration. The convergence of MPI is wellstudied in the context of discounted and average-cost Markov decision processes (MDPs). In this work, we consider the exponential cost risk-sensitive MDP formulation, which is known to provide some robustness to model parameters. Although policy iteration and value iteration are well-studied in the context of risk-sensitive MDPs, MPI is unexplored. To the best of our knowledge, we provide the first proof that MPI also converges for the risk-sensitive problem in the case of finite state and action spaces. Because the exponential cost formulation deals with the multiplicative Bellman equation, our main contribution is a convergence proof that is quite different than existing results for discounted and risk-neutral average-cost as well as risk-sensitive value and iteration
Details
- Title: Subtitle
- On the Convergence of Modified Policy Iteration in Risk-Sensitive Exponential Cost Markov Decision Processes
- Creators
- Yashaswini Murthy - California Institute of TechnologyMehrdad Moharrami - Univ Iowa, Comp Sci, Iowa City, IA 52242 USARayadurgam Srikant - University of Illinois Urbana-Champaign
- Resource Type
- Journal article
- Publication Details
- Operations research
- DOI
- 10.1287/opre.2024.0818
- ISSN
- 0030-364X
- eISSN
- 1526-5463
- Publisher
- Informs
- Number of pages
- 13
- Grant note
- 22-07547; 23-12714 / National Science Foundation (NSF) FA9550-24-1-0002 / Air Force Office of Scientific Research (AFOSR); United States Department of Defense
- Language
- English
- Electronic publication date
- 11/27/2025
- Academic Unit
- Computer Science
- Record Identifier
- 9985091816402771
Metrics
1 Record Views