On the Convergence of Modified Policy Iteration in Risk-Sensitive Exponential Cost Markov Decision Processes

Yashaswini Murthy; Mehrdad Moharrami; Rayadurgam Srikant

doi:10.1287/opre.2024.0818

Back

On the Convergence of Modified Policy Iteration in Risk-Sensitive Exponential Cost Markov Decision Processes

Journal article

Peer reviewed

On the Convergence of Modified Policy Iteration in Risk-Sensitive Exponential Cost Markov Decision Processes

Yashaswini Murthy, Mehrdad Moharrami and Rayadurgam Srikant

Operations research, Vol.74(3), pp.1425-1436

05/2026

DOI: 10.1287/opre.2024.0818

View Online

Abstract

Modified policy iteration (MPI) is a dynamic programming algorithm that combines elements of policy iteration and value iteration. The convergence of MPI is wellstudied in the context of discounted and average-cost Markov decision processes (MDPs). In this work, we consider the exponential cost risk-sensitive MDP formulation, which is known to provide some robustness to model parameters. Although policy iteration and value iteration are well-studied in the context of risk-sensitive MDPs, MPI is unexplored. To the best of our knowledge, we provide the first proof that MPI also converges for the risk-sensitive problem in the case of finite state and action spaces. Because the exponential cost formulation deals with the multiplicative Bellman equation, our main contribution is a convergence proof that is quite different than existing results for discounted and risk-neutral average-cost as well as risk-sensitive value and iteration

Social Sciences

Technology

Business & Economics

Management

Operations Research & Management Science

Science & Technology

Details

Title: Subtitle: On the Convergence of Modified Policy Iteration in Risk-Sensitive Exponential Cost Markov Decision Processes
Creators: Yashaswini Murthy - California Institute of Technology
Mehrdad Moharrami - Univ Iowa, Comp Sci, Iowa City, IA 52242 USA
Rayadurgam Srikant - University of Illinois Urbana-Champaign
Resource Type: Journal article
Publication Details: Operations research, Vol.74(3), pp.1425-1436
DOI: 10.1287/opre.2024.0818
ISSN: 0030-364X
eISSN: 1526-5463
Publisher: Informs
Number of pages: 13
Grant note: 22-07547; 23-12714 / National Science Foundation (NSF) FA9550-24-1-0002 / Air Force Office of Scientific Research (AFOSR); United States Department of Defense
Language: English
Electronic publication date: 11/27/2025
Date published: 05/2026
Academic Unit: Computer Science
Record Identifier: 9985091816402771

Metrics

26 Record Views