Journal article
A Policy Gradient Algorithm for the Risk-Sensitive Exponential Cost MDP
Mathematics of operations research, Vol.50(1), pp.431-458
02/2025
DOI: 10.1287/moor.2022.0139
Abstract
We study the risk -sensitive exponential cost Markov decision process (MDP) formulation and develop a trajectory -based gradient algorithm to find the stationary point of the cost associated with a set of parameterized policies. We derive a formula that can be used to compute the policy gradient from (state, action, cost) information collected from sample paths of the MDP for each fixed parameterized policy. Unlike the traditional average cost problem, standard stochastic approximation theory cannot be used to exploit this formula. To address the issue, we introduce a truncated and smooth version of the risksensitive cost and show that this new cost criterion can be used to approximate the risksensitive cost and its gradient uniformly under some mild assumptions. We then develop a trajectory -based gradient algorithm to minimize the smooth truncated estimation of the risk -sensitive cost and derive conditions under which a sequence of truncations can be used to solve the original, untruncated cost problem.
Details
- Title: Subtitle
- A Policy Gradient Algorithm for the Risk-Sensitive Exponential Cost MDP
- Creators
- Mehrdad Moharrami - University of IowaYashaswini Murthy - University of Illinois Urbana-ChampaignArghyadip Roy - Indian Institute of Technology GuwahatiR. Srikant - University of Illinois Urbana-Champaign
- Resource Type
- Journal article
- Publication Details
- Mathematics of operations research, Vol.50(1), pp.431-458
- Publisher
- Informs
- DOI
- 10.1287/moor.2022.0139
- ISSN
- 0364-765X
- eISSN
- 1526-5471
- Number of pages
- 29
- Grant note
- 17-04970; 19-34986 / Division of Computing and Communication Foundations; National Science Foundation (NSF); NSF - Directorate for Computer & Information Science & Engineering (CISE) W911NF-19-1-0379 / Army Research Office N0001419-1-2566 / Office of Naval Research Global; Office of Naval Research 21-06801 / Division of Computer and Network Systems; National Science Foundation (NSF); NSF - Directorate for Computer & Information Science & Engineering (CISE)
- Language
- English
- Electronic publication date
- 03/11/2024
- Date published
- 02/2025
- Academic Unit
- Computer Science
- Record Identifier
- 9984582433902771
Metrics
9 Record Views