Preprint
A Policy Gradient Algorithm for the Risk-Sensitive Exponential Cost MDP
ArXiv.org
Cornell University
02/08/2022
DOI: 10.48550/arxiv.2202.04157
Abstract
We study the risk-sensitive exponential cost MDP formulation and develop a trajectory-based gradient algorithm to find the stationary point of the cost associated with a set of parameterized policies. We derive a formula that can be used to compute the policy gradient from (state, action, cost) information collected from sample paths of the MDP for each fixed parameterized policy. Unlike the traditional average-cost problem, standard stochastic approximation theory cannot be used to exploit this formula. To address the issue, we introduce a truncated and smooth version of the risk-sensitive cost and show that this new cost criterion can be used to approximate the risk-sensitive cost and its gradient uniformly under some mild assumptions. We then develop a trajectory-based gradient algorithm to minimize the smooth truncated estimation of the risk-sensitive cost and derive conditions under which a sequence of truncations can be used to solve the original, untruncated cost problem.
Details
- Title: Subtitle
- A Policy Gradient Algorithm for the Risk-Sensitive Exponential Cost MDP
- Creators
- Mehrdad MoharramiYashaswini MurthyArghyadip RoyR Srikant
- Resource Type
- Preprint
- Publication Details
- ArXiv.org
- DOI
- 10.48550/arxiv.2202.04157
- ISSN
- 2331-8422
- Publisher
- Cornell University
- Language
- English
- Date posted
- 02/08/2022
- Academic Unit
- Computer Science
- Record Identifier
- 9984446722902771
Metrics
85 Record Views