The effectiveness of numerical approximation for dynamic programming problems

Wyatt Jones

doi:10.17077/etd.r7aw-1ipf

Back

The effectiveness of numerical approximation for dynamic programming problems

Dissertation

Open access

The effectiveness of numerical approximation for dynamic programming problems

Wyatt Jones

University of Iowa

Doctor of Philosophy (PhD), University of Iowa

Summer 2019

DOI: 10.17077/etd.r7aw-1ipf

Files and links (1)

pdf

The effectiveness of numerical approximation for dynamic programm4.36 MBDownload View

Free to read and download, Open Access

Abstract

The motivation of this thesis is the study of numerical methods for solving dynamic programming problems. In the first chapter, I present methods for reducing the computational cost imposed by the curse of dimensionality in the action and state space that occurs when solving for a Markov perfect equilibrium (MPE). These methods circumvent the computational problems for Markov perfect models in which the size of the state space or action space is large. This is because these methods are able to overcome the issue with approximating a highly non-linear or even discontinuous value function and thus allow the algorithm to use a small subset of the state space to approximate the rest of the value function. I use a model from the dynamic IO literature which is a dynamic oligopoly model with heterogeneous firms to evaluate the difference between using multidimensional Chebyshev polynomials and using artificial neural nets (ANN) for approximation, and present issues that arise when trying to use the gradient of the value function for approximations with Hermite interpolation. I also discuss how value function approximation with continuous functions can be used to find the optimal action quickly through the use of gradient based optimization techniques.

In the second chapter I examine the use of reinforcement learning to solve a high dimensional operations research problem. This methodology expands upon the value function approximation used in the first chapter through the use of an algorithm that uses both a value function approximation and a policy function approximation. I focus on the traveling salesman problem (TSP) and train recurrent neural networks (RNN) that, given a set of city coordinates, output a probability distribution over the next city to visit in a route. Using route labels provided by Google’s OR-Tools TSP solver I train multiple network architectures using supervised learning. I compare the performance of these architectures to the performance when using the route length as a reward signal to train the networks using reinforcement learning. I show that supervised learning is a useful tool for the optimization of hyperparameters that will be used in reinforcement learning, and to evaluate the performance improvement of an architecture change. I also provide evidence that while reinforcement learning is a more general optimization framework than using handcrafted heuristics, in practice it is necessary to build a neural network architecture specific to the problem of interest and that a network architectures ability to be trained using supervised learning does not guarantee the ability to be trained using reinforcement learning.

Economics

public abstract

Details

Title: Subtitle: The effectiveness of numerical approximation for dynamic programming problems
Creators: Wyatt Jones - University of Iowa
Contributors: Rabah Amir (Advisor)
Barrett Thomas (Committee Member)
Anne Villamil (Committee Member)
Suyong Song (Committee Member)
Nicholas C Yannelis (Committee Member)
Resource Type: Dissertation
Degree Awarded: Doctor of Philosophy (PhD), University of Iowa
Degree in: Economics
Date degree season: Summer 2019
DOI: 10.17077/etd.r7aw-1ipf
Publisher: University of Iowa
Number of pages: x, 76 pages
Language: English
Description illustrations: color illustrations
Description bibliographic: Includes bibliographical references (pages 71-76).
Public Abstract (ETD): The motivation of this thesis is to research how to use recent advancements in applied mathematics, and computer science to allow for economic and operations research models to study more complex environments. In the first chapter I use a model from the economic field of industrial organization that has many firms competing in a market. Each firm has a product that consumers think has a different level of quality. Since there are so many possible combinations of each firm having a specific quality level the model can only be solved when there are only a few firms. I use the recent advancements in computer science with the use of artificial neural networks (ANN) to expand the number of firms that can be in the model. I also discuss how to combine this method with other advancements in applied mathematics and why ANN are an improvement over the current methodology.

In the second chapter I use the recent advancements in recurrent neural networks (RNN), and reinforcement learning to solve the traveling salesman problem (TSP). In this problem there are many cities and a salesman must decide the route through all the cities and back home in the shortest possible distance. This is one of the oldest problems in operations research and through many years several methods have emerged that perform very well on this problem. These methods took many years to develop and they do not perform well when the TSP’s problem statement is changed slightly. Therefore there is interest in using RL to automatically discover these methods for the TSP and other similar problems. In the second chapter I detail how the implementation of the RNN must be built specifically for the problem of interest which limits how much more general this methodology is over existing methods, how to use supervised learning (SL) to get the correct settings so that RL will work, and discuss the issue of reproducibility in RL.
Academic Unit: Economics
Record Identifier: 9983777015002771

Metrics

388 File views/ downloads

306 Record Views