The motivation of this thesis is the study of numerical methods for solving dynamic programming problems. In the first chapter, I present methods for reducing the computational cost imposed by the curse of dimensionality in the action and state space that occurs when solving for a Markov perfect equilibrium (MPE). These methods circumvent the computational problems for Markov perfect models in which the size of the state space or action space is large. This is because these methods are able to overcome the issue with approximating a highly non-linear or even discontinuous value function and thus allow the algorithm to use a small subset of the state space to approximate the rest of the value function. I use a model from the dynamic IO literature which is a dynamic oligopoly model with heterogeneous firms to evaluate the difference between using multidimensional Chebyshev polynomials and using artificial neural nets (ANN) for approximation, and present issues that arise when trying to use the gradient of the value function for approximations with Hermite interpolation. I also discuss how value function approximation with continuous functions can be used to find the optimal action quickly through the use of gradient based optimization techniques.
In the second chapter I examine the use of reinforcement learning to solve a high dimensional operations research problem. This methodology expands upon the value function approximation used in the first chapter through the use of an algorithm that uses both a value function approximation and a policy function approximation. I focus on the traveling salesman problem (TSP) and train recurrent neural networks (RNN) that, given a set of city coordinates, output a probability distribution over the next city to visit in a route. Using route labels provided by Google’s OR-Tools TSP solver I train multiple network architectures using supervised learning. I compare the performance of these architectures to the performance when using the route length as a reward signal to train the networks using reinforcement learning. I show that supervised learning is a useful tool for the optimization of hyperparameters that will be used in reinforcement learning, and to evaluate the performance improvement of an architecture change. I also provide evidence that while reinforcement learning is a more general optimization framework than using handcrafted heuristics, in practice it is necessary to build a neural network architecture specific to the problem of interest and that a network architectures ability to be trained using supervised learning does not guarantee the ability to be trained using reinforcement learning.