Enhancing reinforcement learning models by including direct and indirect pathways improves performance on striatal dependent tasks

Kim T. Blackwell; Kenji Doya

doi:10.1371/journal.pcbi.1011385

Back

Enhancing reinforcement learning models by including direct and indirect pathways improves performance on striatal dependent tasks

Journal article

Open access

Peer reviewed

Enhancing reinforcement learning models by including direct and indirect pathways improves performance on striatal dependent tasks

Kim T. Blackwell and Kenji Doya

PLoS computational biology, Vol.19(8), 1011385

08/18/2023

DOI: 10.1371/journal.pcbi.1011385

PMCID: PMC10479916

PMID: 37594982

Files and links (1)

url

https://doi.org/10.1371/journal.pcbi.1011385View

Published (Version of record) Open Access

Abstract

A major advance in understanding learning behavior stems from experiments showing that reward learning requires dopamine inputs to striatal neurons and arises from synaptic plasticity of cortico-striatal synapses. Numerous reinforcement learning models mimic this dopamine-dependent synaptic plasticity by using the reward prediction error, which resembles dopamine neuron firing, to learn the best action in response to a set of cues. Though these models can explain many facets of behavior, reproducing some types of goal-directed behavior, such as renewal and reversal, require additional model components. Here we present a reinforcement learning model, TD2Q, which better corresponds to the basal ganglia with two Q matrices, one representing direct pathway neurons (G) and another representing indirect pathway neurons (N). Unlike previous two-Q architectures, a novel and critical aspect of TD2Q is to update the G and N matrices utilizing the temporal difference reward prediction error. A best action is selected for N and G using a softmax with a reward-dependent adaptive exploration parameter, and then differences are resolved using a second selection step applied to the two action probabilities. The model is tested on a range of multi-step tasks including extinction, renewal, discrimination; switching reward probability learning; and sequence learning. Simulations show that TD2Q produces behaviors similar to rodents in choice and sequence learning tasks, and that use of the temporal difference reward prediction error is required to learn multi-step tasks. Blocking the update rule on the N matrix blocks discrimination learning, as observed experimentally. Performance in the sequence learning task is dramatically improved with two matrices. These results suggest that including additional aspects of basal ganglia physiology can improve the performance of reinforcement learning models, better reproduce animal behaviors, and provide insight as to the role of direct- and indirect-pathway striatal neurons. Author summaryHumans and animals are exceedingly adept at learning to perform complicated tasks when the only feedback is reward for correct actions. Early phases of learning are characterized by exploration of possible actions, and later phases of learning are characterized by optimizing the action sequence. Experimental evidence suggests that reward is encoded by the dopamine signal, and that dopamine also can influence the degree of exploration. Reinforcement learning algorithms are machine learning algorithms that use the reward signal to determine the value of taking an action. These algorithms have some similarity to information processing by the basal ganglia, and can explain several types of learning behavior. We extend one of these algorithms, Q learning, to increase the similarity to basal ganglia circuitry, and evaluate performance on several learning tasks. We show that by incorporating two opposing basal ganglia pathways, we can improve performance on operant conditioning tasks and a difficult sequence learning task. These results suggest that incorporating additional aspects of brain circuitry could further improve performance of reinforcement learning algorithms.

Biochemical Research Methods

Biochemistry & Molecular Biology

Life Sciences & Biomedicine

Mathematical & Computational Biology

Science & Technology

Details

Title: Subtitle: Enhancing reinforcement learning models by including direct and indirect pathways improves performance on striatal dependent tasks
Creators: Kim T. Blackwell - George Mason University
Kenji Doya - Okinawa Institute of Science and Technology Graduate University
Resource Type: Journal article
Publication Details: PLoS computational biology, Vol.19(8), 1011385
DOI: 10.1371/journal.pcbi.1011385
PMID: 37594982
PMCID: PMC10479916
NLM abbreviation: PLoS Comput Biol
ISSN: 1553-734X
eISSN: 1553-7358
Publisher: Public Library Science
Number of pages: 31
Grant note: R01AA016022 / NIH; United States Department of Health & Human Services; National Institutes of Health (NIH) - USA Okinawa Institute of Science and Technology Graduate University; Okinawa Institute of Science & Technology Graduate University 23120007; 16K21738; 16H06561; 16H06563 / JSPS KAKENHI; Ministry of Education, Culture, Sports, Science and Technology, Japan (MEXT); Japan Society for the Promotion of Science; Grants-in-Aid for Scientific Research (KAKENHI)
Language: English
Date published: 08/18/2023
Academic Unit: Roy J. Carver Department of Biomedical Engineering; Iowa Neuroscience Institute
Record Identifier: 9984460454402771

Metrics

57 Record Views

3 Times Cited - Web of Science