Institut de Neurosciences Cognitives et Intégratives d'Aquitaine (UMR5287)

Aquitaine Institute for Cognitive and Integrative Neuroscience



INCIA - UMR 5287- CNRS
Université de Bordeaux

Zone nord Bat 2 2ème étage
146, rue Léo Saignat
33076 Bordeaux cedex
France

Téléphone 05.57.57.15.51
Télécopie 05.56.90.14.21

Supervisory authorities

CNRS Ecole Pratique des Hautes Etudes Université de Bordeaux

Our partners

Neurocampus Unitéde Formation de Biologie

GDR

GDR Robotique GDR Mémoire GDR Multi-électrodes

Search




Home > News

A new mechanism for inter-individual learning differences

by Wolff - published on

A new mechanism for inter-individual learning differences

Manipulating the revision of reward value during the intertrial interval increases sign tracking and dopamine releases. Lee, B., Gentry, R., Bissonette, G.B., Herman, R.J., Mallon, J.J., Bryden, D.W., Calu, D.J., Schoenbaum, G., Coutureau, E., Marchand, A., Khamassi, M. and Roesch, M.R. (2018). PLoS Biology. https://doi.org/10.1371/journal.pbio.2004015

In an article just published in PLOS Biology, Etienne Coutureau and Alain Marchand from team DECAD worked with a neurobiology team from Baltimore (M. Roesch, USA) and a neurocomputational team from Paris (M. Khamassi) to uncover the mechanisms of inter-individual learning differences. Individuals differ in the way they attribute motivational salience to environmental stimuli. Some may be attracted by a signal, others by the reward associated with this signal. These behaviors correspond to different response profiles in midbrain dopaminergic neurons. The french-american consortium just confirmed the predictions from a neurocomputational model that postulates a balance between model-free and model-based learning processes. This study was supported by a joint ANR-NSF grant for Collaborative Research in Computational Neuroscience (CRCNS).

Learning the value of objects and events in our environment is an essential function for survival.
Stimuli associated with a reward acquire motivational salience in some individuals, termed sign-trackers. This specific behavior has been associated in humans with increased vulnerability to drugs of abuse and relapse. Understanding the basis of inter-individual differences may therefore have a major impact in public health policy.
Sign-trackers differ from goal-trackers who instead focus on the reward itself. Dopaminergic neurons in sign-trackers have typical reward prediction error (RPE) properties. In particular, they initially respond to unexpected rewards, but this response gradually vanishes as the reward becomes fully predicted. In goal-trackers however, the same responses persist across training, and learning is not dependent on dopamine.
A computational model may account for these differences based on the notion that each individual behaves according to a balance between two types of processes: a model-free learning mechanism relying on RPE and dopamine, and a more flexible model-based mechanism that explicitly anticipates actions and events. Depending on the relative influence of each system, different subjects could express sign tracking or goal tracking (Figure 1).


Figure 1 : The STGT computational model assumes that in sign-trackers, a dopamine-dependent model-free learning process has more influence, so that the dopaminergic signal transfers from the reward to the predictive cue (the lever). By contrast, goal-trackers rely more on the dopamine-independent model-based learning process so that the dopaminergic response is not transferred.

Furthermore, the model predicts that manipulating various parameters of the task should alter the type of response expressed by subjects. Thus, increasing the duration of the inter-trial interval (ITI) should be sufficient to increase the proportion of sign-trackers in the population and to restore RPE-like dopaminergic signals when compared to a short ITI (Figure 2).


Figure 2 : The STGT model predicts that under a long ITI, the value attributed to the food magazine is revised down, allowing accrued learning by the dopamine-dependent model-free mechanism and more sign-tracking. A short ITI predicts the opposite result (more goal-tracking).

To test this hypothesis, rats were trained in a Pavlovian task where an 8 s stimulus (an inactive lever) was unconditionally followed by food delivery. As expected, they behaved either as sign-trackers focusing on the lever, or as goal-trackers focusing on the food magazine. Using fast-scan cyclic voltammetry, the researchers from University of Maryland showed that rapid variations in dopamine release conformed to the predictions of the model for different ITIs.
These results support the idea that all individuals are potentially able to express sign-tracking or goal-tracking behaviors according to environmental constraints. They raise new questions about the neural basis of model-free and model-based learning, their relationship with RPE and dopamine, and their link with disorders such as drug addiction or Obsessional Compulsive Disorder.