Manipulating the revision of reward value during the intertrial interval increases sign tracking and dopamine releases. Lee, B., Gentry, R., Bissonette, G.B., Herman, R.J., Mallon, J.J., Bryden, D.W., Calu, D.J., Schoenbaum, G., Coutureau, E., Marchand, A., Khamassi, M. and Roesch, M.R. (2018). PLoS Biology. https://doi.org/10.1371/journal.pbio.2004015
In an article just published in PLOS Biology, Etienne Coutureau and Alain Marchand from team DECAD worked with a neurobiology team from Baltimore (M. Roesch, USA) and a neurocomputational team from Paris (M. Khamassi) to uncover the mechanisms of inter-individual learning differences. Individuals differ in the way they attribute motivational salience to environmental stimuli. Some may be attracted by a signal, others by the reward associated with this signal. These behaviors correspond to different response profiles in midbrain dopaminergic neurons. The french-american consortium just confirmed the predictions from a neurocomputational model that postulates a balance between model-free and model-based learning processes. This study was supported by a joint ANR-NSF grant for Collaborative Research in Computational Neuroscience (CRCNS).
Learning the value of objects and events in our environment is an essential function for survival.
Stimuli associated with a reward acquire motivational salience in some individuals, termed sign-trackers. This specific behavior has been associated in humans with increased vulnerability to drugs of abuse and relapse. Understanding the basis of inter-individual differences may therefore have a major impact in public health policy.
Sign-trackers differ from goal-trackers who instead focus on the reward itself. Dopaminergic neurons in sign-trackers have typical reward prediction error (RPE) properties. In particular, they initially respond to unexpected rewards, but this response gradually vanishes as the reward becomes fully predicted. In goal-trackers however, the same responses persist across training, and learning is not dependent on dopamine.
A computational model may account for these differences based on the notion that each individual behaves according to a balance between two types of processes: a model-free learning mechanism relying on RPE and dopamine, and a more flexible model-based mechanism that explicitly anticipates actions and events. Depending on the relative influence of each system, different subjects could express sign tracking or goal tracking (Figure 1).
Figure 1 : The STGT computational model assumes that in sign-trackers, a dopamine-dependent model-free learning process has more influence, so that the dopaminergic signal transfers from the reward to the predictive cue (the lever). By contrast, goal-trackers rely more on the dopamine-independent model-based learning process so that the dopaminergic response is not transferred.
Furthermore, the model predicts that manipulating various parameters of the task should alter the type of response expressed by subjects. Thus, increasing the duration of the inter-trial interval (ITI) should be sufficient to increase the proportion of sign-trackers in the population and to restore RPE-like dopaminergic signals when compared to a short ITI (Figure 2).
Figure 2 : The STGT model predicts that under a long ITI, the value attributed to the food magazine is revised down, allowing accrued learning by the dopamine-dependent model-free mechanism and more sign-tracking. A short ITI predicts the opposite result (more goal-tracking).
To test this hypothesis, rats were trained in a Pavlovian task where an 8 s stimulus (an inactive lever) was unconditionally followed by food delivery. As expected, they behaved either as sign-trackers focusing on the lever, or as goal-trackers focusing on the food magazine. Using fast-scan cyclic voltammetry, the researchers from University of Maryland showed that rapid variations in dopamine release conformed to the predictions of the model for different ITIs.
These results support the idea that all individuals are potentially able to express sign-tracking or goal-tracking behaviors according to environmental constraints. They raise new questions about the neural basis of model-free and model-based learning, their relationship with RPE and dopamine, and their link with disorders such as drug addiction or Obsessional Compulsive Disorder.