How mice decide: Stimulation of striatal D1 and D2 neurons bias choice in opposite ways

“It is our choices, Harry, that show us what we truly are, far more than our abilities.” 

“We are our choices.”
–Jean-Paul Sartre 

The question of how animals make decisions based upon prior experiences has plagued neuroscience since the field’s inception. An animal wants to make a decision in such a manner that it maximally benefits, choosing the action with the greatest “action value”. So how does the animal do this?

Reams of journal articles and volumes of books have been dedicated to this problem; for instance, Jonah Lehrer’s mediocre, 2009 #1 NY Times best-seller How We Decide posited several woefully wanting answers addressing just that. More scientific, nuanced approaches have recently implicated the striatum of the basal ganglia, an area associated with planning and habit formation, and implicated in some executive function (e.g. working memory), in reward-based choice based upon previous experience. One key study associated neuronal signals in the striatum with judging expected reward outcomes (Lauweyns et al 2002), and another demonstrated that striatal activity reflects learning based on previous experience in a   reward-based task (Barnes et al 2005). In this vein, UCSF scientist Linda Wilbrecht has been looking at neuronal structure and function associated with learning. Her most recent paper looks at the role of two most common types of dopaminergic neurons, D1 and D2, in the striatum, and their role in reward-based decision making (Tai et al. 2012).

(a) Sequence of events in a probabilistic switching task. Mice learned to initiate a trial at the center port and to choose a left or right peripheral port for water reinforcement. Only one peripheral port was rewarded at a time. In 25% of trials, neither port was rewarded. The rewarded port was switched only after a rewarded trial. (b) Fraction of choices for left port (n = 28 subjects) for trials before and after a switch of the rewarded port (at trial 0). (c) Fraction of choices for the left port from one subject for reward histories in which two consecutive choices to either the left or the right port were made during the previous two trials. Data from mixed choice histories are not shown for brevity. All error bars represent s.e.m. (From Tai et al. 2012).

In a fairly straightforward assay, mice were presented with 3 ports. Upon sticking their snout in the center port, lights would turn on above the side ports, and, if the mice subsequently stuck their snout in the left or right port (depending on the trial), they would receive water (not quite as a good as juice, but I guess it works) 75% of the time. The mice learned this task quickly, with their tendency to go to the left port largely determined by whether they received a water reward on the past two trials. The authors inserted a Cre-dependent light-activated ion channel (ChR2) into a line of mice expressing Cre-recombinase in either striatal D1 or D2 neurons, (for a well-written primer on the usage of optogenetics with Cre, see Luo et al. 2008). By then shining a laser into the striatum, the authors were able to selectively activate striatal D1 or D2 neurons. These mice were then used in the aforementioned reward paradigm, with optical stimulation pulsed for a half second upon a center nose poke (the beginning of the paradigm). They found that, compared to ChR2-lacking controls, activation of D1 neurons led to a stronger tendency to go to the right port, while D2 neuronal activation seemed to have the exact opposite effect: the mice were more apt to go to the right port. Importantly, in both cases, the magnitude of their response depended on whether they received water out of a particular port in the previous trial, which indicates a learning behavior. These data are nicely illustrated in Figure 3, particularly in d and e, which show opposite trends, demonstrating a bias left (3e, D2 activation) or right (3d, D1) when plotted against control responses.

(a) Timing of optical stimulation in the task. In 6% of trials, optical stimulation was delivered to the dorsal striatum during a 500-ms period starting at the same time as the Go light cues. Stimulation occurred at 5, 10 or 20 Hz, delivering 3, 5 or 10 pulses, respectively, of 5-ms light stimulation. (b,c) Examples showing the effect of 10-Hz stimulation in the left dorsomedial striatum (DMS) of a D1-Cre mouse (b) and a D2-Cre mouse (c) expressing ChR2-eYFP. Individual bars represent the fraction of left choices for various reward histories in trials in which the mouse previously made two consecutive responses at the same port. Red bars indicate stimulation trials and blue bars represent trials without stimulation. (d,e) Fraction of left choices with and without stimulation for all possible combinations of choices and outcomes in the previous two trials with more than five total occurrences. (d) Data from the D1-Cre mouse shown in b. The frequency of trials with a given reward history are indicated by the relative size of the circle. Filled circles represent a significant change in fraction of left choice with stimulation (P < 0.05, Fisher’s exact test). The red curve relates the probabilities of choice with and without stimulation for a fixed odds ratio (odds ratio = e−1.33 ± 0.20). (e) Data from the D2-Cre mouse shown in c (odds ratio = e1.45 ± 0.18). All error bars represent s.e.m. (From Tai et al. 2012)

In short, these data suggest, in a straightforward manner, that stimulating a subset of striatal dopaminergic neurons biases decision-making. The effect of striatal stimulation is not deterministic, but dependent on previous reward experience, which strongly hints that these D1 and D2 neurons are involved in learning and choice, specifically in biasing a choice one way or another. With this article, the Wilbrecht lab inches closer to revealing how we actually decide, on a neuronal level.

Please join us for the next installment of our 2012-2013 Neuroscience Seminar Series at 4 pm on Tuesday, October 30th in the CNBC Large Conference Room to learn more about mouse decision making from Linda Wilbrecht!

Matt Boisvert is a first year in the Neurosciences PhD program. He is completing his first rotation with Dr. Sascha du Lac at the Salk Institute. 

Tai L.H., Lee A.M., Benavidez N., Bonci A. & Wilbrecht L. (2012). Transient stimulation of distinct subpopulations of striatal neurons mimics changes in action value, Nature Neuroscience, 15 (9) 1281-1289. DOI: