decision(making,models( definition( action

20
Decision Making, Models Definition Models of decision making attempt to describe, using stochastic differential equations which represent either neural activity or more abstract psychological variables, the dynamical process that produces a commitment to a single action/outcome as a result of incoming evidence that can be ambiguous as to the action it supports. Background Decision making can be separated into four processes (Doya, 2008): 1) Acquisition of sensory information to determine the state of the environment and the organism within it. 2) Evaluation of potential actions (options) in terms of the cost and benefit to the organism given its belief about the current state. 3) Selection of an action based on, ideally, an optimal tradeoff between the costs and benefits. 4) Use of the outcome of the action to update the costs and benefits associated with it. Models of the dynamics of decision making have focused on perceptual decisions with only two possible responses available. The term twoalternative forced choice (TAFC) applies to such tasks when two stimuli are provided, but the term is now generally used for any binary choice discrimination task. In a perceptual decision, the response, or action, is directly determined by the current percept. Thus the decision in these tasks is essentially one of perceptual categorization, namely process (1) above, though the same models can be used for action selection given ambiguous information of the current state (process 3). Evaluation of the possible responses in terms of their value or the resulting state’s utility (process 2) (Sugrue et al., 2005) given both uncertainty in the current state, and uncertainty in the outcomes of an action given the state, is the subject of expected utility theory and prospect theory. The necessary learning and updating of the values of different actions given the actual outcomes they produce (process 4) is the subject of instrumental conditioning and reinforcement learning, for example via temporaldifference learning (Seymour et al., 2004) and actorcritic models (Joel et al., 2002). This article is primarily concerned with the dynamics of the production of either a single percept given unreliable sensory evidence (1), or a single action given uncertainty in the outcomes (3). General features of discrimination tasks or TAFC tasks. In a TAFC task, a single decision variable can be defined, representing the likelihood ratio— the probability that evidence to date favors one alternative over the other. While TAFC tasks (Figure 1) have provided the dominant paradigm for analysis of choice behavior, the restriction to only two choices is lifted in many of the more recent models of decision making based on multiple variables, allowing for the fitting of a wider range of data sets.

Upload: others

Post on 28-Oct-2021

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Decision(Making,Models( Definition( action

Decision  Making,  Models    Definition  Models  of  decision  making  attempt  to  describe,  using  stochastic  differential  equations  which  represent  either  neural  activity  or  more  abstract  psychological  variables,  the  dynamical  process  that  produces  a  commitment  to  a  single  action/outcome  as  a  result  of  incoming  evidence  that  can  be  ambiguous  as  to  the  action  it  supports.      Background  Decision  making  can  be  separated  into  four  processes  (Doya,  2008):  

1) Acquisition  of  sensory  information  to  determine  the  state  of  the  environment  and  the  organism  within  it.  

2) Evaluation  of  potential  actions  (options)  in  terms  of  the  cost  and  benefit  to  the  organism  given  its  belief  about  the  current  state.  

3) Selection  of  an  action  based  on,  ideally,  an  optimal  tradeoff  between  the  costs  and  benefits.  

4) Use  of  the  outcome  of  the  action  to  update  the  costs  and  benefits  associated  with  it.    Models  of  the  dynamics  of  decision  making  have  focused  on  perceptual  decisions  with  only  two  possible  responses  available.  The  term  two-­‐alternative  forced  choice  (TAFC)  applies  to  such  tasks  when  two  stimuli  are  provided,  but  the  term  is  now  generally  used  for  any  binary  choice  discrimination  task.      In  a  perceptual  decision,  the  response,  or  action,  is  directly  determined  by  the  current  percept.  Thus  the  decision  in  these  tasks  is  essentially  one  of  perceptual  categorization,  namely  process  (1)  above,  though  the  same  models  can  be  used  for  action  selection  given  ambiguous  information  of  the  current  state  (process  3).      Evaluation  of  the  possible  responses  in  terms  of  their  value  or  the  resulting  state’s  utility  (process  2)  (Sugrue  et  al.,  2005)  given  both  uncertainty  in  the  current  state,  and  uncertainty  in  the  outcomes  of  an  action  given  the  state,  is  the  subject  of  expected  utility  theory  and  prospect  theory.      The  necessary  learning  and  updating  of  the  values  of  different  actions  given  the  actual  outcomes  they  produce  (process  4)  is  the  subject  of  instrumental  conditioning  and  reinforcement  learning,  for  example  via  temporal-­‐difference  learning  (Seymour  et  al.,  2004)  and  actor-­‐critic  models  (Joel  et  al.,  2002).      This  article  is  primarily  concerned  with  the  dynamics  of  the  production  of  either  a  single  percept  given  unreliable  sensory  evidence  (1),  or  a  single  action  given  uncertainty  in  the  outcomes  (3).      General  features  of  discrimination  tasks  or  TAFC  tasks.  In  a  TAFC  task,  a  single  decision  variable  can  be  defined,  representing  the  likelihood  ratio—the  probability  that  evidence  to  date  favors  one  alternative  over  the  other.  While  TAFC  tasks  (Figure  1)  have  provided  the  dominant  paradigm  for  analysis  of  choice  behavior,  the  restriction  to  only  two  choices  is  lifted  in  many  of  the  more  recent  models  of  decision  making  based  on  multiple  variables,  allowing  for  the  fitting  of  a  wider  range  of  data  sets.      

Page 2: Decision(Making,Models( Definition( action

The  tasks  can  either  be  based  on  a  free  response  paradigm,  in  which  a  subject  responds  after  as  much  or  little  time  as  she  wants,  or  an  interrogation  (forced  response)  paradigm,  in  which  the  stimulus  duration  is  limited  and  the  subject  must  make  a  response  within  a  given  time  interval.  The  free  response  paradigm  is  perhaps  more  powerful,  since  each  trial  produces  two  types  of  information:  accuracy  (correct  or  incorrect)  and  response  time.  However,  by  variation  of  the  time  allowed  when  responses  are  forced,  both  paradigms  are  valuable  for  constraining  models,  since  they  can  provide  a  distribution  of  response  times  for  both  correct  and  incorrect  trials,  as  well  as  the  proportion  of  trials  that  are  correct  or  incorrect  with  a  given  stimulus.  These  behavioral  data  can  be  modified  by  task  difficulty,  task  instructions,  such  as  (“respond  rapidly”  versus  “respond  accurately”)  or  reward  schedules  and  inter-­‐trial  intervals.        Most  models  of  the  dynamics  of  decision  making  focus  on  tasks  where  the  time  from  stimulus  onset  to  response  is  no  more  than  one  to  two  seconds,  a  timescale  over  which  neural  spiking  can  be  maintained.  Choices  requiring  much  more  time  than  this  are  likely  to  depend  upon  multiple  memory  stores,  neural  circuits  and  strategies,  which  become  difficult  to  identify,  extract  and  model  in  a  dynamical  systems  framework  (a  state-­‐based  framework  is  more  appropriate).  

 

 Figure  1.  Scheme  of  the  two-­‐alternative  forced  choice  (TAFC)  task.    Two  streams  of  sensory  input,  each  containing  stimulus  information,  or  a  signal  (S1  and  S2)  combined  with  noise  (𝜎!𝜂! 𝑡  and  𝜎!𝜂! 𝑡 ),  are  compared  in  a  decision-­‐making  circuit.    The  circuit  must  produce  one  of  two  responses  (A  or  B)  indicating  which  of  the  two  signals  is  the  stronger.  The  optimal  method  for  achieving  this  discrimination  is  via  the  sequential  probability  ratio  test  (SPRT)  which  requires  the  decision  making  circuit  to  integrate  inputs  over  time.    In  the  standard  setup  of  the  models,  two  parallel  streams  of  noisy  sensory  input  are  available,  with  each  stream  supplying  evidence  in  support  of  one  of  the  two  allowed  actions  (see  Figure  1).  The  sensory  inputs  can  be  of  either  discrete  or  continuous  quantities  and  can  arrive  discretely  or  continuously  in  time.  The  majority  of  models  focus  on  continuous  update  in  continuous  time  so  can  be  formulated  as  stochastic  differential  equations  (Gillespie,  1992,  Lawler,  2006).  The  sensory  evidence,  which  is  momentary,  produces  a  decision  variable,  which  indicates  the  likelihood  of  choosing  one  of  the  two  alternatives  given  current  evidence  and  all  prior  evidence.  The  primary  difference  between  models  is  in  how  sensory  evidence  determines  the  decision  variable.  While  most  models  incorporate  a  form  of  temporal  integration  of  evidence  (Cain  and  Shea-­‐Brown,  2012)  and  include  a  negative  interaction  between  the  two  sources  of  evidence,  differences  arise  in  the  stability  

Page 3: Decision(Making,Models( Definition( action

of  initial  states  which  determines  whether  integration  is  perfect  and  in  the  nature  of  the  interaction:  feedforward  between  the  inputs,  feedforward  between  outputs  or  feedback  from  outputs  to  decision  variables  (Bogacz  et  al.,  2006).  Models  can  also  differ  in  their  choice  of  decision  threshold—the  value  of  the  decision  variable  at  which  a  response  is  produced—in  the  free  response  paradigm  (Simen  et  al.,  2009,  Deneve,  2012,  Drugowitsch  et  al.,  2012),  and  in  particular  whether  this  parameter  or  other  model  parameters,  such  as  input  gain,  which  also  affect  the  response  time  distribution,  are  static  or  dynamic  across  a  trial  (Shea-­‐Brown  et  al.,  2008,  Thura  et  al.,  2012).    As  the  time  available  for  acquisition  of  sensory  information  increases,  so  accuracy  of  responses  increases  in  a  perceptual  discrimination  task.  Accuracy  is  measured  as  probability  of  choosing  the  response  leading  to  more  reward,  which  is  equivalent  to  obtaining  a  veridical  percept  in  these  tasks.  All  of  the  models  to  be  discussed  below  can  produce  such  a  speed-­‐accuracy  tradeoff  by  parameter  adjustment.  If  parameters  are  adjusted  so  as  to  increase  the  mean  response  time,  then  accuracy  increases.  Such  a  tradeoff  is  observed  in  behavioral  tasks,  when  either  instructions  or  the  schedule  of  reward  and  punishment  encourages  participants  to  respond  as  quickly  as  possible  while  being  less  concerned  about  making  errors,  or  to  respond  as  accurately  as  possible,  while  being  less  concerned  about  the  time  it  takes  to  decide.  The  simplest  way  to  effect  such  a  tradeoff  is  to  adjust  the  inter  trial  interval,  which  if  long  compared  to  the  decision  time,  means  that  accuracy  of  responses  impacts  reward  rate  much  more  so  than  the  time  for  the  decision  itself.  Models  can  replicate  such  behavior  when  optimal  performance  is  based  on  the  maximal  reward  rate.  Typical  parameter  adjustments  to  increase  accuracy  while  slowing  responses  would  be  a  multiplicative  scaling  down  of  inputs  (and  the  concurrent  input  noise)  or  a  scaling  up  of  the  range  across  which  the  decision  variable  can  vary  by  raising  a  decision  threshold  (Figs  2-­‐3)  (Ratcliff,  2002,  Simen  et  al.,  2009,  Balci  et  al.,  2011).  A  similar  effect  can  be  achieved  in  alternative,  attractor-­‐based  models  through  the  level  of  a  global  applied  current,  which  affects  the  stability  of  the  initial  “undecided”  state  (Figs  6-­‐9)  (Miller  and  Katz,  2013).    From  a  neuroscience  perspective,  the  decision  variable  is  typically  interpreted  as  either  the  mean  firing  rate  of  a  group  of  neurons  or  a  linear  combination  of  rates  of  many  neurons  (Beck  et  al.,  2008)  (the  difference  between  two  groups  being  the  simplest  such  combination).  There  has  been  remarkable  progress  in  matching  the  observed  firing  patterns  of  neurons  (Newsome  et  al.,  1989,  Shadlen  and  Newsome,  2001,  Huk  and  Shadlen,  2005)  with  the  dynamics  of  a  decision  variable  in  more  mathematical  models  of  decision  making  (Glimcher,  2001,  Gold  and  Shadlen,  2001,  Glimcher,  2003,  Smith  and  Ratcliff,  2004,  Gold  and  Shadlen,  2007,  Ratcliff  et  al.,  2007).  This  has  led  to  the  introduction  of  biophysically  based  models  of  neural  circuits  (Wang,  2008),  which  have  accounted  for  much  of  the  concordance  between  simple  mathematical  models,  neural  activity  and  behavior.      Optimal  Decision  Making  An  optimal  decision-­‐making  strategy  either  maximizes  expected  reward  over  a  given  time  or  minimizes  risk.  In  TAFC  perceptual  tasks,  a  response  is  either  correct  or  an  error.  In  the  interrogation  paradigm,  with  fixed  time  per  decision,  the  optimal  strategy  is  the  one  leading  to  greatest  accuracy,  that  is  the  lowest  expected  error  rate.  In  the  free  response  paradigm  the  optimal  strategy  either  delivers  the  greatest  accuracy  for  a  given  mean  response  time,  or  produces  the  fastest  mean  response  time  for  a  given  accuracy.  In  these  tasks,  the  sequential  probability  ratio  test  (SPRT),  introduced  by  Wald  and  Wolfowitz  (Wald,  1947,  Wald  and  Wolfowitz,  1948),  and  in  its  continuous  form,  the  drift  diffusion  model  (DDM)  (Ratcliff  and  

Page 4: Decision(Making,Models( Definition( action

Smith,  2004,  Ratcliff  and  McKoon,  2008)  leads  to  optimal  choice  behavior  by  any  of  these  measures  of  optimality  (see  (Bogacz  et  al.,  2006)  for  a  thorough  review).      Using  SPRT  in  the  interrogation  paradigm,  one  simply  accumulates  over  time  the  log-­‐likelihood  ratio  of  the  probabilities  of  each  alternative  given  the  stream  of  evidence,  where  the  observed  sensory  input  per  unit  time  has  a  certain  probability  given  alternative  A  and  another  probability  given  alternative  B.  Integrating  the  log-­‐likelihood  over  time,  after  setting  the  initial  condition  as  the  log-­‐likelihood  ratio  of  the  prior  probabilities,  log[P(A)/P(B)],  leads  to  a  quantity  log[P(A|S)/P(B|S)]  which  is  greater  than  zero  if  A  is  more  likely  than  B  given  the  stimulus  and  less  than  zero  otherwise.  Thus  the  optimal  procedure  is  to  choose  A  or  B  depending  on  the  sign  of  the  summed,  or  in  the  continuous  limit,  integrated,  log-­‐likelihood  ratio.      In  the  free  response  paradigm  a  stopping  criterion  must  be  included.  This  is  achieved  by  setting  two  thresholds  for  the  integrated  log-­‐likelihood  ratio,  a  positive  one  (+𝑎)  for  choice  A  and  a  negative  one  (– 𝑏)  for  choice  B.  The  further  the  thresholds  are  from  the  origin,  the  lower  the  chance  of  error,  but  the  longer  the  integration  time  before  reaching  a  decision.  Thus  the  thresholds  reflect  the  fraction  of  errors  that  can  be  tolerated,  with  𝑎 = log !

!!!  and  

𝑏 = log !!!!

 where  𝛼  is  the  probability  of  choosing  A  when  B  is  correct  and  𝛽  is  the  probability  of  choosing  B  when  A  is  correct.      The  Models    Accumulator  Models  The  first  models  of  decision  making  in  humans  or  animals  were  accumulator  models,  sometimes  called  counter  models  or  race  models.  In  these  models,  evidence  accumulates  separately  for  each  possible  outcome.  This  has  the  advantage  that  if  many  outcomes  are  possible,  the  models  are  simply  extended  by  addition  of  one  more  variable  for  each  additional  alternative,  with  evidence  for  each  alternative  accumulating  within  its  allotted  variable.  In  the  interrogation  paradigm,  one  simply  reads  out  the  highest  variable,  so  the  choice  depends  on  the  sign  of  the  difference  of  the  two  variables  in  the  TAFC  paradigm.  Thus,  if  the  difference  in  accumulated  quantities  matched  the  difference  in  integrated  log  probabilities  of  the  two  types  of  evidence,  such  readout  from  an  accumulator  model  would  be  equivalent  to  an  SPRT,  so  would  be  optimal.      In  the  free  response  paradigm,  accumulator  models  produce  a  choice  when  any  one  of  the  accumulated  variables  reaches  a  threshold,  so  these  models  can  be  called  “race  to  threshold  models”  or  simply  “race  models”.  The  original  accumulator  models  included  neither  interaction  between  accumulators,  nor  ability  for  variables  to  decrease.  However,  for  decisions  in  nature  or  in  laboratory  protocols,  evidence  in  favor  of  one  alternative  is  typically  evidence  against  the  other  alternative.  This  is  particularly  problematic  in  the  free  response  paradigm,  because  the  time  at  which  one  variable  reaches  threshold  and  produces  the  corresponding  choice  is  independent  of  evidence  accumulated  for  other  choices.  Thus  the  behavior  of  simple  accumulator  models  is  not  optimal.  Comparisons  of  response  time  distributions  of  these  models  with  behavioral  responses  showed  the  models  to  be  inaccurate  in  this  regard—observed  response-­‐time  distributions  are  skewed  with  a  long-­‐tail,  whereas  the  response  times  of  accumulator  models  were  much  more  symmetric  about  the  mean.  These  discrepancies  led  to  the  ascendance  of  Ratcliff’s  drift  diffusion  model  (Ratcliff,  1978).  

Page 5: Decision(Making,Models( Definition( action

   The  Drift  Diffusion  Model  The  drift  diffusion  model  (DDM)  is  an  integrator  with  thresholds  (Figure  2),  or  more  precisely,  the  decision  variable,  𝑥,  follows  a  Wiener  process  with  two  absorbing  boundaries  (Figure  3).  It  includes  a  deterministic  (drift)  term,  𝑆,  proportional  to  the  rate  of  incoming  evidence  and  a  diffusive  noise  term  of  variance    𝜎!,  which  produces  variability  in  response  times  and  can  lead  to  errors:    

𝑑𝑥𝑑𝑡

=  𝑆 + 𝜎𝜂 𝑡 ,  where  𝜂 𝑡  is  a  white  noise  term  defined  by   𝜂 𝑡 𝜂 𝑡′ = 𝛿 𝑡 − 𝑡′ .      

 Figure  2.  The  drift  diffusion  model  (DDM).  The  DDM  is  a  one-­‐dimensional  model,  so  the  two  competing  inputs  and  their  noise  terms  are  first  combined:  in  this  case  𝑆 = 𝑆! − 𝑆!  and  𝜎 =𝜎!! + 𝜎!!.  

 If  the  model  is  scaled  to  a  given  level  of  noise  then  its  three  independent  parameters  are  drift  rate  (S)  and  positions  of  each  of  the  two  thresholds  (a,  -­‐b)  with  respect  to  the  starting  point.  When  the  model  was  introduced,  these  parameters  were  assumed  fixed  for  a  given  subject  in  a  specific  task.  The  threshold  spacing  determines  where  one  operates  in  the  speed-­‐accuracy  tradeoff,  so  can  be  optimized  as  a  function  of  the  relative  cost  for  making  an  incorrect  response  and  the  time  between  trials.  Any  starting  point  away  from  the  midpoint  represents  bias  or  prior  information.  The  drift  rate  is  proportional  to  stimulus  strength.      

 

 

S1

S2

Choice A X > 0

Choice B X < 0

1(t)

2(t)

+

+

-or X = +T

or X = -T

S + n X

σ1

σ

η

m

X

0

X

t

m t

P(X,t)

X(0) = 0

+a

−b

St

(for St + t << +a)

Page 6: Decision(Making,Models( Definition( action

Figure  3.  The  drift  diffusion  model  is  a  Wiener  process  (one-­‐dimensional  Brownian  motion)  with  absorbing  boundaries.  In  the  absence  of  boundaries  the  probability  distribution  is  Gaussian,  centered  at  a  distance  𝑆𝑡  from  its  starting  point  with  standard  deviation  increasing  as  𝜎 𝑡.      With  fixed  parameters,  which  could  be  fitted  to  any  subject’s  responses,  the  DDM  reproduces  key  features  of  the  behavioral  data:  notably  the  skewed  shape  of  response  time  distributions  and  the  covariation  of  mean  response  times  and  response  accuracy  with  task  difficulty.    Skewed  response  time  distributions  arise  because  the  variance  of  a  Wiener  process  increases  linearly  with  time—responses  much  earlier  than  the  mean  response  time,  when  the  variance  in  the  decision  variable  is  low,  are  less  likely  than  responses  much  later,  when  the  variance  in  the  decision  variable  is  high.  A  more  difficult  perceptual  choice  is  represented  by  a  drift  rate  closer  to  zero,  which  increases  response  times  and  increases  the  probability  of  error.  Such  covariation  of  response  times  with  accuracy  matches  behavioral  data  well,  so  long  as  an  additional  processing  time  is  added  to  the  model—the  additional  time  representing  a  variable  sensory  transduction  delay  on  the  input  side  and  a  motor  delay  on  the  output  side,  both  of  which  contribute  to  response  times  in  addition  to  the  processing  within  any  decision  circuit.    While  the  original  DDM  included  trial-­‐to-­‐trial  variability  through  the  diffusive  noise  term,  additional  trial-­‐to-­‐trial  variability  in  the  parameters  was  needed  to  account  for  differences  between  the  response  time  distributions  of  correct  trials  from  those  of  incorrect  trials.  With  fixed  parameters  and  no  initial  bias,  the  DDM  produces  identical  distributions  (though  with  different  magnitudes)  for  the  timing  of  correct  responses  and  errors.  However,  response  times  of  human  subjects  are  typically  slower  when  they  produce  errors,  unless  they  are  instructed  to  respond  as  quickly  as  possible,  in  which  case  the  reverse  is  true.  These  behaviors  are  accounted  for  in  the  DDM  by  including  trial-­‐to-­‐trial  variability  in  the  drift  rate  and/or  the  starting  point.  Trial-­‐to-­‐trial  variability  in  drift  rate  leads  to  slower  errors,  as  errors  become  more  likely  on  those  trials  when  the  drift  rate  is  altered  from  its  mean  toward  zero  and  mean  responses  are  longer.  Trial-­‐to-­‐trial  variability  in  the  starting  point  leads  to  faster  errors,  as  errors  become  more  likely  on  those  trials  in  which  the  starting  point  is  closer  to  the  error  boundary,  in  which  case  the  response  is  faster.  Error  response  times  are  reduced  more  by  such  variability  than  are  correct  response  times  since  the  correct  responses  include  more  of  the  trials  started  at  the  midpoint  where  mean  times  are  longer.    Prior  information  or  bias  can  be  incorporated  in  the  drift-­‐diffusion  model,  either  through  a  shift  in  the  starting  point  of  integration,  or  an  additional  bias  current  to  the  inputs  to  be  integrated.  A  shift  in  starting  point  is  more  optimal  and  has  led  to  two-­‐stage  models  ,  where  the  first  stage  sets  the  starting  point  from  the  values  of  the  two  choices,  before  a  second  stage  of  integration  toward  threshold  commences.  Such  a  model  is  supported  by  electrophysiological  data  (Rorie  et  al.,  2010).      

Page 7: Decision(Making,Models( Definition( action

The  Leaky  Competing  Accumulator  Model

 Figure  4:  The  Leaky  Competing  Accumulator  (LCA)  model.  Two  separate  integration  processes  for  the  two  separate  stimuli  produce  two  decision  variables,  𝑋!  and  𝑋!.  A  “leak”  term  proportional  to  𝑘  causes  a  decay  of  the  decision  variable  through  self-­‐inhibition  while  a  cross-­‐inhibition  term  proportional  to  𝛽  produces  competition.  In  the  interrogation  paradigm  a  decision  is  made  according  to  the  greater  of  𝑋!  and  𝑋!  at  the  response  time,  while  in  the  free  response  paradigm  a  choice  is  made  when  one  decision  variable  first  reaches  its  threshold  (given  as  +a  and  +b  respectively).    The  Leaky  Competing  Accumulator  (LCA),  introduced  by  Usher  and  McClelland  (Usher  and  McClelland,  2001),  was  suggested  to  be  in  better  accord  with  neural  data  and  to  better  fit  some  behavioral  data  than  the  DDM.  The  LCA  is  a  two-­‐variable  model,  with  each  variable  integrating  evidence  in  support  of  one  of  the  two  alternatives  in  a  TAFC  task.  The  model  includes  a  “leak”  term,  as  decay  back  to  baseline  for  each  individual  variable  in  the  absence  of  incoming  evidence.  The  “competition”  in  LCA  is  a  cross-­‐inhibition  between  the  two  variables,  thus  improving  upon  original  accumulator  models  in  allowing  evidence  for  one  variable  contributing  to  a  reduction  in  the  other  variable.      

𝜏𝑑𝑋!𝑑𝑡

= 𝑆! − 𝑘𝑋! − 𝛽𝑋! + 𝜎! 𝜏𝜂! 𝑡  

𝜏𝑑𝑋!𝑑𝑡

= 𝑆! − 𝑘𝑋! − 𝛽𝑋! + 𝜎! 𝜏𝜂! 𝑡      The  difference  in  the  two  LCA  variables,  𝑥! = 𝑥! − 𝑥!  follows  an  Ornstein-­‐Uhlenbeck  process:    

𝜏𝑑𝑋!𝑑𝑡

= 𝑆! − 𝑘𝑋! + 𝛽𝑋! + 𝜎! 𝜏𝜂 𝑡  

where  𝑆! = 𝑆! − 𝑆!  and  𝜎! = 𝜎!! + 𝜎!!.  It  should  be  noted  that  the  DDM  is  retrieved  as  a  special  case  of  the  LCA,  if  the  coefficient  of  the  term  linear  in  𝑋! ,  that  is  𝛽 − 𝑘,  is  set  to  zero.      If  the  only  criterion  for  choosing  the  best  model  were  its  match  to  behavioral  data,  one  could  use  Aikake  Information  Criterion  or  Bayesian  Information  Criterion  to  assess  whether  inclusion  of  a  non-­‐zero  term  in  the  LCA  is  justified  by  the  better  fit  to  data  so  produced.  As  well  as  ability  to  fit  human  response  times,  the  ability  for  a  model  to  match  neural  activity  and  to  be  generated  robustly  in  a  neural  circuit  should  be  considered  when  assessing  its  value  and  relevance.  One  achievement  of  the  LCA  is  to  reproduce  an  initial  increase  in  both  variables,  before  the  competition  between  variables  causes  one  variable  to  be  suppressed,  while  the  other  variable  accelerates  its  rise.  Such  behavior  matches  electrophysiological  data  recorded  in  monkeys  during  perceptual  choices—in  particular  firing  rates  of  neurons  

Page 8: Decision(Making,Models( Definition( action

representing  the  response  not  chosen  increase  upon  stimulus  onset  before  they  are  suppressed  (cf  trajectories  in  Figure  6).        Neural  Network  Models  and  Attractor  States  Some  of  the  first  models  of  perceptual  categorization,  which  have  had  significant  impact  on  neuroscience,  were  Hopfield  networks,  Cohen-­‐Grossberg  neural  networks  and  Willshaw  networks.  While  these  models  are  primarily  aimed  at  formation  and  storage  of  memories,  the  retrieval  of  a  memory  via  corrupted  information  is  identical  to  a  perceptual  decision.      A  correspondence  between  memory  retrieval  and  decision  making  should  not  be  surprising,  since  Ratcliffs’  introduction  of  the  DDM—the  archetype  of  decision  making  models—was  within  a  paper  entitled  “A  theory  of  memory  retrieval”.  Ratcliff  was  focused  on  fitting  response  times,  so  thus  produced  an  inherently  dynamic  model—memory  retrieval  was  treated  as  a  set  of  parallel  DDMs,  each  representing  an  individual  item  with  a  “match”  and  “non-­‐match”  threshold  to  indicate  its  recognition  or  not.  The  rate  of  evidence  accumulation  in  each  individual  DDM  was  taken  as  the  closeness  of  match  between  a  current  item  and  one  in  memory.      Models  such  as  those  of  Hopfield  respond  to  the  match  between  a  current  item  and  memories  of  previously  encoded  items,  which  are  contained  within  the  network  as  attractor  states.  The  network’s  activity  reaches  a  stable  attractor  state  more  rapidly  if  the  match  is  close.  The  temporal  dynamics  of  memory  retrieval  or  pattern  completion,  which  comprises  the  decision  process,  is  not  addressed  carefully  in  neural  network  models,  in  which  neural  units  can  be  binary  and  time  can  be  discretized,  since  this  is  not  their  goal.  However,  use  of  attractor  states  to  represent  the  outcome  of  a  decision  or  a  perceptual  categorization  has  achieved  success  in  more  biophysical  models  based  on  spiking  neurons  (Wang,  2008),  albeit  with  far  simpler  attractor  states  than  those  of  neural  networks.    Biophysical  Models  The  LCA  model  (Usher  and  McClelland,  2001),  being  motivated  by  neurophysiology,  is  similar  in  spirit  to  the  more  detailed  biophysical  models  that  followed  it.  In  particular,  the  first  model  of  decision  making  based  on  spiking  neurons  assumes  two  competing  integrators,  where  integration  is  produced  through  tuned  recurrent  excitation  and  competition  is  the  result  of  cross-­‐inhibition  between  the  pools  of  neurons  (see  Figure  5).  However,  when  even  the  most  elementary  properties  of  spiking  neurons  are  taken  into  account,  a  few  additional  complications  arise  (Wong  and  Wang,  2006).    

   Figure  5.  A  neural  circuit  model  of  decision-­‐making.  Integration  of  stimuli  (S1,  S2)  by  groups  of  neurons  with  rates  r1  and  r2  can  be  achieved  through  strong  excitatory  recurrent  connections  within  groups  (looped  

S1

S2

r1

r2

INHIB

Page 9: Decision(Making,Models( Definition( action

connections  with  arrows).  The  mean  firing  rate  of  cells  in  the  inhibitory  pool  (“INHIB”)  increases  with  both  r1  and  r2,  producing  competition  through  inhibitory  input  to  both  cell-­‐groups  (solid  circles)  (Wang,  2002).      First,  neurons  emit  spikes  as  a  point  process  in  time,  so  that  even  at  a  constant  firing  rate  they  supply  a  variable  current  to  other  cells.  The  variability  in  the  current  can  be  reduced  with  an  increase  in  the  number  of  cells  in  a  group,  so  long  as  the  spike  times  are  uncorrelated—that  is,  cells  are  firing  asynchronously.  Asynchrony  is  most  easily  achieved  when  neurons  spike  irregularly,  a  feature  produced  by  adding  additional  noise  to  each  neuron  in  the  decision  making  circuit.  So,  in  accord  with  most  likely  realizations  of  a  decision  making  circuit  in  vivo,  biophysical  models  introduce  additional  noise  into  the  decision  making  model  itself,  which  inevitably  adds  to  any  already-­‐present  stimulus  noise  (Wang,  2002).    Second,  neurons  are  either  excitatory  or  inhibitory,  so  in  order  for  two  groups  of  cells  with  self-­‐excitation  to  inhibit  each  other,  at  a  minimum  a  third  group  of  inhibitory  cells  must  be  added.  The  need  for  an  extra  cell-­‐group  to  mediate  the  inhibition  adds  a  small  delay  to  the  cross-­‐inhibition  compared  to  self-­‐excitation,  though  this  effect  can  be  counteracted  with  fast  responses  in  the  synaptic  connections  to  and  from  inhibitory  cells  versus  slower  synaptic  responses  in  excitatory  connections.  The  second  effect  of  an  intermediate  cell-­‐group  is  to  add  the  nonlinearity  of  the  inhibitory  interneuron’s  response  function  into  the  cross-­‐inhibition.  Thus  the  inhibitory  input  to  one  excitatory  cell-­‐group  is  not  a  linear  function  of  the  firing  rate  of  the  other  cell-­‐group.  The  consequences  of  this  and  other  nonlinearities  are  discussed  further  below.    Third,  biophysical  models  of  neurons,  just  like  neurons  in  vivo,  respond  nonlinearly  to  increased  inputs.  Similarly  synaptic  inputs  to  a  cell  saturate,  so  are  a  nonlinear  function  of  presynaptic  firing  rates.  In  the  LCA  model  (Usher  and  McClelland,  2001),  both  the  neural  response  and  the  feedback  are  linear  functions  passing  through  the  origin,  so  via  the  tuning  of  one  variable  the  two  curves  can  match  each  other  and  produce  an  integrator.  Integrators  require  such  a  matching  of  synaptic  feedback  to  neural  response  so  they  can  retain  a  stable  firing  rate  in  the  absence  of  input—the  firing  rate  produced  by  a  given  synaptic  input  must  be  exactly  that  needed  to  feedback  the  same  synaptic  input—and  this  must  be  true  for  a  wide  range  of  firing  rates.  Such  matching  of  synaptic  feedback  to  neural  response  produces  a  state  of  marginal  stability,  typically  called  a  line  attractor  or  continuous  attractor.    

   

Figure  6.  Nonlinearity  of  neural  response  functions  produces  stable  fixed  points,  toward  which  neural  activity  evolves.  Nullclines  are  represented  by  the  solid  thick  red/green  lines,  the  green  line  where  dr2/dt  =  0  at  fixed  r1  and  the  red  line  where  dr1/dt  =  0  at  fixed  r2.  Crossings  of  the  lines  are  fixed  points  of  the  system,  with  stable  fixed  points  denoted  by  solid  black  circles.  Decision  states  are  stable  fixed  points  with  r2>>r1  or  r1>>r2.  As  a  symmetric  input  current  is  added  to  the  network,  the  spontaneous  “undecided”  state,  that  is  the  fixed  point  of  low  rates  with  r1=r2,  becomes  unstable,  so  that  by  I1=I2=10,  the  only  stable  fixed  points  correspond  to  one  

0 1000

20

40

60

80

100

r1 (Hz)

r 2 (Hz)

I1 = I2 = 0

0 100r1 (Hz)

I1 = I2 = 5

0 100r1 (Hz)

I1 = I2 = 10

0 100r1 (Hz)

I1 = I2 = 15

Page 10: Decision(Making,Models( Definition( action

choice  or  the  other  and  a  decision  is  forced.  With  high  enough  applied  input  current,  I1=I2=15,  a  new  symmetric  stable  state  appears  at  high  firing  rates,  but  this  plays  no  role  in  the  decision  making  process.  Deterministic  trajectories  from  a  range  of  starting  points  are  indicated  by  the  thin  colored  lines,  which  terminate  at  a  fixed  point.  See  Table  1  for  an  xppaut  code,  which  produces  the  nullclines,  and  Table  2  for  the  Matlab  code,  which  produces  the  trajectories.  

 

 Figure  7.  Stochastic  noise  in  the  simulation  produces  trial-­‐to-­‐trial  variability  in  the  neural  responses.  With  the  same  neural  circuit  of  Fig.  6,  addition  of  noise  can  cause  neural  activity  to  end  up  in  different  states,  even  with  the  same  starting  points.  Small  colored  dots  indicate  trajectories,  with  black  solid  circles  denoting  end  points  after  5s.  All  simulations  begin  with  𝑟! = 𝑟! = 0.1𝐻𝑧.  Far  left:  with  no  applied  current,  neural  activity  is  maintained  near  the  low-­‐rate  spontaneous  state.  Center  left:  with  moderate  applied  current,  noise  can  induce  transitions  to  one  of  the  two  “decision  states”.  Center  and  far  right:  with  increased  applied  current,  trajectories  always  end  up  at  a  decision  state—even  with  I1=I2=15,  the  two  decision  states  are  more  stable  than  the  symmetric  high-­‐rate  state  (far  right).  See  Table  1  for  an  xppaut  code,  which  produces  the  nullclines,  and  Table  2  for  the  Matlab  code,  which  produces  the  trajectories.    However,  in  the  absence  of  symmetry  or  a  remarkable  similarity  in  the  shape  of  neural  response  curves  and  synaptic  feedback  curves,  the  nonlinear  curves  of  biophysical  neurons  intersect  at  no  more  than  three  points  (Figure  6),  leading  to  the  possibility  of  two  discrete  stable  attractor  states  for  a  group  of  cells  (with  an  unstable  fixed  point  in  between).  When  two  such  groups  are  coupled,  such  as  by  cross-­‐inhibition,  the  network  can  have  at  most  four  stable  states,  given  by  the  combinations  of  low  and  high  firing  rates  for  the  two  groups.  In  the  winner-­‐takes-­‐all  model  of  decision  making,  three  of  these  states  can  be  stable  in  the  absence  of  input:  the  state  when  both  cell-­‐groups  have  low,  or  spontaneous  activity  and  the  two  states  with  just  one  of  the  cell-­‐groups  possessing  high  activity.  The  fixed  point  with  both  groups  possessing  high  activity  is  unstable.  In  the  presence  of  input,  the  symmetric  low-­‐activity  state  becomes  unstable  and  only  two  stable  states  remain:  the  decision  states  with  one  group  active  (“the  winner”)  and  the  other  group  inactive  (“the  loser”).  Importantly,  with  a  combination  of  slow  synaptic  time  constants  (NMDA  receptors  with  a  time  constant  of  50-­‐100ms  are  an  essential  ingredient  of  the  model)  and  sufficient  fine  tuning  of  parameters  of  the  network,  the  time  course  for  the  network’s  activity  to  shift  from  the  unstable  initial  state  to  one  of  the  two  remaining  stable  attractor  states  is  slow  enough  to  match  neural  and  behavioral  response  times.        

0 1000

20406080

100

r1 (Hz)

r 2 (Hz)

I1 = I2 = 0

0 100r1 (Hz)

I1 = I2 = 5

0 100r1 (Hz)

I1 = I2 = 10

0 100r1 (Hz)

I1 = I2 = 15

Page 11: Decision(Making,Models( Definition( action

   Figure  8.  A  bias  in  the  inputs  causes  neural  activity  to  favor  one  decision  state  over  the  other,  though  noise  means  that  “errors”  can  arise.  Left:  In  the  deterministic  system  more  trajectories  terminate  at  the  attractor  point  of  high  r1,  because  group-­‐1  receives  higher  input  current.  In  particular,  a  symmetric  initial  condition  (r1=r2)  results  in  termination  with  high  r1  and  low  r2.  That  is,  the  basin  of  attraction  for  the  state  with  r1>r2  includes  the  line  r1=r2.  Right:  With  added  noise,  some  trials  with  a  symmetric  starting  point  terminate  in  the  state  with  high  r2—these  correspond  to  “Errors”  in  the  standard  terminology  of  decision  making.  See  Table  1  for  an  xppaut  code,  which  produces  the  nullclines,  and  Table  2  for  the  Matlab  code,  which  produces  the  trajectories.    

 Figure  9.  In  a  noisy  system,  the  initial  state  can  remain  deterministically  stable,  but  responses  terminate  in  one  of  the  decision  states,  with  the  bias  favoring  one  final  state  over  the  other.  Left:  in  the  absence  of  noise  and  a  bias  in  the  inputs,  more  trajectories  evolve  to  the  fixed  point  favored  by  the  input,  but  many  terminate  at  the  “undecided”  state.  Right:  with  added  noise,  and  symmetric  initial  conditions,  most  otherwise  “undecided”  responses  switch  to  the  decision  state  favored  by  the  input  bias  (corresponding  to  “Correct”  responses)  while  a  few  terminate  in  the  other  decision  state  (“Incorrect”  responses)  and  one  remains  in  the  symmetric  low-­‐rate  state  (an  “Undecided”  response).  See  Table  1  for  an  xppaut  code,  which  produces  the  nullclines,  and  Table  2  for  the  Matlab  code,  which  produces  the  trajectories.    Extension  of  models  to  multiple  choices  Many  decision-­‐making  models  for  TAFC  are  simply  extended  to  the  case  of  multiple  alternatives  (Bogacz  et  al.,  2007b,  Furman  and  Wang,  2008,  Niwa  and  Ditterich,  2008,  Ditterich,  2010).  Electrophysiological  data  suggests  that  neural  firing  rates  reach  a  threshold  independent  of  number  of  alternatives,  but  neurons  receive  greater  inhibition,  as  revealed  by  reduced  firing  rates  in  their  initial,  spontaneous  activity  state  (Churchland  et  al.,  2008,  Churchland  and  Ditterich,  2012).  This  is  akin  to  a  reduction  in  the  prior  probability  of  each  individual  choice  alternative  before  stimulus  onset  if  prior  probability  impacts  the  starting  point  of  integration.  Models  in  which  the  total  amount  of  inhibition  depends  on  the  total  number  of  alternative  choices  (see  Figure  10)  reproduce  such  behavior.  One  

0 1000

20

40

60

80

100

r1 (Hz)

r 2 (Hz)

I1 = 11, I2 = 10

0 100r1 (Hz)

I1 = 11, I2 = 10

0 1000

20

40

60

80

100

r1 (Hz)

r 2 (Hz)

I1 = 5.5, I2 = 5

0 100r1 (Hz)

I1 = 5.5, I2 = 5

Page 12: Decision(Making,Models( Definition( action

consequence  of  the  increased  inhibition  is  a  slowing  of  decision  times  as  the  number  of  alternatives  increases.    

         Figure  10:  Models  for  decision-­‐making  with  multiple  alternatives.  Left:  Extension  of  the  leaky  competing  accumulator  model  to  multiple  alternatives,  such  that  the  negative  feedback  is  proportional  to  the  sum  of  multiple  decision  variables.  Right:  Extension  of  the  biophysical  model,  in  which  cells  providing  inhibitory  feedback  are  activated  by  the  summed  activity  of  multiple  groups  of  excitatory  cells    Sequential  Discrimination,  Context-­‐Dependent  Decisions  and  Prior  Information  The  most  developed  dynamical  models  of  decision-­‐making  pertain  to  the  identification  of  an  incoming  sensory  stimulus  or  the  comparison  of  two  or  more  concurrent  stimuli.  However,  many  decisions  require  a  comparison  of  successive  stimuli  and  even  when  two  stimuli  are  concurrent,  our  attention  typically  switches  from  one  to  the  other  when  making  a  perceptual  choice  based  on  the  two  stimuli.  Thus,  models  of  decision-­‐making  have  been  developed  in  which  the  stimuli  are  separated  in  time,  so  a  form  of  short-­‐term  memory  is  required,  with  the  contents  of  short-­‐term  memory  affecting  the  choice  of  action  given  a  later  sensory  input  (Romo  and  Salinas,  2003,  Machens  et  al.,  2005,  Miller  and  Wang,  2006).  The  process  of  making  a  decision  by  combining  short-­‐term  memory,  which  can  represent  the  current  “context”  (Salinas,  2004),  with  sensory  input  provides  the  essential  ingredient  for  working  memory  tasks  and  for  model-­‐based  strategies  of  action  selection  (Deco  and  Rolls,  2005).      Prior  information  can  make  one  response  either  a  more  likely  alternative  or  a  more  rewarding  alternative  given  ambiguous  sensory  information  and  lead  to  across-­‐trial  dependences  in  decision-­‐making  behavior.  Consideration  of  how  much  to  weight  prior  information  compared  to  the  current  stimulus  requires  a  separate  choice  (Hanks  et  al.,  2011),  of  how  long  one  should  continue  to  acquire  sensory  input.  Such  a  choice  is  akin  to  setting  a  decision-­‐making  threshold.  Factors  affecting  the  optimal  period  of  obtaining  sensory  input  include  the  relative  clarity  of  incoming  information  compared  to  the  strength  of  the  prior  (Deneve,  2012),  as  well  as  the  intrinsic  cost  in  reduced  reward  rate  when  taking  more  time  to  decide  (Drugowitsch  et  al.,  2012).  At  some  point  in  time,  awaiting  further  sensory  evidence  does  not  improve  ones  probability  of  a  correct  choice  sufficiently  to  warrant  any  extra  delay  of  reward.  Solution  by  dynamic  programming  (Bellman,  1957)  of  a  model  that  takes  into  account  these  factors  suggests  that  monkeys  respond  according  to  an  optimal,  time-­‐varying  cost  function  (Drugowitsch  et  al.,  2012).    

S1

S2

S3

SN

∫I = Xn

X1

X2

X3

XN

Σ

S1

S2

S3

SN

r1

r2

r3

rN

INHIB

Page 13: Decision(Making,Models( Definition( action

Testing  Decision-­‐Making  Models  Decision-­‐making  models  can  be  tested  more  stringently  using  tasks  in  which  the  stimulus  is  not  held  constant  across  the  time  allotted  for  producing  a  response  (Zhou  et  al.,  2009,  Stanford  et  al.,  2010,  Shankar  et  al.,  2011,  Rüter  et  al.,  2012).  For  example,  in  models  such  as  DDM  based  on  perfect  integration,  if  a  stimulus  is  altered  or  even  reversed  for  a  short  amount  of  time,  so  long  as  the  stimulus  alteration  is  in  the  period  of  its  integration  it  has  the  same  effect  on  response  time  and  choice  probability  whether  it  is  early  or  late.  However,  in  models  where  the  initial  state  is  unstable,  a  late  altered  stimulus  has  weaker  impact  on  the  decision-­‐making  dynamics  than  an  early  one.  Conversely,  in  models  where  the  initial  state  is  stable,  such  as  the  LCA  with  a  positive  leak  term,  only  stimuli  shortly  before  the  final  response  contribute  to  it—the  time  constant  of  the  drift  term  back  to  the  initial  state  corresponds  to  a  time  constant  for  the  forgetting  of  earlier  evidence.    Alternatively,  if  one  sets  up  a  task  in  which  noise  in  the  stimulus,  controlled  by  the  experimenter,  is  the  dominant  noise  source  in  the  decision  making  process,  one  can  analyze  separately  correct  trials  and  error  trials  and  align  them  by  either  time  of  stimulus  onset,  or  time  of  response,  to  assess  the  impact  of  noise  fluctuations  on  choice  probability  or  response  times.  Care  must  be  taken  when  aligning  by  response  times,  since  threshold  crossings  are  inevitably  produced  by  noise  fluctuations  in  the  direction  of  the  threshold  crossed.  However,  in  all  cases  model  predictions  can  be  tested  with  experimental  measurements.  Current  results  appear  to  be  task  dependent,  as  some  data  sets  suggest  all  evidence  is  equally  impactful  (supporting  a  perfect  integrator)  (Huk  and  Shadlen,  2005,  Brunton  et  al.,  2013),  while  others  suggest  the  weighting  of  sensory  evidence  is  higher  early  (Ludwig  and  Davies,  2011)  or  higher  late  (Cisek  et  al.,  2009,  Thura  et  al.,  2012)  or  oscillatory  (Wyart  et  al.,  2012)  across  stimulus  duration.    Beyond  Discrimination:  Action  Selection  In  the  models  considered  heretofore,  the  decision  of  what  action  to  take  has  been  equivalent  to  the  question  of  what  is  perceived.  This  is  because  in  the  relevant  tasks,  the  difficulty  is  in  unraveling  the  cause  of  a  sensory  input,  which  has  been  degraded  either  at  source,  or  through  sensory  processing,  or  as  a  result  of  imperfections  of  memory  encoding  and  recall.  The  requisite  action  given  a  sensory  percept  is  either  via  a  straightforward  instruction  for  human  subjects,  or  produced  by  weeks  to  months  of  training  in  non-­‐human  animal  subjects.  Thus,  in  the  post-­‐training  stage  used  to  acquire  data,  the  step  from  percept  to  action  can  be  considered  very  fast  and  independent  of  the  parameters  varied  by  the  experimenter  to  modify  task  difficulty.      However,  most  decisions  require  us  to  select  a  course  of  action  given  a  percept,  or  given  a  combination  of  percepts.  Two  general  strategies,  termed  model-­‐based  or  model  free  (Dayan  and  Daw,  2008,  Dayan  and  Niv,  2008),  are  possible  for  action  selection.  Model-­‐based  strategies  require  an  evaluation  of  all  possible  consequences,  with  their  likelihood,  similar  in  a  chess  game  to  calculating  all  the  combinations  of  moves  in  response  to  one  move,  or  conversely,  all  the  possible  causes  of  an  observation.  A  model-­‐free  system  simply  learns  the  value  of  a  given  state  and  selects  an  action  based  on  the  immediately  reachable  state  with  highest  value.  For  example,  one  could  move  a  chess  piece  to  produce  the  pattern  of  pieces  that  has  led  to  most  games  won  in  the  past.    A  wide  literature  in  the  field  of  REINFORCEMENT  LEARNING  (Barto,  1994,  Redish  et  al.,  2007)  and  its  possible  neural  underpinnings  (Daw  and  Doya,  2006,  Johnson  et  al.,  2007,  Lee  and  Seo,  2007)  addresses  how  one  can  learn  the  value  of  states  over  multiple  trials.  This  

Page 14: Decision(Making,Models( Definition( action

literature  has  influenced  model-­‐free  biophysical  models  of  decision  making  (Soltani  and  Wang,  2008,  2010),  with  the  principle  requirement  being  a  need  for  enhanced  Hebbian  learning  when  the  resulting  action  leads  to  positive  reinforcement,  but  not  otherwise.  The  neural  activity  underlying  the  decision  precedes  the  reinforcement  signal,  so  in  order  for  the  appropriate  synapses  to  be  potentiated  when  positive  reinforcement  arrives,  authors  have  suggested  the  activity  itself  remains  in  a  persistent  state  of  high  firing-­‐rate  (Soltani  and  Wang,  2006),  or  a  molecular  “eligibility  trace”  of  earlier  activity  persists  through  the  time  of  the  reinforcement  signal  (Bogacz  et  al.,  2007a,  Izhikevich,  2007).      A  model-­‐based  method  for  making  a  decision  is  more  flexible  and  can  be  more  accurate,  but  in  all  but  the  simplest  cases  the  combinatorial  explosion  of  alternatives  becomes  unwieldy  and  time  consuming  to  calculate.  Hierarchical  reinforcement  learning  renders  such  models  more  manageable  (Barto  and  Mahadevan,  2003,  Ito  and  Doya,  2011,  Botvinick,  2012).        

Page 15: Decision(Making,Models( Definition( action

Table  1:  A  code  base  on  xppaut  to  produce  nullclines  and  fixed  points  for  two  interacting  groups  of  neurons,  connected  in  a  decision-­‐making  circuit,  as  used  in  Figs.  4-­‐7.

# Rate-based model of two coupled neuron populations. # I1 and I2 are the inputs to each population, which should be varied. par I1=5 par I2=5.5 # Strength of self-excitation par wself=1 # Cross connections are net inhibitory par wcross=-0.2 # tau is the time constant -- the slowest synaptic time constant, of NMDA is used. # tau has no effect on steady state solutions. par tau=0.075 # Sigmoidal f-I curves require three parameters, these are: # rmax, the maximum rate par rmax=100 # Ithresh, current to produce half-maximum rate par Ithresh=50 # delta, determines the width of the sigmoid (sensitivity near Ithresh). par delta=20 # Initial firing rates can be varied. init r1=0 init r2=0 dr1/dt=-r1/tau+rmax/(1+exp(-(I1+wself*r1+wcross*r2-Ithresh)/delta))/tau dr2/dt=-r2/tau+rmax/(1+exp(-(I2+wself*r2+wcross*r1-Ithresh)/delta))/tau # The following are default values for the start-up of xpp. @method=rk,total=2,bound=500,dt=.005,dtmin=1e-12,atoler=1e-7 @toler=1e-5,xhi=2,yhi=100,ylo=0 njmp=5 @ bell=off,nout=50 done

       

Page 16: Decision(Making,Models( Definition( action

Table  2.    The  Matlab  function  used  to  simulate  multiple  decision-­‐making  trials  as  plotted  in  each  panel  of  Figs  6-­‐9.  function [ r1 r2 ] = WTAtrials( Iapp1,Iapp2,sigma ); % WTAtrials produces multiple trials of a winner-takes-all network produced % by coupling two sigmoidal firing rate models, each representing a neural % population. % Iapp1 and Iapp2 are the respective current inputs to the 2 populations. % sigma is the level of noise (added independently to each population). % The function returns the firing rate as a function of time for each % trial for each population. % Number of trials is defined within the function as Ntrials = 10. dt = 0.001; % timestep for simulation tmax = 5.0; % max time for integration tvec = 0:dt:tmax; % time vector Ntrials = 10; r1 = zeros(Ntrials, length(tvec)); % records rate of one cell group r2 = zeros(Ntrials, length(tvec)); % records rate of other cell group tau = 0.05; % time constant (cf NMDA receptors) rmax = 100; % max rate of cells Ithresh = 50; % input needed for 1/2-max firing Idelta = 20; % determines steepness of sigmoidal f-I curve wself = 1; % recurrent excitatory feedback wcross = -0.2; % strength of cross-inhibition trial = 0; rhighstart = 0*rmax; % If nonzero, provides a range of starting points. for trial = 1:Ntrials if ( trial <= Ntrials/2 ) r1init = 0.1+rhighstart*(Ntrials/2-trial)/(Ntrials/2-1); r2init = 0.1; else r2init = 0.1+rhighstart*(Ntrials-trial)/(Ntrials/2-1); r1init = 0.1; end % Implement initial conditions r1(trial,1) = r1init; r2(trial,1) = r2init; % Produce independent random noise for each population noise1 = sigma*sqrt(dt/tau)*randn(size(tvec)); noise2 = sigma*sqrt(dt/tau)*randn(size(tvec)); % Now integrate with basic Euler-Maruyama method using sigmoid f-I % curves and current, I, linear in rate. for i = 2:length(tvec) r1(trial,i) = r1(trial,i-1) + ... dt/tau*( rmax/(1+exp(-(Iapp1+wself*r1(trial,i-1)+wcross*r2(trial,i-1)-Ithresh)/Idelta)) ... -r1(trial,i-1) )+ noise1(i); r2(trial,i) = r2(trial,i-1) + ... dt/tau*(rmax/(1+exp(-(Iapp2+wself*r2(trial,i-1)+wcross*r1(trial,i-1)-Ithresh)/Idelta)) ... -r2(trial,i-1) ) + noise2(i); end end end

Page 17: Decision(Making,Models( Definition( action

Balci  F,  Simen  P,  Niyogi  R,  Saxe  A,  Hughes  JA,  Holmes  P,  Cohen  JD  (2011)  Acquisition  of  decision  making  criteria:  reward  rate  ultimately  beats  accuracy.  Atten  Percept  Psychophys  73:640-­‐657.  

Barto  AG  (1994)  Reinforcement  learning  control.  Current  Opinion  in  Neurobiology  4:888-­‐893.  

Barto  AG,  Mahadevan  S  (2003)  Recent  advances  in  hierarchical  reinforcement  learning.  Discrete  Event  Dynamical  Systems:  Theory  and  Applications  13:343-­‐379.  

Beck  JM,  Ma  WJ,  Kiani  R,  Hanks  T,  Churchland  AK,  Roitman  J,  Shadlen  MN,  Latham  PE,  Pouget  A  (2008)  Probabilistic  population  codes  for  Bayesian  decision  making.  Neuron  60:1142-­‐1152.  

Bellman  R  (1957)  Dynamic  Programming.  Princeton,  NJ:  Princeton  University  Press.  Bogacz  R,  Brown  E,  Moehlis  J,  Holmes  P,  Cohen  JD  (2006)  The  physics  of  optimal  

decision  making:  a  formal  analysis  of  models  of  performance  in  two-­‐alternative  forced-­‐choice  tasks.  Psychological  Review  113:700-­‐765.  

Bogacz  R,  McClure  SM,  Li  J,  Cohen  JD,  Montague  PR  (2007a)  Short-­‐term  memory  traces  for  action  bias  in  human  reinforcement  learning.  Brain  Res  1153:111-­‐121.  

Bogacz  R,  Usher  M,  Zhang  J,  McClelland  JL  (2007b)  Extending  a  biologically  inspired  model  of  choice:  multi-­‐alternatives,  nonlinearity  and  value-­‐based  multidimensional  choice.  Philos  Trans  R  Soc  Lond  B  Biol  Sci  362:1655-­‐1670.  

Botvinick  MM  (2012)  Hierarchical  reinforcement  learning  and  decision  making.  Current  Opinion  in  Neurobiology  22:956-­‐962.  

Brunton  BW,  Botvinick  MM,  Brody  CD  (2013)  Rats  and  humans  can  optimally  accumulate  evidence  for  decision-­‐making.  Science  340:95-­‐98.  

Cain  N,  Shea-­‐Brown  E  (2012)  Computational  models  of  decision  making:  integration,  stability,  and  noise.  Current  Opinion  in  Neurobiology  22:1047-­‐1053.  

Churchland  AK,  Ditterich  J  (2012)  New  advances  in  understanding  decisions  among  multiple  alternatives.  Current  Opinion  in  Neurobiology  22:920-­‐926.  

Churchland  AK,  Kiani  R,  Shadlen  MN  (2008)  Decision-­‐making  with  multiple  alternatives.  Nature  Neuroscience  11:693-­‐702.  

Cisek  P,  Puskas  GA,  El-­‐Murr  S  (2009)  Decisions  in  changing  conditions:  the  urgency-­‐gating  model.  J  Neurosci  29:11560-­‐11571.  

Daw  ND,  Doya  K  (2006)  The  computational  neurobiology  of  learning  and  reward.  Current  Opinion  in  Neurobiology  16:199-­‐204.  

Dayan  P,  Daw  ND  (2008)  Decision  theory,  reinforcement  learning,  and  the  brain.  Cognitive,  affective  &  behavioral  neuroscience  8:429-­‐453.  

Dayan  P,  Niv  Y  (2008)  Reinforcement  learning:  the  good,  the  bad  and  the  ugly.  Current  Opinion  in  Neurobiology  18:185-­‐196.  

Deco  G,  Rolls  ET  (2005)  Attention,  short-­‐term  memory,  and  action  selection:  a  unifying  theory.  Prog  Neurobiol  76:236-­‐256.  

Deneve  S  (2012)  Making  decisions  with  unknown  sensory  reliability.  Frontiers  in  neuroscience  6:75.  

Ditterich  J  (2010)  A  Comparison  between  Mechanisms  of  Multi-­‐Alternative  Perceptual  Decision  Making:  Ability  to  Explain  Human  Behavior,  Predictions  

Page 18: Decision(Making,Models( Definition( action

for  Neurophysiology,  and  Relationship  with  Decision  Theory.  Frontiers  in  neuroscience  4:184.  

Doya  K  (2008)  Modulators  of  decision  making.  Nature  Neuroscience  11:410-­‐416.  Drugowitsch  J,  Moreno-­‐Bote  R,  Churchland  AK,  Shadlen  MN,  Pouget  A  (2012)  The  

cost  of  accumulating  evidence  in  perceptual  decision  making.  The  Journal  of  neuroscience  :  the  official  journal  of  the  Society  for  Neuroscience  32:3612-­‐3628.  

Furman  M,  Wang  XJ  (2008)  Similarity  effect  and  optimal  control  of  multiple-­‐choice  decision  making.  Neuron  60:1153-­‐1168.  

Gillespie  DT  (1992)  Markov  Processes:  Academic  Press.  Glimcher  PW  (2001)  Making  choices:  the  neurophysiology  of  visual-­‐saccadic  

decision  making.  Trends  in  Neurosciences  24:654-­‐659.  Glimcher  PW  (2003)  The  neurobiology  of  visual-­‐saccadic  decision  making.  Annual  

Review  of  Neuroscience  26:133-­‐179.  Gold  JI,  Shadlen  MN  (2001)  Neural  computations  that  underlie  decisions  about  

sensory  stimuli.  Trends  Cogn  Sci  5:10-­‐16.  Gold  JI,  Shadlen  MN  (2007)  The  neural  basis  of  decision  making.  Annual  Review  of  

Neuroscience  30:535-­‐574.  Hanks  TD,  Mazurek  ME,  Kiani  R,  Hopp  E,  Shadlen  MN  (2011)  Elapsed  decision  time  

affects  the  weighting  of  prior  probability  in  a  perceptual  decision  task.  The  Journal  of  neuroscience  :  the  official  journal  of  the  Society  for  Neuroscience  31:6339-­‐6352.  

Huk  AC,  Shadlen  MN  (2005)  Neural  activity  in  macaque  parietal  cortex  reflects  temporal  integration  of  visual  motion  signals  during  perceptual  decision  making.  The  Journal  of  neuroscience  :  the  official  journal  of  the  Society  for  Neuroscience  25:10420-­‐10436.  

Ito  M,  Doya  K  (2011)  Multiple  representations  and  algorithms  for  reinforcement  learning  in  the  cortico-­‐basal  ganglia  circuit.  Current  Opinion  in  Neurobiology  21:368-­‐373.  

Izhikevich  EM  (2007)  Solving  the  distal  reward  problem  through  linkage  of  STDP  and  dopamine  signaling.  Cereb  Cortex  17:2443-­‐2452.  

Joel  D,  Niv  Y,  Ruppin  E  (2002)  Actor-­‐critic  models  of  the  basal  ganglia:  new  anatomical  and  computational  perspectives.  Neural  networks  :  the  official  journal  of  the  International  Neural  Network  Society  15:535-­‐547.  

Johnson  A,  van  der  Meer  MA,  Redish  AD  (2007)  Integrating  hippocampus  and  striatum  in  decision-­‐making.  Current  Opinion  in  Neurobiology  17:692-­‐697.  

Lawler  GF  (2006)  Introduction  to  Stochastic  Processes.  Boca  Raton:  Chapman  &  Hall/CRC.  

Lee  D,  Seo  H  (2007)  Mechanisms  of  reinforcement  learning  and  decision  making  in  the  primate  dorsolateral  prefrontal  cortex.  Annals  of  the  New  York  Academy  of  Sciences  1104:108-­‐122.  

Ludwig  CJ,  Davies  JR  (2011)  Estimating  the  growth  of  internal  evidence  guiding  perceptual  decisions.  Cognitive  Psychology  63:61-­‐92.  

Machens  CK,  Romo  R,  Brody  CD  (2005)  Flexible  control  of  mutual  inhibition:  a  neural  model  of  two-­‐interval  discrimination.  Science  307:1121-­‐1124.  

Page 19: Decision(Making,Models( Definition( action

Miller  P,  Katz  DB  (2013)  Accuracy  and  response-­‐time  distributions  for  decision-­‐making:  linear  perfect  integrators  versus  nonlinear  attractor-­‐based  neural  circuits.  Journal  of  Computational  Neuroscience.  

Miller  P,  Wang  XJ  (2006)  Discrimination  of  temporally  separated  stimuli  by  integral  feedback  control.  Proc  Natl  Acad  Sci  USA  103:201-­‐206.  

Newsome  WT,  Britten  KH,  Movshon  JA  (1989)  Neuronal  correlates  of  a  perceptual  decision.  Nature  341:52-­‐54.  

Niwa  M,  Ditterich  J  (2008)  Perceptual  decisions  between  multiple  directions  of  visual  motion.  The  Journal  of  neuroscience  :  the  official  journal  of  the  Society  for  Neuroscience  28:4435-­‐4445.  

Ratcliff  R  (1978)  A  theory  of  memory  retrieval.  Psychological  Review  85:59-­‐108.  Ratcliff  R  (2002)  A  diffusion  model  account  of  response  time  and  accuracy  in  a  

brightness  discrimination  task:  fitting  real  data  and  failing  to  fit  fake  but  plausible  data.  Psychon  Bull  Rev  9:278-­‐291.  

Ratcliff  R,  Hasegawa  YT,  Hasegawa  RP,  Smith  PL,  Segraves  MA  (2007)  Dual  diffusion  model  for  single-­‐cell  recording  data  from  the  superior  colliculus  in  a  brightness-­‐discrimination  task.  J  Neurophysiol  97:1756-­‐1774.  

Ratcliff  R,  McKoon  G  (2008)  The  diffusion  decision  model:  theory  and  data  for  two-­‐choice  decision  tasks.  Neural  Comput  20:873-­‐922.  

Ratcliff  R,  Smith  PL  (2004)  A  comparison  of  sequential  sampling  models  for  two-­‐choice  reaction  time.  Psychol  Rev  111:333-­‐367.  

Redish  AD,  Jensen  S,  Johnson  A,  Kurth-­‐Nelson  Z  (2007)  Reconciling  reinforcement  learning  models  with  behavioral  extinction  and  renewal:  implications  for  addiction,  relapse,  and  problem  gambling.  Psychological  Review  114:784-­‐805.  

Romo  R,  Salinas  E  (2003)  Flutter  discrimination:  neural  codes,  perception,  memory  and  decision  making.  Nat  Rev  Neurosci  4:203-­‐218.  

Rorie  AE,  Gao  J,  McClelland  JL,  Newsome  WT  (2010)  Integration  of  sensory  and  reward  information  during  perceptual  decision-­‐making  in  lateral  intraparietal  cortex  (LIP)  of  the  macaque  monkey.  PLoS  ONE  5:e9308.  

Rüter  J,  Marcille  N,  Sprekeler  H,  Gerstner  W,  Herzog  MH  (2012)  Paradoxical  evidence  integration  in  rapid  decision  processes.  PLoS  computational  biology  8:e1002382.  

Salinas  E  (2004)  Fast  remapping  of  sensory  stimuli  onto  motor  actions  on  the  basis  of  contextual  modulation.  J  Neurosci  24:1113-­‐1118.  

Seymour  B,  O'Doherty  JP,  Dayan  P,  Koltzenburg  M,  Jones  AK,  Dolan  RJ,  Friston  KJ,  Frackowiak  RS  (2004)  Temporal  difference  models  describe  higher-­‐order  learning  in  humans.  Nature  429:664-­‐667.  

Shadlen  MN,  Newsome  WT  (2001)  Neural  basis  of  a  perceptual  decision  in  the  parietal  cortex  (area  LIP)  of  the  rhesus  monkey.  Journal  of  Neurophysiology  86:1916-­‐1936.  

Shankar  S,  Massoglia  DP,  Zhu  D,  Costello  MG,  Stanford  TR,  Salinas  E  (2011)  Tracking  the  temporal  evolution  of  a  perceptual  judgment  using  a  compelled-­‐response  task.  The  Journal  of  neuroscience  :  the  official  journal  of  the  Society  for  Neuroscience  31:8406-­‐8421.  

Page 20: Decision(Making,Models( Definition( action

Shea-­‐Brown  E,  Gilzenrat  MS,  Cohen  JD  (2008)  Optimization  of  decision  making  in  multilayer  networks:  the  role  of  locus  coeruleus.  Neural  Comput  20:2863-­‐2894.  

Simen  P,  Contreras  D,  Buck  C,  Hu  P,  Holmes  P,  Cohen  JD  (2009)  Reward  rate  optimization  in  two-­‐alternative  decision  making:  empirical  tests  of  theoretical  predictions.  Journal  of  experimental  psychology  Human  perception  and  performance  35:1865-­‐1897.  

Smith  PL,  Ratcliff  R  (2004)  Psychology  and  neurobiology  of  simple  decisions.  Trends  Neurosci  27:161-­‐168.  

Soltani  A,  Wang  XJ  (2006)  A  Biophysically-­‐based  Neural  Model  of  Matching  Law  Behavior:  Melioration  by  Stochastic  Synapses.  J  Neurosci  26:3731-­‐3744.  

Soltani  A,  Wang  XJ  (2008)  From  biophysics  to  cognition:  reward-­‐dependent  adaptive  choice  behavior.  Curr  Opin  Neurobiol  18:209-­‐216.  

Soltani  A,  Wang  XJ  (2010)  Synaptic  computation  underlying  probabilistic  inference.  Nat  Neurosci  13:112-­‐119.  

Stanford  TR,  Shankar  S,  Massoglia  DP,  Costello  MG,  Salinas  E  (2010)  Perceptual  decision  making  in  less  than  30  milliseconds.  Nature  Neuroscience  13:379-­‐385.  

Sugrue  LP,  Corrado  GS,  Newsome  WT  (2005)  Choosing  the  greater  of  two  goods:  neural  currencies  for  valuation  and  decision  making.  Nature  reviews  Neuroscience  6:363-­‐375.  

Thura  D,  Beauregard-­‐Racine  J,  Fradet  CW,  Cisek  P  (2012)  Decision  making  by  urgency  gating:  theory  and  experimental  support.  Journal  of  Neurophysiology  108:2912-­‐2930.  

Usher  M,  McClelland  JL  (2001)  The  time  course  of  perceptual  choice:  the  leaky,  competing  accumulator  model.  Psychol  Rev  108:550-­‐592.  

Wald  A  (1947)  Sequential  analysis.  New  York:  Wiley.  Wald  A,  Wolfowitz  J  (1948)  Optimum  character  of  the  sequential  probability  ratio  

test.  Annals  of  Mathematical  Statistics  19:326-­‐339.  Wang  XJ  (2002)  Probabilistic  decision  making  by  slow  reverberation  in  cortical  

circuits.  Neuron  36:955-­‐968.  Wang  XJ  (2008)  Decision  making  in  recurrent  neuronal  circuits.  Neuron  60:215-­‐234.  Wong  KF,  Wang  XJ  (2006)  A  recurrent  network  mechanism  of  time  integration  in  

perceptual  decisions.  The  Journal  of  neuroscience  :  the  official  journal  of  the  Society  for  Neuroscience  26:1314-­‐1328.  

Wyart  V,  de  Gardelle  V,  Scholl  J,  Summerfield  C  (2012)  Rhythmic  fluctuations  in  evidence  accumulation  during  decision  making  in  the  human  brain.  Neuron  76:847-­‐858.  

Zhou  X,  Wong-­‐Lin  K,  Philip  H  (2009)  Time-­‐varying  perturbations  can  distinguish  among  integrate-­‐to-­‐threshold  models  for  perceptual  decision  making  in  reaction  time  tasks.  Neural  Computation  21:2336-­‐2362.