cognitive modeling / university of groningen / / artificial intelligence |rensselaer| cognitive...
TRANSCRIPT
Cognitive Modeling/University
of Groningen
/ /Artificial Intelligence
|RENSSELAER| Cognitive ScienceCogWorksLaboratories
› Christian P. Janssen
› Wayne D. Gray
› Michael J. Schoelles
How a Modeler’s Conception of Rewards Influences a Model’s behaviorInvestigating ACT-R 6’s utility learning mechanism
Cognitive Modeling/University
of Groningen
/ /Artificial Intelligence
|RENSSELAER| Cognitive ScienceCogWorksLaboratories
2
Temporal difference learning & ACT-R
› Temporal difference learning has recently been introduced as ACT-R’s new utility learning mechanism (e.g., Fu & Anderson, 2004; Anderson, 2006, 2007; Bothell, 2005)
› Utility learning learns to optimize behavior as to maximize the rewards that the model receives
› A model can:• Receive rewards at different moments in times• Receive rewards of different magnitudes
› There are no guidelines for choosing when a reward should be given and what its magnitude should be
Cognitive Modeling/University
of Groningen
/ /Artificial Intelligence
|RENSSELAER| Cognitive ScienceCogWorksLaboratories
3
New issues for ACT-R
› We studied two aspects of TD learning:• When is reward given• Magnitude of the reward
› This a new issue for ACT-R• When is reward given: could be varied in ACT-R 5
• Magnitude of reward: could not be varied in ACT-R 5
› As we will show, the modeler’s conception of rewards has a big influence on a model’s behavior
› Case study: Blocks World task (Gray et al., 2006)
Cognitive Modeling/University
of Groningen
/ /Artificial Intelligence
|RENSSELAER| Cognitive ScienceCogWorksLaboratories
4
Why the Blocks World task?
› Previous work indicates that the utility learning mechanism is crucial for this task• ACT-R 5 models (Gray, Sims, Schoelles, 2005)
• Regular ACT-R 5 can not provide a good fit to the human data
• Because rewards in ACT-R 5 are binary (i.e., successes and failures) and not scalar
• Ideal Performer Model (Gray et al., 2006)• Model outside of ACT-R that uses temporal difference learning provided a very good fit (Gray et al., 2006)
Cognitive Modeling/University
of Groningen
/ /Artificial Intelligence
|RENSSELAER| Cognitive ScienceCogWorksLaboratories
5
Blocks World task
› So what’s the task?
Cognitive Modeling/University
of Groningen
/ /Artificial Intelligence
|RENSSELAER| Cognitive ScienceCogWorksLaboratories
6
Blocks World task
Task: “Copy pattern in target window by moving blocks from resource window to workspace
window”
Cognitive Modeling/University
of Groningen
/ /Artificial Intelligence
|RENSSELAER| Cognitive ScienceCogWorksLaboratories
7
Blocks World task
Windows are covered with gray rectangles:Accessing information requires interaction
with the interface
Cognitive Modeling/University
of Groningen
/ /Artificial Intelligence
|RENSSELAER| Cognitive ScienceCogWorksLaboratories
8
Blocks World task
Windows are covered with gray rectangles:Accessing information requires interaction
with the interface
Cognitive Modeling/University
of Groningen
/ /Artificial Intelligence
|RENSSELAER| Cognitive ScienceCogWorksLaboratories
9
Blocks World task
Windows are covered with gray rectangles:Accessing information requires interaction
with the interface
Cognitive Modeling/University
of Groningen
/ /Artificial Intelligence
|RENSSELAER| Cognitive ScienceCogWorksLaboratories
10
Blocks World task
Windows are covered with gray rectangles:Accessing information requires interaction
with the interface
Cognitive Modeling/University
of Groningen
/ /Artificial Intelligence
|RENSSELAER| Cognitive ScienceCogWorksLaboratories
11
Blocks World task
› Blocks world task:• Information in Target Window is only available after waiting for a lockout time • 0, 400 or 3200 milliseconds (between subjects)
Cognitive Modeling/University
of Groningen
/ /Artificial Intelligence
|RENSSELAER| Cognitive ScienceCogWorksLaboratories
12
Blocks World task: human data (Gray et al., 2006)
› Size of lockout time influences human behavior:
0
1
2
3
4
5
0.0 1.0 2.0 3.0
Number of blocks placedafter 1st visit to target window
Lockout Time [s]
Cognitive Modeling/University
of Groningen
/ /Artificial Intelligence
|RENSSELAER| Cognitive ScienceCogWorksLaboratories
13
Blocks World task: Modeling Strategies
› Strategy: How many blocks do you plan to place after a visit to the target window?
› 8 encode-x production rules• “study x blocks”• Encode-1 till encode-8
› Model learns utility value of each production rule using ACT-R’s temporal difference learning algorithm
Cognitive Modeling/University
of Groningen
/ /Artificial Intelligence
|RENSSELAER| Cognitive ScienceCogWorksLaboratories
14
Utility learning
› Utility learning requires the incorporation of rewards
› Two choices are crucial:• When is the reward is given?• What is the magnitude of the reward?
› After some experience, the utility of a production rule approximates (Anderson, 2007):
€
U i = r(tx ) − (tx − ti)
Magnitude When is reward given
Cognitive Modeling/University
of Groningen
/ /Artificial Intelligence
|RENSSELAER| Cognitive ScienceCogWorksLaboratories
15
Utility learning
› Choice 1: When is the reward given?› Important because:
• Utility value has a linear relationship with the the time at which the reward is given
› Choice in Blocks World• Once model: Update once, at the end of the trial
• Each model: Update each time that part of the task is completed.• A (set of) block(s) has been placed and the model either returns to the target window to study more blocks, or finishes the trial
€
U i = r(tx ) −Δtt i ,tx
Cognitive Modeling/University
of Groningen
/ /Artificial Intelligence
|RENSSELAER| Cognitive ScienceCogWorksLaboratories
16
Utility learning
› Choice 2: magnitude of the reward› Important because:
• Utility value has a linear relationship with the magnitude of the reward
› But how to set this value?• Experimental tweaking? -> unfavorable• Fixed range of values? (e.g., between 0 and 1) -> difficult
• Relate to neurological data? -> not available for most models
€
U i = r(tx ) −Δtt i ,tx
Cognitive Modeling/University
of Groningen
/ /Artificial Intelligence
|RENSSELAER| Cognitive ScienceCogWorksLaboratories
17
Utility learning
› Choice 2: magnitude of the reward› Choice in Blocks World:
• Relate the reward to what might be important in the task
• Accuracy: Accuracy with which task is performedOptions:• Success: # blocks placed (once)• Success: # blocks placed (each)• Success & Failure: # blocks placed - #blocks forgotten (each model)
• Time: How much time does (part of the) task take?Options:• Time spend on the task: -1 * time spend (once)• Time spend waiting for specific aspect of the task: -1 * lockout size * number of visits to target window (once)
• Number of blocks placed per second (each)
Cognitive Modeling/University
of Groningen
/ /Artificial Intelligence
|RENSSELAER| Cognitive ScienceCogWorksLaboratories
18
Blocks World task: Modeling Strategies
› 6 models were developed› Each model is run 6 times for each of 3 experimental conditions:• 0, 400 and 3200 milliseconds
› Models interact with the same interface as human participants
Cognitive Modeling/University
of Groningen
/ /Artificial Intelligence
|RENSSELAER| Cognitive ScienceCogWorksLaboratories
19
Blocks World task: general results
› Each model has unique results
Cognitive Modeling/University
of Groningen
/ /Artificial Intelligence
|RENSSELAER| Cognitive ScienceCogWorksLaboratories
20
Blocks World task: general results
› What is the impact of:• When the reward is given (once/each)• The concept of the reward (related to accuracy/time)
› Results averaged over 3 models
Cognitive Modeling/University
of Groningen
/ /Artificial Intelligence
|RENSSELAER| Cognitive ScienceCogWorksLaboratories
21
Utility learning: impact of when reward is given
Cognitive Modeling/University
of Groningen
/ /Artificial Intelligence
|RENSSELAER| Cognitive ScienceCogWorksLaboratories
22
Utility learning: impact of concept of reward
Cognitive Modeling/University
of Groningen
/ /Artificial Intelligence
|RENSSELAER| Cognitive ScienceCogWorksLaboratories
23
Comparison with ACT-R 5 (Gray, Sims & Schoelles, 2005)
Cognitive Modeling/University
of Groningen
/ /Artificial Intelligence
|RENSSELAER| Cognitive ScienceCogWorksLaboratories
24
Conclusion
› Rewards can be given at different times during a trial and according to different concepts
› There are no guidelines what the best choices are
› Blocks World suggests that rewards should:• Be given once: Model can optimize behavior over entire task
• Relate to concept of time: because different strategy choices have a big impact on reward size
› Models of other tasks should point out if this is consistent
Cognitive Modeling/University
of Groningen
/ /Artificial Intelligence
|RENSSELAER| Cognitive ScienceCogWorksLaboratories
25
Conclusion
› This is not just a Blocks World issue• General Computer Science / AI issue: representing a task in the right way is crucial(e.g., Russell & Norvig, 1995; Sutton & Barto, 1998)
• Many experiments involve manipulations and measurements of accuracy and speed of performance
› This a new issue for ACT-R• When is reward given: could be varied in ACT-R 5
• Magnitude of reward: could not be varied in ACT-R 5
Cognitive Modeling/University
of Groningen
/ /Artificial Intelligence
|RENSSELAER| Cognitive ScienceCogWorksLaboratories
26
Thank you for your attention
› Questions?
› More information:• [email protected]• www.ai.rug.nl/~cjanssen• www.cogsci.rpi.edu/cogworks• Poster Session @ CogSci 2008 Thursday, July 24th “Cognitive Models of Strategy Shifts in Interactive Behavior”(session: “Attention and Implicit Learning”)
Cognitive Modeling/University
of Groningen
/ /Artificial Intelligence
|RENSSELAER| Cognitive ScienceCogWorksLaboratories
27
References
› Anderson, J. R. (2006). A new utility learning mechanism. Paper presented at the 2006 ACT-R workshop.
› Anderson, J. R. (2007). How can the human mind occur in the physical universe? New York: Oxford University Press.
› Bothell, D. (2005). ACT-R 6 Official Release. Proceedings of the 12th ACT-R Workshop.
› Fu, W. T., & Anderson, J. R. (2004). Extending the computational abilities of the procedural learning mechanism in ACT-R. Proceedings of the 26th annual meeting of the Cognitive Science Society, 416-421.
› Gray, W. D., Schoelles, M. J., & Sims, C. R. (2005). Adapting to the task environment: Explorations in expected value. Cognitive Systems Research, 6(1), 27-40.
› Gray, W. D., Sims, C. R., Fu, W. T., & Schoelles, M. J. (2006). The soft constraints hypothesis: A rational analysis approach to resource allocation for interactive behavior. Psychological Review, 113(3), 461-482.
› Russell, S. J., & Norvig, P. (1995). Artificial intelligence: a modern approach. Upper Saddle River, NJ: Prentice-Hall, Inc.
› Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: An introduction. Cambridge, MA: MIT Press.