cognitive modeling / university of groningen / / artificial intelligence |rensselaer| cognitive...

27
Cognitive Modeling / University of Groningen / / Artificial Intelligenc e | RENSSELAER| Cognitive Science CogWorks Laboratories ›Christian P. Janssen ›Wayne D. Gray ›Michael J. Schoelles How a Modeler’s Conception of Rewards Influences a Model’s behavior Investigating ACT-R 6’s utility learning mechanism

Upload: shannon-cole

Post on 04-Jan-2016

216 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Cognitive Modeling / University of Groningen / / Artificial Intelligence |RENSSELAER| Cognitive Science CogWorks Laboratories › Christian P. Janssen ›

Cognitive Modeling/University

of Groningen

/ /Artificial Intelligence

|RENSSELAER| Cognitive ScienceCogWorksLaboratories

› Christian P. Janssen

› Wayne D. Gray

› Michael J. Schoelles

How a Modeler’s Conception of Rewards Influences a Model’s behaviorInvestigating ACT-R 6’s utility learning mechanism

Page 2: Cognitive Modeling / University of Groningen / / Artificial Intelligence |RENSSELAER| Cognitive Science CogWorks Laboratories › Christian P. Janssen ›

Cognitive Modeling/University

of Groningen

/ /Artificial Intelligence

|RENSSELAER| Cognitive ScienceCogWorksLaboratories

2

Temporal difference learning & ACT-R

› Temporal difference learning has recently been introduced as ACT-R’s new utility learning mechanism (e.g., Fu & Anderson, 2004; Anderson, 2006, 2007; Bothell, 2005)

› Utility learning learns to optimize behavior as to maximize the rewards that the model receives

› A model can:• Receive rewards at different moments in times• Receive rewards of different magnitudes

› There are no guidelines for choosing when a reward should be given and what its magnitude should be

Page 3: Cognitive Modeling / University of Groningen / / Artificial Intelligence |RENSSELAER| Cognitive Science CogWorks Laboratories › Christian P. Janssen ›

Cognitive Modeling/University

of Groningen

/ /Artificial Intelligence

|RENSSELAER| Cognitive ScienceCogWorksLaboratories

3

New issues for ACT-R

› We studied two aspects of TD learning:• When is reward given• Magnitude of the reward

› This a new issue for ACT-R• When is reward given: could be varied in ACT-R 5

• Magnitude of reward: could not be varied in ACT-R 5

› As we will show, the modeler’s conception of rewards has a big influence on a model’s behavior

› Case study: Blocks World task (Gray et al., 2006)

Page 4: Cognitive Modeling / University of Groningen / / Artificial Intelligence |RENSSELAER| Cognitive Science CogWorks Laboratories › Christian P. Janssen ›

Cognitive Modeling/University

of Groningen

/ /Artificial Intelligence

|RENSSELAER| Cognitive ScienceCogWorksLaboratories

4

Why the Blocks World task?

› Previous work indicates that the utility learning mechanism is crucial for this task• ACT-R 5 models (Gray, Sims, Schoelles, 2005)

• Regular ACT-R 5 can not provide a good fit to the human data

• Because rewards in ACT-R 5 are binary (i.e., successes and failures) and not scalar

• Ideal Performer Model (Gray et al., 2006)• Model outside of ACT-R that uses temporal difference learning provided a very good fit (Gray et al., 2006)

Page 5: Cognitive Modeling / University of Groningen / / Artificial Intelligence |RENSSELAER| Cognitive Science CogWorks Laboratories › Christian P. Janssen ›

Cognitive Modeling/University

of Groningen

/ /Artificial Intelligence

|RENSSELAER| Cognitive ScienceCogWorksLaboratories

5

Blocks World task

› So what’s the task?

Page 6: Cognitive Modeling / University of Groningen / / Artificial Intelligence |RENSSELAER| Cognitive Science CogWorks Laboratories › Christian P. Janssen ›

Cognitive Modeling/University

of Groningen

/ /Artificial Intelligence

|RENSSELAER| Cognitive ScienceCogWorksLaboratories

6

Blocks World task

Task: “Copy pattern in target window by moving blocks from resource window to workspace

window”

Page 7: Cognitive Modeling / University of Groningen / / Artificial Intelligence |RENSSELAER| Cognitive Science CogWorks Laboratories › Christian P. Janssen ›

Cognitive Modeling/University

of Groningen

/ /Artificial Intelligence

|RENSSELAER| Cognitive ScienceCogWorksLaboratories

7

Blocks World task

Windows are covered with gray rectangles:Accessing information requires interaction

with the interface

Page 8: Cognitive Modeling / University of Groningen / / Artificial Intelligence |RENSSELAER| Cognitive Science CogWorks Laboratories › Christian P. Janssen ›

Cognitive Modeling/University

of Groningen

/ /Artificial Intelligence

|RENSSELAER| Cognitive ScienceCogWorksLaboratories

8

Blocks World task

Windows are covered with gray rectangles:Accessing information requires interaction

with the interface

Page 9: Cognitive Modeling / University of Groningen / / Artificial Intelligence |RENSSELAER| Cognitive Science CogWorks Laboratories › Christian P. Janssen ›

Cognitive Modeling/University

of Groningen

/ /Artificial Intelligence

|RENSSELAER| Cognitive ScienceCogWorksLaboratories

9

Blocks World task

Windows are covered with gray rectangles:Accessing information requires interaction

with the interface

Page 10: Cognitive Modeling / University of Groningen / / Artificial Intelligence |RENSSELAER| Cognitive Science CogWorks Laboratories › Christian P. Janssen ›

Cognitive Modeling/University

of Groningen

/ /Artificial Intelligence

|RENSSELAER| Cognitive ScienceCogWorksLaboratories

10

Blocks World task

Windows are covered with gray rectangles:Accessing information requires interaction

with the interface

Page 11: Cognitive Modeling / University of Groningen / / Artificial Intelligence |RENSSELAER| Cognitive Science CogWorks Laboratories › Christian P. Janssen ›

Cognitive Modeling/University

of Groningen

/ /Artificial Intelligence

|RENSSELAER| Cognitive ScienceCogWorksLaboratories

11

Blocks World task

› Blocks world task:• Information in Target Window is only available after waiting for a lockout time • 0, 400 or 3200 milliseconds (between subjects)

Page 12: Cognitive Modeling / University of Groningen / / Artificial Intelligence |RENSSELAER| Cognitive Science CogWorks Laboratories › Christian P. Janssen ›

Cognitive Modeling/University

of Groningen

/ /Artificial Intelligence

|RENSSELAER| Cognitive ScienceCogWorksLaboratories

12

Blocks World task: human data (Gray et al., 2006)

› Size of lockout time influences human behavior:

0

1

2

3

4

5

0.0 1.0 2.0 3.0

Number of blocks placedafter 1st visit to target window

Lockout Time [s]

Page 13: Cognitive Modeling / University of Groningen / / Artificial Intelligence |RENSSELAER| Cognitive Science CogWorks Laboratories › Christian P. Janssen ›

Cognitive Modeling/University

of Groningen

/ /Artificial Intelligence

|RENSSELAER| Cognitive ScienceCogWorksLaboratories

13

Blocks World task: Modeling Strategies

› Strategy: How many blocks do you plan to place after a visit to the target window?

› 8 encode-x production rules• “study x blocks”• Encode-1 till encode-8

› Model learns utility value of each production rule using ACT-R’s temporal difference learning algorithm

Page 14: Cognitive Modeling / University of Groningen / / Artificial Intelligence |RENSSELAER| Cognitive Science CogWorks Laboratories › Christian P. Janssen ›

Cognitive Modeling/University

of Groningen

/ /Artificial Intelligence

|RENSSELAER| Cognitive ScienceCogWorksLaboratories

14

Utility learning

› Utility learning requires the incorporation of rewards

› Two choices are crucial:• When is the reward is given?• What is the magnitude of the reward?

› After some experience, the utility of a production rule approximates (Anderson, 2007):

U i = r(tx ) − (tx − ti)

Magnitude When is reward given

Page 15: Cognitive Modeling / University of Groningen / / Artificial Intelligence |RENSSELAER| Cognitive Science CogWorks Laboratories › Christian P. Janssen ›

Cognitive Modeling/University

of Groningen

/ /Artificial Intelligence

|RENSSELAER| Cognitive ScienceCogWorksLaboratories

15

Utility learning

› Choice 1: When is the reward given?› Important because:

• Utility value has a linear relationship with the the time at which the reward is given

› Choice in Blocks World• Once model: Update once, at the end of the trial

• Each model: Update each time that part of the task is completed.• A (set of) block(s) has been placed and the model either returns to the target window to study more blocks, or finishes the trial

U i = r(tx ) −Δtt i ,tx

Page 16: Cognitive Modeling / University of Groningen / / Artificial Intelligence |RENSSELAER| Cognitive Science CogWorks Laboratories › Christian P. Janssen ›

Cognitive Modeling/University

of Groningen

/ /Artificial Intelligence

|RENSSELAER| Cognitive ScienceCogWorksLaboratories

16

Utility learning

› Choice 2: magnitude of the reward› Important because:

• Utility value has a linear relationship with the magnitude of the reward

› But how to set this value?• Experimental tweaking? -> unfavorable• Fixed range of values? (e.g., between 0 and 1) -> difficult

• Relate to neurological data? -> not available for most models

U i = r(tx ) −Δtt i ,tx

Page 17: Cognitive Modeling / University of Groningen / / Artificial Intelligence |RENSSELAER| Cognitive Science CogWorks Laboratories › Christian P. Janssen ›

Cognitive Modeling/University

of Groningen

/ /Artificial Intelligence

|RENSSELAER| Cognitive ScienceCogWorksLaboratories

17

Utility learning

› Choice 2: magnitude of the reward› Choice in Blocks World:

• Relate the reward to what might be important in the task

• Accuracy: Accuracy with which task is performedOptions:• Success: # blocks placed (once)• Success: # blocks placed (each)• Success & Failure: # blocks placed - #blocks forgotten (each model)

• Time: How much time does (part of the) task take?Options:• Time spend on the task: -1 * time spend (once)• Time spend waiting for specific aspect of the task: -1 * lockout size * number of visits to target window (once)

• Number of blocks placed per second (each)

Page 18: Cognitive Modeling / University of Groningen / / Artificial Intelligence |RENSSELAER| Cognitive Science CogWorks Laboratories › Christian P. Janssen ›

Cognitive Modeling/University

of Groningen

/ /Artificial Intelligence

|RENSSELAER| Cognitive ScienceCogWorksLaboratories

18

Blocks World task: Modeling Strategies

› 6 models were developed› Each model is run 6 times for each of 3 experimental conditions:• 0, 400 and 3200 milliseconds

› Models interact with the same interface as human participants

Page 19: Cognitive Modeling / University of Groningen / / Artificial Intelligence |RENSSELAER| Cognitive Science CogWorks Laboratories › Christian P. Janssen ›

Cognitive Modeling/University

of Groningen

/ /Artificial Intelligence

|RENSSELAER| Cognitive ScienceCogWorksLaboratories

19

Blocks World task: general results

› Each model has unique results

Page 20: Cognitive Modeling / University of Groningen / / Artificial Intelligence |RENSSELAER| Cognitive Science CogWorks Laboratories › Christian P. Janssen ›

Cognitive Modeling/University

of Groningen

/ /Artificial Intelligence

|RENSSELAER| Cognitive ScienceCogWorksLaboratories

20

Blocks World task: general results

› What is the impact of:• When the reward is given (once/each)• The concept of the reward (related to accuracy/time)

› Results averaged over 3 models

Page 21: Cognitive Modeling / University of Groningen / / Artificial Intelligence |RENSSELAER| Cognitive Science CogWorks Laboratories › Christian P. Janssen ›

Cognitive Modeling/University

of Groningen

/ /Artificial Intelligence

|RENSSELAER| Cognitive ScienceCogWorksLaboratories

21

Utility learning: impact of when reward is given

Page 22: Cognitive Modeling / University of Groningen / / Artificial Intelligence |RENSSELAER| Cognitive Science CogWorks Laboratories › Christian P. Janssen ›

Cognitive Modeling/University

of Groningen

/ /Artificial Intelligence

|RENSSELAER| Cognitive ScienceCogWorksLaboratories

22

Utility learning: impact of concept of reward

Page 23: Cognitive Modeling / University of Groningen / / Artificial Intelligence |RENSSELAER| Cognitive Science CogWorks Laboratories › Christian P. Janssen ›

Cognitive Modeling/University

of Groningen

/ /Artificial Intelligence

|RENSSELAER| Cognitive ScienceCogWorksLaboratories

23

Comparison with ACT-R 5 (Gray, Sims & Schoelles, 2005)

Page 24: Cognitive Modeling / University of Groningen / / Artificial Intelligence |RENSSELAER| Cognitive Science CogWorks Laboratories › Christian P. Janssen ›

Cognitive Modeling/University

of Groningen

/ /Artificial Intelligence

|RENSSELAER| Cognitive ScienceCogWorksLaboratories

24

Conclusion

› Rewards can be given at different times during a trial and according to different concepts

› There are no guidelines what the best choices are

› Blocks World suggests that rewards should:• Be given once: Model can optimize behavior over entire task

• Relate to concept of time: because different strategy choices have a big impact on reward size

› Models of other tasks should point out if this is consistent

Page 25: Cognitive Modeling / University of Groningen / / Artificial Intelligence |RENSSELAER| Cognitive Science CogWorks Laboratories › Christian P. Janssen ›

Cognitive Modeling/University

of Groningen

/ /Artificial Intelligence

|RENSSELAER| Cognitive ScienceCogWorksLaboratories

25

Conclusion

› This is not just a Blocks World issue• General Computer Science / AI issue: representing a task in the right way is crucial(e.g., Russell & Norvig, 1995; Sutton & Barto, 1998)

• Many experiments involve manipulations and measurements of accuracy and speed of performance

› This a new issue for ACT-R• When is reward given: could be varied in ACT-R 5

• Magnitude of reward: could not be varied in ACT-R 5

Page 26: Cognitive Modeling / University of Groningen / / Artificial Intelligence |RENSSELAER| Cognitive Science CogWorks Laboratories › Christian P. Janssen ›

Cognitive Modeling/University

of Groningen

/ /Artificial Intelligence

|RENSSELAER| Cognitive ScienceCogWorksLaboratories

26

Thank you for your attention

› Questions?

› More information:• [email protected]• www.ai.rug.nl/~cjanssen• www.cogsci.rpi.edu/cogworks• Poster Session @ CogSci 2008 Thursday, July 24th “Cognitive Models of Strategy Shifts in Interactive Behavior”(session: “Attention and Implicit Learning”)

Page 27: Cognitive Modeling / University of Groningen / / Artificial Intelligence |RENSSELAER| Cognitive Science CogWorks Laboratories › Christian P. Janssen ›

Cognitive Modeling/University

of Groningen

/ /Artificial Intelligence

|RENSSELAER| Cognitive ScienceCogWorksLaboratories

27

References

› Anderson, J. R. (2006). A new utility learning mechanism. Paper presented at the 2006 ACT-R workshop.

› Anderson, J. R. (2007). How can the human mind occur in the physical universe? New York: Oxford University Press.

› Bothell, D. (2005). ACT-R 6 Official Release. Proceedings of the 12th ACT-R Workshop.

› Fu, W. T., & Anderson, J. R. (2004). Extending the computational abilities of the procedural learning mechanism in ACT-R. Proceedings of the 26th annual meeting of the Cognitive Science Society, 416-421.

› Gray, W. D., Schoelles, M. J., & Sims, C. R. (2005). Adapting to the task environment: Explorations in expected value. Cognitive Systems Research, 6(1), 27-40.

› Gray, W. D., Sims, C. R., Fu, W. T., & Schoelles, M. J. (2006). The soft constraints hypothesis: A rational analysis approach to resource allocation for interactive behavior. Psychological Review, 113(3), 461-482.

› Russell, S. J., & Norvig, P. (1995). Artificial intelligence: a modern approach. Upper Saddle River, NJ: Prentice-Hall, Inc.

› Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: An introduction. Cambridge, MA: MIT Press.