soar one-hour tutorial
DESCRIPTION
Soar One-hour Tutorial. John E. Laird University of Michigan March 2009. http://sitemaker.umich.edu/soar [email protected]. Supported in part by DARPA and ONR. Tutorial Outline. Cognitive Architecture Soar History Overview of Soar Details of Basic Soar Processing and Syntax - PowerPoint PPT PresentationTRANSCRIPT
Soar One-hour TutorialJohn E. Laird
University of MichiganMarch 2009
http://sitemaker.umich.edu/soar [email protected]
Supported in part by DARPA and ONR
1
Tutorial Outline1. Cognitive Architecture2. Soar History3. Overview of Soar4. Details of Basic Soar Processing and Syntax
– Internal decision cycle– Interaction with external environments– Subgoals and meta-reasoning– Chunking
5. Recent extensions to Soar– Reinforcement Learning– Semantic Memory– Episodic Memory– Visual Imagery
2
Learning
How can we build a human-level AI?
3
Tasks
Neurons
Neural Circuits
Brain Structure
CalculusHistory
ReadingSudoku
Shopping
Driving
Talking on cell phone
Learning
How can we build a human-level AI?
Tasks
Neurons
Neural Circuits
Brain Structure
CalculusHistory
ReadingSudoku
Shopping
Driving
Talking on cell phone
4
Programs
Computer Architecture
Logic Circuits
Electrical circuits
Learning
How can we build a human-level AI?
Tasks
Neurons
Neural Circuits
Brain Structure
CalculusHistory
ReadingSudoku
Shopping
Driving
Talking on cell phone
5
Programs
Computer Architecture
Logic Circuits
Electrical circuits
Symbolic Long-Term Memories
Procedural
Symbolic Short-Term Memory
Decision Procedure
ChunkingReinforcementLearning
Semantic
SemanticLearning
Episodic
EpisodicLearning
Perception Action
Imagery
App
rais
als
CognitiveArchitecture
Body
Cognitive Architecture
Fixed mechanisms underlying cognition– Memories, processing elements, control, interfaces– Representations of knowledge– Separation of fixed processes and variable knowledge– Complex behavior arises from composition of simple
primitivesPurpose:
– Bring knowledge to bear to select actions to achieve goals
Not just a framework – BDI, NN, logic & probability, rule-based systems
Important constraints:– Continual performance– Real-time performance– Incremental, on-line learning
Architecture
Knowledge Goals
Task Environment
6
Common Structures of manyCognitive Architectures
7
Short-term Memory
Procedural Long-term Memory
Declarative Long-term Memory
Perception Action
ActionSelection
Procedure Learning
Declarative Learning
Goals
Different Goals of Cognitive Architecture
• Biological plausibility: Does the architecture correspond to what we know about the brain?
• Psychological plausibility: Does the architecture capture the details of human performance in a wide range of cognitive tasks?
• Functionality: Does the architecture explain how humans achieve their high level of intellectual function? – Building Human-level AI
8
Short History of Soar
9
1980 19951985 1990 2000 2005
Pre-SoarProblem SpacesProduction SystemsHeuristic Search
Functionality
Modeling
Multi-method Multi-task problem solvingSubgoalingChunking
UTCNatural LanguageHCIExternal Environment
IntegrationLarge bodies of knowledgeTeamworkReal Application
Virtual AgentsLearning from Experience, Observation, Instruction
New Capabilities
Distinctive Features of Soar• Emphasis on functionality
– Take engineering, scaling issues seriously – Interfaces to real world systems– Can build very large systems in Soar that exist for a long time
• Integration with perception and action– Mental imagery and spatial reasoning
• Integrates reaction, deliberation, meta-reasoning– Dynamically switching between them
• Integrated learning – Chunking, reinforcement learning, episodic & semantic
• Useful in cognitive modeling– Expanding this is emphasis of many current projects
• Easy to integrate with other systems & environments– SML efficiently supports many languages, inter-process
10
System ArchitectureSoar Kernel
gSKI
KernelSML
ClientSML
SWIG LanguageLayer
Application
SML
Soar 9.0 Kernel (C)
Higher-level Interface (C++)
Encodes/Decodes function calls and responses in XML (C++)
Soar Markup Language
Encodes/Decodes function calls and responses in XML (C++)
Wrapper for Java/Tcl (Not needed if app is in C++)
Application (any language)
Soar Basics
• Operators: Deliberate changes to internal/external state • Activity is a series of operators controlled by knowledge:
1. Input from environment2. Elaborate current situation: parallel rules3. Propose and evaluate operators via preferences: parallel rules4. Select operator5. Apply operator: Modify internal data structures: parallel rules6. Output to motor system
12
Agent in real or virtual world
?
Agent in new state
?
Agent in new state
Operator
Basic Soar Architecture
Body
Long-Term MemoryProcedural
Symbolic Short-Term MemoryDecision
Procedure
Chunking
Perception Action
ElaborateOperator OutputInput
Elaborate State
Propose Operators
Evaluate Operators
Select Operator Apply Operator
ApplyDecide
13
EvaluateOperatorsEvaluate
Operators
ProductionMemory
WorkingMemory
Soar 101: Eaters
East
SouthNorth
ProposeOperator
North > EastSouth > East
North = South
ApplyOperator OutputInput Select
Operator
If cell in direction <d> is not a wall, --> propose operator move <d>
If operator <o1> will move to a bonus food and operator <o2> will move to a normal food, --> operator <o1> > <o2>
If an operator is selected to move <d>--> create outputmove-direction <d>
Input ProposeOperator
SelectOperator
ApplyOperator Output
If operator <o1> will move to a empty cell--> operator <o1> <
North > EastSouth <
move-direction North
Example Working Memory
BA (s1 ^block b1 ^block b2 ^table t1)
(b1 ^color blue ^name A ^ontop b2 ^size 1 ^type block ^weight 14)(b2 ^color yellow ^name B ^ontop t1 ^size 1 ^type block ^under b1 ^weight 14)(t1 ^color gray ^shape square ^type table ^under b2)
Working memory is a graph.All working memory elements must be “linked” directly or indirectly to a state.
S1
b1
t1
b2
^block
^block
^table
yellow
block
1
B
14
^color
^name
^size
^type
^weight
^under
^ontop
15
Soar Processing Cycle
16
ElaborateOperator OutputInput
Elaborate State
Propose Operators
Evaluate Operators
Select Operator Apply Operator
ApplyDecide
Rules Impasse
Subgoal
ElaborateOperator OutputInput
Elaborate State
Propose Operators
Evaluate Operators
Select Operator Apply Operator
ApplyDecide
TankSoar
Red Tank’s Shield
Borders (stone)
Walls (trees)
Health charger
Missile pack
Blue tank (Ouch!)
Energy charger
Green tank’s radar
17
Soar 103: Subgoals
ProposeOperator
CompareOperators
ApplyOperator OutputInput Select
OperatorInput ProposeOperator
CompareOperators
SelectOperator
Move
Wander
If enemy not sensed, then wander
Turn
ApplyOperator Output
Soar 103: Subgoals
ProposeOperator
CompareOperators
ApplyOperator OutputInput Select
Operator
Attack
If enemy is sensed, then attack
Shoot
TacAir-Soar [1997]
Controls simulated aircraft in real-time training exercises (>3000 entities)
Flies all U.S. air missions
Dynamically changes missions as appropriate
Communicates and coordinates with computer and human controlled planes
Large knowledge base (8000 rules)
No learning
TacAir-Soar Task Decomposition
AchieveProximity
EmployWeapons Search Execute
TacticScram
Get MissileLAR
SelectMissile
Get SteeringCircle
SortGroup
LaunchMissile
Lock Radar Lock IR Fire-Missile Wait-forMissile-Clear
If intercepting an enemy andthe enemy is within range ROE are met thenpropose employ-weapons
EmployWeapons
If employing-weapons andmissile has been selected andthe enemy is in the steering circle and LAR has been achieved, then propose launch-missile Launch
MissileIf launching a missile andit is an IR missile and there is currently no IR lockthen propose lock-IRLock IR
Execute Mission
Fly-route GroundAttackFly-Wing Intercept
If instructed to intercept an enemy then propose intercept
Intercept
>250 goals, >600 operators, >8000 rules 21
Impasse/Substate Implications:
• Substate is really meta-state that allows system to reflect• Substate = goal to resolve impasse
– Generate operator – Select operator (deliberate control)– Apply operator (task decomposition)
• All basic problem solving functions open to reflection – Operator creation, selection, application, state elaboration
• Substate is where knowledge to resolve impasse can be found• Hierarchy of substate/subgoals arise through recursive impasses
22
Tie Subgoals and Chunking
East
SouthNorth
ProposeOperator
EvaluateOperators
ApplyOperator OutputInput Select
OperatorInput Propose
OperatorEvaluate
OperatorsSelect
Operator
Tie Impasse
Evaluate-operator (North)
North = 10
Evaluate-operator (South)
Evaluate-operator (East)
= 10 = 10 = 5
Chunking creates rule that applies evaluate-operator
North > EastSouth > EastNorth = South
= 10
Chunking creates rules that create preferences
based on what was tested
Chunking Analysis• Converts deliberate reasoning/planning to reaction• Generality of learning based on generality of reasoning
– Leads to many different types learning– If reasoning is inductive, so is learning
• Soar only learns what it thinks about• Chunking is impasse driven
– Learning arises from a lack of knowledge
24
Extending Soar
• Learn from internal rewards– Reinforcement learning
• Learn facts– What you know– Semantic memory
• Learn events– What you remember– Episodic memory
• Basic drives and …– Emotions, feelings, mood
• Non-symbolic reasoning– Mental imagery
• Learn from regularities– Spatial and temporal clusters
Body
Symbolic Long-Term Memories
Procedural
Symbolic Short-Term MemoryDecision
Procedure
ChunkingReinforcementLearning
Semantic
SemanticLearning
Episodic
EpisodicLearning
Perception ActionVisual
Imagery
App
rais
al
Det
ecto
r
ReinforcementLearning
Clustering
25
Theoretical Commitments
Stayed the Same• Problem Space Computational Model• Long-term & short-term memories• Associative procedural knowledge• Fixed decision procedure• Impasse-driven reasoning• Incremental, experience-driven
learning• No task-specific modules
Changed• Multiple long-term memories• Multiple learning mechanisms• Modality-specific representations &
processing• Non-symbolic processing
– Symbol generation (clustering)– Control (numeric preferences)– Learning Control (reinforcement learning)– Intrinsic reward (appraisals)– Aid memory retrieval (WM activation)– Non-symbolic reasoning (visual imagery)
26
Reinforcement LearningShelly Nason
27
RL in Soar
1. Encode the value function as operator evaluation rules with numeric preferences.
2. Combine all numeric preferences for an operator dynamically.
3. Adjust value of numeric preferences with experience.
Internal State
Value Function
PerceptionReward
Update ValueFunction
Action Selection Action
28
The Q-function in Soar
The value-function is stored in rules that test the state and operator, and create numeric preferences.
sp {rl-rule (state <s> ^operator <o> +) …--> (<s> ^operator <o> = 0.34)}
Operator Q-value = the sum of all numeric preferences.Selection: epsilon greedy, or Boltzmann
O1: {.34, .45, .02} = 8.1
O2: {.25, .11, .12} = 4.8
O3: {-.04, .14, -.05} = .05
epsilon-greedy: With probability ε the agent selects an action at random. Otherwise the agent takes the action with the highest expected value. [Balance exploration/exploitation]
29
Updating operator values
Sarsa update:Q(s,O1) Q(s,O1) + α[r + λQ(s’,O2) – Q(s,O1)] .1 * [.2 + .9*.11 - .33] = -.03
Update is split evenly between rules contributing to O1 = -.01.R1 = .19, R2 = .14, R3 = -.03
O1 = .33
Q(s,O1) = sum of numeric prefs.
r = reward = .2
O2 = .11
Q(s’,O2) = sum of numeric prefs. of selected operator (O2)
R1(O1) = .20R2(O1) = .15R3(O1)= -.02
30
Results with Eaters
0
200
400
600
800
1000
1200
1 13 25 37 49 61 73 85 97 109 121 133 145 157 169 181 193 205 217 229 241 253 265 277 289
Tota
l Sco
re
Move #
Figure 2a rule
Random
After 5
After 10
After 15
After 20
31
RL TankSoar Agent
-20
-10
0
10
20
30
40
50
60
1 11 21 31 41 51 61 71 81 91 101 111 121 131 141 151 161 171
Successive Games
Aver
age
Mar
gin
of V
icto
ry
32
Semantic MemoryYongjia Wang
33
Memory Systems
Memory
Long Term Memory Short Term Memory
Declarative Procedural
Semantic Memory
Episodic Memory
Perceptual Representation
System
Procedural Memory
Working Memory
34
Declarative Memory Alternatives
• Working Memory– Keep everything in working memory
• Retrieve dynamically with rules– Rules provide asymmetric access – Data chunking to learn (complex)
• Separate Declarative Memories– Semantic memory (facts)– Episodic memory (events)
35
Basic Semantic Memory Functionalities
• Encoding– What to save?– When to add new declarative chunk?– How to update knowledge?
• Retrieval– How the cue is placed and matched?– What are the different types of retrieval?
• Storage– What are the storage structures? – How are they maintained?
36
Semantic Memory Functionalities
AB A
state
B
Cue
AExpand
NIL NIL
ExpandCue
C
D E F
D EFE
E
Save
NILSave
Save
Feature Match
Retrieval
Update with Complex Structure
AutoCommit
Remove-No-Change
Semantic Memory
Working Memory
37
Episodic Memory Andrew Nuxoll
38
Memory Systems
Memory
Long Term Memory Short Term Memory
Declarative Procedural
Semantic Memory
Episodic Memory
Perceptual Representation
System
Procedural Memory
Working Memory
39
Episodic vs. Semantic Memory
• Semantic Memory–Knowledge of what we “know”–Example: what state the Grand Canyon
is in• Episodic Memory
–History of specific events–Example: a family vacation to the Grand Canyon
Characteristics of Episodic Memory: Tulving• Architectural:
– Does not compete with reasoning.– Task independent
• Automatic: – Memories created without deliberate decision.
• Autonoetic: – Retrieved memory is distinguished from sensing.
• Autobiographical: – Episode remembered from own perspective.
• Variable Duration: – The time period spanned by a memory is not fixed.
• Temporally Indexed: – Rememberer has a sense of when the episode occurred.
41
Long-term Procedural MemoryProduction Rules
Implementation
Encoding Initiation?
Storage
Retrieval
When the agent takes an action.
Input
Output Cue
Retrieved
Working Memory
42
Long-term Procedural MemoryProduction Rules
Current Implementation
Encoding Initiation Content?Storage
Retrieval
The entire working memory is stored in the episode
Input
Output Cue
Retrieved
Working Memory
43
Long-term Procedural MemoryProduction Rules
Current Implementation
Encoding Initiation ContentStorage Episode Structure?Retrieval
Episodes are stored in a separate memory
Input
Output Cue
Retrieved
Working Memory
EpisodicMemory
EpisodicLearning
44
Long-term Procedural MemoryProduction Rules
Current Implementation
Encoding Initiation ContentStorage Episode StructureRetrieval Initiation/Cue?
Cue is placed in an architecture specific buffer.
Input
Output Cue
Retrieved
Working Memory
EpisodicMemory
EpisodicLearning
45
EpisodicMemory
Long-term Procedural MemoryProduction Rules
Current Implementation
Encoding Initiation ContentStorage Episode StructureRetrieval Initiation/Cue Retrieval
The closest partial match is retrieved.
Input
Output Cue
Retrieved
Working Memory
EpisodicLearning
46
Cognitive Capability: Virtual Sensing• Retrieve prior perception that
is relevant to the current task • Tank recursively searches
memory– Have I seen a charger from here?– Have I seen a place where I can
see a charger? ?
47
Virtual Sensors Results
0
50
100
150
200
250
1 3 5 7 9 11 13 15 17 19
Subsequent Searches
Ave
rage
Num
ber o
f Mov
es
Average RandomEpisodic Memory
48
Create a memory cue
East
SouthNorth
Evaluate moving in each available direction
Cognitive Capability: Action Modeling
49
EpisodicRetrieval
Retrieve the best matching memory
RetrieveNext Memory
Retrieve the next memory Use the change in score to evaluate the proposed action
Move North = 10 points
Agent’s knowledge is insufficient - impasseAgent attempts to choose direction
Episodic Memory:Multi-Step Action Projection
[Andrew Nuxoll]
• Learn tactics from prior success and failure– Fight/flight– Back away from enemy (and fire)– Dodging
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174
-30
-20
-10
0
10
20
30
40
Average Margin of Victory
Successive Games
Mar
gin
of V
icto
ry
Enables Cognitive Capabilities • Sensing
– Detect Changes – Detect Repetition– Virtual Sensing
• Reasoning– Model Actions– Use Previous
Successes/Failures– Model the Environment– Manage Long Term Goals– Explain Behavior
• Learning– Retroactive Learning– Allows Reanalysis Given New
Knowledge– “Boost” other Learning
Mechanisms
Episodic Memory
51
Mental Imagery and Spatial ReasoningScott Lathrop
Sam Wintermute
See AGI Talks
52
• Shape, color, topology, spatial properties • Depictive, pixel-based representations• Image algebra algorithms
Sentential/Algebraic algorithms Depictive/Ordinal algorithms
VISUAL IMAGERY
VISUAL-SPATIAL VISUAL-DEPICTIVE
• Location, orientation • Sentential, quantitative
representations• Linear algebra and
computational geometry algorithms
WHAT IS VISUAL IMAGERY?
53
Where can you put A next to I?
54
Spatial Problem Solving with Mental Imagery[Scott Lathrop & Sam Wintermute]
Environment
Spatial Scene
SoarQualitative descriptions of object relationships
Qualitative description of new objects in relation to existing objects
Quantitative descriptions of environmental objects
O
A
A’ A’
(on AI)
(imagine_left_of A I)
(intersect A′ O)(no_intersect A’)
(imagine_right_of A I)(move_right_of A I)
I
Upcoming Challenges
• Continued refinement and integration• Integrate with complex perception and motor
systems• Adding/learning lots of world knowledge
+ Language, Spatial, Temporal Reasoning, …• Scaling up to large bodies of knowledge
– Build up from instruction, experience, exploration, …
56
Soar Community
• Soar Website– http://sitemaker.umich.edu/soar
• Soar Workshop every June in Ann Arbor– June 22-26, 2009
• Soar-group– http://lists.sourceforge.net/lists/listinfo/soar-group– Low traffic
57
Thanks to
Funding Agencies: NSF, DARPA, ONRPh.D. students:
Nate Derbinsky, Nicholas Gorski, Scott Lathrop, Robert Marinier, Andrew Nuxoll, Yongjia Wang, Samuel Wintermute, Joseph Xu
Research Programmers:Karen Coulter, Jonathan Voigt
Continued inspiration:Allen Newell
58
Challenges in Cognitive Architecture Research
• Dynamic taskability– Pursue novel tasks
• Learning– Always learning, learning in unexpected and unplanned ways (wild learning)– Transition from programming to learning by imitation, instruction, experience, reflection,
…• Natural language
– Active area but much left to do.• Social behavior
– Interaction with humans and other entities • Connect to the real world
– Cognitive robotics with long-term existence• Applications
– Expand domains and problems– Putting cognitive architectures to work
• Connect to unfolding research on the brain, psychology, and the rest of AI.60