soar one-hour tutorial

60
Soar One-hour Tutorial John E. Laird University of Michigan March 2009 http:// sitemaker.umich.edu/soar [email protected] Supported in part by DARPA and ONR 1

Upload: penny

Post on 25-Feb-2016

95 views

Category:

Documents


19 download

DESCRIPTION

Soar One-hour Tutorial. John E. Laird University of Michigan March 2009. http://sitemaker.umich.edu/soar [email protected]. Supported in part by DARPA and ONR. Tutorial Outline. Cognitive Architecture Soar History Overview of Soar Details of Basic Soar Processing and Syntax - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Soar One-hour Tutorial

Soar One-hour TutorialJohn E. Laird

University of MichiganMarch 2009

http://sitemaker.umich.edu/soar [email protected]

Supported in part by DARPA and ONR

1

Page 2: Soar One-hour Tutorial

Tutorial Outline1. Cognitive Architecture2. Soar History3. Overview of Soar4. Details of Basic Soar Processing and Syntax

– Internal decision cycle– Interaction with external environments– Subgoals and meta-reasoning– Chunking

5. Recent extensions to Soar– Reinforcement Learning– Semantic Memory– Episodic Memory– Visual Imagery

2

Page 3: Soar One-hour Tutorial

Learning

How can we build a human-level AI?

3

Tasks

Neurons

Neural Circuits

Brain Structure

CalculusHistory

ReadingSudoku

Shopping

Driving

Talking on cell phone

Page 4: Soar One-hour Tutorial

Learning

How can we build a human-level AI?

Tasks

Neurons

Neural Circuits

Brain Structure

CalculusHistory

ReadingSudoku

Shopping

Driving

Talking on cell phone

4

Programs

Computer Architecture

Logic Circuits

Electrical circuits

Page 5: Soar One-hour Tutorial

Learning

How can we build a human-level AI?

Tasks

Neurons

Neural Circuits

Brain Structure

CalculusHistory

ReadingSudoku

Shopping

Driving

Talking on cell phone

5

Programs

Computer Architecture

Logic Circuits

Electrical circuits

Symbolic Long-Term Memories

Procedural

Symbolic Short-Term Memory

Decision Procedure

ChunkingReinforcementLearning

Semantic

SemanticLearning

Episodic

EpisodicLearning

Perception Action

Imagery

App

rais

als

CognitiveArchitecture

Page 6: Soar One-hour Tutorial

Body

Cognitive Architecture

Fixed mechanisms underlying cognition– Memories, processing elements, control, interfaces– Representations of knowledge– Separation of fixed processes and variable knowledge– Complex behavior arises from composition of simple

primitivesPurpose:

– Bring knowledge to bear to select actions to achieve goals

Not just a framework – BDI, NN, logic & probability, rule-based systems

Important constraints:– Continual performance– Real-time performance– Incremental, on-line learning

Architecture

Knowledge Goals

Task Environment

6

Page 7: Soar One-hour Tutorial

Common Structures of manyCognitive Architectures

7

Short-term Memory

Procedural Long-term Memory

Declarative Long-term Memory

Perception Action

ActionSelection

Procedure Learning

Declarative Learning

Goals

Page 8: Soar One-hour Tutorial

Different Goals of Cognitive Architecture

• Biological plausibility: Does the architecture correspond to what we know about the brain?

• Psychological plausibility: Does the architecture capture the details of human performance in a wide range of cognitive tasks?

• Functionality: Does the architecture explain how humans achieve their high level of intellectual function? – Building Human-level AI

8

Page 9: Soar One-hour Tutorial

Short History of Soar

9

1980 19951985 1990 2000 2005

Pre-SoarProblem SpacesProduction SystemsHeuristic Search

Functionality

Modeling

Multi-method Multi-task problem solvingSubgoalingChunking

UTCNatural LanguageHCIExternal Environment

IntegrationLarge bodies of knowledgeTeamworkReal Application

Virtual AgentsLearning from Experience, Observation, Instruction

New Capabilities

Page 10: Soar One-hour Tutorial

Distinctive Features of Soar• Emphasis on functionality

– Take engineering, scaling issues seriously – Interfaces to real world systems– Can build very large systems in Soar that exist for a long time

• Integration with perception and action– Mental imagery and spatial reasoning

• Integrates reaction, deliberation, meta-reasoning– Dynamically switching between them

• Integrated learning – Chunking, reinforcement learning, episodic & semantic

• Useful in cognitive modeling– Expanding this is emphasis of many current projects

• Easy to integrate with other systems & environments– SML efficiently supports many languages, inter-process

10

Page 11: Soar One-hour Tutorial

System ArchitectureSoar Kernel

gSKI

KernelSML

ClientSML

SWIG LanguageLayer

Application

SML

Soar 9.0 Kernel (C)

Higher-level Interface (C++)

Encodes/Decodes function calls and responses in XML (C++)

Soar Markup Language

Encodes/Decodes function calls and responses in XML (C++)

Wrapper for Java/Tcl (Not needed if app is in C++)

Application (any language)

Page 12: Soar One-hour Tutorial

Soar Basics

• Operators: Deliberate changes to internal/external state • Activity is a series of operators controlled by knowledge:

1. Input from environment2. Elaborate current situation: parallel rules3. Propose and evaluate operators via preferences: parallel rules4. Select operator5. Apply operator: Modify internal data structures: parallel rules6. Output to motor system

12

Agent in real or virtual world

?

Agent in new state

?

Agent in new state

Operator

Page 13: Soar One-hour Tutorial

Basic Soar Architecture

Body

Long-Term MemoryProcedural

Symbolic Short-Term MemoryDecision

Procedure

Chunking

Perception Action

ElaborateOperator OutputInput

Elaborate State

Propose Operators

Evaluate Operators

Select Operator Apply Operator

ApplyDecide

13

Page 14: Soar One-hour Tutorial

EvaluateOperatorsEvaluate

Operators

ProductionMemory

WorkingMemory

Soar 101: Eaters

East

SouthNorth

ProposeOperator

North > EastSouth > East

North = South

ApplyOperator OutputInput Select

Operator

If cell in direction <d> is not a wall, --> propose operator move <d>

If operator <o1> will move to a bonus food and operator <o2> will move to a normal food, --> operator <o1> > <o2>

If an operator is selected to move <d>--> create outputmove-direction <d>

Input ProposeOperator

SelectOperator

ApplyOperator Output

If operator <o1> will move to a empty cell--> operator <o1> <

North > EastSouth <

move-direction North

Page 15: Soar One-hour Tutorial

Example Working Memory

BA (s1 ^block b1 ^block b2 ^table t1)

(b1 ^color blue ^name A ^ontop b2 ^size 1 ^type block ^weight 14)(b2 ^color yellow ^name B ^ontop t1 ^size 1 ^type block ^under b1 ^weight 14)(t1 ^color gray ^shape square ^type table ^under b2)

Working memory is a graph.All working memory elements must be “linked” directly or indirectly to a state.

S1

b1

t1

b2

^block

^block

^table

yellow

block

1

B

14

^color

^name

^size

^type

^weight

^under

^ontop

15

Page 16: Soar One-hour Tutorial

Soar Processing Cycle

16

ElaborateOperator OutputInput

Elaborate State

Propose Operators

Evaluate Operators

Select Operator Apply Operator

ApplyDecide

Rules Impasse

Subgoal

ElaborateOperator OutputInput

Elaborate State

Propose Operators

Evaluate Operators

Select Operator Apply Operator

ApplyDecide

Page 17: Soar One-hour Tutorial

TankSoar

Red Tank’s Shield

Borders (stone)

Walls (trees)

Health charger

Missile pack

Blue tank (Ouch!)

Energy charger

Green tank’s radar

17

Page 18: Soar One-hour Tutorial

Soar 103: Subgoals

ProposeOperator

CompareOperators

ApplyOperator OutputInput Select

OperatorInput ProposeOperator

CompareOperators

SelectOperator

Move

Wander

If enemy not sensed, then wander

Turn

ApplyOperator Output

Page 19: Soar One-hour Tutorial

Soar 103: Subgoals

ProposeOperator

CompareOperators

ApplyOperator OutputInput Select

Operator

Attack

If enemy is sensed, then attack

Shoot

Page 20: Soar One-hour Tutorial

TacAir-Soar [1997]

Controls simulated aircraft in real-time training exercises (>3000 entities)

Flies all U.S. air missions

Dynamically changes missions as appropriate

Communicates and coordinates with computer and human controlled planes

Large knowledge base (8000 rules)

No learning

Page 21: Soar One-hour Tutorial

TacAir-Soar Task Decomposition

AchieveProximity

EmployWeapons Search Execute

TacticScram

Get MissileLAR

SelectMissile

Get SteeringCircle

SortGroup

LaunchMissile

Lock Radar Lock IR Fire-Missile Wait-forMissile-Clear

If intercepting an enemy andthe enemy is within range ROE are met thenpropose employ-weapons

EmployWeapons

If employing-weapons andmissile has been selected andthe enemy is in the steering circle and LAR has been achieved, then propose launch-missile Launch

MissileIf launching a missile andit is an IR missile and there is currently no IR lockthen propose lock-IRLock IR

Execute Mission

Fly-route GroundAttackFly-Wing Intercept

If instructed to intercept an enemy then propose intercept

Intercept

>250 goals, >600 operators, >8000 rules 21

Page 22: Soar One-hour Tutorial

Impasse/Substate Implications:

• Substate is really meta-state that allows system to reflect• Substate = goal to resolve impasse

– Generate operator – Select operator (deliberate control)– Apply operator (task decomposition)

• All basic problem solving functions open to reflection – Operator creation, selection, application, state elaboration

• Substate is where knowledge to resolve impasse can be found• Hierarchy of substate/subgoals arise through recursive impasses

22

Page 23: Soar One-hour Tutorial

Tie Subgoals and Chunking

East

SouthNorth

ProposeOperator

EvaluateOperators

ApplyOperator OutputInput Select

OperatorInput Propose

OperatorEvaluate

OperatorsSelect

Operator

Tie Impasse

Evaluate-operator (North)

North = 10

Evaluate-operator (South)

Evaluate-operator (East)

= 10 = 10 = 5

Chunking creates rule that applies evaluate-operator

North > EastSouth > EastNorth = South

= 10

Chunking creates rules that create preferences

based on what was tested

Page 24: Soar One-hour Tutorial

Chunking Analysis• Converts deliberate reasoning/planning to reaction• Generality of learning based on generality of reasoning

– Leads to many different types learning– If reasoning is inductive, so is learning

• Soar only learns what it thinks about• Chunking is impasse driven

– Learning arises from a lack of knowledge

24

Page 25: Soar One-hour Tutorial

Extending Soar

• Learn from internal rewards– Reinforcement learning

• Learn facts– What you know– Semantic memory

• Learn events– What you remember– Episodic memory

• Basic drives and …– Emotions, feelings, mood

• Non-symbolic reasoning– Mental imagery

• Learn from regularities– Spatial and temporal clusters

Body

Symbolic Long-Term Memories

Procedural

Symbolic Short-Term MemoryDecision

Procedure

ChunkingReinforcementLearning

Semantic

SemanticLearning

Episodic

EpisodicLearning

Perception ActionVisual

Imagery

App

rais

al

Det

ecto

r

ReinforcementLearning

Clustering

25

Page 26: Soar One-hour Tutorial

Theoretical Commitments

Stayed the Same• Problem Space Computational Model• Long-term & short-term memories• Associative procedural knowledge• Fixed decision procedure• Impasse-driven reasoning• Incremental, experience-driven

learning• No task-specific modules

Changed• Multiple long-term memories• Multiple learning mechanisms• Modality-specific representations &

processing• Non-symbolic processing

– Symbol generation (clustering)– Control (numeric preferences)– Learning Control (reinforcement learning)– Intrinsic reward (appraisals)– Aid memory retrieval (WM activation)– Non-symbolic reasoning (visual imagery)

26

Page 27: Soar One-hour Tutorial

Reinforcement LearningShelly Nason

27

Page 28: Soar One-hour Tutorial

RL in Soar

1. Encode the value function as operator evaluation rules with numeric preferences.

2. Combine all numeric preferences for an operator dynamically.

3. Adjust value of numeric preferences with experience.

Internal State

Value Function

PerceptionReward

Update ValueFunction

Action Selection Action

28

Page 29: Soar One-hour Tutorial

The Q-function in Soar

The value-function is stored in rules that test the state and operator, and create numeric preferences.

sp {rl-rule (state <s> ^operator <o> +) …--> (<s> ^operator <o> = 0.34)}

Operator Q-value = the sum of all numeric preferences.Selection: epsilon greedy, or Boltzmann

O1: {.34, .45, .02} = 8.1

O2: {.25, .11, .12} = 4.8

O3: {-.04, .14, -.05} = .05

epsilon-greedy: With probability ε the agent selects an action at random. Otherwise the agent takes the action with the highest expected value. [Balance exploration/exploitation]

29

Page 30: Soar One-hour Tutorial

Updating operator values

Sarsa update:Q(s,O1) Q(s,O1) + α[r + λQ(s’,O2) – Q(s,O1)] .1 * [.2 + .9*.11 - .33] = -.03

Update is split evenly between rules contributing to O1 = -.01.R1 = .19, R2 = .14, R3 = -.03

O1 = .33

Q(s,O1) = sum of numeric prefs.

r = reward = .2

O2 = .11

Q(s’,O2) = sum of numeric prefs. of selected operator (O2)

R1(O1) = .20R2(O1) = .15R3(O1)= -.02

30

Page 31: Soar One-hour Tutorial

Results with Eaters

0

200

400

600

800

1000

1200

1 13 25 37 49 61 73 85 97 109 121 133 145 157 169 181 193 205 217 229 241 253 265 277 289

Tota

l Sco

re

Move #

Figure 2a rule

Random

After 5

After 10

After 15

After 20

31

Page 32: Soar One-hour Tutorial

RL TankSoar Agent

-20

-10

0

10

20

30

40

50

60

1 11 21 31 41 51 61 71 81 91 101 111 121 131 141 151 161 171

Successive Games

Aver

age

Mar

gin

of V

icto

ry

32

Page 33: Soar One-hour Tutorial

Semantic MemoryYongjia Wang

33

Page 34: Soar One-hour Tutorial

Memory Systems

Memory

Long Term Memory Short Term Memory

Declarative Procedural

Semantic Memory

Episodic Memory

Perceptual Representation

System

Procedural Memory

Working Memory

34

Page 35: Soar One-hour Tutorial

Declarative Memory Alternatives

• Working Memory– Keep everything in working memory

• Retrieve dynamically with rules– Rules provide asymmetric access – Data chunking to learn (complex)

• Separate Declarative Memories– Semantic memory (facts)– Episodic memory (events)

35

Page 36: Soar One-hour Tutorial

Basic Semantic Memory Functionalities

• Encoding– What to save?– When to add new declarative chunk?– How to update knowledge?

• Retrieval– How the cue is placed and matched?– What are the different types of retrieval?

• Storage– What are the storage structures? – How are they maintained?

36

Page 37: Soar One-hour Tutorial

Semantic Memory Functionalities

AB A

state

B

Cue

AExpand

NIL NIL

ExpandCue

C

D E F

D EFE

E

Save

NILSave

Save

Feature Match

Retrieval

Update with Complex Structure

AutoCommit

Remove-No-Change

Semantic Memory

Working Memory

37

Page 38: Soar One-hour Tutorial

Episodic Memory Andrew Nuxoll

38

Page 39: Soar One-hour Tutorial

Memory Systems

Memory

Long Term Memory Short Term Memory

Declarative Procedural

Semantic Memory

Episodic Memory

Perceptual Representation

System

Procedural Memory

Working Memory

39

Page 40: Soar One-hour Tutorial

Episodic vs. Semantic Memory

• Semantic Memory–Knowledge of what we “know”–Example: what state the Grand Canyon

is in• Episodic Memory

–History of specific events–Example: a family vacation to the Grand Canyon

Page 41: Soar One-hour Tutorial

Characteristics of Episodic Memory: Tulving• Architectural:

– Does not compete with reasoning.– Task independent

• Automatic: – Memories created without deliberate decision.

• Autonoetic: – Retrieved memory is distinguished from sensing.

• Autobiographical: – Episode remembered from own perspective.

• Variable Duration: – The time period spanned by a memory is not fixed.

• Temporally Indexed: – Rememberer has a sense of when the episode occurred.

41

Page 42: Soar One-hour Tutorial

Long-term Procedural MemoryProduction Rules

Implementation

Encoding Initiation?

Storage

Retrieval

When the agent takes an action.

Input

Output Cue

Retrieved

Working Memory

42

Page 43: Soar One-hour Tutorial

Long-term Procedural MemoryProduction Rules

Current Implementation

Encoding Initiation Content?Storage

Retrieval

The entire working memory is stored in the episode

Input

Output Cue

Retrieved

Working Memory

43

Page 44: Soar One-hour Tutorial

Long-term Procedural MemoryProduction Rules

Current Implementation

Encoding Initiation ContentStorage Episode Structure?Retrieval

Episodes are stored in a separate memory

Input

Output Cue

Retrieved

Working Memory

EpisodicMemory

EpisodicLearning

44

Page 45: Soar One-hour Tutorial

Long-term Procedural MemoryProduction Rules

Current Implementation

Encoding Initiation ContentStorage Episode StructureRetrieval Initiation/Cue?

Cue is placed in an architecture specific buffer.

Input

Output Cue

Retrieved

Working Memory

EpisodicMemory

EpisodicLearning

45

Page 46: Soar One-hour Tutorial

EpisodicMemory

Long-term Procedural MemoryProduction Rules

Current Implementation

Encoding Initiation ContentStorage Episode StructureRetrieval Initiation/Cue Retrieval

The closest partial match is retrieved.

Input

Output Cue

Retrieved

Working Memory

EpisodicLearning

46

Page 47: Soar One-hour Tutorial

Cognitive Capability: Virtual Sensing• Retrieve prior perception that

is relevant to the current task • Tank recursively searches

memory– Have I seen a charger from here?– Have I seen a place where I can

see a charger? ?

47

Page 48: Soar One-hour Tutorial

Virtual Sensors Results

0

50

100

150

200

250

1 3 5 7 9 11 13 15 17 19

Subsequent Searches

Ave

rage

Num

ber o

f Mov

es

Average RandomEpisodic Memory

48

Page 49: Soar One-hour Tutorial

Create a memory cue

East

SouthNorth

Evaluate moving in each available direction

Cognitive Capability: Action Modeling

49

EpisodicRetrieval

Retrieve the best matching memory

RetrieveNext Memory

Retrieve the next memory Use the change in score to evaluate the proposed action

Move North = 10 points

Agent’s knowledge is insufficient - impasseAgent attempts to choose direction

Page 50: Soar One-hour Tutorial

Episodic Memory:Multi-Step Action Projection

[Andrew Nuxoll]

• Learn tactics from prior success and failure– Fight/flight– Back away from enemy (and fire)– Dodging

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174

-30

-20

-10

0

10

20

30

40

Average Margin of Victory

Successive Games

Mar

gin

of V

icto

ry

Page 51: Soar One-hour Tutorial

Enables Cognitive Capabilities • Sensing

– Detect Changes – Detect Repetition– Virtual Sensing

• Reasoning– Model Actions– Use Previous

Successes/Failures– Model the Environment– Manage Long Term Goals– Explain Behavior

• Learning– Retroactive Learning– Allows Reanalysis Given New

Knowledge– “Boost” other Learning

Mechanisms

Episodic Memory

51

Page 52: Soar One-hour Tutorial

Mental Imagery and Spatial ReasoningScott Lathrop

Sam Wintermute

See AGI Talks

52

Page 53: Soar One-hour Tutorial

• Shape, color, topology, spatial properties • Depictive, pixel-based representations• Image algebra algorithms

Sentential/Algebraic algorithms Depictive/Ordinal algorithms

VISUAL IMAGERY

VISUAL-SPATIAL VISUAL-DEPICTIVE

• Location, orientation • Sentential, quantitative

representations• Linear algebra and

computational geometry algorithms

WHAT IS VISUAL IMAGERY?

53

Page 54: Soar One-hour Tutorial

Where can you put A next to I?

54

Page 55: Soar One-hour Tutorial

Spatial Problem Solving with Mental Imagery[Scott Lathrop & Sam Wintermute]

Environment

Spatial Scene

SoarQualitative descriptions of object relationships

Qualitative description of new objects in relation to existing objects

Quantitative descriptions of environmental objects

O

A

A’ A’

(on AI)

(imagine_left_of A I)

(intersect A′ O)(no_intersect A’)

(imagine_right_of A I)(move_right_of A I)

I

Page 56: Soar One-hour Tutorial

Upcoming Challenges

• Continued refinement and integration• Integrate with complex perception and motor

systems• Adding/learning lots of world knowledge

+ Language, Spatial, Temporal Reasoning, …• Scaling up to large bodies of knowledge

– Build up from instruction, experience, exploration, …

56

Page 57: Soar One-hour Tutorial

Soar Community

• Soar Website– http://sitemaker.umich.edu/soar

• Soar Workshop every June in Ann Arbor– June 22-26, 2009

• Soar-group– http://lists.sourceforge.net/lists/listinfo/soar-group– Low traffic

57

Page 58: Soar One-hour Tutorial

Thanks to

Funding Agencies: NSF, DARPA, ONRPh.D. students:

Nate Derbinsky, Nicholas Gorski, Scott Lathrop, Robert Marinier, Andrew Nuxoll, Yongjia Wang, Samuel Wintermute, Joseph Xu

Research Programmers:Karen Coulter, Jonathan Voigt

Continued inspiration:Allen Newell

58

Page 59: Soar One-hour Tutorial
Page 60: Soar One-hour Tutorial

Challenges in Cognitive Architecture Research

• Dynamic taskability– Pursue novel tasks

• Learning– Always learning, learning in unexpected and unplanned ways (wild learning)– Transition from programming to learning by imitation, instruction, experience, reflection,

…• Natural language

– Active area but much left to do.• Social behavior

– Interaction with humans and other entities • Connect to the real world

– Cognitive robotics with long-term existence• Applications

– Expand domains and problems– Putting cognitive architectures to work

• Connect to unfolding research on the brain, psychology, and the rest of AI.60