a multi-domain evaluation of scaling in soar’s episodic memory nate derbinsky, justin li, john e....

20
A Multi-Domain Evaluation of Scaling in Soar’s Episodic Memory Nate Derbinsky, Justin Li, John E. Laird University of Michigan

Upload: calvin-pitts

Post on 29-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Multi-Domain Evaluation of Scaling in Soar’s Episodic Memory Nate Derbinsky, Justin Li, John E. Laird University of Michigan

A Multi-Domain Evaluation of Scaling in Soar’s Episodic Memory

Nate Derbinsky, Justin Li, John E. LairdUniversity of Michigan

Page 2: A Multi-Domain Evaluation of Scaling in Soar’s Episodic Memory Nate Derbinsky, Justin Li, John E. Laird University of Michigan

Soar Workshop 2012 - Ann Arbor, MI 2

Motivation

Prior Work• Nuxoll & Laird (‘12): integration and capabilities• Derbinsky & Laird (‘09): efficient algorithms

Core QuestionTo what extent is Soar’s episodic memory effective and efficient for real-time agents that persist for long periods of time across a variety of tasks?

20 June 2012

Page 3: A Multi-Domain Evaluation of Scaling in Soar’s Episodic Memory Nate Derbinsky, Justin Li, John E. Laird University of Michigan

Soar Workshop 2012 - Ann Arbor, MI 3

Approach: Multi-Domain Evaluation

• Existing agents from diverse tasks (49)– Linguistics, planning, games, robotics

• Long agent runs– Hours-days RT (105 – 108 episodes)

• Evaluate at each X episodes– Memory consumption– Reactivity for >100 task relevant cues

• Maximum time for cue matching <? 50 msec.20 June 2012

Page 4: A Multi-Domain Evaluation of Scaling in Soar’s Episodic Memory Nate Derbinsky, Justin Li, John E. Laird University of Michigan

Soar Workshop 2012 - Ann Arbor, MI 4

Outline

• Overview of Soar’s EpMem• Word Sense Disambiguation (WSD)• Planning• Video Games & Robotics

20 June 2012

Page 5: A Multi-Domain Evaluation of Scaling in Soar’s Episodic Memory Nate Derbinsky, Justin Li, John E. Laird University of Michigan

Soar Workshop 2012 - Ann Arbor, MI 5

Working Memory

Episodic MemoryProblem Formulation

20 June 2012

Representation• Episode: connected di-graph• Store: temporal sequence

Encoding/Storage• Automatic• No dynamics

Retrieval• Cue: acyclic graph• Semantics: desired features in context• Find the most recent episode that

shares the most leaf nodes in common with the cue

Episodic Memory

Encoding

Storage

Matching

Cue

Page 6: A Multi-Domain Evaluation of Scaling in Soar’s Episodic Memory Nate Derbinsky, Justin Li, John E. Laird University of Michigan

Soar Workshop 2012 - Ann Arbor, MI 6

Episodic MemoryAlgorithmic Overview

Storage– Capture WM-changes as temporal intervals

Cue Matching (reverse walk of cue-relevant Δ’s)– 2-phase search

• Only graph-match episodes that have all cue features independently

– Only evaluate episodes that have changes relevant to cue features

– Incrementally re-score episodes20 June 2012

Page 7: A Multi-Domain Evaluation of Scaling in Soar’s Episodic Memory Nate Derbinsky, Justin Li, John E. Laird University of Michigan

Soar Workshop 2012 - Ann Arbor, MI 7

Episodic MemoryStorage Characterization

20 June 2012

0 20 40 60 80 100 120 140 1600

1000

2000

3000

4000

5000

6000

f(x) = 27.8774973529299 x + 52.0287559751198R² = 0.982502738087084

Avg. Working Memory Changes

Avg.

Byt

es p

er E

piso

de

1 week, 8GB

1 day, 8GB

Page 8: A Multi-Domain Evaluation of Scaling in Soar’s Episodic Memory Nate Derbinsky, Justin Li, John E. Laird University of Michigan

Soar Workshop 2012 - Ann Arbor, MI 8

Episodic MemoryRetrieval Characterization

Assumptions– Few changes per episode (temporal contiguity)– Representational re-use (structural regularity)– Small cue

Scaling– Search distance (# changes to walk)

• Temporal Selectivity: how often does a WME change• Feature Co-Occurrence: how often do WMEs co-occur within a single

episode (related to search-space size)

– Episode scoring (similar to rule matching)• Structural Selectivity: how many ways can a cue WME match an episode

(i.e. multi-valued attributes)

20 June 2012

Page 9: A Multi-Domain Evaluation of Scaling in Soar’s Episodic Memory Nate Derbinsky, Justin Li, John E. Laird University of Michigan

Soar Workshop 2012 - Ann Arbor, MI 9

Word Sense DisambiguationExperimental Setup

• Input: <“word”, POS>; Output: sense #; Result– Corpus: SemCor (~900K eps/exposure)

• Agent– Maintain context as n-gram– Query EpMem for context• If success, get next episode, output result• If failure, null

20 June 2012

Accuracy First Second2-gram 14.57% 92.82%3-gram 2.32% 99.47%

Page 10: A Multi-Domain Evaluation of Scaling in Soar’s Episodic Memory Nate Derbinsky, Justin Li, John E. Laird University of Michigan

Soar Workshop 2012 - Ann Arbor, MI 10

Word Sense DisambiguationResults

Storage– Avg. 234 bytes/episode

Cue Matching– All 1-, 2-, and 3-gram cues reactive– 0.2% of 4-grams exceed 50msec.

20 June 2012

Page 11: A Multi-Domain Evaluation of Scaling in Soar’s Episodic Memory Nate Derbinsky, Justin Li, John E. Laird University of Michigan

Soar Workshop 2012 - Ann Arbor, MI 11

N-gram Retrieval ScalingRetrieval Time (msec) vs. Episodes (x1000)

20 June 2012

Feat

ure

Co

-Occ

urre

nce

Tem

pora

l Se

lecti

vity

0 500000 1000000 1500000 2000000 2500000 3000000 3500000 4000000 45000000

0.5

1

1.5

2

2.5

{be, say} (69){say, group} (6){friday, say} (1){say}

0 500000 1000000 1500000 2000000 2500000 3000000 3500000 4000000 45000000

5

10

15

20

25

{well, be, say}{friday, say, group}{friday, say}{say}

Page 12: A Multi-Domain Evaluation of Scaling in Soar’s Episodic Memory Nate Derbinsky, Justin Li, John E. Laird University of Michigan

Soar Workshop 2012 - Ann Arbor, MI 12

PlanningExperimental Setup

• 12 automatically converted PDDL domains– Logistics, Blocksworld, Eight-puzzle, Grid, Gripper,

Hanoi, Maze, Mine-Eater, Miconic, Mystery, Rockets, and Taxi

– 44 distinct problem instances (e.g. # blocks)

• Agent: randomly explore state space– 50K episodes, measure every 1K

20 June 2012

Page 13: A Multi-Domain Evaluation of Scaling in Soar’s Episodic Memory Nate Derbinsky, Justin Li, John E. Laird University of Michigan

Soar Workshop 2012 - Ann Arbor, MI 13

PlanningResults

Storage– Reactive: <12.04 msec./episode– Memory: 562 – 5454 bytes/episode

Cue Matching (reactive: < 50 msec.)1. Full State: only smallest state + space size (12)2. Relational: none3. Schema: all (max = 0.08 msec.)

20 June 2012

Page 14: A Multi-Domain Evaluation of Scaling in Soar’s Episodic Memory Nate Derbinsky, Justin Li, John E. Laird University of Michigan

Soar Workshop 2012 - Ann Arbor, MI 14

Video Games & Mobile RoboticsExperimental Setup

• Hand-coded cues (per domain)20 June 2012

Domain Agent Duration Eval. Rate

TankSoar mapping-bot 3.5M 50K

Eaters advanced-move 3.5M 50K

Infinite Mario [Mohan & Laird ‘11] 3.5M 50K

Rooms World [Laird, Derbinsky & Voigt ‘11] 12 hours 300K

Page 15: A Multi-Domain Evaluation of Scaling in Soar’s Episodic Memory Nate Derbinsky, Justin Li, John E. Laird University of Michigan

Soar Workshop 2012 - Ann Arbor, MI 15

Data: Eaters

20 June 2012

0 500000 1000000 1500000 2000000 2500000 3000000 350000005

101520253035404550

1234567

Episodes (x1 Million)

Retr

ieva

l Tim

e (m

sec)

813 bytes/episode

Page 16: A Multi-Domain Evaluation of Scaling in Soar’s Episodic Memory Nate Derbinsky, Justin Li, John E. Laird University of Michigan

Soar Workshop 2012 - Ann Arbor, MI 16

Data: Infinite Mario

20 June 2012

0 500000 1000000 1500000 2000000 2500000 3000000 350000005

101520253035404550

1234567891011121314

Episodes (x1 Million)

Retr

ieva

l Tim

e (m

sec)

Structural Selectivity

2646 bytes/episode

Page 17: A Multi-Domain Evaluation of Scaling in Soar’s Episodic Memory Nate Derbinsky, Justin Li, John E. Laird University of Michigan

Soar Workshop 2012 - Ann Arbor, MI 17

Data: TankSoar

20 June 2012

0 500000 1000000 1500000 2000000 2500000 3000000 350000005

101520253035404550 1

23456789101112131415Episodes (x1 Million)

Retr

ieva

l Tim

e (m

sec)

Co-Occurrence

1035 bytes/episode

Page 18: A Multi-Domain Evaluation of Scaling in Soar’s Episodic Memory Nate Derbinsky, Justin Li, John E. Laird University of Michigan

Soar Workshop 2012 - Ann Arbor, MI 18

Data: Mobile Robotics

20 June 2012

0

10000000

20000000

30000000

40000000

50000000

60000000

70000000

80000000

90000000

100000000

11000000005

101520253035404550

123456

Episodes (x1 Million)

Retr

ieva

l Tim

e (m

sec) Temporal Selectivity

113 bytes/episode

Page 19: A Multi-Domain Evaluation of Scaling in Soar’s Episodic Memory Nate Derbinsky, Justin Li, John E. Laird University of Michigan

Soar Workshop 2012 - Ann Arbor, MI 19

Summary of Results

Generality– Demonstrated 7 cognitive capabilities

• Virtual sensing, action modeling, long-term goal management, …

Reactivity– <50 msec. storage time for all tasks (ex. temporal discontiguity)– <50 msec. cue matching for many cues

Scalability– No growth in cue matching for many cues (days!)

• Validated predictive performance models

– 0.18 - 4 kb/episode (days – months)

20 June 2012

Page 20: A Multi-Domain Evaluation of Scaling in Soar’s Episodic Memory Nate Derbinsky, Justin Li, John E. Laird University of Michigan

Soar Workshop 2012 - Ann Arbor, MI 20

Evaluation

Nuggets• Unprecedented evaluation of

general episodic memory– Breadth, temporal extent, analysis

• Characterization of EpMem performance via task-independent properties

• Soar’s EpMem (v9.3.2) is effective and efficient for many tasks and cues!

• Domains and cues available

Coal• Still easy to construct

domain/cue that makes Soar unreactive

• Unbounded memory consumption (given enough time)

20 June 2012

For more details, see paper in proceedings of

AAAI 2012