information processing technology office learning workshop april 12, 2004 seedling overview

15
Learning in the Larg QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. Information Processing Technology Office Learning Workshop April 12, 2004 Seedling Overview Learning in the Large MIT CSAIL PIs: Leslie Pack Kaelbling, Tomás Lozano-Pérez, Tommi

Upload: lacy-holden

Post on 30-Dec-2015

26 views

Category:

Documents


0 download

DESCRIPTION

Information Processing Technology Office Learning Workshop April 12, 2004 Seedling Overview Learning in the Large MIT CSAIL PIs: Leslie Pack Kaelbling, Tomás Lozano-Pérez, Tommi Jaakkola. Three Subprojects. Learning to behave in huge domains - PowerPoint PPT Presentation

TRANSCRIPT

Learning in the LargeQuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

Information Processing Technology Office

Learning Workshop

April 12, 2004

Seedling Overview

Learning in the Large

MIT CSAIL

PIs: Leslie Pack Kaelbling,

Tomás Lozano-Pérez, Tommi Jaakkola

Learning in the LargeQuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

Three Subprojects

• Learning to behave in huge domains

• Transfer of learned knowledge across problems and domains

• Learning to recognize objects and interpret scenes

Learning in the LargeQuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

Three Subprojects

• Learning to behave in huge domains

• Transfer of learned knowledge across problems and domains

• Learning to recognize objects and interpret scenes

Learning in the LargeQuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

Learning Objective

• Learn to act effectively in highly complex dynamic domains– Learn models of complex world dynamics involving

objects, properties, and relations– Learn “meta-cognition” strategies for deciding how

to focus computational attention for action selection

• Learning is crucial for both problems because human designers are unable to build appropriate models by hand

Learning in the LargeQuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

What Is Being Learned?

• Learning probabilistic dynamic rulespickup(X): on(X,Y), clear(X), table(Z), inhand-nil

0.8 : inhand(X), ¬on(X,Y), clear(Y), ¬clear(X)

¬inhand-nil

0.2: ¬on(X,Y), clear(Y), on(X,Z)

• Important goal is to learn partial models: some aspects will be easy to learn to predict, others will take longer

• Take advantage of partial models as soon as they’re learned

Learning in the LargeQuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

How is it Being Learned?

• Search in rule space– logic-based methods for learning structure– convex optimization for probabilities

• Effectiveness of learned models tested using planner to select actions

• Learning is automatic• Amount of data needed depends on the frequency

and reliability of phenomenon being modeled

Learning in the LargeQuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

How is the Knowledge Represented?

• Probabilistic dynamics rules• No background knowledge currently, but it

would be easy to build in some rules• Knowledge is task-independent (though we

may use utility to focus learning)• Models can account for only parts of the state

evolution; and they’re probabilistic• Currently, no

Learning in the LargeQuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

What is the Domain?

• Currently: physics simulator of blocks world• Would like simulation of more complex

environment, e.g., – battlefield– disaster relief– making breakfast

Learning in the LargeQuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

How is Progress Being Measured?

• First, human inspection of rules for plausibility• Second by performance of agent using rules for

planning• Nothing changes in the experimental set-up except

the learned rules• Metrics:

– utility gained by the agent– computation speed

• Easily done overnight on a workstation

Learning in the LargeQuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

What are the Technical Milestones?

• Defined by model sophistication rather than overt performance in the task– Learn rules with quantifiers– Learn to ground symbolic predicates in

perception– Learn rules in partially observable

environments– Postulate hidden causes– Focus rule-learning based on utility

Learning in the LargeQuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

What is Being Learned?

• Learning to formulate small planning problem, from a huge state space and competing goals– what are useful subgoals?– when is it appropriate

to ignore certain aspectsof the domain?

learninginferenceplanning

perception action

Learning in the LargeQuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

How is it Being Learned?

• Learning parameters in abstract models– partial observability makes it hard– gradient descent works, but may be weak– take advantage of Russell’s methods?

• Compare speed and utility of resulting action-selection system

• Learning is automatic• Amount of data needed depends on the frequency

and reliability of phenomenon being modeled

Learning in the LargeQuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

How is the Knowledge Represented?

• Parameters in strategies for building abstractions

• Currently most of the abstraction structure is hand-coded

• The knowledge depends on the distribution of problems an agent has to solve, but not on particular low-level tasks

• Uncertainty isn’t represented explicitly, but is handled implicitly in statistical learning

• We are learning at multiple levels of abstraction

Learning in the LargeQuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

What is the Domain?

• Nethack • Would like more complex simulated domain

Learning in the LargeQuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

What are the Technical Milestones?

• Meta-learning– Learn parameters in hand-built

abstractions for MDPs– Learn new abstractions for MDPs– Learn to compose abstractions– Do it all for POMDPs