information processing technology office learning workshop april 12, 2004 seedling overview
DESCRIPTION
Information Processing Technology Office Learning Workshop April 12, 2004 Seedling Overview Learning in the Large MIT CSAIL PIs: Leslie Pack Kaelbling, Tomás Lozano-Pérez, Tommi Jaakkola. Three Subprojects. Learning to behave in huge domains - PowerPoint PPT PresentationTRANSCRIPT
Learning in the LargeQuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
Information Processing Technology Office
Learning Workshop
April 12, 2004
Seedling Overview
Learning in the Large
MIT CSAIL
PIs: Leslie Pack Kaelbling,
Tomás Lozano-Pérez, Tommi Jaakkola
Learning in the LargeQuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
Three Subprojects
• Learning to behave in huge domains
• Transfer of learned knowledge across problems and domains
• Learning to recognize objects and interpret scenes
Learning in the LargeQuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
Three Subprojects
• Learning to behave in huge domains
• Transfer of learned knowledge across problems and domains
• Learning to recognize objects and interpret scenes
Learning in the LargeQuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
Learning Objective
• Learn to act effectively in highly complex dynamic domains– Learn models of complex world dynamics involving
objects, properties, and relations– Learn “meta-cognition” strategies for deciding how
to focus computational attention for action selection
• Learning is crucial for both problems because human designers are unable to build appropriate models by hand
Learning in the LargeQuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
What Is Being Learned?
• Learning probabilistic dynamic rulespickup(X): on(X,Y), clear(X), table(Z), inhand-nil
0.8 : inhand(X), ¬on(X,Y), clear(Y), ¬clear(X)
¬inhand-nil
0.2: ¬on(X,Y), clear(Y), on(X,Z)
• Important goal is to learn partial models: some aspects will be easy to learn to predict, others will take longer
• Take advantage of partial models as soon as they’re learned
Learning in the LargeQuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
How is it Being Learned?
• Search in rule space– logic-based methods for learning structure– convex optimization for probabilities
• Effectiveness of learned models tested using planner to select actions
• Learning is automatic• Amount of data needed depends on the frequency
and reliability of phenomenon being modeled
Learning in the LargeQuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
How is the Knowledge Represented?
• Probabilistic dynamics rules• No background knowledge currently, but it
would be easy to build in some rules• Knowledge is task-independent (though we
may use utility to focus learning)• Models can account for only parts of the state
evolution; and they’re probabilistic• Currently, no
Learning in the LargeQuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
What is the Domain?
• Currently: physics simulator of blocks world• Would like simulation of more complex
environment, e.g., – battlefield– disaster relief– making breakfast
Learning in the LargeQuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
How is Progress Being Measured?
• First, human inspection of rules for plausibility• Second by performance of agent using rules for
planning• Nothing changes in the experimental set-up except
the learned rules• Metrics:
– utility gained by the agent– computation speed
• Easily done overnight on a workstation
Learning in the LargeQuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
What are the Technical Milestones?
• Defined by model sophistication rather than overt performance in the task– Learn rules with quantifiers– Learn to ground symbolic predicates in
perception– Learn rules in partially observable
environments– Postulate hidden causes– Focus rule-learning based on utility
Learning in the LargeQuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
What is Being Learned?
• Learning to formulate small planning problem, from a huge state space and competing goals– what are useful subgoals?– when is it appropriate
to ignore certain aspectsof the domain?
learninginferenceplanning
perception action
Learning in the LargeQuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
How is it Being Learned?
• Learning parameters in abstract models– partial observability makes it hard– gradient descent works, but may be weak– take advantage of Russell’s methods?
• Compare speed and utility of resulting action-selection system
• Learning is automatic• Amount of data needed depends on the frequency
and reliability of phenomenon being modeled
Learning in the LargeQuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
How is the Knowledge Represented?
• Parameters in strategies for building abstractions
• Currently most of the abstraction structure is hand-coded
• The knowledge depends on the distribution of problems an agent has to solve, but not on particular low-level tasks
• Uncertainty isn’t represented explicitly, but is handled implicitly in statistical learning
• We are learning at multiple levels of abstraction
Learning in the LargeQuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
What is the Domain?
• Nethack • Would like more complex simulated domain