1 towards a testbed for evaluating learning systems in gaming simulators 21 march 2004 symposium on...
TRANSCRIPT
1
Towards a Testbed for EvaluatingLearning Systems in Gaming Simulators
21 March 2004Symposium on Reasoning and Learning
Outline:1. Objective2. Specification3. Intended functionality4. Example of use5. Status and Goals
Outline:1. Objective2. Specification3. Intended functionality4. Example of use5. Status and Goals
David W. AhaIntelligent Decision Aids Group
Naval Research Laboratory, Code 5515Washington, DC
home.earthlink.net/[email protected]
(working with Matt Molineaux)
David W. AhaIntelligent Decision Aids Group
Naval Research Laboratory, Code 5515Washington, DC
home.earthlink.net/[email protected]
(working with Matt Molineaux)
TIELTTIELT
2
Objective & Expected Benefits
Objective
Support the machine learning community by providing an API for a set of gaming engines, the ability to select a wide variety of learning and performance tasks, and an editor for specifying and conducting an evaluation methodology.
Support the machine learning community by providing an API for a set of gaming engines, the ability to select a wide variety of learning and performance tasks, and an editor for specifying and conducting an evaluation methodology.
Benefits
1. Reduce costs (time, $) to create these integrations• Costs are often prohibitive, encouraging isolated studies
2. Encourage research on learning in cognitive systems:• Embedded (e.g., process aware)• Rapid (i.e., few trials)• Knowledge-intensive• Enduring
3. Support analysis of alternative learning systems for a given task
1. Reduce costs (time, $) to create these integrations• Costs are often prohibitive, encouraging isolated studies
2. Encourage research on learning in cognitive systems:• Embedded (e.g., process aware)• Rapid (i.e., few trials)• Knowledge-intensive• Enduring
3. Support analysis of alternative learning systems for a given task
3
DARPA’s Cognitive Systems Thrust
IPTO: Information Processing Technology Office• Director: Ron Brachman• Assistant Director: Barbara Yoon
History: IPTO has many impressive contributions• e.g., time-sharing, interactive computing, Internet• Long-term goal: human-computer symbiosis
IPTO: Information Processing Technology Office• Director: Ron Brachman• Assistant Director: Barbara Yoon
History: IPTO has many impressive contributions• e.g., time-sharing, interactive computing, Internet• Long-term goal: human-computer symbiosis
Current goal: Develop “cognitive” systems that know what they’re doing:
– can reason, using substantial amounts of appropriately represented knowledge– can learn from its experience to perform better tomorrow than it did today– can explain itself and be told what to do– can be aware of its own capabilities and reflect on its own behavior – can respond robustly to surprise
Current goal: Develop “cognitive” systems that know what they’re doing:
– can reason, using substantial amounts of appropriately represented knowledge– can learn from its experience to perform better tomorrow than it did today– can explain itself and be told what to do– can be aware of its own capabilities and reflect on its own behavior – can respond robustly to surprise
4
IPTO’s View of a Cognitive Agent
External Environment External Environment
Communication(language,gesture,image)
Prediction,planning
Deliberative Processes
Reflective Processes
Reactive Processes
Perception Action
STM
Sensors Effectors
Other reasoning
LTM(KB)
Concepts
SentencesCog
nit
ive
Ag
en
t
Affect
Attention
Learning
5
Learning Foci in Cognitive Systems(Langley & Laird, 2002)
Action preconditions
Plan adaptor
Action effects
Resource allocater
Information fuser
Decision application procedure
Decision selector, Conflict resolver
Explanation generation
Recall procedureRemembering & Reflection
Dialogue coordination
NL interpretationInteraction & Communication
Action utility
Action executerExecution & Action
Inferencing knowledge and procedures
Beliefs & belief relationsReasoning & Belief Maintenance
Plans, Plan generator (e.g., search method)Problem Solving & Planning
Monitoring focus
Environment modelPrediction & Monitoring
Situation categories, Situation categorizationPerception & Situation Assessment
Space of possible decisionsDecision Making & Choice
Categories, Pattern Categorizer
Patterns, Pattern recognizer Recognition & Categorization
Knowledge Container(s)Capability
Action preconditions
Plan adaptor
Action effects
Resource allocater
Information fuser
Decision application procedure
Decision selector, Conflict resolver
Explanation generation
Recall procedureRemembering & Reflection
Dialogue coordination
NL interpretationInteraction & Communication
Action utility
Action executerExecution & Action
Inferencing knowledge and procedures
Beliefs & belief relationsReasoning & Belief Maintenance
Plans, Plan generator (e.g., search method)Problem Solving & Planning
Monitoring focus
Environment modelPrediction & Monitoring
Situation categories, Situation categorizationPerception & Situation Assessment
Space of possible decisionsDecision Making & Choice
Categories, Pattern Categorizer
Patterns, Pattern recognizer Recognition & Categorization
Knowledge Container(s)Capability
Many opportunities for learning
6
Typical Current Practice
Comparatively few machine learning (ML) efforts on cognitive systems: Isolated studies of learning algorithms Single-step, non-interactive tasks Knowledge-poor learning contexts
Comparatively few machine learning (ML) efforts on cognitive systems: Isolated studies of learning algorithms Single-step, non-interactive tasks Knowledge-poor learning contexts
Few of Today’s Cognitive Systems Support Realistic
Learning Capabilities
Cognitive System
ML System
ML System
Database
7
Wanted: A new Interface(thanks to W. Cohen)
ML System
Database ML System
ML System
ML System
InterfaceDatabaseDatabaseDatabase
Curmudgeon’s Viewpoint: • This might encourage research on more challenging problems• But don’t count on it
Curmudgeon’s Viewpoint: • This might encourage research on more challenging problems• But don’t count on it
(e.g., UCI Repository)
Cognitive System
ML SystemInterface
Cognitive System
Cognitive System
ML System
Cognitive System
ML System
Cognitive System
ML System(e.g., TIELT?)
8
Your Potential Uses of TIELT(Hastily Considered…and Reaching)
1. Randy Jones: Smart way to update rule conditions• Use: Updating game model’s tasks
2. Doug Pearson: Changing conditions on operators• Use: Controlling game agents
3. Prasad Tedapalli: Learning hierarchies in the game model• Use: Active development of a game model’s task hierarchy
4. Jim Blythe: Knowledge acquisition• Use: Acquiring game model constraints
5. Gheorghe Tecuci: Mixed-initiative learning for knowledge acquisition• Use: Active learning of task models, etc.
6. Karen Myers: Incorporating guidance from humans for agent control• Use: Learning agent controls (assuming players can provide direct feedback)
7. Barney Pell: Learning to play any of a category of games given their rules• Use: Hmm…agent control, if collaborating with a game model-updating system
8. Afzal Upal: Updating plan quality• Use: Induce task-specific control rules
9. Susan Epstein: Learning to solve (large) CSPs• Use: Reasoning with game model’s constraints
10. Frank Ritter: Recognition tasks (e.g., for strategies?)• Use: Learning opponent strategies
11. Dan Roth: Using multiple classifiers to solve problems• Use: Set of (coordinated) learning systems for problem solving
12. Ken Forbus: Analogical reasoning and companion cognitive systems• Use: Qualitative representation for game model, predicting human/agent intentions
13. Daniel Borrajo: Learning control knowledge for planning• Use: Incremental learning for agent control tasks
14. Niels Taatgen: Learning for real-time tasks• Use: Agent control, several RTS applications
9
Outline (cont)
1. Objective2. Specification of a testbed
• Select a category of cognitive systems• Category-specific challenges
3. Intended functionality4. Example of use5. Status of project6. Goals for future work
1. Objective2. Specification of a testbed
• Select a category of cognitive systems• Category-specific challenges
3. Intended functionality4. Example of use5. Status of project6. Goals for future work
10
ML System
Cognitive System
• Effects• State
• State • DecisionMessage passingAchieve a goal(s)Create a planSignificant• Temporal, qualitative, …Plan execution measures
Interface Comparison(e.g., for a supervised learning system)
(e.g., for a cognitive system involving planning)
Characteristics
1. Performance API• Input• Output
2. Learning API• Input• Output
3. Integration4. Performance task5. Learning task6. Domain knowledge
• Reasoning7. Evaluation
methodology
• -• Common data format
• Matches input format• -Data input moduleClassification Set wts., create tree, etc.-• -Acc., ROC curves, etc.
ML System
Database
11
What type of Cognitive System?
Desiderata:
Candidate: Interactive Gaming SimulatorsCandidate: Interactive Gaming Simulators
1. Available implementations• Inexpensive to acquire and run
2. Pushes ML research boundaries• Challenging embedded learning tasks
3. Significant interest/excitement• Military, industry, academia, funding
1. Available implementations• Inexpensive to acquire and run
2. Pushes ML research boundaries• Challenging embedded learning tasks
3. Significant interest/excitement• Military, industry, academia, funding
12
Gaming Genres(Laird & van Lent, 2001)
Control units and strategic enemy (i.e., other coach), commentator
Act as coach and a key player
Madden NFL Football
Team Sports
Control enemy1st vs. 3rd personIndividual competitionMany (e.g., driving games)
Individual Sports
Control unit goals and goal-achievement strategies
Control a simulated world & its units
SimCity, The Sims
God
Control all units and strategic enemies
God’s eye view, controls many units (e.g., tactical warfare)
Age of Empires, Warcraft, Civilization
Strategy
Control supporting characters
Linear vs. dynamic scripting
Player solves puzzles, interacting w/ others
King’s Quest, Blade Runner
Adventure
Control enemies, partners, and supporting characters
Solo vs. (massively) multi-player
Be a characterDiabloRole-Playing
Control enemies1st vs. 3rd person, solo vs team play
Control a characterQuake, UnrealAction
AI RolesSub-GenresDescriptionExampleGenre
Unfortunately,…reaction time and aiming skill are the most important factors in success in a first-person shooter game. Deep reasoning about tactics and strategy don’t end up playing a big role as might be expected. (van Lent et al., 2004)
13
Real-Time Strategy (RTS) Games(Buro & Furtak, 2003)
Fundamental AI research problems
1. Adversarial real-time planning• Motivates need for abstractions of world state
2. Decision making under uncertainty• e.g., opponent intentions
3. Opponent modeling, learning• One of the biggest shortcomings of current RTS game AI
systems is their inability to learn quickly…Current ML approaches…are inadequate in this area.
4. Spatial and temporal reasoning5. Resource management6. Collaboration7. Pathfinding
1. Adversarial real-time planning• Motivates need for abstractions of world state
2. Decision making under uncertainty• e.g., opponent intentions
3. Opponent modeling, learning• One of the biggest shortcomings of current RTS game AI
systems is their inability to learn quickly…Current ML approaches…are inadequate in this area.
4. Spatial and temporal reasoning5. Resource management6. Collaboration7. Pathfinding
14
Military: Learning in Simulators for Computer Generated Forces (CGF)
Purpose: Training (present) & planning (future)
• Simulators: JWARS, OneSAF, Full Spectrum Command, etc.• Target: Control strategic opponent or own units
• Simulators: JWARS, OneSAF, Full Spectrum Command, etc.• Target: Control strategic opponent or own units
Evidence of commitment: Some Claims
• “Learning is an essential ability of intelligent systems” (NRC, 1998)• “To realize the full benefit of a human behavior model within an intelligent
simulator,…the model should incorporate learning” (Hunter et al., CCGBR’00)• “Successful employment of human behavior models…requires that [they] possess
the ability to integrate learning” (Banks & Stytz, CCGBR’00)
• “Learning is an essential ability of intelligent systems” (NRC, 1998)• “To realize the full benefit of a human behavior model within an intelligent
simulator,…the model should incorporate learning” (Hunter et al., CCGBR’00)• “Successful employment of human behavior models…requires that [they] possess
the ability to integrate learning” (Banks & Stytz, CCGBR’00)
Status:
• No CGF simulator has been deployed with learning (D. Reece, 2003)• Problems: Performance (costly training), overtraining, behavioral
accuracy (e.g., learned behaviors may become unpredictable), constraint violations (learned behaviors do not follow doctrine), difficult to isolate the utility of learning (Petty, CGFBR’01)
• No CGF simulator has been deployed with learning (D. Reece, 2003)• Problems: Performance (costly training), overtraining, behavioral
accuracy (e.g., learned behaviors may become unpredictable), constraint violations (learned behaviors do not follow doctrine), difficult to isolate the utility of learning (Petty, CGFBR’01)
15
Industry: Learning in Video and Computer Games
Focus: Increase sales via enhanced gaming experience
• Simulators: Many! (e.g., SimCity, Quake, SoF, UT)• Target: Control avatars, unit behaviors
• Simulators: Many! (e.g., SimCity, Quake, SoF, UT)• Target: Control avatars, unit behaviors
Status
• Few deployed systems have used learning (Kirby, 2004): e.g.,1. Black & White: on-line, explicit (player immediately reinforces behavior)2. C&C Renegade: on-line, implicit (agent updates set of legal paths)3. Re-volt: off-line, implicit (GA tunes racecar behaviors prior to shipping)
• Problems: Performance, constraints (preventing learning “something dumb”), trust in learning system
• Few deployed systems have used learning (Kirby, 2004): e.g.,1. Black & White: on-line, explicit (player immediately reinforces behavior)2. C&C Renegade: on-line, implicit (agent updates set of legal paths)3. Re-volt: off-line, implicit (GA tunes racecar behaviors prior to shipping)
• Problems: Performance, constraints (preventing learning “something dumb”), trust in learning system
Evidence of commitment
• Developers: “keenly interested in building AIs that might learn, both from the player & environment around them.” (GDC’03 Roundtable Report)
• Middleware products that support learning (e.g., MASA, SHAI, LearningMachine)• Long-term investments in learning (e.g., iKuni, Inc.)
“A computer that learns is worth 10 Microsofts.” (B. Gates, 2004)
• Developers: “keenly interested in building AIs that might learn, both from the player & environment around them.” (GDC’03 Roundtable Report)
• Middleware products that support learning (e.g., MASA, SHAI, LearningMachine)• Long-term investments in learning (e.g., iKuni, Inc.)
“A computer that learns is worth 10 Microsofts.” (B. Gates, 2004)
16
Focus: Several research thrusts
Academia: Learning inInteractive Computer Games
Status: Publication options (specific to AI & gaming)
• AAAI symposia and workshops (several)AAAI’04 Workshop on Challenges in Game AI
• International Conference on Computers and Games• Journals: J. of Game Development, Int. Computer Games J.
• AAAI symposia and workshops (several)AAAI’04 Workshop on Challenges in Game AI
• International Conference on Computers and Games• Journals: J. of Game Development, Int. Computer Games J.
• Game engines (e.g., GameBots, ORTS, RoboCup Soccer Server)Use (other) open source engines (e.g., FreeCiv, Stratagus)
• Representation (e.g., Forbus et al., 2001; Houk, 2004; Munoz-Avila & Fisher, 2004)
• Knowledge acquisition (e.g., Hieb et al., 1995)• Supervised learning of lower-level behaviors (e.g., Geisler, 2002)• Learning plans (e.g., Fasciano, 1996)• Learning opponent unit models (e.g., Laird, 2001; Hill et al., 2002)• Learning to provide advice (e.g., Sweetser & Dennis, 2003)• Learning hierarchical knowledge (e.g., van Lent & Laird, 1998)• Learning rule preferences (e.g., Ponsen, 2004)
• Game engines (e.g., GameBots, ORTS, RoboCup Soccer Server)Use (other) open source engines (e.g., FreeCiv, Stratagus)
• Representation (e.g., Forbus et al., 2001; Houk, 2004; Munoz-Avila & Fisher, 2004)
• Knowledge acquisition (e.g., Hieb et al., 1995)• Supervised learning of lower-level behaviors (e.g., Geisler, 2002)• Learning plans (e.g., Fasciano, 1996)• Learning opponent unit models (e.g., Laird, 2001; Hill et al., 2002)• Learning to provide advice (e.g., Sweetser & Dennis, 2003)• Learning hierarchical knowledge (e.g., van Lent & Laird, 1998)• Learning rule preferences (e.g., Ponsen, 2004)
17
Example integrations
Academia: Learning in Interactive Computer Games (cont.)
Name + Reference Game Engine Learning Approach Tasks(Goodman, AAAI’93) Bilestoad Projective visualization Fighting maneuvers
CAPTAIN (Hieb et al., CCGFBR’95)
ModSAF Multistrategy (e.g., version spaces)
Platoon placement
MAYOR (Fasciano, 96 Dept. of CS, U. Chicago TR 96-05)
SimCity Case-based reasoning City development
(Fogel et al., CCGFBR’96) ModSAF Genetic programming Tank movements
KnoMic (van Lent & Laird, ICML’98)
ModSAF Rule condition learning in SOAR
Aircraft maneuvers
(Geisler, 2002) Soldier of Fortune Multiple (e.g., boosting backprop)
FPS action selection
(Sweetser & Dennis, 2003) Tubby Terror Regression Advice generation
(Chia & Williams, BRIMS’03) TankSoar Naïve Bayes classification Tank behaviors
(Ponsen, 2004) Wargus/Stratagus Genetic algorithms (dynamic scripting)
Strategic rule selection
18
Summary: Some Additional Challenges withEmbedding Learning in Gaming Simulators
1. Low cpu requirements (e.g., in real-time games)2. Constraining learned knowledge
• Must not violate expectations
3. Learning & reasoning (e.g., planning)4. Isolating learning contributions (for evaluation)
1. Low cpu requirements (e.g., in real-time games)2. Constraining learned knowledge
• Must not violate expectations
3. Learning & reasoning (e.g., planning)4. Isolating learning contributions (for evaluation)
19
Specification for Integrating Learning Systems with Gaming Simulators
1. Simplifies integration!• Interests ML researchers• Interests game developers
2. Learning focus concerns at least three types of models:• Task (e.g., learn how to perform, or advise on, a task)• Player (e.g., learn a human player’s strategies)• Game (e.g., learn its objects, their relations & functions)
• State interpretation/abstraction3. Learning methods: A wide variety
• They should be able to output their learned behaviors for inspection (e.g., by game developers)
4. Game engines: Those with challenging learning tasks• i.e., large hypothesis spaces, knowledge-intensive
5. Supports reuse via modularity (to be at all feasible)• Abstracts interface definitions from game & task models
6. Free (unlike some similar commercial tools)• Preferably, open source
1. Simplifies integration!• Interests ML researchers• Interests game developers
2. Learning focus concerns at least three types of models:• Task (e.g., learn how to perform, or advise on, a task)• Player (e.g., learn a human player’s strategies)• Game (e.g., learn its objects, their relations & functions)
• State interpretation/abstraction3. Learning methods: A wide variety
• They should be able to output their learned behaviors for inspection (e.g., by game developers)
4. Game engines: Those with challenging learning tasks• i.e., large hypothesis spaces, knowledge-intensive
5. Supports reuse via modularity (to be at all feasible)• Abstracts interface definitions from game & task models
6. Free (unlike some similar commercial tools)• Preferably, open source
20
Outline (cont)
1. Objective2. Specification of a testbed3. Intended functionality
• Interaction• Types of use• Open issues
4. Example of use5. Status and Goals
1. Objective2. Specification of a testbed3. Intended functionality
• Interaction• Types of use• Open issues
4. Example of use5. Status and Goals
21EditorsEditors
TIELT
TIELTUser
Reasoning & Learning
System
Reasoning & Learning
SystemLearning System
#1
LearningSystem
#n
. . .
Knowledge BasesKnowledge Bases
GameModel
Description
Task Descriptions
GameInterface
Description
LearningInterface
Description
EvaluationMethodology Description
Displays Displays
PredictionDisplay
AdviceDisplay
EvaluationDisplayGame
Engine
GameEngine
Stratagus
FreeCiv
GamePlayer(s)
Learned Knowledge
TIELT: Testbed for Integrating and Evaluating Learning Techniques
22
TIELT Knowledge Bases
GameModel
Description
Task Descriptions
GameInterface
Description
LearningInterface
Description
EvaluationMethodology Description
Defines communication processes with the game engine
Defines communication processes with the learning system
Defines interpretation of the game• e.g., objects, operators, behaviors model, tasks, initial state
Defines the selected learning and performance tasks• Selected from the game model description
Defines the empirical evaluation to conduct
23
Example learning functionalities supported
Data Sources and Targeted Functionalities
1. Learning from observations (e.g., behavioral cloning)2. Active learning3. Learning from advice (requires inputs from user)4. Learning to advise5. …
1. Learning from observations (e.g., behavioral cloning)2. Active learning3. Learning from advice (requires inputs from user)4. Learning to advise5. …
Data sources
1. Game (world) model (possibly incomplete, incorrect)2. Simulator
• Passive state observations (e.g., behavioral cloning)• Active testing (e.g., apply an action in a state)
3. Humans• Advice
1. Game (world) model (possibly incomplete, incorrect)2. Simulator
• Passive state observations (e.g., behavioral cloning)• Active testing (e.g., apply an action in a state)
3. Humans• Advice
24EditorsEditors
TIELT
TIELTUser
Reasoning & Learning
System
Reasoning & Learning
SystemLearning System
#1
LearningSystem
#n
. . .
Knowledge BasesKnowledge Bases
GameModel
Description
Task Descriptions
GameInterface
Description
LearningInterface
Description
EvaluationMethodology Description
Displays Displays
PredictionDisplay
AdviceDisplay
EvaluationDisplayGame
Engine
GameEngine
Stratagus
FreeCiv
GamePlayer(s)
Example TIELT Usage: Controlling a Game Character
Raw State Processed State
DecisionAction
25EditorsEditors
TIELT
TIELTUser
Reasoning & Learning
System
Reasoning & Learning
SystemLearning System
#1
LearningSystem
#n
. . .
Knowledge BasesKnowledge Bases
GameModel
Description
Task Descriptions
GameInterface
Description
LearningInterface
Description
EvaluationMethodology Description
Displays Displays
PredictionDisplay
AdviceDisplay
EvaluationDisplayGame
Engine
GameEngine
Stratagus
FreeCiv
GamePlayer(s)
Example TIELT Usage: Advising a Game Player
Raw State Processed State
Decision,Reason
Advice, Explanation
26EditorsEditors
TIELT
TIELTUser
Reasoning & Learning
System
Reasoning & Learning
SystemLearning System
#1
LearningSystem
#n
. . .
Knowledge BasesKnowledge Bases
GameModel
Description
Task Descriptions
GameInterface
Description
LearningInterface
Description
EvaluationMethodology Description
Displays Displays
PredictionDisplay
AdviceDisplay
EvaluationDisplayGame
Engine
GameEngine
Stratagus
FreeCiv
GamePlayer(s)
Example TIELT Usage: Predicting a Game Player’s Actions
Raw State Processed State
Prediction,Reason
Prediction, Explanation
27EditorsEditors
TIELT
TIELTUser
Reasoning & Learning
System
Reasoning & Learning
SystemLearning System
#1
LearningSystem
#n
. . .
Knowledge BasesKnowledge Bases
GameModel
Description
Task Descriptions
GameInterface
Description
LearningInterface
Description
EvaluationMethodology Description
Displays Displays
PredictionDisplay
AdviceDisplay
EvaluationDisplayGame
Engine
GameEngine
Stratagus
FreeCiv
GamePlayer(s)
Example TIELT Usage: Updating a Game Model
Raw State Processed State
Edit
Edit
28EditorsEditors
TIELT
TIELTUser
Reasoning & Learning
System
Reasoning & Learning
SystemLearning System
#1
LearningSystem
#n
. . .
Knowledge BasesKnowledge Bases
GameModel
Description
Task Descriptions
GameInterface
Description
LearningInterface
Description
EvaluationMethodology Description
Displays Displays
PredictionDisplay
AdviceDisplay
EvaluationDisplayGame
Engine
GameEngine
Stratagus
FreeCiv
GamePlayer(s)
Learned Knowledge
Example TIELT Usage: Building a Task/Player Model
Raw State Processed State
Model
29
Game Developer
Intended Use Cases
1. Define/store game engine interface2. Define/store game model3. Select learning system & interface4. Select learning and performance tasks5. Define (or select) evaluation methodology6. Run experiments7. Analyze displayed results
1. Define/store game engine interface2. Define/store game model3. Select learning system & interface4. Select learning and performance tasks5. Define (or select) evaluation methodology6. Run experiments7. Analyze displayed results
ML Researcher
1. Define/store learning system interface2. Select game engine & interface3. Select game model4. Select learning and performance tasks5. Define (or select) evaluation methodology6. Run experiments7. Analyze displayed results
1. Define/store learning system interface2. Select game engine & interface3. Select game model4. Select learning and performance tasks5. Define (or select) evaluation methodology6. Run experiments7. Analyze displayed results
Repository of learning systems, learning interface descriptions
Repository of game engines, game interface descriptions
Repository of game engines, game interface descriptions
30
Some Open Questions
1. Game Model: • What representation?
• STRIPS operators?• Hierarchical task networks?• Explicit constraints?
• How to communicate it to the learning system?• Should it instead be maintained in the learning system?
2. What standards for:• Game engine message passing• Learning system message passing• Output format for learned knowledge
3. Support both on-line and off-line studies?4. What representations for advice and explanations?5. How to explicitly represent & apply constraints on learned
knowledge?6. How to evaluate TIELT’s utility?
1. Game Model: • What representation?
• STRIPS operators?• Hierarchical task networks?• Explicit constraints?
• How to communicate it to the learning system?• Should it instead be maintained in the learning system?
2. What standards for:• Game engine message passing• Learning system message passing• Output format for learned knowledge
3. Support both on-line and off-line studies?4. What representations for advice and explanations?5. How to explicitly represent & apply constraints on learned
knowledge?6. How to evaluate TIELT’s utility?
31
Outline (cont)
1. Objective2. Specification of a testbed3. Intended functionality
• Interaction• Types of use• Open issues
4. Example of use• Demonstration of initial GUI• Simple “city placement” task
5. Status and Goals
1. Objective2. Specification of a testbed3. Intended functionality
• Interaction• Types of use• Open issues
4. Example of use• Demonstration of initial GUI• Simple “city placement” task
5. Status and Goals
32
Outline (cont)
1. Objective2. Specification of a testbed3. Intended functionality
• Interaction• Types of use• Open issues
4. Example of use5. Status and Goals
1. Objective2. Specification of a testbed3. Intended functionality
• Interaction• Types of use• Open issues
4. Example of use5. Status and Goals
33
Status and Goals
TIELTSpecification
TIELT (Initial GUI)
Matt Molineaux
34
2. ORTS (Open Real-Time Strategy (RTS)) project:• Open source RTS game engine
• Free• Flexible game specification (via scripts)• Hack-free server-side simulation• Open message protocol: Players have total control
• Prefer ORTS to Stratagus?
2. ORTS (Open Real-Time Strategy (RTS)) project:• Open source RTS game engine
• Free• Flexible game specification (via scripts)• Hack-free server-side simulation• Open message protocol: Players have total control
• Prefer ORTS to Stratagus?
Status and Goals: Recent Influences
1. Full Spectrum Command (van Lent et al., 2004)• Multiple AI systems, one game engine
1. Full Spectrum Command (van Lent et al., 2004)• Multiple AI systems, one game engine
3. Collaboration with Lehigh University (Asst. Prof. H. Muñoz-Avila)• Extended Hierarchical Task Network (HTN) process
representation for the Game Model’s tasks?• Fall 2004 PhD candidate: First to integrate ML with Stratagus• Fall 2004 student: Will develop Game Models for us
3. Collaboration with Lehigh University (Asst. Prof. H. Muñoz-Avila)• Extended Hierarchical Task Network (HTN) process
representation for the Game Model’s tasks?• Fall 2004 PhD candidate: First to integrate ML with Stratagus• Fall 2004 student: Will develop Game Models for us
35
Conclusion
Objective
Support the machine learning community by providing an API for a set of gaming engines, the ability to select a wide variety of learning and performance tasks, and an editor for specifying and conducting an evaluation methodology.
Support the machine learning community by providing an API for a set of gaming engines, the ability to select a wide variety of learning and performance tasks, and an editor for specifying and conducting an evaluation methodology.
Status
• Started 12/03, effectively• Initial GUI implementation• Many open research questions
• Started 12/03, effectively• Initial GUI implementation• Many open research questions
Goals
• 9/04: First complete implementation• Incrementally integrate with game engines, learning systems• Document & publicize for use to gain ML interest• Subsequently, seek military/industry interest
• 9/04: First complete implementation• Incrementally integrate with game engines, learning systems• Document & publicize for use to gain ML interest• Subsequently, seek military/industry interest
And game-developer community? Other research communities?
36
Backup Slides
37
Goal: Wargaming testbed for the machine learning community–Explore learning techniques in the
context of today’s latest simulations & video games
–Facilitate exploration of strategies and “what if” scenarios
–Provide common platform for evaluating different learning techniques
New Learning Techniques
Development Environment
Video Wargaming
Testbed
API
TIELT: Initial Vision(DARPA, 11/13/03)
Technical Approach: Enable insertion of learning/KA techniques into state-of-the-art video combat & strategy games–Create API for integrating learning into selected video games
• e.g., comm. module, socket interface, client-server comms protocol & language
–Create API that enables learning in computer generated forces (CGF) tools
38
Functionality: Supervised learning using a passive dataset
API for Isolated Studies
Performance (Classifier): • Task: Classification• Interface:
Input: NoneOutput: Common access format (across all tasks & datasets)
Learning: • Task: Varies (e.g., tree, weight settings) • Interface:
Input: Data instance or set– Common format (across all tasks & systems)
Output: Classification decision
Performance (Classifier): • Task: Classification• Interface:
Input: NoneOutput: Common access format (across all tasks & datasets)
Learning: • Task: Varies (e.g., tree, weight settings) • Interface:
Input: Data instance or set– Common format (across all tasks & systems)
Output: Classification decision
39
API for Cognitive Learning
Functionality: Learning by doing/being told/observation/etc.
Performance (Cognitive System):• Task: Varied (e.g., planning, design, diagnosis, … , classification)• Interface:
Input: ActionOutput: Current state
Learning: • Task: Varies (e.g., rule application conditions) • Interface:
Input: Processed current stateOutput: Decision
Performance (Cognitive System):• Task: Varied (e.g., planning, design, diagnosis, … , classification)• Interface:
Input: ActionOutput: Current state
Learning: • Task: Varies (e.g., rule application conditions) • Interface:
Input: Processed current stateOutput: Decision
40
Role Focus State-of-the-Art AI NeedsTactical Enemies
Challenge human player Cheats, Scripts using FSMs, Path planning, expert systems
Situation assessment, user modeling, spatial & temporal reasoning, planning, plan recognition, learning
Partners Cooperation & coordination w/ human
Scripted responses to specific commands
Speech recognition, NLP, gesture recognition, user modeling, adaptation
Support Characters
Guide/interact with human Canned responses NL understanding & generation, path planning, coordination
Strategic Opponents
Develop high-level strategy, allocate resources. & issue unit-level commands
Cheating, etc. Integrated planning, commonsense reasoning, spatial reasoning, plan recognition, resource allocation
Units Carry out-high level commands autonomously
FSMs and path planning
Commonsense reasoning & coordination
Commentators Observe and comment on game play
NL generation, plan recognition
NL generation, plan recognition
Commercial Game Roles for AI(Laird & van Lent, 2001)
41
TIELT
Display Display
Editors Editors
EvaluationMethodology
Game InterfaceEditor
Percepts
UserLearning Interface
EditorGame Model
EditorTask
Editor
GameModel
Description
Task Descriptions
Pe
rf.
Ta
sk
Display DisplayEvaluation
Display
Evaluator
ActionTranslator(Mapper)
Learning OutputsActions
LearningSystem(s)
LearningSystem(s)
System#1
System#2
System#n
. . .
Translated Model (Subset)
Learning Task
GameInterface
Description
LearningInterface
Description
LearningTranslator(Mapper)
Controller
CurrentState
ModelUpdater
Database
EvaluationSettings
StoredState
AdviceDisplay Database
EngineStateGame
Engine
GameEngine
Stratagus
FreeCiv
42
TIELT
Editors Editors
Game InterfaceEditor
Sensors
UserGame Model
Editor
GameModel
Description
Up
da
tes
GameInterface
Description
ActionTranslator
Actions
GameEngine
GameEngine
CurrentState
1
2
4 3
4
In Game Engine, game begins and the colony pod is created and placed.
1
The Game Engine sends a “See” sensor message stating where the pod is.
The message template provides updates to the Game Model Description, which tell the Current Model that there is a pod at the location See describes.
4
2
The Model Updater receives the sensor message and finds the corresponding message template in the Game Interface Description.
3
ControllerModel
Updater
3
The Model Updater notifies the Controller that the See action event has occurred.
5
5
1. Sensing the Game State
43
TIELT
Editors Editors
UserLearning Interface
EditorTask
Editor
Task Descriptions
LearningTranslator
Translated Model (Subset)
LearningInterface
Description
ActionTranslator
Learning Outputs
The Controllor notifies the Learning Translator that it has received a See message.
The Learning Translator finds a city location task which is triggered by the See message. It queries the controller for the learning mode, then creates a TestInput message to send to the learning system with information on the pod’s location and the map from the Current State.
The Learning System(s) transmit output to the Action Translator.
The Learning Translator transmits the TestInput message to the appropriate Learning System(s).
12
2 23
Controller
LearningSystem(s)
LearningSystem(s)
System#1
System#2
System#n
. . .
CurrentState
StoredState
2. Fetching Decisions fromthe Learning System
1
4
2
3
4
44
TIELT
Editors Editors
Game InterfaceEditor
User
ActionTranslator
Actions
GameEngine
GameEngine
CurrentState
1
2
4
The Action Translator receives a TestOutput message from a Learning System.
The Action Translator finds the TestOutput message template and determines that it is associated with the city location task, and builds a MovePod operator (defined by the Current State) with the parameters of TestOutput.
The Game Engine receives Move and updates the game to move the pod toward its destination, or
The Action Translator determines that the Move Action from the Game Interface Description is triggered by the MovePod Operator and binds Move using information from MovePod, then sends Move to the Game Engine. Learning Interface
Editor
2
3
GameInterface
Description
LearningInterface
Description
Display Display
AdviceDisplay
3
The Advice Display receives Move and displays advice to a human player on what to do next.
51
4
2
3
3. Acting in the Game World
5
45
TIELT
Editors Editors
TaskEditor
Task Descriptions
Model
Ev
alu
ati
on
Display DisplayEvaluation
Display
EvaluatorCurrent
State
The Evaluator is triggered by the Controller according to a trigger from the Evaluation Settings.
1
The Evaluator obtains performance metrics from each Task and calculates them on the Current State.
2
The Evaluator sends the new metrics values to the Evaluation Display, which updates with the new information.
3
2 2
3
Controller1
4. Displaying an Operation to the User
46
TIELTWhen in Record mode, the Controller triggers the Database Engine when the state updates.
1
The Database Engine records the Current State in a Database for later use.
2
Later, in Playback mode, the Controller triggers the Database Engine after the Learning System indicates readiness.
3
LearningTranslator(Mapper)
Controller
CurrentState
Database
StoredState
DatabaseEngine
State
12
2
3
The Database Engine then queries a Database and retrieves a Stored State.
4
Finally, the Controller notifies the Learning System that an update has arrived and to query the Stored State for message info.
5
4
4
55
5. Retrieving States from a Database