castle - aaai 2008 - pope · castle in a nutshell virtual environment human performance models and...
TRANSCRIPT
CASTLE: A Framework for Integrating Cognitive
Models into Virtual Environments
Sponsored by U.S. Air Force Office of Scientific Research
Dr. Art Pope, SET Corporation Dr. Pat Langley, Arizona State University
CASTLE In a Nutshell
Virtual Environment
Human Performance Models and Data
Sensory inputs
World states and events
Motor controls and other actions
Other Virtual Environment Simulations
Cognitive Models
World Model
Why a virtual environment?
Linking Cognitive Models to Military Simulations
CASTLE Solves This Problem
Military Training Simulations
Simulate complex, real-time 3-D worlds
Standard interfaces available
Both human- and machine-controlled actors
Need better machine-controlled ones
Computational Cognitive Models
Emulate human problem solving
Recently have perceptual & motor modules
Allow “embodiment” in simulated worlds
Need suitable problem domains
Problem: Limited integration Special-purpose interfaces Little re-use across research programs Lab-quality software Human-like limits not always imposed
Some CASTLE Design Goals
Scalable – Multiple levels of complexity and fidelity – Multiple levels of sense / action abstraction
• E.g., visual input as pixels, surfaces or objects • E.g., muscle control or “pick up x”
Platform and language portable Efficient
– Participation in real-time simulations *
Accessible and extensible – Run-time discovery of senses and actions – Full as well as biologically-constrained access to state & events
* depending on scene complexity, required fidelity, cognitive model, computing resources…
Support for Experimentation
Import world models from diverse sources Provide tools for monitoring and control Log data for analysis and repeatability
What’s Hard
Simulating physics – detecting collisions (contacts) among surfaces – integrating motion in the presence of constraints – these are computationally expensive and unstable
Simulating optics – building 3-D models with rich structure and texture – efficiently rendering shadows, penumbras,
reflections, irradiance
Gaming applications are driving solutions – GPU-accelerated physics simulation – real-time ray tracing
With Consistency!
CASTLE Design
API through which cognitive model controls agent
Interfaces to specific simulation environments Allows distribution
across machines, languages and platforms
Renders visual input for cognitive model
Simulates physics for cognitive model’s agent
Records world states and events
Keeps cache of world state
Allows software to control framework
Allows user to control framework
CASTLE Implementation
Ice middleware
Hibernate persistence engine PostgreSQL database
Java3D scene graph
JOODE physics engine
Portico HLA RTI
Written in Java Based on open source components Data modeled and documented using UML
Attaching a Cognitive Model
Cognitive model can access API in any of several languages – C++, Java, .NET, Python, Ruby, …
– LISP via foreign function interface
Cognitive model can be on same or separate machine – on same LAN or across internet gateways and firewalls
Select Actor to Control
Attach Cognitive Model as Actor’s
Controller
Query for Actor’s Characteristics
Update Simulation
State
1. Setup
2. Execution
Poll Other Controllers
Read Sensory Data Cogitate Output Motor
Commands
Framework
Cognitive Model
Connect to Simulation
Filtering — Imposing Human Limitations
Framework imposes human-like limitations on sensory inputs, motor outputs
E.g., visibility, accuracy, speed, repeatability Based on psychophysical and human performance data Applied through mechanism accessible to experimentalists
Example: Human Visual Performance
Field of view – Monocular: 160° (w) x 135° (h) – Binocular overlap: 120° (w) x 135° (h)
Resolution – 60 cycles/deg. to 2.5 cycles/deg.
(depends on eccentricity, contrast) Dynamic range
– 102 : 1 in a single “exposure” – 105 : 1 in a single scene – 109 : 1 with 30 minutes adaptation
Temporal resolution – Max. flicker frequency: 50-60 Hz
(depends on contrast, wavelength, extent)
– Min. temporal separation: 15-20 ms Movement
– Tremor: 10-30 arc-seconds; up to 80 Hz – Drift: 1-5 arc-minutes – Microsaccades: 2-120 arc-minutes in
10-20 ms – Saccades: up to 1000°/s for 20-200 ms
Example: Human Auditory Performance
Frequency response: 16 - 20,000 Hz
Sensitivity: 130 dB range
– Min.: 10-12 watts/m2
– Max.: 10 watts/m2
– Temporary Threshold Shift (TTS):
• Reduced sensitivity after exposure to >70 dB(A)
Localization accuracy
– Front: 2° horizontal, 3.5° vertical
– Peripheral: 20°
Masking effects
– Frequency or simultaneous masking: inability to distinguish simultaneous sounds
– Temporal masking: offset attenuation lasting 50 ms
Filtering — Summary
No. Name Description
H04 Front sound localization A human can localize sounds in front to within 2 deg hor., 3.5 deg ver.
H05 Peripheral sound localization A human can localize sounds to the side to within 20 deg
H06 Sound masking Loud sounds mask a human's perception (detection) of quiet ones
H07 Sound dynamic range A human perceives sounds over a dynamic range of 130 dB
H08 Distant sounds A human is given perceptual cues as to the distance of sound sources
H09 Sound characterization A human can distinguish various categories of sounds, such as explosions, vehicles, and contact noises
S04 Field of view A human eye has a field of view of 160x135 deg
S05 Foveal resolution Foveal resolution of a human eye is 60 cycles / deg
S06 Parafoveal resolution Parafoveal resolution of a human eye is 15 cycles / deg
S07 Peripheral resolution Peripheral resolution of a human eye is 2.5 cycles / deg
Algorithms
Parameters
Java
XML
Human Performance “Spec Sheet”
CASTLE Framework
Models & Data
Configurable Senses and Motors
Vision sense – any number of “eyes”
– each with specified field-of-view, resolution, saccade muscles
– delivers images and/or sets of visible objects
Kinesthetic sense – delivers approximate position and force of each joint
Tactile sense – entire visible surface is touch sensitive
– 3-D “skin” surface mapped to 2-D touch position manifold
– resolution and sensitivity interpolated from discrete nodes
– delivers approximate position and pressure, with stochastic error
Time sense – delivers approximate elapsed time, with stochastic error
Motors / muscles – control hinged joints and wheels
– each has maximum velocity, acceleration, torque
– cognitive model can command position, velocity, or torque
– commanded values are stochastically perturbed
No cheating: All cognitive model inputs and outputs are normalized, unit-less quantities with stochastic noise.
Example: Simple Humanoid Robot
monocular vision
tactile sense on all surfaces
stable wheeled base
neck rotation
shoulder & elbow rotation
waist rotation
pivot steering
proprioception in all joints
temporal sense
Example: Outdoor Urban Environment
Current Status of CASTLE Framework
Senses implemented: – Vision: both pixel and visible surface representations – Tactile: contact location and force – Kinesthetic: joint angles and forces – Time
Actions implemented: – Position, velocity and force control of muscles / motors on hinged and rotary joints
Simulation interfaces implemented: – Interface to simulation environments via High-Level Architecture (HLA) – Testing with Delta3D simulation environment
Cognitive architecture interfaces implemented: – Interface to Icarus via Common Lisp foreign function interface
Virtual environments: – Indoor world of rooms, furniture and objects – Outdoor world of roads, buildings and vehicles
Free, open source for educational and research use