castle - aaai 2008 - pope · castle in a nutshell virtual environment human performance models and...

CASTLE: A Framework for Integrating Cognitive

Models into Virtual Environments

Sponsored by U.S. Air Force Office of Scientific Research

Dr. Art Pope, SET Corporation Dr. Pat Langley, Arizona State University

CASTLE In a Nutshell

Virtual Environment

Human Performance Models and Data

Sensory inputs

World states and events

Motor controls and other actions

Other Virtual Environment Simulations

Cognitive Models

World Model

Why a virtual environment?

Linking Cognitive Models to Military Simulations

CASTLE Solves This Problem

Military Training Simulations

Simulate complex, real-time 3-D worlds

Standard interfaces available

Both human- and machine-controlled actors

Need better machine-controlled ones

Computational Cognitive Models

Emulate human problem solving

Recently have perceptual & motor modules

Allow “embodiment” in simulated worlds

Need suitable problem domains

Problem: Limited integration   Special-purpose interfaces   Little re-use across research programs   Lab-quality software   Human-like limits not always imposed

Some CASTLE Design Goals

  Scalable –  Multiple levels of complexity and fidelity –  Multiple levels of sense / action abstraction

•  E.g., visual input as pixels, surfaces or objects •  E.g., muscle control or “pick up x”

  Platform and language portable   Efficient

–  Participation in real-time simulations *

  Accessible and extensible –  Run-time discovery of senses and actions –  Full as well as biologically-constrained access to state & events

* depending on scene complexity, required fidelity, cognitive model, computing resources…

Support for Experimentation

  Import world models from diverse sources   Provide tools for monitoring and control   Log data for analysis and repeatability

What’s Hard

  Simulating physics –  detecting collisions (contacts) among surfaces –  integrating motion in the presence of constraints –  these are computationally expensive and unstable

  Simulating optics –  building 3-D models with rich structure and texture –  efficiently rendering shadows, penumbras,

reflections, irradiance

  Gaming applications are driving solutions –  GPU-accelerated physics simulation –  real-time ray tracing

With Consistency!

CASTLE Design

API through which cognitive model controls agent

Interfaces to specific simulation environments Allows distribution

across machines, languages and platforms

Renders visual input for cognitive model

Simulates physics for cognitive model’s agent

Records world states and events

Keeps cache of world state

Allows software to control framework

Allows user to control framework

CASTLE Implementation

Ice middleware

Hibernate persistence engine PostgreSQL database

Java3D scene graph

JOODE physics engine

Portico HLA RTI

  Written in Java   Based on open source components   Data modeled and documented using UML

Attaching a Cognitive Model

  Cognitive model can access API in any of several languages –  C++, Java, .NET, Python, Ruby, …

–  LISP via foreign function interface

  Cognitive model can be on same or separate machine –  on same LAN or across internet gateways and firewalls

Select Actor to Control

Attach Cognitive Model as Actor’s

Controller

Query for Actor’s Characteristics

Update Simulation

State

1. Setup

2. Execution

Poll Other Controllers

Read Sensory Data Cogitate Output Motor

Commands

Framework

Cognitive Model

Connect to Simulation

Filtering — Imposing Human Limitations

  Framework imposes human-like limitations on sensory inputs, motor outputs

  E.g., visibility, accuracy, speed, repeatability   Based on psychophysical and human performance data   Applied through mechanism accessible to experimentalists

Example: Human Visual Performance

  Field of view –  Monocular: 160° (w) x 135° (h) –  Binocular overlap: 120° (w) x 135° (h)

  Resolution –  60 cycles/deg. to 2.5 cycles/deg.

(depends on eccentricity, contrast)   Dynamic range

–  102 : 1 in a single “exposure” –  105 : 1 in a single scene –  109 : 1 with 30 minutes adaptation

  Temporal resolution –  Max. flicker frequency: 50-60 Hz

(depends on contrast, wavelength, extent)

–  Min. temporal separation: 15-20 ms   Movement

–  Tremor: 10-30 arc-seconds; up to 80 Hz –  Drift: 1-5 arc-minutes –  Microsaccades: 2-120 arc-minutes in

10-20 ms –  Saccades: up to 1000°/s for 20-200 ms

Example: Human Auditory Performance

  Frequency response: 16 - 20,000 Hz

  Sensitivity: 130 dB range

–  Min.: 10-12 watts/m2

–  Max.: 10 watts/m2

–  Temporary Threshold Shift (TTS):

•  Reduced sensitivity after exposure to >70 dB(A)

  Localization accuracy

–  Front: 2° horizontal, 3.5° vertical

–  Peripheral: 20°

  Masking effects

–  Frequency or simultaneous masking: inability to distinguish simultaneous sounds

–  Temporal masking: offset attenuation lasting 50 ms

Filtering — Summary

No. Name Description

H04 Front sound localization A human can localize sounds in front to within 2 deg hor., 3.5 deg ver.

H05 Peripheral sound localization A human can localize sounds to the side to within 20 deg

H06 Sound masking Loud sounds mask a human's perception (detection) of quiet ones

H07 Sound dynamic range A human perceives sounds over a dynamic range of 130 dB

H08 Distant sounds A human is given perceptual cues as to the distance of sound sources

H09 Sound characterization A human can distinguish various categories of sounds, such as explosions, vehicles, and contact noises

S04 Field of view A human eye has a field of view of 160x135 deg

S05 Foveal resolution Foveal resolution of a human eye is 60 cycles / deg

S06 Parafoveal resolution Parafoveal resolution of a human eye is 15 cycles / deg

S07 Peripheral resolution Peripheral resolution of a human eye is 2.5 cycles / deg

Algorithms

Parameters

Java

XML

Human Performance “Spec Sheet”

CASTLE Framework

Models & Data

Configurable Senses and Motors

  Vision sense –  any number of “eyes”

–  each with specified field-of-view, resolution, saccade muscles

–  delivers images and/or sets of visible objects

  Kinesthetic sense –  delivers approximate position and force of each joint

  Tactile sense –  entire visible surface is touch sensitive

–  3-D “skin” surface mapped to 2-D touch position manifold

–  resolution and sensitivity interpolated from discrete nodes

–  delivers approximate position and pressure, with stochastic error

  Time sense –  delivers approximate elapsed time, with stochastic error

  Motors / muscles –  control hinged joints and wheels

–  each has maximum velocity, acceleration, torque

–  cognitive model can command position, velocity, or torque

–  commanded values are stochastically perturbed

No cheating: All cognitive model inputs and outputs are normalized, unit-less quantities with stochastic noise.

Example: Simple Humanoid Robot

monocular vision

tactile sense on all surfaces

stable wheeled base

neck rotation

shoulder & elbow rotation

waist rotation

pivot steering

proprioception in all joints

temporal sense

Example: Outdoor Urban Environment

Current Status of CASTLE Framework

  Senses implemented: –  Vision: both pixel and visible surface representations –  Tactile: contact location and force –  Kinesthetic: joint angles and forces –  Time

  Actions implemented: –  Position, velocity and force control of muscles / motors on hinged and rotary joints

  Simulation interfaces implemented: –  Interface to simulation environments via High-Level Architecture (HLA) –  Testing with Delta3D simulation environment

  Cognitive architecture interfaces implemented: –  Interface to Icarus via Common Lisp foreign function interface

  Virtual environments: –  Indoor world of rooms, furniture and objects –  Outdoor world of roads, buildings and vehicles

  Free, open source for educational and research use

castle - aaai 2008 - pope · castle in a nutshell virtual environment human performance models and...

Documents