frank e. ritter urmila kukreja robert st. amant

1

Frank E. RitterUrmila KukrejaRobert St. Amant

Including a Model of Visual Processing With a Cognitive Architec-ture to Model a Simple Teleoperation Task

Journal of Cognitive Engineering and Decision Making, 1(2), 2007.

Introduction

More Direct Visual Processing for a Cognitive Architecture

Exploring This Approach in an Example HRI Task

Results

Conclusion

Contents

2 / 17

Although HCI evaluation techniques are robust and may have clear application to HRI, HRI poses more difficult problems for evaluation.

HRI covers a wider range of tasks and environments.

Yanco et al. (2004) identified three classes of evaluation methods common in HCI that might be used in HRI:

Inspection methods carried out by interface design experts Empirical methods based on observation and testing with users Formal methods based on analytical models of interaction

There are other differences between HCI and HRI that make some of the evaluation techniques less applicable.

Scholtz (2003) reviewed these differences: 6 dimensions.

The focus of this article. Is on a technique for HRI evaluation based on detailed cognitive modeling.

Introduction (1/5)

3 / 17

Proponents of cognitive modeling can point to a number of potential results that make pursuit of the approach worthwhile.

CMs provide accounts of user behavior that generalize across a variety of envi-ronments and tasks.

User and model can perform the tasks in those environments. CMs can offer such predictions and explanations at a more detailed level than is

often possible with other techniques. What cognitive, perceptual, and motor mechanisms are being used.

CMs offer new opportunities for experimental evaluation of interactive systems. Less expensive than real users.

CMs are dynamic.Interactions may be difficult to specify and to analyze with static description.

4 / 17

An important first challenge is determining whether it is feasible to build low level cognitive models that can carry out HRI tasks as surrogate

users working directly with interfaces.

Introduction (2/5)

Cognitive Models for Evaluating Interactive SystemsTypical user interface evaluation processes using a cognitive model

Pilot studies and literature review Developing a cognitive model for one or more tasks Designing experiment Collecting data for comparison between human and model performance Validating and improving the cognitive model

5 / 17

Introduction (3/5)

Implications for User Modeling in HRISome of the requirements for interactions with a robot control task (User must):

Carry out a repetitive task in a changing environment. Interpret scenes captured by one or more cameras on a robot. Deal with unfamiliar classes of information from novel sensors. Integrate information from a physical and a virtual environment. Often provide high-level guidance to a robot. Control multiple robots at the same time.

Scholtz et al.(2004) described six dimensions in which HRI differs from HCI: A wider variety of roles that users and operators play in controlling a robot. The less predictable ways in which a robot platform interacts with the physi-

cal environment. The dynamic behavior of the hardware (leading to autonomous changes) The nature of the user’s interaction environment. The number of robots that the user controls. The autonomous behavior of the robot.

6 / 17

Introduction (4/5)

Implications for User Modeling in HRIThe user in an HRI environment must potentially deal with several coupled prob-

lem-solving environments: Decision making in the environment (partially observable, dynamic, contin-

uous). Monitoring of robot hardware and robot behaviors in the environment. Interacting with an HRI control interface.

In the environmental features noted previously, there are two recurring issues: The need to interpret a complex information environment (most of the data

provided by visual channels) and Manage complexity in controlling the behavior of robots.

7 / 17

Introduction (5/5)

When evaluating a computer system, either in an HCI or an HRI context, a cognitive model needs access to the information that a user sees.

An open issue is how environmental information can best be translated into visual objects.

ACT-R models typically receive information about a visual environment through spe-cialized functions that interact with specially instrumented interface windows.

These windows are built into a tool that is included with ACT-R to create simu-lated task environments.

We worked toward extending the visual-processing capabilities of a cognitive model-ing architecture to use the environment without modifications.

More Direct Visual Processing for a Cognitive Architecture (1/5)

8 / 17

SegMan(Segmentation/Manipulation) OverviewImage-processing substrate (cognitive modeling component)

Extends ACT-R’s (and other cognitive architecture) visual processing to work with interfaces through parsing their bitmap.

9 / 17


SegMan serves as an intermediary between an environment (vi-sual scene) and the cognitive modeling system (ACT-R).

(source: Amant et al., 2005)

SegMan’s three types of visual properties: region/group properties Properties of pixel regions, relationships between pixel regions, and compos-

ite properties of pixel groups

10 / 17

Figure 1 illustrates four pixel regions:Figure 1. A schematic of four pixel regions, with annotations.


SegMan summarizes its neighbor value with the function:

v(p) = 22+24+26 = 84

SegMan OperationSegMan generates and maintains its representation in three stages:

1) Segmentation: Sampling pixels at different resolutions. 2) Feature computation: properties are computed for each pixel region, re-

gions are combined into pixel groups, and group properties are computed.3) Interpretation: involves a top down matching process between library tem-

plates.SegMan’s current design is strongly influenced by Roelfsema’s (2005) theory of

visual processing. Roelfsema (2005) identified two main classes of elemental operators.

Binding operators: cuing/searching/tracing/region filling/associationMaintenance-matching: record intermediate results for further process-

ing.

Cuing and SearchingCuing: identification of properties of the visual pattern at a specified location.

Search: determining the location of a predefined visual pattern.

11 / 17


Tracing and Region FillingTracing: iteratively identifying connectivity properties between simple visual el-

ements.Region filling: can be viewed as a generalization of curve tracing to two dimen-

sions by identifying areas to be characterized as single visual objects.

AssociationThe linkage of features that co-occur repeatedly.

MatchingThe process of identifying similarities (or differences) between stimuli.

Tracking moving objectsSegMan reprocesses information iteratively, and it maintains a representation of

its immediately preceding results.

12 / 17


SegMan can be evaluated in two ways. First, can it perform the task of interest using psychologically plausible mecha-

nisms? We have used SegMan in combination with ACT-R to model a variety of

tasks. Second, is the system’s performance adequate for dynamic tasks?

We carried out both an evaluative and functional test of this combined archi-tecture of SegMan and ACT-R as part of a larger HRI study.

HRI User ModelThe model performs the same driving task as humans do.The model uses a pseudo-fovea that is focused on the camera view window.

A pixel region template was defined to identify the path as a visual object in the camera view.

Exploring This Approach in an Example HRI Task (1/2)

13 / 17

HRI Study

Participants: 30 students Users were divided into 3 groups: visible / previous seen / unseen condition

Task: navigate the robot to pick up the cup and return, and program it to do two tasks with its visual programming language.

14 / 17

Figure 2. The ER1 Robot System Figure 3. The HRI interface that users saw when driving the ER1 robot remotely.

Figure 4. The physical task environmentfor the human users.

Exploring This Approach in an Example HRI Task (2/2)

Results (1/2)

15 / 17

Figure 6. Comparison of learning on the predicted and actual task time for the model and three conditions.

Figure 5. Comparison of the model’s predicted task time with human performance.

Figure 8. Average mouse click duration for human drivers in each group and by the model.

Figure 7. The number of mouse clicks by the human drivers in each group and by the model to navigate the course and pick up the cup.

John and Newell (1989) recommend using models as reasonable approximations of hu-man performance if outputs do not deviate by more than 20% from observed data.

This model does not account for the differences between the conditions in the human navigation experiment.

We used the different conditions to explore performance in this task. The model most closely approximates users who are in the seen condition.

Limitations on the model as the representation of human performances Model development and veridicality

The model was not tailored to the differences between the previously seen, unseen, and visible conditions for users.

Environmental constraintsNavigation path without obstacles was basically static, deterministic.

Task constraintsNavigation task was a very simple form.

Model constraintsThe model resorted to random search when the path is no longer visible.

16 / 17

Results (2/2)

Recommendations for SegMan and ACT-RThe use of SegMan should be seen as an extension of ACT-R to support direct in-

teraction with existing interfaces. One modification to SegMan was necessary to support interacting with it

this more complex task.

Recommendations for HRI Interface DesignThis study has not yet progressed to the point that it is possible to identify inter-

action problems that cannot be identified by more conventional evaluation meth-ods, such as inspection or simple user testing.

Limitations, Implications, and Future WorkMore task will have to be modeled for evaluation on a routine basis.SegMan interacts with hardware, making time predictions is more difficult to

record accurately because the model’s timing has to be synchronized with and in-teract with the robot platform, and it has to run in real time.

Conclusion

17 / 17

18

Binding operators Binding operators establish groupings among visual objects that are not computed

in early visual processing.

Maintenance operators It is not enough to select an object, but the observer must be able to maintain the

object in memory for future cognitive manipulations.

Roelfsema’s elemental operators in vision

frank e. ritter urmila kukreja robert st. amant

Documents