vincze cognitive vision fermo2006 - univpmpsfmr.univpm.it/slide/vincze_cognitive_vision.pdf ·...

Cognitive Vision

Markus VinczeAutomation Control Institute

Vienna University of Technologyvincze@acin.tuwien.ac.at

www.acin.tuwien.ac.at

PSFMR – Fermo, 11.-16.9.2006

Idea of Today• Overview of Cognitive Vision Methodology

• Scratch at cognitive science and cognitive systems

• Open your view to other disciplines

• Point out many open problems that are simply awaiting a good student for resolution

Content• Overview

• Tracking

• Detection

• Cognitive Vision– Vision systems– Integration

– Computer Vision Cognitive Vision ...

– Cognitive Systems

• Ambient Intelligence

• “Natural” computer interfaces (e.g., MIT, MS)

• Japan: developmental (humanoid) robotics

Ideas, Drives, Future(s)

STARTREKOdyssey 2001

MIT icom MS EasyLiving

Personal Assistance• Support user by being aware of situation

• Distributed mobile and ambient devices

Example situations:

• Information assistance, guidanceto location, assembly help

• Alerting of dangerous situations

Personal Assistant• User guidance to

– operate a machine (e.g., copy machine, video/CD-player)

– assemble objects (e.g., furniture, machine maintenance)

• Exploit Augmented Reality to display information

• On-line interpretation to aid user

Personal Assistance – Ingredients

Capabilities• Detection, tracking,

recognition, spatio-temporal reasoning, ...

• Interpret human intentions before acting

• Personalised behaviour

Austrian CV Project: understandand react to situations

Robot Helper g„James, please bring me my cup“

Capabilities• Navigate, avoid obstacles• Detect & recognise objects• Grasp objects• Interact with user• Cope with new situations• Dependable and safe behaviour

Vision for Interaction

Capabilities• Robust detection, tracking• Object and gesture

recognition• Spatio-temporal object

relationships (3D)• Interpretation, understanding

ActIPret: interpretation of humans who handle objects

Cognitive Vision ComponentsEU project ActIPret

Object recognition(CMP)

Robust object

detection

Stereo hand tracking (FORTH)

Object tracking

Cognitive Vision ComponentsEU project ActIPret

Hand gesture recognition (COGS)

Spatio-temporal object reasoning (in 3D)

'Hand 0 picked up object cd-linux-0'Semantic interpretation

GUI – Graphical User Interface

On-line display of 3D results(trajectories, recognition and

interpretation results, )

'Hand 0 pressed button ejectButton-2''Hand 0 picked up object cd-linux-0'

Stereo obser-vation

Off-line VR replay of activity

EU project ActIPret

MOVEMENT• Movment of

– Persons, objects, information

• Stereo vision for navigation– Segement floor

– Obstacle detection

– Detection of tables and chairs

person

Infor-mationobject

MOVEMENT – EU IST Projekt 2004-2007

Vision Capabilities• Vision can provide many capabilities

• Vision itself has many capabilities (redundancies)– Temporal redundancy

– Stereo, many views

– Many cues per image

– Vast number of features

– Multiple representations

• Integrate with other system functions

Summary• Many, many vision (perception) capabilities

• Capabilities operate in context

A consequence:

• To solve even a simple task ⇒ system

Another consequence:• More than vision – e.g., cognitive vision

• Integration – architecture and tool

Content• Overview

• Tracking

• Detection

• Cognitive Vision– Vision systems

– Integration– Computer Vision Cognitive Vision ...

– Cognitive Systems

Integration: Vision is not Alone• Other sensors

– odometry, distance, touch

– Time-of-flight, ultrasound, infrared, ...

– acoustic, olfactory, ...

• „Envisioned“ embodiment

• Task, situation

• Knowledge representations, common sense

• Semantics, language

System Requirements• Task and data-oriented

• Context-based

• Reactive control

• Enable distribution

• Separated development

• Modular + scaleable

• Reusability (!)

Some Options• Dataflow: pipes & filters [Unix shell]

• Layer architectures [OS, ISO-OSI]

• Object-oriented [Corba, ...]

• Event-driven [HMI]

• Shared data: blackboard [DBs]

• Agent-based [AI]

• Component based [Software engineering]

Integration based on Components• Components encapsulate functions

– Service principle - „Yellow pages“– Dynamic linking

• Reusable, distributed, scalable

• Simple (installation, programming)

• „Fast“

Capabilities of Each Component• Function

• Communication

• Memory

• Self-evaluation (reports confidence, accuracy, resource demands)

• Control (processing, view)

• Context (exploit it, report it)

ComponentComponent

Control& Data

Re-active Dynamic Integration

ComponentComponent

Component

•• Avoiding negotiationAvoiding negotiation

Example Architecture for ActIPret

Task-related Space of Interest

Zwork (Zillich‘s network)• RPC (Remote Procedure Calls)• Asynchronous• Automatic marshalling of messages• Simple debugging (gdb/ddd)• Logging• GUI Component

Service ProviderService ProviderService Provider

Service Requester

managingframe

Implemen-tation;

providerinterface

requesterinterfaceComponent structure.

• Every lab has another approach:– Definition and notation of functions, skills (and context)

• Key: practicability + ease-of-use

• „Standard“ interface definitions– Get more specific along project

• Integration learn to build systems– How do parts work together?

– Learn which parts work together.

Conclusion Integration

Content• Overview

• Tracking

• Detection

– Integration

– Computer Vision Cognitive Vision ...– Cognitive Systems

Computer Vision• Computer Vision is a subfield of AI concerned with

processing of images from the real world.

• Purpose: program a computer to "understand" a scene or features in an image.

• Methods: detection, segmentation, tracking, pose estimation, mapping to 3D model, recognition of objects in images (e.g., human faces, robot navigation)

• Achieved by means of pattern recognition, statistical learning, projective geometry, image processing, graph theory and other fields.

Pattern Recognition• "the act of taking in raw data and taking an

action based on the category of the data" [1]

• Goal: detect and learn known patterns

• Methods: statistics, machine learning, ...

[1] Richard O. Duda, Peter E. Hart, David G. Stork (2001) Pattern classification (2nd edition), Wiley, New York.

Computer Vision – A SummaryMany solutions and many more problems, e.g.:

1. Real world?

2. Brittle, thresholds.

3. „Understand“ scene?

4. „Understand“ features?

5. Segment ⇔ recognise?

6. Replication of experiments?

7. Formal description of capabilities?

Computer Vision – Lessons Learned

Serious work with real world images.

Robust, threshold-free methods.

To „understand“ act in scene.

Huge set of features.

Segment AND recognise.

PETS, DBs.

Problems: start to work on the core problems, e.g.:

1. Real world?

2. Brittle, thresholds.

3. „Understand“ scene?

4. „Understand“ features?

5. Segment ⇔ recognise?

6. Replication of experiments?

7. Formal description of capabilities?

Machine Vision• = application of computer vision to factory

automation.

• A MV system is a computer that makes decisions based on the analysis of digital images.

LightLighting Object Sensor

DataResultsControl

Media Reflection

Processing

Figure: Components of a machine vision system.

Machine Vision• Problem 1: narrow applications

• Problem 2: understanding?

• Lesson learned: more options to control

• Lesson learned: consider complete system

DataResultsControl

Media Reflection

Processing

Figure: Components of a machine vision system.

DataResultsControl

Media Reflection

Processing

Cognitive Vision

• embodiment (AmI, PDA, AR, VR, robotics)

representations •

computer vision •

• system architecture

• machine learning

• user experiments, usability

neuro science •information theory •

• cognitive science

• artificial intelligence

systems engineering •

Cognitive science Computer Vision

Cognitive Vision

Cognitive Science• = scientific study either of mind or of

intelligence, it is inherently interdisciplinary – E.g., psychology, neuroscience, linguistics,

philosophy, computer science, and biology

• Cognition = „coming to know“ = act of acquiring knowledge

• be aware of and judge the result of this act

• "cognitive" - any kind of mental operation or structure that can be studied in precise terms[Lakoff, Johnson, 1999]

Cognitive (Computer) Vision One definition:

CV is the act of seeing to obtain empirical factual knowledge– Act: some form of body, self-awareness,

communication, evolving

– Seeing: all computer and biological vision has to offer

– Empirical: based on observation and experiment

– Factual: objective reality, repeatable

– Knowledge: facts acquired, models, procedures

Cognitive (Computer) VisionAnother definition of Cognitive Vision

• 4 levels of generic computer vision functionality– Detection, localization, recognition, and understanding

– Purposive goal-directed behaviour

– Adapting to unforeseen changes of environment

– Anticipate the occurrence of objects or events.

• Achieves capabilities through – Learning semantic knowledge (i.e. contextualized

understanding of form, function, and behaviour)

– Knowledge about environment, itself, and relationsips

ECVision (2002 - 2005), www.ecvision.org

Why Cognitive Vision?• Old terms did not succeed, new term might

• Interdisciplinarity is something good

• „New“ understanding:– Active (since 1987)

– Learning, evolving

– Embodied (envisioned)

• 1000x more computing power in last 15 years

• Understanding cognitive science

Cognitive Vision - Essentials• Seeing: eyes, head, vision data processing

• See what? Objects, humans, environment– „Come to know“ about them

– Only knowledge relevant to seeing

• Act of seeing upon – what is visible or

– becomes visible before it becomes invisible again

For Example: Hide & Seek

Kind.mov

• Key: combining top-down (cognitive) with bottom-up (vision) processes

Cognitive Vision - Challenges• Object permanence (hiding but existing)

• Spatial and dimensional awareness (close or far range, spatial relationships, stacking objects)

• Temporal awareness (synchronous events, e.g., pointing)

• Hierarchical object concepts

• Detect something new

• Awareness of camera/body (view point reasoning, self-localisation)

Cognitive Vision System (CVS)• Instantiation of the bits and pieces necessary

for cognitive vision– System = interacting group of items forming a

unified whole [Merriam Webster]

• Is it only the „seeing“ part of a system?

• Action? How much action?

• Body? How much body?

CVS as part of a Cognitive System

Cognitive Vision System• Vision under egomotion (e.g., + inertial sensors)

• Interaction with other (vision) systems

• Hand-eye coordination (throw objects, eye-body coordination)

• Interpreting gestures

• Search for sounds with eyes (+ auditory cues)

• Viewing the world as seen from a third person’s perspective (= “perspective taking”)

More Challenges• Representations: objects, relationships,

situations, context, „visual“ semantics

• Understanding: function and use of objects

• System: support multiple tasks & autonomy

• Real world: work with it and use it (context)

Possible Methodological Approach• Cognitive reference: people

• Learn from system evaluation by people

• Build system close to people, i.e., in their environment

• Learn from, not copy, biological vision systems

Content• Overview

• Tracking

• Detection

– Integration

– Computer Vision Cognitive Vision ...

– Cognitive Systems and examples

• Foresight Cognitive System Project, UK– natural or artificial information processing systems

– perception, learning, reasoning and decision making

– communication and action

Cognitive Systems• FP6, EU

– physically instantiated (embodied) systems

– perceive, understand (semantics) and interact

– evolve in order to achieve human-like performance in activities requiring context specific knowledge.

Interdisciplinarity of CognitiveSystems and Cognitive Science• E.g., psychology, neuroscience, linguistics,

philosophy, computer science, biology, …

• Human as reference for cognitive capabilties

• Several examples– Navigation

– Vision

• Source of inspiration

Path Integration in Insects

[Mallot]

Path Complexity does not Impair Visual Path Integration

• Path segments in VR

• No effect on number of segments

• Directionand distance encoding?

[Wiener]

Geographical Slant as Compass• Ground plane

slanted 4 degrees– Perceived visually

and via force feedback

• Pointing the right way becomes easier

[Restat, Mallot]

Orientation in Children• Children re-orient by the shape of the room

• Sensitive to surface layout: distance, angle, sense

• Do not user landmarks for orientation

• Landmarks are detected and remembered

[Gouteux, Spelke, 2001]

Orientation in Adults• Landmarks are used and described verbally

• Not used with verbal interference

• With interference, adults become like rats.

• Orientation on surfaces: children, rats, fish

[Ratliff, Necombe, 2005]

SLAM?• Do humans SLAM?

• Orientation based on main structure

• Icon-based navigation– Plus obstacle avoidance

• Knowledge about what to expect– On airport, train station, ...

Cognition as Control• Hierarchy to cope with complexity

[Hollnagel]

Neuroscience• Study of the human nervous system, brain, and

biological basis of consciousness, perception, memory, and learning

• Brain has a triad structure – reptilian brain controls basic sensory motor functions

– mammalian brain: emotions, memory, biorhythms

– neocortex or thinking brain that controls cognition, reasoning, language, and higher intelligence

• Continued reconnecting and learning – Learn from real experiences, integrated "whole" ideas

Cortex – Examples• About 30 regions involved in vision, half of

cortex

• MT: detecting areas of motion in images (0.1 s after motion is in image)

• V1: cells respond to oriented edges

• V1: BUT 85% of axions come not from retina

• Hippocampus – place cells, direction cells

• Cognitive map [Tolman 1948]

Illusions• Study human vision

system

• Experience of eye: world is benign– Counter example: Gorilla

• Computer Vision suffers from serial processing– Human: all cues in parallel

– Subsequent fight for what is most plausible

Object Perception• Spatio-temporal constraints to form objects

(4-month olds)

[Spelke: Principles of Object Perception, 1990]

Human Vision Learns• Child: perceived

as one object

• First:motion, surfaces (see before)

• Later: shape, appearance

[Spelke]

Gestalt Laws• Develop in humans

• Occlusion: completion depends on experience

• Criteria– Good continuation

– Similarity

• Animals?

[Spelke]

Chicks• Perception of occluded objects without

experience of occlusion

• Inborn object completion

[Regolin]

Number• Multiple Object Tracking

– A) Following several moving targets

– B) Connections disrupt expectations

– C) Too many to follow

• Set size limit: 3-4– Children, adults, animals

• Perception– Cohesion, contact, continuity

– Auditory set size limit: 3-4[Scholl, Wynn, Mittroff, van Marle]

Sensitivity to Geometry• Response to geometrical relationsships

[Dehaene, Izard, Pica, Spelke, 2006]

Sensitivity to Geometry – Results • Strikingly similar patterns

– Munduruku live in Amazonas region

– Adults in Boston improve over children

• Relationships represent Euclidian geometry

Final Conclusion• Human is the only working vision and

cognitive system

• Cognitive Science and related fields throw some light on how it may work inspiration

• Cognitive science tries to put it all together

Experience, recommendation:• Design methods without parameters

• Work with system, not individual components

vincze cognitive vision fermo2006 - univpmpsfmr.univpm.it/slide/vincze_cognitive_vision.pdf ·...

Documents

likelihood evaluation of high-dimensional spatial latent...

1. spatial form as inherently three-dimensional...2 spatial...

characterizing and improving spatial visualization...

tars megiteles-1-vincze-ivett

multi-dimensional spatial sound design for ‘on the …

methodology to study the three-dimensional spatial

vincze gergely - elte

title two-stage two-dimensional spatial competition between...

kernel intensity estimation of 2- dimensional spatial

equilibria in multi-dimensional, multi-party spatial...

multidimensional indexing: spatial data management & high...

fine spatial resolution simulation of two-dimensional

one-dimensional spatial join processing using a dot-based...

kiresné vincze ilona - Égethető gyurmacsodák

exploring the e ects of spatial aggregation ·...

high spatial resolution three-dimensional mapping of...

instantaneous three-dimensional sensing using spatial...

vincze laszlo dissertation

the e2d programming language for two-dimensional spatial...

a spatial algorithm to reduce phase wraps from two...