dagstuhl oct091 object recognition through reasoning about functionality: a survey of related work...
TRANSCRIPT
Dagstuhl Oct09 1
Object Recognition Through Reasoning About Functionality: A Survey of Related Work and Open Problems
Louise StarkUniversity of the Pacific
Stockton, California
Melanie SuttonUniversity of the West Florida
Pensacola, Florida
Dagstuhl Oct09 2
Function-Based ResearchFunction-Based Research
Dr. Louise StarkUniversity of the PacificStockton, CA
Dagstuhl Oct09 6
Seminar GoalsSeminar Goals
• This seminar brings together scientists from disciplines such as computer science, neuroscience, robotics, developmental psychology, and cognitive science
Dagstuhl Oct09 7
Seminar GoalsSeminar Goals
• Hope to further the knowledge• how the perception of form relates to
object function • how intention and task knowledge (and
hence function) aids in the recognition of relevant objects
Dagstuhl Oct09 8
OverviewOverview
• Recognition based on functionality• Overview of GRUFF approach• Functionality in Related Disciplines• Open Problem Areas
Dagstuhl Oct09 9
Function-based ApproachesFunction-based Approaches
Cognitive Psychology/Human Perception
Artificial Intelligence Computer Vision
Robotics
Representations of object categoriesHuman-robot interaction strategiesWayfinding
Formal representations of knowledgeMachine learning techniques to automate reasoning
Document/aerial image analysisInterpreting human motionObject recognition/categorization
Mapping of indoor environmentsObject detectionNavigation/interaction plansFormalisms for autonomous robot control
Dagstuhl Oct09 10
Computer Vision?Computer Vision?
• Deriving meaningful descriptions of the environment from images•Descriptions needed for
•Recognition•Manipulation•Reasoning about objects
Dagstuhl Oct09 11
Generic Object RecognitionGeneric Object Recognition
•Minsky (1991)•Argued for the necessity of representing knowledge about functionality
•“… rarely use a representation in an intentional vacuum, but we always have goals…”
•“… we must classify things… according to what they can be used for.”
Dagstuhl Oct09 12
MotivationMotivation
Parameterized Model Structural Model
Could these be
recognized?
Dagstuhl Oct09 13
GGRUFFRUFF
chair (cher) n. - a piece of furniture for one person to sit on
GGenericRRecognitionUUsingFForm and FFunction
Dagstuhl Oct09 14
What is the goal?What is the goal?
Develop alternative approaches to genericobject recognition & manipulation
- concentrate on man made objects (artifacts)
Human artifacts – existence or non/existence of properties can be deduced by analyzing the shape of an object
For any particular object category – there is some set of functional properties shared by ALL objects in that category.
Dagstuhl Oct09 15
Approach to the ProblemApproach to the Problem
•Derive the format of my function-based representation• Confirm feasibility of appoach test domain-
perfect input - planar face models• Expand the domains• Test real data• Interact to confirm functionality• Exploit contextual information
Dagstuhl Oct09 16
Knowledge in GKnowledge in GRUFFRUFF is of three types: is of three types:
A category hierarchy which specifiessuperordinate / basic / subordinate categories
furniture chair arm chair
Functional properties that define each catgory(provides_sittable_surface, provides_stability,...)
Knowledge primitives used to reason about shape(dimensions, relative orientation, ...)
All organized into a "category definition tree"which is GRUFF's knowldge about the world.
Dagstuhl Oct09 17
Category Representation TreeCategory Representation Tree
Conventional Chair
Provides Sittable Surface
Provides Stable Support
Dagstuhl Oct09 18
We imagine the definition of a generic object category to be something like...
straight_back_chair ::= provides_seating_surface +
provides_stability + provides_back_support_surface
and recognition is conceptualized as ...
provides_arm_support
Provides_sittable_surface
provides_stable_support
Provides_back_support
Dagstuhl Oct09 19
A functional requirement such as : provides_sittable_surfaceis implemented as a sequence of calls to shape-based operators.
dimensions(shape_element, dimensions_type, range_parameters)
relative_orientation(normal 1,normal 2, range_parameters)
clearance(shape_element clearance_volume)
Shape-based Knowledge PrimitivesShape-based Knowledge Primitives
Dagstuhl Oct09 20
Abstract shape reasoning
• Metric dimensions (width, depth, height, area, contiguous surface, volume
• Proximity• Relative orientation• Clearance• Stability• Enclosure
Knowledge PrimitivesKnowledge Primitives
Dagstuhl Oct09 21
Physical interaction reasoning
• Change orientation• Apply force• Observe deformation
Knowledge PrimitivesKnowledge Primitives
Dagstuhl Oct09 22
Value returned from knowledge primitive invocation
1.0
Evaluation
Measure
0.0
least low high greatest
ideal ideal
Values of Shape Property
Evaluation MeasuresEvaluation Measures
Dagstuhl Oct09 23
•Combine required measurements using probabilistic AND (0-1)
•Combine descendent subcategory node measure using probabilistic OR
Combining EvidenceCombining Evidence
Dagstuhl Oct09 24
• Category representation graph is control structure
• Structural Constraint Propagation – subcategory nodes constrained by what was found for the parent
Recognition ProcessRecognition Process
Dagstuhl Oct09 25
2 approaches
1. Check all known categories in the knowledge base
2. Confirm/deny object can/cannot function as a specified (sub)category
Recognition StageRecognition Stage
Dagstuhl Oct09 28
GRUFF - Generic object recognition system Reasons about and generates plans for understanding 3D scenes of objects
Extension to Context-based Reasoning - Determine significance of accumulated functional evidence to infer the existence of scene concepts
Context-based ReasoningContext-based Reasoning
Dagstuhl Oct09 29
What makes an 'office' an office?
A desk with at least one chair in close proximity.
You categorize areas or workspaces bythe functional configuration of the objectsin the area.
Functionality in the LargeFunctionality in the Large
Dagstuhl Oct09 30
Name: OfficeType: CategoryFunction Verification PlanRealized by Potential Results
Name: Provides potential seating
Name: Provides potential worksurfaces
Shape-basedReasoning
Name:Infer Seating AreasName:Infer Back Support
Name: Infer worksurfaces
Context-basedReasoning
Context-based ReasoningContext-based Reasoning
Dagstuhl Oct09 31
• Multiple objects in scene• Relax functional requirements• Allow partial evidence
What Did Change?What Did Change?
Dagstuhl Oct09 32
• Basic set of functional primitives• Organization of the representation• Categorization, not identification
What Did Not Change?What Did Not Change?
Dagstuhl Oct09 33
Simulated data- Complete 3D models evaluated
no occlusion surfaces- Partial 3D models derived from laser range finder simulation tool
Real data- Stereo camera system generating
range data (SRI's Small VisionSystem software)
Test DataTest Data
Dagstuhl Oct09 34
Test Scenes Used inTest Scenes Used inContext-based ReasoningContext-based Reasoning
Dagstuhl Oct09 35
Test Scenes Used inTest Scenes Used inContext-based ReasoningContext-based Reasoning
Dagstuhl Oct09 36
Infer contextual relationships fromaccumulated functional evidence
Provides potential seating(back support and/or seating area)
Provides potentialworksurfaces
Context-based Reasoning SystemContext-based Reasoning System
Dagstuhl Oct09 37
What is the goal?What is the goal?
Question – How do we recognize objects we have never previously encountered?
- we don'thave a model (or do we?)
Essentially-We categorize objects using some type of "model"
Dagstuhl Oct09 38
Earlier WorkEarlier Work
Roberts“Machine perception of three dimensional solids” 1965
•Analyze intensity image•Extract edge information•Match against library of geometric models
- “Model-based vision” paradigm- “Single arbitrary view 3-D object recognition” paradigm
Dagstuhl Oct09 39
Earlier WorkEarlier Work
Binford“Survey of model-based image analysis systems” 1982
“The essential definition of object class is functional. …
Object classes have an associated 3-D form: form equals function. …
Dagstuhl Oct09 40
Earlier WorkEarlier Work
Binford“Survey of model-based image analysis systems” 1982
“An object’s function is often a geometric function. The function of a room is to be an enclosing volume. … The function of a chair… is to be a flat surface at a comfortable height for sitting….”
Dagstuhl Oct09 41
Earlier WorkEarlier Work
Winston, Binford, Katz and Lowry“Learning physical descriptions from functional definitions, examples and precedents” 1984
•Discussed used of function-based definitions of object categories •Infinity of individual physical descriptions of objects in a category… •Single functional description to represent all (cup example)
Dagstuhl Oct09 42
Earlier WorkEarlier Work
Brady, Agre, Braunegg and Connell“The mechanics mate” 1985
Connell and Brady“Generating and generalizing models of visual objects” 1987
• Discussed relation between geometric structure and functional significance• Generalized structural description learned from sequence of examples
Dagstuhl Oct09 43
Earlier WorkEarlier Work
Minsky“The Society of Mind”, 1985
“… The solution is that we need to combine at least two different kinds of descriptions.
On one side, we need structural descriptions for recognizing chairs when we see them. ”
Dagstuhl Oct09 44
Earlier WorkEarlier Work
Minsky“The Society of Mind”, 1985
“… On the other side we need functional descriptions in order to know what we can do with them… we need connections between parts of the chair structure and the requirements of the human body that those parts are supposed to serve. “
Dagstuhl Oct09 45
BackgroundBackground
DiManzo, Trucco, Giunchiglia, Ricci“FUR: Understanding Functional Reasoning”, 1989
• Utilized functional knowledge within an expert system framework
•Primitives defined as individual expert systems that evaluate 3D information
Dagstuhl Oct09 46
BackgroundBackground
Rivlin and Rosenfeld“Navigational Functionalities”, 1995
• Explored functionality as it relates to mobile robots• Navigating agent may classify objects
in its environment in functional terms as “threat,” “landmark” and so on.
Dagstuhl Oct09 47
Function-based ApproachesFunction-based Approaches
Cognitive Psychology/Human Perception
Artificial Intelligence Computer Vision
Robotics
Representations of object categoriesHuman-robot interaction strategiesWayfinding
Formal representations of knowledgeMachine learning techniques to automate reasoning
Document/aerial image analysisInterpreting human motionObject recognition/categorization
Mapping of indoor environmentsObject detectionNavigation/interaction plansFormalisms for autonomous robot control
Dagstuhl Oct09 48
Artificial IntelligenceArtificial Intelligence
Two areas within AI that impact function-based research
• Work on formal representations of knowledge about functionality•Application of machine learning techniques
to automate the process of constructing function-based systems
Dagstuhl Oct09 49
Artificial IntelligenceArtificial Intelligence
• AI approach developed greater formalism and depth than that in computer vision• Advantage as complexity of system requirements increases
Dagstuhl Oct09 50
RoboticsRobotics
• Incorporate best practices from other fields• Evolution
• Service robots (controlled environment)• Interaction to confirm function• General navigational systems
Dagstuhl Oct09 51
Human Perception TheoriesHuman Perception Theories
• Klatsky et al. (2005)• observe how children interact with object associated to specific function• use information in design of algorithms
for robotic interaction with objects to reason about their function
Dagstuhl Oct09 52
Functional Knowledge RepresentationFunctional Knowledge Representation
• Barsalou et al. (2005)• HIPE (History, Intentional perspective,
Physical environment, and Event sequences)
• Raubal and Moratz (2007) • expanded on theory• representation of affordance-based attributes
Dagstuhl Oct09 53
Affordances?Affordances?
Goal is object recognition using function According to Webster…
Affordance - <graphics> A visual clue to the function of an object.
Yes, GRUFF uses affordances
Dagstuhl Oct09 54
Some interpretation of Gibson affordance
• Automatic• Pop out – no processing necessary
Have to admit – there were (are) different camps
AffordancesAffordances
Dagstuhl Oct09 55
According to Gibson
“If you know what can be done with… an object, what it can be used for, you can call it whatever you please.”
AffordancesAffordances
Dagstuhl Oct09 56
• Considered an error if an object is misclassified. Yes or no?
www.businesssupply.com
AffordancesAffordances
Dagstuhl Oct09 57
According to Gibson“If a surface of support is knee-high above the ground, it affords sitting on.
We call it a seat in general.
If it can be discriminated as having just these properties, it should look sit-on-able.
If it does, the affordance is perceived visuallyperceived visually.”
Yes, GRUFF uses affordances
AffordancesAffordances
Dagstuhl Oct09 59
Gibson’s Theory of AffordancesGibson’s Theory of Affordances
• Properties noted: Knowledge Primitives• horizontal Relative Orientation• flat Planar• extended Metric Dimensions• rigid Requires Interaction
Physical properties, measured relative to the animal. (Shape Properties)
The Ecological Approach to Visual Perception, J.J. Gibson
Dagstuhl Oct09 60
Open Problems: Across DisciplinesOpen Problems: Across Disciplines
Work to ensure: • scalability• efficiency• accuracy• ability to learn
End GoalsEnd Goals
Infer contextual relationships fromInfer contextual relationships fromaccumulated functional evidence…accumulated functional evidence…
Provides potential seating(back support and/or seating area)
Provides potential worksurfaces
Infer affordancesInfer affordances““in the large”…(in scale-space)in the large”…(in scale-space)
Provides potentialtable area
Provides potential containment
Factors Influencing System ComplexityFactors Influencing System Complexity
Degree of Interaction
Feedback from Interaction
Complexity of Interaction
From Function From Visual Analysis and Physical Interaction. M. Sutton, L. Stark, & K. Bowyer. Image and Vision Computing. 16 (1998) 745-763.
Knowledge RepresentationKnowledge Representation
The internal architecture utilized for reasoning about affordances:
Summary of Unpredicted Summary of Unpredicted Subsystem FailuresSubsystem Failures
Category
Model
Building
Subsystem
Shape-based Reasoning
Subsystem
Interaction-based Reasoning Subsystem
Chairs (13/45) 29% 8/32 (25%) 3/18 (17%)
Cups - 0/27 (0%) 7/27 (26%)
Task/Affordance Driven Data FlowTask/Affordance Driven Data Flow
Captureimage pair
Use taskinformation
Calculatedisparity
and range data
(and evaluate)
Performsegmentation
(and evaluate)
Performfunction-based
reasoning
(and evaluate)
Reset parameters
?
Resetparameters
Data flow from function-based reasoning to refinement of image acquisition and range segmentation parameters.
Implementation Level: Implementation Level: Metrics / Error CalculationsMetrics / Error Calculations
Surface Extraction and Use of AffordancesSurface Extraction and Use of Affordances
Capture image pair -> calculate disparity and range ->
evaluate range data -> perform/evaluate range segmentation -> perform/evaluate object recognition
Question: How can use of affordances be incorporated into feedback loops?
Guiding QuestionsGuiding Questions AND ANSWERS! (from previous Dagstuhl seminar) AND ANSWERS! (from previous Dagstuhl seminar)
How could or should a robot control architecture look like that makes use of affordances as first-class items in perceiving the environment?
How could or should such an architecture make use of affordances for action and reasoning?
Is there more to affordances than function-oriented perception, action and reasoning?
Guiding QuestionsGuiding QuestionsAND ANSWERS! (from previous Dagstuhl seminar)AND ANSWERS! (from previous Dagstuhl seminar)
Should affordances in a robot be programmed or learned? (Can they be programmed in the first place?)
What about an affordance needs to be represented in a robot, and how?
How and where in the architecture would attention, intention, or other internal states filter affordances that were perceived on a low level?
How would affordance-based control go together with behavior-based and plan-based control? Is it complementary? Redundant? Inconsistent?
How can affordances be used for reasoning and action?
Affordances:Affordances:
…within …within subsystems…subsystems…
…supervisors,…supervisors,specialists, specialists, agents…agents…
QUESTIONSQUESTIONS??
In a similar vein, trying to understand perception by studying only neurons
is like trying to understand bird flight by studying only feathers:
It just cannot be done. In order to understand bird flight,
we have to understand aerodynamics; only then do
the structure of feathers and the different
shapes of birds’ wings make sense.
David Marr (1982)