neuro vision. what is so great about it? deceptively simple anatomical appearance. incredibly...
TRANSCRIPT
NEURO VISIONNEURO VISION
What is so Great about it?What is so Great about it?
Deceptively simple anatomical appearance. Incredibly complicated structure. Developed through millennia of evolution. Hard-wired to prefer certain objects right at
birth. Invariant to position, scale and rotation of
the object. “Tuned” to quickly recognize objects.
Deceptively simple anatomical appearance. Incredibly complicated structure. Developed through millennia of evolution. Hard-wired to prefer certain objects right at
birth. Invariant to position, scale and rotation of
the object. “Tuned” to quickly recognize objects.
IS HUMANVISION
PERFECT ?
IS HUMANVISION
PERFECT ?
PENROSE’S TRIANGLE
Tricks of the brainTricks of the brain
I’d love to see someone try to get to the top
I’d love to see someone try to get to the top
PENROSE’S STAIRS
Why is it so?Why is it so?
Our vision system neural network has been tuned to perform recognition, processing and classification of phenomena that was vital to our survival and progress. In this context, every species has a different vision system.
Hence, we are not very good in dealing with artificially generated images, as these phenomena rarely occurred in nature during our evolution.
However, we are the best for natural images!
Our vision system neural network has been tuned to perform recognition, processing and classification of phenomena that was vital to our survival and progress. In this context, every species has a different vision system.
Hence, we are not very good in dealing with artificially generated images, as these phenomena rarely occurred in nature during our evolution.
However, we are the best for natural images!
How human visual perception works.How human visual perception works.
How human visual perception worksHow human visual perception works
Perceptions of static scenes are inadequate to describe motion
Gibson’s theory of affordances Vision evolved in organisms embedded in a
dynamically changing environment What is important to an organism is a collection of
processes, not a single unique one These processes are at different levels of
abstraction. E.g. We see waves on a shore, and also the innumerable molecules in it moving
Perceptions of static scenes are inadequate to describe motion
Gibson’s theory of affordances Vision evolved in organisms embedded in a
dynamically changing environment What is important to an organism is a collection of
processes, not a single unique one These processes are at different levels of
abstraction. E.g. We see waves on a shore, and also the innumerable molecules in it moving
How human visual perception works(contd.)
How human visual perception works(contd.)
“Seeing involves multi-level process simulations in partial registration using different ontologies, with rich (but changing) structural relations between levels”
Use of structures of various sorts Agglomeration/grouping: Structures of different sizes at
same level of abstraction Interpretation: Structures at different levels of abstraction-
mapping to a new ontology Fragments recognized in parallel, assembled into larger
wholes-may trigger higher level fragments, or redirect processing at lower levels to resolve ambiguities, etc.
“Seeing involves multi-level process simulations in partial registration using different ontologies, with rich (but changing) structural relations between levels”
Use of structures of various sorts Agglomeration/grouping: Structures of different sizes at
same level of abstraction Interpretation: Structures at different levels of abstraction-
mapping to a new ontology Fragments recognized in parallel, assembled into larger
wholes-may trigger higher level fragments, or redirect processing at lower levels to resolve ambiguities, etc.
Functions of visionFunctions of vision
Segment the image (or scene) and recognize the objects distinguished
Compute distance to contact in every direction Provide feedback and triggers for action Provide a low-level summary of the 2-D and/or 3-
D features of the image, leaving it to the central non-visual processes to draw conclusions
Is something left out?
Segment the image (or scene) and recognize the objects distinguished
Compute distance to contact in every direction Provide feedback and triggers for action Provide a low-level summary of the 2-D and/or 3-
D features of the image, leaving it to the central non-visual processes to draw conclusions
Is something left out?
Visual/spatial reasoningVisual/spatial reasoning
Our ability to use diagrams and visual images to reason about very abstract mathematical problems, like thinking about the complexity of a search strategy
“Seeing” that 7+5=12 by a rearrangement of dots “Seeing” that angles of a triangle add up to a
straight line Visualize infinitely thin and long lines of
Euclidean geometry Many more examples
Our ability to use diagrams and visual images to reason about very abstract mathematical problems, like thinking about the complexity of a search strategy
“Seeing” that 7+5=12 by a rearrangement of dots “Seeing” that angles of a triangle add up to a
straight line Visualize infinitely thin and long lines of
Euclidean geometry Many more examples
Visual/spatial reasoning(contd.)Visual/spatial reasoning(contd.)
Uses of spatial reasoning: Knowing where to search for an object thrown over a wall, assembling toy crane from a toy set, uses of spatial concepts(notion of search space) in programming design
Reasoning using a grasp of spatial structures requires at least: the ability to see various structures involved in the proof, the possibilities for variatins(rearrangements) in them, the invariant structures during the rearrangements, etc.
In contrast, a reasoning system like logic is completely discrete and all syntactic composition involves function appllication
Specification of the requirements for visual reasoning is very vague, and would not be easy to mechanize
Uses of spatial reasoning: Knowing where to search for an object thrown over a wall, assembling toy crane from a toy set, uses of spatial concepts(notion of search space) in programming design
Reasoning using a grasp of spatial structures requires at least: the ability to see various structures involved in the proof, the possibilities for variatins(rearrangements) in them, the invariant structures during the rearrangements, etc.
In contrast, a reasoning system like logic is completely discrete and all syntactic composition involves function appllication
Specification of the requirements for visual reasoning is very vague, and would not be easy to mechanize
Visual perception involves much more..
Visual perception involves much more..
Visual perception involves “affordances” Affordances are the possibilities for, and
constraints on action and change in a situation. Seeing the possibility of things that do not exist, but might exist
Example: A person perceiving a chair can immediately see the possibility of sitting on it, that is, the chair "affords" sitting
Visual perception involves “affordances” Affordances are the possibilities for, and
constraints on action and change in a situation. Seeing the possibility of things that do not exist, but might exist
Example: A person perceiving a chair can immediately see the possibility of sitting on it, that is, the chair "affords" sitting
Visual perception involves much more..(contd.)
Visual perception involves much more..(contd.)
POPEYE(1970’s): The Popeye project investigated how it is possible for humans to see structure in very cluttered scenes, where structure exists at different levels of abstraction-it showed that we recognize fll words before individual alphabets
Consider looking at a smiling or a sad face. Does it involve only perceiving the structure of the pattern? We are able to perceive mental states of happiness or sadness
POPEYE(1970’s): The Popeye project investigated how it is possible for humans to see structure in very cluttered scenes, where structure exists at different levels of abstraction-it showed that we recognize fll words before individual alphabets
Consider looking at a smiling or a sad face. Does it involve only perceiving the structure of the pattern? We are able to perceive mental states of happiness or sadness
Visual perception involves much more..(contd.)
Visual perception involves much more..(contd.)
What may appear to be only one task, might consist of many different tasks in different contexts, e.g. estimating the length of a plank to fit across a ditch
When a number of images are speedily flashed before the eyes in order, the speed with which people can see at least roughly, what sort of scene is depicted by each image, implies that our visual mechanisms are capable of finding low level features, using them to cue in features of the images at various levels of size and abstraction, arriving at percepts involving known types of objects, within 1 or 2 seconds
High level precisions are made in less than 1/2 a second
What may appear to be only one task, might consist of many different tasks in different contexts, e.g. estimating the length of a plank to fit across a ditch
When a number of images are speedily flashed before the eyes in order, the speed with which people can see at least roughly, what sort of scene is depicted by each image, implies that our visual mechanisms are capable of finding low level features, using them to cue in features of the images at various levels of size and abstraction, arriving at percepts involving known types of objects, within 1 or 2 seconds
High level precisions are made in less than 1/2 a second
Artificial Vision systems-what they aim at
Artificial Vision systems-what they aim at
recognize objects or people in static images without acquiring or reasoning about the information using 3-D structure
track moving objects represented in simple shapes(points or blobs) often using 2-D representations
explore an environment building a 2-D map of walls, doors, etc. without a possible human understanding of the maps
control a moving robot, regarded as a moving object obtain some 3-D information about the environment, only
to generate new images
recognize objects or people in static images without acquiring or reasoning about the information using 3-D structure
track moving objects represented in simple shapes(points or blobs) often using 2-D representations
explore an environment building a 2-D map of walls, doors, etc. without a possible human understanding of the maps
control a moving robot, regarded as a moving object obtain some 3-D information about the environment, only
to generate new images
What vision systems cannot doWhat vision systems cannot do
After Freddy, the Edinburgh robot built in 1973, there was a need to move from 2-D to 3-D. Failure due to limitations of computational power, and difficulty of choosing a representation
Consider a cup on a table. Humans can "see" the orientation required for the grasping object at different grasping locations-visual systems cannot. Alignment of grasped surfaces with grasping ones is important
Affordances in the object being grasped, if it has sharp corners, some part of it is more fragile than others-requires a grasp of counterfactual conditionals involving processes that do not actually exist
After Freddy, the Edinburgh robot built in 1973, there was a need to move from 2-D to 3-D. Failure due to limitations of computational power, and difficulty of choosing a representation
Consider a cup on a table. Humans can "see" the orientation required for the grasping object at different grasping locations-visual systems cannot. Alignment of grasped surfaces with grasping ones is important
Affordances in the object being grasped, if it has sharp corners, some part of it is more fragile than others-requires a grasp of counterfactual conditionals involving processes that do not actually exist
Why the limitation?Why the limitation?
To develop a human-like visual system that will do what a small child/many animals do-need for an adequate analysis of the requirements for such a system
The requirements might seem much simpler than they actually are, if they are not studied in sufficient depth
The failure to achieve set goals is not a fault of the choice of domains, or the representation-it is a problem of overoptimistic predictions
To develop a human-like visual system that will do what a small child/many animals do-need for an adequate analysis of the requirements for such a system
The requirements might seem much simpler than they actually are, if they are not studied in sufficient depth
The failure to achieve set goals is not a fault of the choice of domains, or the representation-it is a problem of overoptimistic predictions
Where is the complexity?Where is the complexity?
Different levels of perception needed. High level of precision to lift a hair with a pair of tweezers, much lower precision to see something is not graspable
Perception can involve multi-strand relationships requiring much richer forms of representation that just a logical form
Multiple levels of abstraction, affordances, causation-all is needed
Many more subtleties..
Different levels of perception needed. High level of precision to lift a hair with a pair of tweezers, much lower precision to see something is not graspable
Perception can involve multi-strand relationships requiring much richer forms of representation that just a logical form
Multiple levels of abstraction, affordances, causation-all is needed
Many more subtleties..
Visual PathwayVisual Pathway
Hierarchical Neural Network Architecture
Hierarchical Neural Network Architecture
ContentsContents
Brain Mechanism of Vision Hubel‘s and Wiesel's hierarchy model
Brain Mechanism of Vision Hubel‘s and Wiesel's hierarchy model
Cerebral CortexCerebral Cortex
Evolution of cerebral cortex is one of the great success in the history of living beings.
Insights of cortical organization: Division into different regions having different
functionalities.
e. g. , Visual, auditory, somatic sensory, speech and motor regions
Evolution of cerebral cortex is one of the great success in the history of living beings.
Insights of cortical organization: Division into different regions having different
functionalities.
e. g. , Visual, auditory, somatic sensory, speech and motor regions
Visual PathwayVisual Pathway
Retina to the Visual Cortex Retina to the Visual Cortex
Hubel’s and Wiesel’s ModelHubel’s and Wiesel’s Model
Hierarchical model of cortical cells .
The cortical cells are divide into various types Type IV Simple cells Complex cells
Hierarchical model of cortical cells .
The cortical cells are divide into various types Type IV Simple cells Complex cells
Hubel’s and Wiesel’s ModelHubel’s and Wiesel’s Model
Type IV Cells have circular symmetry. The receptive field of the cell is divided into
on Center.
(Excitatory Center and Inhibitory Surrounding) off Center.
(Inhibitory Center and Excitatory Surrounding)
Type IV Cells have circular symmetry. The receptive field of the cell is divided into
on Center.
(Excitatory Center and Inhibitory Surrounding) off Center.
(Inhibitory Center and Excitatory Surrounding)
Hubel’s and Wiesel’s ModelHubel’s and Wiesel’s Model
Simple Cells
Respond to an optimally oriented line in a narrowly defined location.
Achieved by requiring the centers of layer Iv cells that lie along the line.
Simple Cells
Respond to an optimally oriented line in a narrowly defined location.
Achieved by requiring the centers of layer Iv cells that lie along the line.
Hubel’s and Wiesel’s ModelHubel’s and Wiesel’s Model Complex cells
the main feature of complex cell o They are less particular about the location,
Concerned mainly on orientation.oAquired is from a number of simple cellso Detects motion(direction specific).
Complex cells
the main feature of complex cell o They are less particular about the location,
Concerned mainly on orientation.oAquired is from a number of simple cellso Detects motion(direction specific).
Hubel’s and Wiesel’s ModelHubel’s and Wiesel’s Model
Biological Visual Systems as Guides
Biological Visual Systems as Guides
Modelling attempts to imitate primate vision systems
Modelling attempts to imitate primate vision systems
Extended Hubel-WeiselExtended Hubel-Weisel
Hubel-Weisel hierarchical models have been extended to obtain a fine balance between selectivity and invariance.
Simple and complex cells are interleaved at different levels of the inferotemporal (IT) lobes.
Max-like pooling mechanisms have been suggested at certain levels as opposed to a weighted sum of afferents to boost invariancy in scale, position and rotation.
Hubel-Weisel hierarchical models have been extended to obtain a fine balance between selectivity and invariance.
Simple and complex cells are interleaved at different levels of the inferotemporal (IT) lobes.
Max-like pooling mechanisms have been suggested at certain levels as opposed to a weighted sum of afferents to boost invariancy in scale, position and rotation.
Feedforward ArchitectureFeedforward Architecture The S cells (simple cells) in the previous figure passed on
information to the C cells (complex cells) by a bell-tuned weighted sum or a max-like operation.
These cells were further arranged in a higher feature-level hierarchy. Some cells bypass a level in propagating information. This model only considers the feedforward architecture model for
the primary visual cortex, V4 and the posterior IT lobe, and a top-level supervised learning mode (coloured regions). [Serre et al. 2007]
The S cells (simple cells) in the previous figure passed on information to the C cells (complex cells) by a bell-tuned weighted sum or a max-like operation.
These cells were further arranged in a higher feature-level hierarchy. Some cells bypass a level in propagating information. This model only considers the feedforward architecture model for
the primary visual cortex, V4 and the posterior IT lobe, and a top-level supervised learning mode (coloured regions). [Serre et al. 2007]
Feedforward ArchitectureFeedforward Architecture Primates have a very advanced level of attention modulation
(fixation) which is a feedback propagation from the IT lobes to the primary visual cortex and lower levels.
This mechanism allows to shift attention from one part of the image to another.
However, crude object recognition is done in a very small duration after stimulus which indicates use of only the feedforward architecture for rapid categorization.
Such a model was attempted at the McGovern Institute for Brain Research at MIT with some simplifications.
The input consisted of 4 different orientations and several scales, densely covering the gray-value input image of 7ºx 7º
Primates have a very advanced level of attention modulation (fixation) which is a feedback propagation from the IT lobes to the primary visual cortex and lower levels.
This mechanism allows to shift attention from one part of the image to another.
However, crude object recognition is done in a very small duration after stimulus which indicates use of only the feedforward architecture for rapid categorization.
Such a model was attempted at the McGovern Institute for Brain Research at MIT with some simplifications.
The input consisted of 4 different orientations and several scales, densely covering the gray-value input image of 7ºx 7º
ResultsResults The model was evaluated against human responses for input
stimulus of 20ms followed by varying inter-stimulus interval.
No single model parameter was adjusted to fit the human data. All unsupervised parts were fixed and constant throughout all the runs.
The supervised mode was tuned differently in different runs using different test images. Humans were also shown these test images.
An evaluation across all such runs for the identification of animal objects was done for both humans and animals. The results were compared.
The model was evaluated against human responses for input stimulus of 20ms followed by varying inter-stimulus interval.
No single model parameter was adjusted to fit the human data. All unsupervised parts were fixed and constant throughout all the runs.
The supervised mode was tuned differently in different runs using different test images. Humans were also shown these test images.
An evaluation across all such runs for the identification of animal objects was done for both humans and animals. The results were compared.
ResultsResults
Various categories of images in different clutter, scale, position, rotation were given.
Maximum similarity was found for ISI until 80ms.
Various categories of images in different clutter, scale, position, rotation were given.
Maximum similarity was found for ISI until 80ms.
ConclusionsConclusions
Biologically inspired computation models have shown very promising results. They are versatile and fast learners. Why not learn from nature’s best?
Advances in neuroscience are picking up, allowing us greater understanding. Also, simulations of hypothetical models will help us validate neuroscience findings.
Biologically inspired computation models have shown very promising results. They are versatile and fast learners. Why not learn from nature’s best?
Advances in neuroscience are picking up, allowing us greater understanding. Also, simulations of hypothetical models will help us validate neuroscience findings.
ReferencesReferences
Talks by Aaron Sloman, Univ of Birmingham, UK 2005 - 2007http://www.cs.bham.ac.uk/~axs/invited-talks.html
http://www.lifesci.sussex.ac.uk/home/George_Mather/Linked%20Pages/Physiol/Cortex.html
Last accessed - 13 April 2008 Brain Mechanism of Vision,
David H. Hubel and Torsten N. WieselScientific American, September 1979
How We See What See - V. Demidov, Mir Publishers, 1986 A feedforward architecture accounts for rapid categorization
Serre et al., PNAS, 2007 Hierarchical Models of Object Recognition in Cortex
Poggio et al., Nature America, 1999 http://www.thebrain.mcgill.ca
Last accessed - 13 April 2008
Talks by Aaron Sloman, Univ of Birmingham, UK 2005 - 2007http://www.cs.bham.ac.uk/~axs/invited-talks.html
http://www.lifesci.sussex.ac.uk/home/George_Mather/Linked%20Pages/Physiol/Cortex.html
Last accessed - 13 April 2008 Brain Mechanism of Vision,
David H. Hubel and Torsten N. WieselScientific American, September 1979
How We See What See - V. Demidov, Mir Publishers, 1986 A feedforward architecture accounts for rapid categorization
Serre et al., PNAS, 2007 Hierarchical Models of Object Recognition in Cortex
Poggio et al., Nature America, 1999 http://www.thebrain.mcgill.ca
Last accessed - 13 April 2008