introduction

1
Active vision system for embodied intelligence based on ret ina sampling model and hierarchical representation Janusz A. Starzyk, Xinming Yu Ohio University, Athens, OH INTRODUCTION Retina structure is fundamental to human vision system, which is much more efficient than any of the current robotic vision systems. ● Photoreceptors (CONE, ROD) are concentrated around fovea, for the highest resolution on target ● Retina processes the sampled scene using ganglion cells, and sends activation through optical nerve and LGN to the primary visual cortex (V1) ● The neurons in V1 fire in groups responding to different visual features from the retina e retina sampling model uses prespecified mpling density. g. 2 shows the distribution density curves the cones inside retina. ● Correlation based sparse connections are used to mimic the neuron connections in V1 ● Neurons which are locally correlated connect to the same group of neurons in the higher layer ● Winners and theirs neighbors fire and have Fig. 1: Retina structure Fig. 2: Cone densities in human retina [1] Retina sampling: Model of data Collection The retina, unlike a camera, does not simply send a picture to the brain. The retina spatially encodes (compresses) the image to fit the limited capacity of the optic nerve. ● The photoreceptors are not evenly distributed inside retina. Most of them are concentrated on or around fovea ●1D probability distribution curve is shown in Fig. 3. ● Cortex receives distorted images, which are sharper in the fovea area. Fovea is the reference point of gaze shifting, and focuses on the most interesting part of the scene. Fig. 4 shows the sampling points for the retina model, with higher density in the center than on the periphery Fig. 3: PDF of the photoreceptors (cones and rods) [2] Fig. 4: Sampling points for retina model ● Using the retina sampling, the vision system receives more useful information. ● Table 1 shows comparison of retina sampling model (human vision) and uniform sampling (computer vision). The resolution (density of the sampling points) in the center part of the retina sampling is much higher than that of uniform sampling. 100% 100% Whole range 50% 78% Part inside green 25% 63% Part inside red 14% 52% Part inside black Uniform Sampling (Computer Vision) Retina Sampling (Human Vision) Percentage of sampling points Table 1: Comparison (Human V.S. Computer vision) Part inside blue 31% 4% When this artificial retina sampling is applied to a visual scene, the vision system will receive much more data from object in focus, and still have a peripheral vision Fig. 5: An example of retina sampling Original resolution: 900x900, resolution after sampling 60x60 ● In primary visual cortex (V1), neurons are activated by the stimuli from similar groups of inputs. ● The connections built based on the correlation of the input reflect observed relations in the real world. Fig. 6 shows the correlations based on real images. Correlation-based Connection Fig. 6: Correlation of the input data Linsker obtained useful features in visual field with a fixed connectivity model and noise input for self-organizing training. [3] The disadvantage of his model is that the fixed connectivity model May not deliver connections to remote but correlated areas of the visual field. Fig. 7 shows the existence of the remote but correlated area May not result in useful features on higher levels Local connectivity region is set arbitrarily Fig.7: Correlation based connections with remote but correlated area [1] Curcio, C.A., Sloan, K.R. Jr, Packer, O., Hendrickson, A.E. & Kal ina, R.E. (1987). Distribution of cones in human and monkey retina: individual variability and radia l asymmetry. Science 236, pp. 579-582. [2] Riedel G., Physiology of Human Cells, Available: http://www.aberdeen.ac.uk/sms/ugradteaching/course.php?ID=10 [3] Linsker R., “From Basic Network Principles to Neural Architecture : Emergence of Spatial-Opponent Cells”, Proc. National Academy of Sciences, Vol. 83. pp. 7508-7512, 1986. [4] Oja E., “Simplified neuron model as a principal component analyze r”. Journal of Mathematical Biology 15 (3): pp. 267-273, 1982. An active vision system for embodied intelligence based on retina sampling model and hierarchical representation is developed. The retina sampling model mimics efficiency of human vision system. A hierarchical representation is built up with sparse connections, which are locally generated from the neurons’ activity correlation. Using the goal creation system learning scheme, the active vision system can learn complex knowledge. Goals evolve from the simple ones through interaction with environment. with self-organizing structure and dynamic goals. BIBLIOGRAPHY CONCLUSIONS ● Local winners are used to adjust the connection weights ◘ the local winners are activated (e.g. the green one in layer N) ◘ The weights of connections to neighbors of local winner are adjusted ◘ The local winners help the neighbors to fire together (horizontal red arrows are excitator y) ◘ All groups of winner sets in layer N (local winners and their neigh bors) used to activate layer N+1 ◘ Use Oja’s learning rule [4] to adjust the weights of connections to the winners or Procedure of the weight adjustment: Activate layer N-1 find the strongest winners in layer N excite the neighbors as co-winners adjust weights for all activated activate layer N+1 In active vision system, we apply a connection mechanism based on correlation between input neurons’ activation and the activation of local winners. Correlation between input neurons’ activation ◘ Use real image data instead of noise ◘ Use images organized in time sequences to obtain feedback connections for invariance building ◘ Process the input data from layer N-1 and calculate the correlations ◘ For each neuron, find out the best correlated set of neurons, and create connections to those neurons Fig. 8: The excitation of local winner and its neighbors An active servo system shown in Fig. 9 is being built with real-time video input, to demonstrate the active vision system for embodied intelligence. Both the retina sampling model and the correlation based connections are used to work with the servo system. The webcam is used to capture the visual data, The raw data is uniformly distributed (320x240 pixels), it will be processed first by retina model, compressed to 40x30 with little data loss in the center. With the compressed data and the correlation based sparse connection, the active vision system processes the real-time input, finds the interesting object and generates the object coordinates. The servo system receives the real-time coordinates and follows the object with laser pointer. Fig. 9: Servo system Fig. 10: Servo system is working with active vision system to follow the object in view Building up memories from environment Fig. 11: The pathways through which the system is built up from interactions with the unspecified environment In learning, it is not easy to obtain examples of desired behavior that are both correct and representative of all the situations in which the agent has to act. Reinforcement learning (RL) is a good choice for learning in unspecified external environment. As shown in Fig. 12, the agent (A) receives data, which includes input (I) and reward (R) from the environment (E), and takes proper action (M) back to the environment. With the aid of the reward, the agent learns how to take correct action to have the maximum reward. Goal Creation system provides a mechanism that organizes learning of intentional representations and associations between sensory and motor pathways. When an agent realizes that a specific action resulted in a d effect related to the current goal, it stores a representation of the p object involved in such action and learns associations between the sens and motor pathways. Hierarchical representations learning is based on external reinforcement for primitive goals and internal goal creation system for abstract goals and internal rewards Fig. 12: The reinforcement learning model A E I R M S IN PUT OUTPUT Sim ulation or Real-W orld System Task Environm ent EIArchitecture G oal C reation Act Perceive Competing goals Planning Pain IN PUT OUTPUT Sim ulation or Real-W orld System Task Environm ent EIArchitecture G oal C reation Act Perceive Competing goals Planning Pain

Upload: rashad-adams

Post on 03-Jan-2016

23 views

Category:

Documents


0 download

DESCRIPTION

A. INTRODUCTION. ● Using the retina sampling, the vision system receives more useful information. ● Table 1 shows comparison of retina sampling model (human vision) and uniform sampling (computer vision). - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: INTRODUCTION

Active vision system for embodied intelligence based on retina sampling model and hierarchical representation

Janusz A. Starzyk, Xinming YuOhio University, Athens, OH

INTRODUCTION

Retina structure is fundamental to human vision system, which is much more efficient than any of the current robotic vision systems.● Photoreceptors (CONE, ROD) are concentrated around fovea, for the highest resolution on target ● Retina processes the sampled scene using ganglion cells, and sends activation through optical nerve and LGN to the primary visual cortex (V1)● The neurons in V1 fire in groups responding to different visual features from the retina

The retina sampling model uses prespecifiedsampling density. Fig. 2 shows the distribution density curves of the cones inside retina.

● Correlation based sparse connections are used to mimic the neuron connections in V1● Neurons which are locally correlated connect to the same group of neurons in the higher layer● Winners and theirs neighbors fire and have weight adjusted together, for smooth processing and increased robustness.

Fig. 1: Retina structure

Fig. 2: Cone densitiesin human retina [1]

Retina sampling: Model of data Collection

The retina, unlike a camera, does not simply send a picture to the brain. The retina spatially encodes (compresses) the image to fit the limited capacity of the optic nerve.

● The photoreceptors are not evenly distributed inside retina. Most of them are concentrated on or around fovea●1D probability distribution curve is shown in Fig. 3.

● Cortex receives distorted images, which are sharper in the fovea area. ● Fovea is the reference point of gaze shifting, and focuses on the most interesting part of the scene.

Fig. 4 shows the sampling points for the retina model, with higher density in the center than on the periphery

Fig. 3: PDF of the photoreceptors (cones and rods) [2]

Fig. 4: Sampling points for retina model

● Using the retina sampling, the vision system receives more useful information.● Table 1 shows comparison of retina sampling model (human vision) and uniform sampling (computer vision). ● The resolution (density of the sampling points) in the center part of the retina sampling is much higher than that of uniform sampling.

100%100%Whole range

50%78%Part inside green

25%63%Part inside red

14%52%Part inside black

Uniform Sampling(Computer Vision)

Retina Sampling(Human Vision)

Percentage of sampling points

Table 1: Comparison (Human V.S. Computer vision)

Part inside blue 31% 4%

When this artificial retina sampling is applied to a visual scene, the vision system will receive much more data from object in focus, and still have a peripheral vision

Fig. 5: An example of retina samplingOriginal resolution: 900x900, resolution after sampling 60x60

● In primary visual cortex (V1), neurons are activated by the stimuli from similar groups of inputs. ● The connections built based on the correlation of the input reflect observedrelations in the real world. Fig. 6 shows the correlations based on real images.

Correlation-based Connection

Fig. 6: Correlation of the input data

● Linsker obtained useful features in visual field with a fixed connectivity model and noise input for self-organizing training. [3] The disadvantage of his model is that the fixed connectivity model ◘ May not deliver connections to remote but correlated areas of the visual field. Fig. 7 shows the existence of the remote but correlated area ◘ May not result in useful features on higher levels ◘ Local connectivity region is set arbitrarily

Fig.7: Correlation based connections with remote but correlated area

[1] Curcio, C.A., Sloan, K.R. Jr, Packer, O., Hendrickson, A.E. & Kalina, R.E. (1987). Distribution of cones in human and monkey retina: individual variability and radial asymmetry. Science 236, pp. 579-582. [2] Riedel G., Physiology of Human Cells, Available: http://www.aberdeen.ac.uk/sms/ugradteaching/course.php?ID=10[3] Linsker R., “From Basic Network Principles to Neural Architecture: Emergence of Spatial-Opponent Cells”, Proc. National Academy of Sciences, Vol. 83. pp. 7508-7512, 1986. [4] Oja E., “Simplified neuron model as a principal component analyzer”. Journal of Mathematical Biology 15 (3): pp. 267-273, 1982.

An active vision system for embodied intelligence based on retina sampling model and hierarchical representation is developed. The retina sampling model mimics efficiency of human vision system.A hierarchical representation is built up with sparse connections, which are locally generated from the neurons’ activity correlation.Using the goal creation system learning scheme, the active vision system can learn complex knowledge.Goals evolve from the simple ones through interaction with environment. Such organization of the learning process is conductive to creation of a general intelligence, with self-organizing structure and dynamic goals.

BIBLIOGRAPHY

CONCLUSIONS

● Local winners are used to

adjust the connection weights

◘ the local winners are

activated (e.g. the green

one in layer N)

◘ The weights of connections

to neighbors of local winner

are adjusted

◘ The local winners help the

neighbors to fire together (horizontal red arrows are excitatory)

◘ All groups of winner sets in layer N (local winners and their neighbors)

used to activate layer N+1

◘ Use Oja’s learning rule [4] to adjust the weights of connections to the winners

or

Procedure of the weight adjustment:

Activate layer N-1 find the strongest winners in layer N excite the neighbors

as co-winners adjust weights for all activated activate layer N+1

In active vision system, we apply a connection mechanism based on correlation between input neurons’ activation and the activation of local winners.

● Correlation between input neurons’ activation ◘ Use real image data instead of noise ◘ Use images organized in time sequences to obtain feedback connections for invariance building ◘ Process the input data from layer N-1 and calculate the correlations ◘ For each neuron, find out the best correlated set of neurons, and create connections to those neurons

Fig. 8: The excitation of local winner and its neighbors

An active servo system shown in Fig. 9 is being built with real-time video input, to demonstrate the active vision system for embodied intelligence.Both the retina sampling model and the correlation based connections are used to work with the servo system.◘ The webcam is used to capture the visual data,◘ The raw data is uniformly distributed (320x240 pixels), it will be processed first by retina model, compressed to 40x30 with little data loss in the center.◘ With the compressed data and the correlation based sparse connection, the active vision system processes the real-time input, finds the interesting object and generates the object coordinates.◘ The servo system receives the real-time coordinates and follows the object with laser pointer.

Fig. 9: Servo system

Fig. 10: Servo system is working with activevision system to follow the object in view

Building up memories from environment

Fig. 11: The pathways through which the system is built up from interactions with the unspecified environment

In learning, it is not easy to obtain examples of desired behavior that are both correct and representative of all the situations in which the agent has to act.

Reinforcement learning (RL) is a good choice for learning in unspecified external environment.

As shown in Fig. 12, the agent (A) receives data, which includes input (I) and reward (R) from the environment (E), and takes proper action (M) back to the environment. With the aid of the reward, the agent learns how to take correct action to have the maximum reward.

Goal Creation system provides a mechanism that organizes learning of intentional representations and associations between sensory and motor pathways. When an agent realizes that a specific action resulted in a desirable effect related to the current goal, it stores a representation of the perceived object involved in such action and learns associations between the sensory and motor pathways.

Hierarchical representations learning is based on external reinforcement for primitive goals and internal goal creation system for abstract goals and internal rewards

Fig. 12: The reinforcement learning model

A

E

I

RM

S

INPUT OUTPUT

Simulation or

Real-World System

TaskEnvironment

EI Architecture

Goal Creation

ActPerceive Competing goals

Planning

Pain

INPUT OUTPUT

Simulation or

Real-World System

TaskEnvironment

EI Architecture

Goal Creation

ActPerceive Competing goals

Planning

Pain