a cortical model for learning complex temporal structure ... · apr 01 apr 08 apr 15 apr 22 apr 29...

1
Experimental support for active dendrites: Poirazi et al., 2003; Palmer et al., 2014; Major, Larkum and Schiller 2013 Sequence learning is ubiquitous in cortex What is neural mechanism for sequence learning? HTM sequence memory: 1. Neurons learn to recognize hundreds of patterns using active dendrites. 2. Recognition of patterns act as predictions by depolarizing the cell without generating an immediate action potential. 3. A network of neurons with active dendrites forms a powerful sequence memory. Active Dendrites Pattern detectors Each cell can recognize 100’s of unique patterns Feedforward 100’s of synapses “Classic” receptive field Context 1,000’s of synapses Depolarize neuron “Predicted state” Pyramidal neuron HTM neuron Proximal dendrite can recognize a feedforward input pattern and activate cell 4. Sparse representations lead to highly robust recognition. HTM neuron model: Each distal dendrite segment can recognize a particular pattern and cause dendritic spike 5. Agrees well with experimental evidence. A cortical model for learning complex temporal structure in sensory streams Subutai Ahmad, Yuwei Cui, and Jeff Hawkins [sahmad,ycui,jhawkins]@numenta.com HTM network model for sequence learning Activation rules Select the top 2% of columns with strongest inputs on proximal dendrite as active columns If any cell in an active column is predicted, only the predicted cells fire If no cell in an active column is predicted, all cells in the column fire : If a depolarized cell becomes active subsequently, its active dendritic segment will be reinforced If a depolarized cell does not become active, we apply a small decay to active segments of that cell If no cell in an active column is predicted, the cell with the most activated segment gets reinforced Learning and activation rules (Hawkins and Ahmad, 2016) Unsupervised Hebbian-like learning rules Detected pattern on distal dendrite causes cell to be depolarized (predicted) X A B B C C D Y Before learning X B’’ C’’ D’ Y’’ After learning A B’ C’ Same columns, but only one cell active per column after learning. Time Active cells Depolarized (predictive) cells Inactive cells Learns complex high-order sequences e.g. ABCD vs XBCY Visual Stimulus Screen Trial 1 Trial 20 Time (sec) 5 10 15 20 25 30 0 50 100 150 200 250 300 350 Time (sec) 5 10 15 20 25 30 Neuron # 0 50 100 150 200 250 300 350 Time (sec) 5 10 15 20 25 30 Neuron # 0 40 80 120 160 Time (sec) 5 10 15 20 25 30 0 40 80 120 160 V1 AL Trial Number 0 5 10 15 20 Activation Density x10 -3 3 4 5 6 7 8 9 10 11 12 V1 AL Cell assembly order 3 4 5 6 Number of unique cell assemblies x10 4 1 3 5 7 Cell assembly order 3 4 5 6 0 1000 2000 3000 Data Shuffled Poisson AL V1 AL V1 3-order assembly single cell -1 -0.5 0 0.5 1 0 0.1 0.2 0.3 -1 -0.5 0 0.5 1 0 0.02 0.04 0.06 0.08 Time jitter (sec) Prob. of repeated cell assembly Time jitter (sec) Prob. of repeated cell assembly Example population responses Naturalistic Movie HTM sequence memory predicts structured sparser activity for learned sequence Data: Sparser activity over repeated presentations 0 10 20 30 Trial 0 10 20 Responses of three cells 0 10 20 30 Shuffled Spikes 0 10 20 30 Trial 0 10 20 Cell Assembly 0 10 20 30 Cell Assembly with shuffled spikes Time (s) Time (s) Subset index 0 0.01 0.02 0.03 0.04 AL Data Shuffled Poisson Subset index 0 0.02 0.04 0.06 0.08 V1 Data Shuffled Poisson Wide FOV calcium imaging Active cell # on both first trial and last trial Active cell # on last trial Data: Presence of high-order cell assemblies Cell assemblies are locked to stimulus Testable predictions and experimental validation HTM sequence memory predicts presence of high-order cell assemblies Data: sparse activity is a subset of activity before learning (Stirman et al., 2016) (In collaboration with Spencer Smith and Yiyi Yu, UNC, Chapel Hill) References Cui Y, Ahmad S, Hawkins J. Neural Computation. (2016) 28: 2474-2504 Hawkins J, Ahmad S, Front. Neural Circuits. 10 (2016) 1–13. Major, G., Larkum, M. E., and Schiller, J. (2013). Annu. Rev. Neurosci. 36, 1–24. Miller JK, Ayzenshtat I, Carrillo-Reid L, Yuste R (2014). PNAS 111(38):E4053-61 Piuri, V., J Parallel Distr Com., vol. 61, pp. 18-48. Poirazi P, Brannon T, Mel BW (2003) Neuron 37:989–999. Stirman, J. N., Smith, I. T., Kudenov, M. W. & Smith, S. L. (2016) Nat. Biotechnol. 34, 857–862. Mean absolute percent error Shift ARIMA ESN LSTM1000 LSTM3000 LSTM6000 HTM 0.00 0.10 0.20 0.30 2015-04-20 Monday 2015-04-21 Tuesday 2015-04-22 Wednesday 2015-04-23 Thursday 2015-04-24 Friday 2015-04-25 Saturday 2015-04-26 Sunday 0k 5k 10k 15k 20k 25k Passenger Count Data sources: http://www.nyc.gov/html/tlc/html/about/trip_record_data.shtml Task: predict taxi passenger count in NYC HTM has comparable performance to state-of-the-art algorithms HTM exhibits high fault tolerance to neuron death 0.0 0.1 0.2 0.3 0.4 0.5 0.6 Fraction of cell death 0.2 0.4 0.6 0.8 1.0 Accuracy after cell death HTM LSTM HTM is fault tolerant due to properties of sparse distributed representations (Hawkins & Ahmad 2016). In contrast, LSTM and most other artificial neural networks are sensitive to loss of neurons or synapses (Piuri 2001). Apr 01 Apr 08 Apr 15 Apr 22 Apr 29 May 06 LSTM6000 HTM 0.10 0.15 0.20 0.25 0.30 0.35 Mean absolute percent error 20% increase in weekday night traffic 20% decrease in weekday morning traffic HTM adapts quickly to changes Injected change to the data HTM works well on real-world problems HTM adapts quickly to changes in statistics due to its continuous unsupervised Hebbian learning rule. (Cui et al., 2016)

Upload: others

Post on 04-Nov-2019

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A cortical model for learning complex temporal structure ... · Apr 01 Apr 08 Apr 15 Apr 22 Apr 29 May 06 LSTM6000 HTM 0.10 0.15 0.20 0.25 0.30 0.35 Mean ab s olu t e per c en t erro

Experimental support for active dendrites: Poirazi et al., 2003; Palmer et al., 2014; Major, Larkum and Schiller 2013

Sequence learning is ubiquitous in cortex

What is neural mechanism for sequence learning?

HTM sequence memory:1. Neurons learn to recognize hundreds of patterns using

active dendrites.

2. Recognition of patterns act as predictions by depolarizing the cell without generating an immediate action potential.

3. A network of neurons with active dendrites forms a powerful sequence memory.

Active Dendrites Pattern detectors Each cell can recognize 100’s of unique patterns

Feedforward 100’s of synapses “Classic” receptive field

Context 1,000’s of synapses Depolarize neuron “Predicted state”

Pyramidal neuron HTM neuron

Proximal dendrite can recognize a feedforward input pattern and activate cell

4. Sparse representations lead to highly robust recognition.

HTM neuron model:

Each distal dendrite segment can recognize a particular pattern and cause dendritic spike

5. Agrees well with experimental evidence.

A cortical model for learning complex temporal structure in sensory streams Subutai Ahmad, Yuwei Cui, and Jeff Hawkins[sahmad,ycui,jhawkins]@numenta.com

HTM network model for sequence learning

Activation rules Select the top 2% of columns with strongest inputs on proximal dendrite as active columns

If any cell in an active column is predicted, only the predicted cells fire

If no cell in an active column is predicted, all cells in the column fire

: If a depolarized cell becomes active subsequently, its active dendritic segment will be reinforced If a depolarized cell does not become active, we apply a small decay to active segments of that cell If no cell in an active column is predicted, the cell with the most activated segment gets reinforced

Learning and activation rules

(Hawkins and Ahmad, 2016)

Unsupervised Hebbian-like learning rules

Detected pattern on distal dendrite causes cell to be depolarized (predicted)

X

A B

B

C

C

D

Y

Before learning

X B’’ C’’

D’

Y’’

After learning

A B’ C’

Same columns, but only one cell active per column after learning.

Time

Active cells

Depolarized (predictive) cells

Inactive cellsLearns complex high-order sequencese.g. ABCD vs XBCY

Visual StimulusScreen

Trial 1 Trial 20

Time (sec)5 10 15 20 25 300

50

100

150

200

250

300

350

Time (sec)5 10 15 20 25 30

Neu

ron

#

0

50

100

150

200

250

300

350

Time (sec)5 10 15 20 25 30

Neu

ron

#

0

40

80

120

160

Time (sec)5 10 15 20 25 300

40

80

120

160

V1

AL

Trial Number0 5 10 15 20

Activ

atio

n D

ensi

ty

x10-3

3456789

101112

V1AL

Cell assembly order3 4 5 6N

umbe

r of u

niqu

e ce

ll as

sem

blie

s x104

1

3

5

7

Cell assembly order3 4 5 6

0

1000

2000

3000 DataShuffledPoisson

ALV1

ALV13-order assemblysingle cell

-1 -0.5 0 0.5 10

0.1

0.2

0.3

-1 -0.5 0 0.5 10

0.020.040.060.08

Time jitter (sec)

Pro

b. o

f rep

eate

dce

ll as

sem

bly

Time jitter (sec)

Pro

b. o

f rep

eate

dce

ll as

sem

bly

Example population responses

Naturalistic Movie

HTM sequence memory predicts structured sparser activity for learned sequence

Data: Sparser activity over repeated presentations

0 10 20 30

Tria

l

0

10

20

Responses of three cells

0 10 20 30

Shuffled Spikes

0 10 20 30

Tria

l

0

10

20Cell Assembly

0 10 20 30

Cell Assembly with shuffled spikes

Time (s) Time (s)

Subs

et in

dex

0

0.01

0.02

0.03

0.04

AL

Data

Shuffle

d

Poisso

n

Subs

et in

dex

0

0.02

0.04

0.06

0.08 V1

Data

Shuffle

d

Poisso

n

Wide FOV calcium imaging

Active cell # on both first trial and last trial

Active cell # on last trial

Data: Presence of high-order cell assemblies

Cell assemblies are locked to stimulus

Testable predictions and experimental validation

HTM sequence memory predicts presence of high-order cell assemblies

Data: sparse activity is a subset of activity before learning

(Stirman et al., 2016)

(In collaboration with Spencer Smith and Yiyi Yu, UNC, Chapel Hill)

ReferencesCui Y, Ahmad S, Hawkins J. Neural Computation. (2016) 28: 2474-2504Hawkins J, Ahmad S, Front. Neural Circuits. 10 (2016) 1–13.Major, G., Larkum, M. E., and Schiller, J. (2013). Annu. Rev. Neurosci. 36, 1–24. Miller JK, Ayzenshtat I, Carrillo-Reid L, Yuste R (2014). PNAS 111(38):E4053-61Piuri, V., J Parallel Distr Com., vol. 61, pp. 18-48.Poirazi P, Brannon T, Mel BW (2003) Neuron 37:989–999.Stirman, J. N., Smith, I. T., Kudenov, M. W. & Smith, S. L. (2016) Nat. Biotechnol. 34, 857–862.

Mea

n ab

solu

te p

erce

nt e

rror

Shift

ARIMAESN

LSTM10

00

LSTM30

00

LSTM60

00HTM

0.00

0.10

0.20

0.30

2015-04-20 Monday

2015-04-21Tuesday

2015-04-22Wednesday

2015-04-23 Thursday

2015-04-24 Friday

2015-04-25 Saturday

2015-04-26 Sunday

0k

5k

10k

15k

20k

25k

Pas

seng

er C

ount

Data sources: http://www.nyc.gov/html/tlc/html/about/trip_record_data.shtml

Task: predict taxi passenger count in NYCHTM has comparable performance to state-of-the-art algorithms

HTM exhibits high fault tolerance to neuron death

0.0 0.1 0.2 0.3 0.4 0.5 0.6Fraction of cell death

0.2

0.4

0.6

0.8

1.0

Acc

urac

y af

ter c

ell d

eath

HTMLSTM HTM is fault tolerant due to properties of

sparse distributed representations (Hawkins & Ahmad 2016).

In contrast, LSTM and most other artificial neural networks are sensitive to loss of neurons or synapses (Piuri 2001).

Apr 01 Apr 08 Apr 15 Apr 22 Apr 29 May 06

LSTM6000

HTM

0.10

0.15

0.20

0.25

0.30

0.35

Mea

n ab

solu

te p

erce

nt e

rror

20% increase in weekday night traffic20% decrease in weekday morning traffic

HTM adapts quickly to changes

Injected change to the data

HTM works well on real-world problems

HTM adapts quickly to changes in statistics due to its continuous unsupervised Hebbian learning rule.

(Cui et al., 2016)