cs 182 sections 101 - 102 eva mok ([email protected]) feb 11, 2004 ( bad puns alert!
Post on 21-Dec-2015
217 views
TRANSCRIPT
CS 182Sections 101 - 102Eva Mok ([email protected])
Feb 11, 2004
(http://www2.hi.net/s4/strangebreed.htm)
bad puns alert!
Announcements
• a3 part 1 is due tonight (submit as a3-1)
• The second tester file is up, so pls. start part 2.
• The quiz is graded (get it after class).
Where we stand
• Last Week
– Backprop
• This Week
– Recruitment learning
– color
• Coming up
– Imagining techniques (e.g. fMRI)
The Big (and complicated) Picture
Cognition and Language
Computation
Structured Connectionism
Computational Neurobiology
Biology
MidtermQuiz Finals
Neural Development
Triangle Nodes
Neural Net & Learning
Spatial Relation
Motor Control Metaphor
SHRUTI
Grammar
abst
ract
ion
Regier Model
Bailey Model
Narayanan Model
Chang Model
Visual System
Psycholinguistics Experiments
Quiz
1. What is a localist representation? What is a distributed representation? Why are they both bad?
2. What is coarse-fine encoding? Where is it used in our brain?
3. What can Back-Propagation do that Hebb’s Rule can’t?
4. Derive the Back-Propagation Algorithm
5. What (intuitively) does the learning rate do? How about the momentum term?
Distributed vs Localist Rep’n
John 1 1 0 0
Paul 0 1 1 0
George 0 0 1 1
Ringo 1 0 0 1
John 1 0 0 0
Paul 0 1 0 0
George 0 0 1 0
Ringo 0 0 0 1
What are the drawbacks of each representation?
Distributed vs Localist Rep’n
• What happens if you want to represent a group?
• How many persons can you represent with n bits? 2^n
• What happens if one neuron dies?
• How many persons can you represent with n bits? n
John 1 1 0 0
Paul 0 1 1 0
George 0 0 1 1
Ringo 1 0 0 1
John 1 0 0 0
Paul 0 1 0 0
George 0 0 1 0
Ringo 0 0 0 1
Visual System
• 1000 x 1000 visual map
• For each location, encode:
–orientation
–direction of motion
–speed
–size
–color
–depth
• Blows up combinatorically!
…
…
Coarse Coding
info you can encode with one fine resolution unit = info you can encode with a few coarse resolution units
Now as long as we need fewer coarse units total, we’re good
Coarse-Fine Coding
but we can run into ghost “images”
Feature 2e.g. Direction of Motion
Feature 1e.g. Orientation
Y
X
G
G
Y-Orientation
X-Orientation
Y-Dir X-Dir
Coarse in F2, Fine in F1
Coarse in F1, Fine in F2
Back-Propagation Algorithm
We define the error term for a single node to be ti - yi
xi f
yj wij
yi
xi = ∑j wij yj
yi = f(xi)
ti:target
ixiie
xfy
1
1)(Sigmoid:
Gradient Descent
i2
i1global mimimum: this is your goal
it should be 4-D (3 weights) but you get the idea
k j i
wjk wij
E = Error = ½ ∑i (ti – yi)2
yi
ti: targetij
ijij W
EWW
ijij W
EW
jiiiij
i
i
i
iij
yxfytW
x
x
y
y
E
W
E
)('
The derivative of the sigmoid is just ii yy 1
jiiiiij yyyytW 1
ijij yW iiiii yyyt 1
The output layerlearning rate
k j i
wjk wij
E = Error = ½ ∑i (ti – yi)2
yi
ti: target
The hidden layer
jkjk W
EW
jk
j
j
j
jjk W
x
x
y
y
E
W
E
iijiii
i j
i
i
i
ij
Wxfyty
x
x
y
y
E
y
E)(')(
kji
ijiiijk
yxfWxfytW
E
)(')(')(
kjji
ijiiiijk yyyWyyytW
11)(
jkjk yW jji
ijiiiij yyWyyyt
11)(
jji
iijj yyW
1
Let’s just do an example
E = Error = ½ ∑i (ti – yi)2x0 f
i1 w01
y0i2
b=1
w02
w0b
E = ½ (t0 – y0)2
i1 i2 y0
0 0 0
0 1 1
1 0 1
1 1 10.8
0.6
0.5
0
00.6224
0.51/(1+e^-0.5)
E = ½ (0 – 0.6224)2 = 0.1937
ijij yW iiiii yyyt 1
01 i 0
0
00000 1 yyyt
6224.016224.06224.000
1463.00
1463.0
0101 yW
0202 yW
00 bb yW
02 i
0 b
learning rate
suppose = 0.50731.01463.05.00 bW
0.4268