cs 182 sections 101 - 102 eva mok ([email protected]) feb 11, 2004 ( bad puns alert!

15
CS 182 Sections 101 - 102 Eva Mok ([email protected]) Feb 11, 2004 (http://www2.hi.net/s4/ strangebreed.htm) bad puns alert!

Post on 21-Dec-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CS 182 Sections 101 - 102 Eva Mok (emok@icsi.berkeley.edu) Feb 11, 2004 ( bad puns alert!

CS 182Sections 101 - 102Eva Mok ([email protected])

Feb 11, 2004

(http://www2.hi.net/s4/strangebreed.htm)

bad puns alert!

Page 2: CS 182 Sections 101 - 102 Eva Mok (emok@icsi.berkeley.edu) Feb 11, 2004 ( bad puns alert!

Announcements

• a3 part 1 is due tonight (submit as a3-1)

• The second tester file is up, so pls. start part 2.

• The quiz is graded (get it after class).

Page 3: CS 182 Sections 101 - 102 Eva Mok (emok@icsi.berkeley.edu) Feb 11, 2004 ( bad puns alert!

Where we stand

• Last Week

– Backprop

• This Week

– Recruitment learning

– color

• Coming up

– Imagining techniques (e.g. fMRI)

Page 4: CS 182 Sections 101 - 102 Eva Mok (emok@icsi.berkeley.edu) Feb 11, 2004 ( bad puns alert!

The Big (and complicated) Picture

Cognition and Language

Computation

Structured Connectionism

Computational Neurobiology

Biology

MidtermQuiz Finals

Neural Development

Triangle Nodes

Neural Net & Learning

Spatial Relation

Motor Control Metaphor

SHRUTI

Grammar

abst

ract

ion

Regier Model

Bailey Model

Narayanan Model

Chang Model

Visual System

Psycholinguistics Experiments

Page 5: CS 182 Sections 101 - 102 Eva Mok (emok@icsi.berkeley.edu) Feb 11, 2004 ( bad puns alert!

Quiz

1. What is a localist representation? What is a distributed representation? Why are they both bad?

2. What is coarse-fine encoding? Where is it used in our brain?

3. What can Back-Propagation do that Hebb’s Rule can’t?

4. Derive the Back-Propagation Algorithm

5. What (intuitively) does the learning rate do? How about the momentum term?

Page 6: CS 182 Sections 101 - 102 Eva Mok (emok@icsi.berkeley.edu) Feb 11, 2004 ( bad puns alert!

Distributed vs Localist Rep’n

John 1 1 0 0

Paul 0 1 1 0

George 0 0 1 1

Ringo 1 0 0 1

John 1 0 0 0

Paul 0 1 0 0

George 0 0 1 0

Ringo 0 0 0 1

What are the drawbacks of each representation?

Page 7: CS 182 Sections 101 - 102 Eva Mok (emok@icsi.berkeley.edu) Feb 11, 2004 ( bad puns alert!

Distributed vs Localist Rep’n

• What happens if you want to represent a group?

• How many persons can you represent with n bits? 2^n

• What happens if one neuron dies?

• How many persons can you represent with n bits? n

John 1 1 0 0

Paul 0 1 1 0

George 0 0 1 1

Ringo 1 0 0 1

John 1 0 0 0

Paul 0 1 0 0

George 0 0 1 0

Ringo 0 0 0 1

Page 8: CS 182 Sections 101 - 102 Eva Mok (emok@icsi.berkeley.edu) Feb 11, 2004 ( bad puns alert!

Visual System

• 1000 x 1000 visual map

• For each location, encode:

–orientation

–direction of motion

–speed

–size

–color

–depth

• Blows up combinatorically!

Page 9: CS 182 Sections 101 - 102 Eva Mok (emok@icsi.berkeley.edu) Feb 11, 2004 ( bad puns alert!

Coarse Coding

info you can encode with one fine resolution unit = info you can encode with a few coarse resolution units

Now as long as we need fewer coarse units total, we’re good

Page 10: CS 182 Sections 101 - 102 Eva Mok (emok@icsi.berkeley.edu) Feb 11, 2004 ( bad puns alert!

Coarse-Fine Coding

but we can run into ghost “images”

Feature 2e.g. Direction of Motion

Feature 1e.g. Orientation

Y

X

G

G

Y-Orientation

X-Orientation

Y-Dir X-Dir

Coarse in F2, Fine in F1

Coarse in F1, Fine in F2

Page 11: CS 182 Sections 101 - 102 Eva Mok (emok@icsi.berkeley.edu) Feb 11, 2004 ( bad puns alert!

Back-Propagation Algorithm

We define the error term for a single node to be ti - yi

xi f

yj wij

yi

xi = ∑j wij yj

yi = f(xi)

ti:target

ixiie

xfy

1

1)(Sigmoid:

Page 12: CS 182 Sections 101 - 102 Eva Mok (emok@icsi.berkeley.edu) Feb 11, 2004 ( bad puns alert!

Gradient Descent

i2

i1global mimimum: this is your goal

it should be 4-D (3 weights) but you get the idea

Page 13: CS 182 Sections 101 - 102 Eva Mok (emok@icsi.berkeley.edu) Feb 11, 2004 ( bad puns alert!

k j i

wjk wij

E = Error = ½ ∑i (ti – yi)2

yi

ti: targetij

ijij W

EWW

ijij W

EW

jiiiij

i

i

i

iij

yxfytW

x

x

y

y

E

W

E

)('

The derivative of the sigmoid is just ii yy 1

jiiiiij yyyytW 1

ijij yW iiiii yyyt 1

The output layerlearning rate

Page 14: CS 182 Sections 101 - 102 Eva Mok (emok@icsi.berkeley.edu) Feb 11, 2004 ( bad puns alert!

k j i

wjk wij

E = Error = ½ ∑i (ti – yi)2

yi

ti: target

The hidden layer

jkjk W

EW

jk

j

j

j

jjk W

x

x

y

y

E

W

E

iijiii

i j

i

i

i

ij

Wxfyty

x

x

y

y

E

y

E)(')(

kji

ijiiijk

yxfWxfytW

E

)(')(')(

kjji

ijiiiijk yyyWyyytW

11)(

jkjk yW jji

ijiiiij yyWyyyt

11)(

jji

iijj yyW

1

Page 15: CS 182 Sections 101 - 102 Eva Mok (emok@icsi.berkeley.edu) Feb 11, 2004 ( bad puns alert!

Let’s just do an example

E = Error = ½ ∑i (ti – yi)2x0 f

i1 w01

y0i2

b=1

w02

w0b

E = ½ (t0 – y0)2

i1 i2 y0

0 0 0

0 1 1

1 0 1

1 1 10.8

0.6

0.5

0

00.6224

0.51/(1+e^-0.5)

E = ½ (0 – 0.6224)2 = 0.1937

ijij yW iiiii yyyt 1

01 i 0

0

00000 1 yyyt

6224.016224.06224.000

1463.00

1463.0

0101 yW

0202 yW

00 bb yW

02 i

0 b

learning rate

suppose = 0.50731.01463.05.00 bW

0.4268