deep learning - umiacsjbg/teaching/data_digging/lecture_10.pdf · digging into data deep learning...

63
Deep Learning Digging into Data April 28, 2014 Based on material from Andrew Ng Digging into Data Deep Learning April 28, 2014 1 / 29

Upload: others

Post on 02-Aug-2020

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Deep Learning - UMIACSjbg/teaching/DATA_DIGGING/lecture_10.pdf · Digging into Data Deep Learning April 28, 2014 12 / 29. Objective Function For every example x,y of our supervised

Deep Learning

Digging into Data

April 28, 2014

Based on material from Andrew Ng

Digging into Data Deep Learning April 28, 2014 1 / 29

Page 2: Deep Learning - UMIACSjbg/teaching/DATA_DIGGING/lecture_10.pdf · Digging into Data Deep Learning April 28, 2014 12 / 29. Objective Function For every example x,y of our supervised

Outline

1 Why Deep Learning

2 Review of Logistic Regression

3 Can’t Somebody else do it? (Feature Engineering)

4 Deep Learning from Data

5 Examples

6 Tricks and Toolkits

7 Toolkits for Deep Learning

Digging into Data Deep Learning April 28, 2014 2 / 29

Page 3: Deep Learning - UMIACSjbg/teaching/DATA_DIGGING/lecture_10.pdf · Digging into Data Deep Learning April 28, 2014 12 / 29. Objective Function For every example x,y of our supervised

Deep Learning was once known as “Neural Networks”

Digging into Data Deep Learning April 28, 2014 3 / 29

Page 4: Deep Learning - UMIACSjbg/teaching/DATA_DIGGING/lecture_10.pdf · Digging into Data Deep Learning April 28, 2014 12 / 29. Objective Function For every example x,y of our supervised

But it came back . . .

Political Ideology Detection Using Recursive Neural Networks

Mohit Iyyer1, Peter Enns2, Jordan Boyd-Graber3,4, Philip Resnik2,4

1Computer Science, 2Linguistics, 3iSchool, and 4UMIACSUniversity of Maryland

{miyyer,peter,jbg}@umiacs.umd.edu, [email protected]

Abstract

An individual’s words often reveal their po-litical ideology. Existing automated tech-niques to identify ideology from text focuson bags of words or wordlists, ignoring syn-tax. Taking inspiration from recent work insentiment analysis that successfully modelsthe compositional aspect of language, weapply a recursive neural network (RNN)framework to the task of identifying the po-litical position evinced by a sentence. Toshow the importance of modeling subsen-tential elements, we crowdsource politicalannotations at a phrase and sentence level.Our model outperforms existing models onour newly annotated dataset and an existingdataset.

1 Introduction

Many of the issues discussed by politicians andthe media are so nuanced that even word choiceentails choosing an ideological position. For ex-ample, what liberals call the “estate tax” conser-vatives call the “death tax”; there are no ideolog-ically neutral alternatives (Lakoff, 2002). Whileobjectivity remains an important principle of jour-nalistic professionalism, scholars and watchdoggroups claim that the media are biased (Grosecloseand Milyo, 2005; Gentzkow and Shapiro, 2010;Niven, 2003), backing up their assertions by pub-lishing examples of obviously biased articles ontheir websites. Whether or not it reflects an under-lying lack of objectivity, quantitative changes in thepopular framing of an issue over time—favoringone ideologically-based position over another—canhave a substantial effect on the evolution of policy(Dardis et al., 2008).

Manually identifying ideological bias in polit-ical text, especially in the age of big data, is animpractical and expensive process. Moreover, bias

They dubbed it

the

death tax“ ” and created a big lie about

its adverse effectson small

businesses

Figure 1: An example of compositionality in ideo-logical bias detection (red � conservative, blue �liberal, gray � neutral) in which modifier phrasesand punctuation cause polarity switches at higherlevels of the parse tree.

may be localized to a small portion of a document,undetectable by coarse-grained methods. In this pa-per, we examine the problem of detecting ideologi-cal bias on the sentence level. We say a sentencecontains ideological bias if its author’s politicalposition (here liberal or conservative, in the senseof U.S. politics) is evident from the text.

Ideological bias is difficult to detect, even forhumans—the task relies not only on politicalknowledge but also on the annotator’s ability topick up on subtle elements of language use. Forexample, the sentence in Figure 1 includes phrasestypically associated with conservatives, such as“small businesses” and “death tax”. When we takemore of the structure into account, however, wefind that scare quotes and a negative propositionalattitude (a lie about X) yield an evident liberal bias.

Existing approaches toward bias detection havenot gone far beyond “bag of words” classifiers, thusignoring richer linguistic context of this kind andoften operating at the level of whole documents.In contrast, recent work in sentiment analysis hasused deep learning to discover compositional ef-fects (Socher et al., 2011b; Socher et al., 2013b).

Building from those insights, we introduce a re-cursive neural network (RNN) to detect ideologicalbias on the sentence level. This model requires

More data

Better tricks(regularization)

Faster computers

Digging into Data Deep Learning April 28, 2014 4 / 29

Page 5: Deep Learning - UMIACSjbg/teaching/DATA_DIGGING/lecture_10.pdf · Digging into Data Deep Learning April 28, 2014 12 / 29. Objective Function For every example x,y of our supervised

And companies are investing . . .

Digging into Data Deep Learning April 28, 2014 5 / 29

Page 6: Deep Learning - UMIACSjbg/teaching/DATA_DIGGING/lecture_10.pdf · Digging into Data Deep Learning April 28, 2014 12 / 29. Objective Function For every example x,y of our supervised

And companies are investing . . .

Digging into Data Deep Learning April 28, 2014 5 / 29

Page 7: Deep Learning - UMIACSjbg/teaching/DATA_DIGGING/lecture_10.pdf · Digging into Data Deep Learning April 28, 2014 12 / 29. Objective Function For every example x,y of our supervised

And companies are investing . . .

Digging into Data Deep Learning April 28, 2014 5 / 29

Page 8: Deep Learning - UMIACSjbg/teaching/DATA_DIGGING/lecture_10.pdf · Digging into Data Deep Learning April 28, 2014 12 / 29. Objective Function For every example x,y of our supervised

Outline

1 Why Deep Learning

2 Review of Logistic Regression

3 Can’t Somebody else do it? (Feature Engineering)

4 Deep Learning from Data

5 Examples

6 Tricks and Toolkits

7 Toolkits for Deep Learning

Digging into Data Deep Learning April 28, 2014 6 / 29

Page 9: Deep Learning - UMIACSjbg/teaching/DATA_DIGGING/lecture_10.pdf · Digging into Data Deep Learning April 28, 2014 12 / 29. Objective Function For every example x,y of our supervised

Map inputs to output

InputVector x1 . . . x

d

Output

f

X

i

W

i

x

i

+b

!Activation

f (z)⌘ 1

1+exp(�z)

Digging into Data Deep Learning April 28, 2014 7 / 29

Page 10: Deep Learning - UMIACSjbg/teaching/DATA_DIGGING/lecture_10.pdf · Digging into Data Deep Learning April 28, 2014 12 / 29. Objective Function For every example x,y of our supervised

Map inputs to output

InputVector x1 . . . x

d

inputs encoded as realnumbers

Output

f

X

i

W

i

x

i

+b

!Activation

f (z)⌘ 1

1+exp(�z)

Digging into Data Deep Learning April 28, 2014 7 / 29

Page 11: Deep Learning - UMIACSjbg/teaching/DATA_DIGGING/lecture_10.pdf · Digging into Data Deep Learning April 28, 2014 12 / 29. Objective Function For every example x,y of our supervised

Map inputs to output

InputVector x1 . . . x

d

Output

f

X

i

W

i

x

i

+b

!

multiply inputs byweights

Activation

f (z)⌘ 1

1+exp(�z)

Digging into Data Deep Learning April 28, 2014 7 / 29

Page 12: Deep Learning - UMIACSjbg/teaching/DATA_DIGGING/lecture_10.pdf · Digging into Data Deep Learning April 28, 2014 12 / 29. Objective Function For every example x,y of our supervised

Map inputs to output

InputVector x1 . . . x

d

Output

f

X

i

W

i

x

i

+b

!

add bias

Activation

f (z)⌘ 1

1+exp(�z)

Digging into Data Deep Learning April 28, 2014 7 / 29

Page 13: Deep Learning - UMIACSjbg/teaching/DATA_DIGGING/lecture_10.pdf · Digging into Data Deep Learning April 28, 2014 12 / 29. Objective Function For every example x,y of our supervised

Map inputs to output

InputVector x1 . . . x

d

Output

f

X

i

W

i

x

i

+b

!

Activation

f (z)⌘ 1

1+exp(�z)

pass through nonlinearsigmoid

Digging into Data Deep Learning April 28, 2014 7 / 29

Page 14: Deep Learning - UMIACSjbg/teaching/DATA_DIGGING/lecture_10.pdf · Digging into Data Deep Learning April 28, 2014 12 / 29. Objective Function For every example x,y of our supervised

What’s a sigmoid?

Digging into Data Deep Learning April 28, 2014 8 / 29

Page 15: Deep Learning - UMIACSjbg/teaching/DATA_DIGGING/lecture_10.pdf · Digging into Data Deep Learning April 28, 2014 12 / 29. Objective Function For every example x,y of our supervised

In the shallow end

This is still logistic regression

Engineering features x is difficult (and requires expertise)

Can we learn how to represent inputs into final decision?

Digging into Data Deep Learning April 28, 2014 9 / 29

Page 16: Deep Learning - UMIACSjbg/teaching/DATA_DIGGING/lecture_10.pdf · Digging into Data Deep Learning April 28, 2014 12 / 29. Objective Function For every example x,y of our supervised

Outline

1 Why Deep Learning

2 Review of Logistic Regression

3 Can’t Somebody else do it? (Feature Engineering)

4 Deep Learning from Data

5 Examples

6 Tricks and Toolkits

7 Toolkits for Deep Learning

Digging into Data Deep Learning April 28, 2014 10 / 29

Page 17: Deep Learning - UMIACSjbg/teaching/DATA_DIGGING/lecture_10.pdf · Digging into Data Deep Learning April 28, 2014 12 / 29. Objective Function For every example x,y of our supervised

Learn the features and the function

a

(2)1 = f

⇣W

(1)11 x1 +W

(1)12 x2 +W

(1)13 x3 +b

(1)1

Digging into Data Deep Learning April 28, 2014 11 / 29

Page 18: Deep Learning - UMIACSjbg/teaching/DATA_DIGGING/lecture_10.pdf · Digging into Data Deep Learning April 28, 2014 12 / 29. Objective Function For every example x,y of our supervised

Learn the features and the function

a

(2)2 = f

⇣W

(1)21 x1 +W

(1)22 x2 +W

(1)23 x3 +b

(1)2

Digging into Data Deep Learning April 28, 2014 11 / 29

Page 19: Deep Learning - UMIACSjbg/teaching/DATA_DIGGING/lecture_10.pdf · Digging into Data Deep Learning April 28, 2014 12 / 29. Objective Function For every example x,y of our supervised

Learn the features and the function

a

(2)3 = f

⇣W

(1)31 x1 +W

(1)32 x2 +W

(1)33 x3 +b

(1)3

Digging into Data Deep Learning April 28, 2014 11 / 29

Page 20: Deep Learning - UMIACSjbg/teaching/DATA_DIGGING/lecture_10.pdf · Digging into Data Deep Learning April 28, 2014 12 / 29. Objective Function For every example x,y of our supervised

Learn the features and the function

h

W ,b(x)= a

(3)1 = f

⇣W

(2)11 a

(2)1 +W

(2)12 a

(2)2 +W

(2)13 a

(2)3 +b

(2)1

Digging into Data Deep Learning April 28, 2014 11 / 29

Page 21: Deep Learning - UMIACSjbg/teaching/DATA_DIGGING/lecture_10.pdf · Digging into Data Deep Learning April 28, 2014 12 / 29. Objective Function For every example x,y of our supervised

Objective Function

For every example x ,y of our supervised training set, we want the label y tomatch the prediction h

W ,b(x).

J(W ,b;x ,y)⌘ 1

2||h

W ,b(x)� y ||2 (1)

We want this value, summed over all of the examples to be as small aspossible

We also want the weights not to be too large

2

n

l

�1X

l

s

lX

i=1

s

l+1X

j=1

⇣W

l

ji

⌘2(2)

Digging into Data Deep Learning April 28, 2014 12 / 29

Page 22: Deep Learning - UMIACSjbg/teaching/DATA_DIGGING/lecture_10.pdf · Digging into Data Deep Learning April 28, 2014 12 / 29. Objective Function For every example x,y of our supervised

Objective Function

For every example x ,y of our supervised training set, we want the label y tomatch the prediction h

W ,b(x).

J(W ,b;x ,y)⌘ 1

2||h

W ,b(x)� y ||2 (1)

We want this value, summed over all of the examples to be as small aspossible

We also want the weights not to be too large

2

n

l

�1X

l

s

lX

i=1

s

l+1X

j=1

⇣W

l

ji

⌘2(2)

Digging into Data Deep Learning April 28, 2014 12 / 29

Page 23: Deep Learning - UMIACSjbg/teaching/DATA_DIGGING/lecture_10.pdf · Digging into Data Deep Learning April 28, 2014 12 / 29. Objective Function For every example x,y of our supervised

Objective Function

For every example x ,y of our supervised training set, we want the label y tomatch the prediction h

W ,b(x).

J(W ,b;x ,y)⌘ 1

2||h

W ,b(x)� y ||2 (1)

We want this value, summed over all of the examples to be as small aspossible

We also want the weights not to be too large

2

n

l

�1X

l

s

lX

i=1

s

l+1X

j=1

⇣W

l

ji

⌘2(2)

Digging into Data Deep Learning April 28, 2014 12 / 29

Page 24: Deep Learning - UMIACSjbg/teaching/DATA_DIGGING/lecture_10.pdf · Digging into Data Deep Learning April 28, 2014 12 / 29. Objective Function For every example x,y of our supervised

Objective Function

For every example x ,y of our supervised training set, we want the label y tomatch the prediction h

W ,b(x).

J(W ,b;x ,y)⌘ 1

2||h

W ,b(x)� y ||2 (1)

We want this value, summed over all of the examples to be as small aspossible

We also want the weights not to be too large

2

n

l

�1X

l

s

lX

i=1

s

l+1X

j=1

⇣W

l

ji

⌘2(2)

Digging into Data Deep Learning April 28, 2014 12 / 29

Page 25: Deep Learning - UMIACSjbg/teaching/DATA_DIGGING/lecture_10.pdf · Digging into Data Deep Learning April 28, 2014 12 / 29. Objective Function For every example x,y of our supervised

Objective Function

For every example x ,y of our supervised training set, we want the label y tomatch the prediction h

W ,b(x).

J(W ,b;x ,y)⌘ 1

2||h

W ,b(x)� y ||2 (1)

We want this value, summed over all of the examples to be as small aspossible

We also want the weights not to be too large

2

n

l

�1X

l

s

lX

i=1

s

l+1X

j=1

⇣W

l

ji

⌘2(2)

Sum over all layers

Digging into Data Deep Learning April 28, 2014 12 / 29

Page 26: Deep Learning - UMIACSjbg/teaching/DATA_DIGGING/lecture_10.pdf · Digging into Data Deep Learning April 28, 2014 12 / 29. Objective Function For every example x,y of our supervised

Objective Function

For every example x ,y of our supervised training set, we want the label y tomatch the prediction h

W ,b(x).

J(W ,b;x ,y)⌘ 1

2||h

W ,b(x)� y ||2 (1)

We want this value, summed over all of the examples to be as small aspossible

We also want the weights not to be too large

2

n

l

�1X

l

s

lX

i=1

s

l+1X

j=1

⇣W

l

ji

⌘2(2)

Sum over all sources

Digging into Data Deep Learning April 28, 2014 12 / 29

Page 27: Deep Learning - UMIACSjbg/teaching/DATA_DIGGING/lecture_10.pdf · Digging into Data Deep Learning April 28, 2014 12 / 29. Objective Function For every example x,y of our supervised

Objective Function

For every example x ,y of our supervised training set, we want the label y tomatch the prediction h

W ,b(x).

J(W ,b;x ,y)⌘ 1

2||h

W ,b(x)� y ||2 (1)

We want this value, summed over all of the examples to be as small aspossible

We also want the weights not to be too large

2

n

l

�1X

l

s

lX

i=1

s

l+1X

j=1

⇣W

l

ji

⌘2(2)

Sum over all destinations

Digging into Data Deep Learning April 28, 2014 12 / 29

Page 28: Deep Learning - UMIACSjbg/teaching/DATA_DIGGING/lecture_10.pdf · Digging into Data Deep Learning April 28, 2014 12 / 29. Objective Function For every example x,y of our supervised

Objective Function

Putting it all together:

J(W ,b)=

24 1

m

mX

i=1

1

2||h

W ,b(x(i))� y

(i)||235+ �

2

n

l

�1X

l

s

lX

i=1

s

l+1X

j=1

⇣W

l

ji

⌘2(3)

Our goal is to minimize J(W ,b) as a function of W and b

Initialize W and b to small random value near zero

Adjust parameters to optimize J

Digging into Data Deep Learning April 28, 2014 13 / 29

Page 29: Deep Learning - UMIACSjbg/teaching/DATA_DIGGING/lecture_10.pdf · Digging into Data Deep Learning April 28, 2014 12 / 29. Objective Function For every example x,y of our supervised

Objective Function

Putting it all together:

J(W ,b)=

24 1

m

mX

i=1

1

2||h

W ,b(x(i))� y

(i)||235+ �

2

n

l

�1X

l

s

lX

i=1

s

l+1X

j=1

⇣W

l

ji

⌘2(3)

Our goal is to minimize J(W ,b) as a function of W and b

Initialize W and b to small random value near zero

Adjust parameters to optimize J

Digging into Data Deep Learning April 28, 2014 13 / 29

Page 30: Deep Learning - UMIACSjbg/teaching/DATA_DIGGING/lecture_10.pdf · Digging into Data Deep Learning April 28, 2014 12 / 29. Objective Function For every example x,y of our supervised

Objective Function

Putting it all together:

J(W ,b)=

24 1

m

mX

i=1

1

2||h

W ,b(x(i))� y

(i)||235+ �

2

n

l

�1X

l

s

lX

i=1

s

l+1X

j=1

⇣W

l

ji

⌘2(3)

Our goal is to minimize J(W ,b) as a function of W and b

Initialize W and b to small random value near zero

Adjust parameters to optimize J

Digging into Data Deep Learning April 28, 2014 13 / 29

Page 31: Deep Learning - UMIACSjbg/teaching/DATA_DIGGING/lecture_10.pdf · Digging into Data Deep Learning April 28, 2014 12 / 29. Objective Function For every example x,y of our supervised

Objective Function

Putting it all together:

J(W ,b)=

24 1

m

mX

i=1

1

2||h

W ,b(x(i))� y

(i)||235+ �

2

n

l

�1X

l

s

lX

i=1

s

l+1X

j=1

⇣W

l

ji

⌘2(3)

Our goal is to minimize J(W ,b) as a function of W and b

Initialize W and b to small random value near zero

Adjust parameters to optimize J

Digging into Data Deep Learning April 28, 2014 13 / 29

Page 32: Deep Learning - UMIACSjbg/teaching/DATA_DIGGING/lecture_10.pdf · Digging into Data Deep Learning April 28, 2014 12 / 29. Objective Function For every example x,y of our supervised

Outline

1 Why Deep Learning

2 Review of Logistic Regression

3 Can’t Somebody else do it? (Feature Engineering)

4 Deep Learning from Data

5 Examples

6 Tricks and Toolkits

7 Toolkits for Deep Learning

Digging into Data Deep Learning April 28, 2014 14 / 29

Page 33: Deep Learning - UMIACSjbg/teaching/DATA_DIGGING/lecture_10.pdf · Digging into Data Deep Learning April 28, 2014 12 / 29. Objective Function For every example x,y of our supervised

Gradient Descent

GoalOptimize J with respect to variables W and b

0

Parameter

Objective

Digging into Data Deep Learning April 28, 2014 15 / 29

Page 34: Deep Learning - UMIACSjbg/teaching/DATA_DIGGING/lecture_10.pdf · Digging into Data Deep Learning April 28, 2014 12 / 29. Objective Function For every example x,y of our supervised

Gradient Descent

GoalOptimize J with respect to variables W and b

0

Digging into Data Deep Learning April 28, 2014 15 / 29

Page 35: Deep Learning - UMIACSjbg/teaching/DATA_DIGGING/lecture_10.pdf · Digging into Data Deep Learning April 28, 2014 12 / 29. Objective Function For every example x,y of our supervised

Gradient Descent

GoalOptimize J with respect to variables W and b

0

Digging into Data Deep Learning April 28, 2014 15 / 29

Page 36: Deep Learning - UMIACSjbg/teaching/DATA_DIGGING/lecture_10.pdf · Digging into Data Deep Learning April 28, 2014 12 / 29. Objective Function For every example x,y of our supervised

Gradient Descent

GoalOptimize J with respect to variables W and b

10

Digging into Data Deep Learning April 28, 2014 15 / 29

Page 37: Deep Learning - UMIACSjbg/teaching/DATA_DIGGING/lecture_10.pdf · Digging into Data Deep Learning April 28, 2014 12 / 29. Objective Function For every example x,y of our supervised

Gradient Descent

GoalOptimize J with respect to variables W and b

10

Digging into Data Deep Learning April 28, 2014 15 / 29

Page 38: Deep Learning - UMIACSjbg/teaching/DATA_DIGGING/lecture_10.pdf · Digging into Data Deep Learning April 28, 2014 12 / 29. Objective Function For every example x,y of our supervised

Gradient Descent

GoalOptimize J with respect to variables W and b

10

2

Digging into Data Deep Learning April 28, 2014 15 / 29

Page 39: Deep Learning - UMIACSjbg/teaching/DATA_DIGGING/lecture_10.pdf · Digging into Data Deep Learning April 28, 2014 12 / 29. Objective Function For every example x,y of our supervised

Gradient Descent

GoalOptimize J with respect to variables W and b

10

2

Digging into Data Deep Learning April 28, 2014 15 / 29

Page 40: Deep Learning - UMIACSjbg/teaching/DATA_DIGGING/lecture_10.pdf · Digging into Data Deep Learning April 28, 2014 12 / 29. Objective Function For every example x,y of our supervised

Gradient Descent

GoalOptimize J with respect to variables W and b

10

23

Digging into Data Deep Learning April 28, 2014 15 / 29

Page 41: Deep Learning - UMIACSjbg/teaching/DATA_DIGGING/lecture_10.pdf · Digging into Data Deep Learning April 28, 2014 12 / 29. Objective Function For every example x,y of our supervised

Gradient Descent

GoalOptimize J with respect to variables W and b

10

23

UndiscoveredCountry

Digging into Data Deep Learning April 28, 2014 15 / 29

Page 42: Deep Learning - UMIACSjbg/teaching/DATA_DIGGING/lecture_10.pdf · Digging into Data Deep Learning April 28, 2014 12 / 29. Objective Function For every example x,y of our supervised

Gradient Descent

GoalOptimize J with respect to variables W and b

0

Wij(l) Wij

(l)

Jα∂

∂W

Digging into Data Deep Learning April 28, 2014 15 / 29

Page 43: Deep Learning - UMIACSjbg/teaching/DATA_DIGGING/lecture_10.pdf · Digging into Data Deep Learning April 28, 2014 12 / 29. Objective Function For every example x,y of our supervised

Backpropagation

For convenience, write the input to sigmoid

z

(l)i

=nX

j=1

W

(l�1)ij

x

j

+b

(l�1)i

(4)

The gradient is a function of a node’s error �(l)i

For output nodes, the error is obvious:

�(n

l

)i

=@

@ z

(nl

)i

||y �h

w ,b(x)||2 =�⇣

y

i

�a

(nl

)i

⌘· f 0⇣

z

(nl

)i

⌘ 1

2(5)

Other nodes must “backpropagate” downstream error based on connectionstrength

�(l)i

=

0B@

s

t+1X

j=1

W

(l+1)ji

�(l+1)j

1CA f

0(z(l)i

) (6)

Digging into Data Deep Learning April 28, 2014 16 / 29

Page 44: Deep Learning - UMIACSjbg/teaching/DATA_DIGGING/lecture_10.pdf · Digging into Data Deep Learning April 28, 2014 12 / 29. Objective Function For every example x,y of our supervised

Backpropagation

For convenience, write the input to sigmoid

z

(l)i

=nX

j=1

W

(l�1)ij

x

j

+b

(l�1)i

(4)

The gradient is a function of a node’s error �(l)i

For output nodes, the error is obvious:

�(n

l

)i

=@

@ z

(nl

)i

||y �h

w ,b(x)||2 =�⇣

y

i

�a

(nl

)i

⌘· f 0⇣

z

(nl

)i

⌘ 1

2(5)

Other nodes must “backpropagate” downstream error based on connectionstrength

�(l)i

=

0B@

s

t+1X

j=1

W

(l+1)ji

�(l+1)j

1CA f

0(z(l)i

) (6)

Digging into Data Deep Learning April 28, 2014 16 / 29

Page 45: Deep Learning - UMIACSjbg/teaching/DATA_DIGGING/lecture_10.pdf · Digging into Data Deep Learning April 28, 2014 12 / 29. Objective Function For every example x,y of our supervised

Backpropagation

For convenience, write the input to sigmoid

z

(l)i

=nX

j=1

W

(l�1)ij

x

j

+b

(l�1)i

(4)

The gradient is a function of a node’s error �(l)i

For output nodes, the error is obvious:

�(n

l

)i

=@

@ z

(nl

)i

||y �h

w ,b(x)||2 =�⇣

y

i

�a

(nl

)i

⌘· f 0⇣

z

(nl

)i

⌘ 1

2(5)

Other nodes must “backpropagate” downstream error based on connectionstrength

�(l)i

=

0B@

s

t+1X

j=1

W

(l+1)ji

�(l+1)j

1CA f

0(z(l)i

) (6)

Digging into Data Deep Learning April 28, 2014 16 / 29

Page 46: Deep Learning - UMIACSjbg/teaching/DATA_DIGGING/lecture_10.pdf · Digging into Data Deep Learning April 28, 2014 12 / 29. Objective Function For every example x,y of our supervised

Backpropagation

For convenience, write the input to sigmoid

z

(l)i

=nX

j=1

W

(l�1)ij

x

j

+b

(l�1)i

(4)

The gradient is a function of a node’s error �(l)i

For output nodes, the error is obvious:

�(n

l

)i

=@

@ z

(nl

)i

||y �h

w ,b(x)||2 =�⇣

y

i

�a

(nl

)i

⌘· f 0⇣

z

(nl

)i

⌘ 1

2(5)

Other nodes must “backpropagate” downstream error based on connectionstrength

�(l)i

=

0B@

s

t+1X

j=1

W

(l+1)ji

�(l+1)j

1CA f

0(z(l)i

) (6)

Digging into Data Deep Learning April 28, 2014 16 / 29

Page 47: Deep Learning - UMIACSjbg/teaching/DATA_DIGGING/lecture_10.pdf · Digging into Data Deep Learning April 28, 2014 12 / 29. Objective Function For every example x,y of our supervised

Backpropagation

For convenience, write the input to sigmoid

z

(l)i

=nX

j=1

W

(l�1)ij

x

j

+b

(l�1)i

(4)

The gradient is a function of a node’s error �(l)i

For output nodes, the error is obvious:

�(n

l

)i

=@

@ z

(nl

)i

||y �h

w ,b(x)||2 =�⇣

y

i

�a

(nl

)i

⌘· f 0⇣

z

(nl

)i

⌘ 1

2(5)

Other nodes must “backpropagate” downstream error based on connectionstrength

�(l)i

=

0B@

s

t+1X

j=1

W

(l+1)ji

�(l+1)j

1CA f

0(z(l)i

) (6)

Digging into Data Deep Learning April 28, 2014 16 / 29

Page 48: Deep Learning - UMIACSjbg/teaching/DATA_DIGGING/lecture_10.pdf · Digging into Data Deep Learning April 28, 2014 12 / 29. Objective Function For every example x,y of our supervised

Backpropagation

For convenience, write the input to sigmoid

z

(l)i

=nX

j=1

W

(l�1)ij

x

j

+b

(l�1)i

(4)

The gradient is a function of a node’s error �(l)i

For output nodes, the error is obvious:

�(n

l

)i

=@

@ z

(nl

)i

||y �h

w ,b(x)||2 =�⇣

y

i

�a

(nl

)i

⌘· f 0⇣

z

(nl

)i

⌘ 1

2(5)

Other nodes must “backpropagate” downstream error based on connectionstrength

�(l)i

=

0B@

s

t+1X

j=1

W

(l+1)ji

�(l+1)j

1CA f

0(z(l)i

) (6)

(chain rule)

Digging into Data Deep Learning April 28, 2014 16 / 29

Page 49: Deep Learning - UMIACSjbg/teaching/DATA_DIGGING/lecture_10.pdf · Digging into Data Deep Learning April 28, 2014 12 / 29. Objective Function For every example x,y of our supervised

Partial Derivatives

For weights, the partial derivatives are

@

@W

(l)ij

J(W ,b;x ,y)= a

(l)j

�(l+1)i

(7)

For the bias terms, the partial derivatives are

@

@ b

(l)i

J(W ,b;x ,y)=�(l+1)i

(8)

But this is just for a single example . . .

Digging into Data Deep Learning April 28, 2014 17 / 29

Page 50: Deep Learning - UMIACSjbg/teaching/DATA_DIGGING/lecture_10.pdf · Digging into Data Deep Learning April 28, 2014 12 / 29. Objective Function For every example x,y of our supervised

Full Gradient Descent Algorithm

1 Initialize U

(l) and V

(l) as zero

2 For each example i = 1 . . .m

1 Use backpropagation to compute rW

J and rb

J

2 Update weight shifts U

(l) =U

(l)+rW

(l)J(W ,b;x ,y)3 Update bias shifts V

(l) = V

(l)+rb

(l)J(W ,b;x ,y)

3 Update the parameters

W

(l) =W

(l)�↵✓

1

m

U

(l)◆�

(9)

b

(l) =b

(l)�↵

1

m

V

(l)�

(10)

4 Repeat until weights stop changing

Digging into Data Deep Learning April 28, 2014 18 / 29

Page 51: Deep Learning - UMIACSjbg/teaching/DATA_DIGGING/lecture_10.pdf · Digging into Data Deep Learning April 28, 2014 12 / 29. Objective Function For every example x,y of our supervised

Outline

1 Why Deep Learning

2 Review of Logistic Regression

3 Can’t Somebody else do it? (Feature Engineering)

4 Deep Learning from Data

5 Examples

6 Tricks and Toolkits

7 Toolkits for Deep Learning

Digging into Data Deep Learning April 28, 2014 19 / 29

Page 52: Deep Learning - UMIACSjbg/teaching/DATA_DIGGING/lecture_10.pdf · Digging into Data Deep Learning April 28, 2014 12 / 29. Objective Function For every example x,y of our supervised

What do you learn?

Digging into Data Deep Learning April 28, 2014 20 / 29

Page 53: Deep Learning - UMIACSjbg/teaching/DATA_DIGGING/lecture_10.pdf · Digging into Data Deep Learning April 28, 2014 12 / 29. Objective Function For every example x,y of our supervised

What do you learn?

Digging into Data Deep Learning April 28, 2014 20 / 29

Page 54: Deep Learning - UMIACSjbg/teaching/DATA_DIGGING/lecture_10.pdf · Digging into Data Deep Learning April 28, 2014 12 / 29. Objective Function For every example x,y of our supervised

What do you learn?

Digging into Data Deep Learning April 28, 2014 20 / 29

Page 55: Deep Learning - UMIACSjbg/teaching/DATA_DIGGING/lecture_10.pdf · Digging into Data Deep Learning April 28, 2014 12 / 29. Objective Function For every example x,y of our supervised

Political Framing

Death Tax

Estate Tax

Pro-Choice

Pro-Life

Entitlements

Obamacare

Digging into Data Deep Learning April 28, 2014 21 / 29

Page 56: Deep Learning - UMIACSjbg/teaching/DATA_DIGGING/lecture_10.pdf · Digging into Data Deep Learning April 28, 2014 12 / 29. Objective Function For every example x,y of our supervised

Political Framing as Deep Learning

wb = changewa = climate

wd = so-called

pc = climate change

pe = so-called climate change

xd= xc=

xe=

xa= xb=

WL WR

WRWLx

p

= f (WL

· xa

+W

R

· xb

+b1),

y

d

= softmax(Wcat

· xp

+b2)

Digging into Data Deep Learning April 28, 2014 22 / 29

Page 57: Deep Learning - UMIACSjbg/teaching/DATA_DIGGING/lecture_10.pdf · Digging into Data Deep Learning April 28, 2014 12 / 29. Objective Function For every example x,y of our supervised

Annotation

Digging into Data Deep Learning April 28, 2014 23 / 29

Page 58: Deep Learning - UMIACSjbg/teaching/DATA_DIGGING/lecture_10.pdf · Digging into Data Deep Learning April 28, 2014 12 / 29. Objective Function For every example x,y of our supervised

Results

be used as an instrument to achieve charitable or social ends

should notthe law

XXX

nationalized health careAn entertainer oncesaid a sucker is bornevery minute , andsurely this is thecase with

those whosupport

made worse bythe implementing

Thus , the harshconditions forfarmers causedby a number offactors ,

, have created a continuing stream of people leaving the countryside and going to live in cities that do not have jobs for them .

of free-marketideology

huge amounts of taxpayer money

saved byBut taxpayers do knowalready that TARP wasdesigned in a way thatallowed

to continue to show the same arrogant traits that should havedestroyed theircompanies .

the same corporationswho were

A B

C D

Digging into Data Deep Learning April 28, 2014 24 / 29

Page 59: Deep Learning - UMIACSjbg/teaching/DATA_DIGGING/lecture_10.pdf · Digging into Data Deep Learning April 28, 2014 12 / 29. Objective Function For every example x,y of our supervised

Results

Model Convote IBCRandom 50% 50%Bag of Words 64.7% 62.1%Phrase Annotations – 61.9%Syntax 66.9% 62.6%word2vec Regression 66.6% 63.7%RNN 69.4% 66.2%RNN w/ word2vec 70.2% 67.1%RNN w/ word2vec and phrases – 69.3%

Digging into Data Deep Learning April 28, 2014 25 / 29

Page 60: Deep Learning - UMIACSjbg/teaching/DATA_DIGGING/lecture_10.pdf · Digging into Data Deep Learning April 28, 2014 12 / 29. Objective Function For every example x,y of our supervised

Outline

1 Why Deep Learning

2 Review of Logistic Regression

3 Can’t Somebody else do it? (Feature Engineering)

4 Deep Learning from Data

5 Examples

6 Tricks and Toolkits

7 Toolkits for Deep Learning

Digging into Data Deep Learning April 28, 2014 26 / 29

Page 61: Deep Learning - UMIACSjbg/teaching/DATA_DIGGING/lecture_10.pdf · Digging into Data Deep Learning April 28, 2014 12 / 29. Objective Function For every example x,y of our supervised

Tricks

Stochastic gradient: compute gradient from a few examples

Hardware: Do matrix computations on gpus

Dropout: Randomly set some inputs to zero

Initialization: Using an autoencoder can help representation

Digging into Data Deep Learning April 28, 2014 27 / 29

Page 62: Deep Learning - UMIACSjbg/teaching/DATA_DIGGING/lecture_10.pdf · Digging into Data Deep Learning April 28, 2014 12 / 29. Objective Function For every example x,y of our supervised

Outline

1 Why Deep Learning

2 Review of Logistic Regression

3 Can’t Somebody else do it? (Feature Engineering)

4 Deep Learning from Data

5 Examples

6 Tricks and Toolkits

7 Toolkits for Deep Learning

Digging into Data Deep Learning April 28, 2014 28 / 29

Page 63: Deep Learning - UMIACSjbg/teaching/DATA_DIGGING/lecture_10.pdf · Digging into Data Deep Learning April 28, 2014 12 / 29. Objective Function For every example x,y of our supervised

Theano: Python package (Yoshua Bengio)

Torch7: Lua package (Yann LeCunn)

ConvNetJS: Javascript package (Andrej Karpathy)

Both automatically compute gradients and have numerical optimization

Working group this summer at umd

Digging into Data Deep Learning April 28, 2014 29 / 29