cs department - lecture 6 - linear classificationmtappen/cap5415/lecs/lec6.pdf · 2010-09-09 ·...

65
Lecture 6 - Linear Classification Fall 2010

Upload: others

Post on 30-Mar-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CS Department - Lecture 6 - Linear Classificationmtappen/cap5415/lecs/lec6.pdf · 2010-09-09 · Basic Vocabulary I We’ll call each thing that we are classifying a point I Each

Lecture 6 - Linear Classification

Fall 2010

Page 2: CS Department - Lecture 6 - Linear Classificationmtappen/cap5415/lecs/lec6.pdf · 2010-09-09 · Basic Vocabulary I We’ll call each thing that we are classifying a point I Each

Classification

I Today, we are going to talk about classifying, or separatingimage patches.

I Classification is the Swiss Army knife of computer vision

I Many important vision problems can be reduced to “Is this aor not”?

Page 3: CS Department - Lecture 6 - Linear Classificationmtappen/cap5415/lecs/lec6.pdf · 2010-09-09 · Basic Vocabulary I We’ll call each thing that we are classifying a point I Each

Basic Setup

I We want to separate the X’s and the O’s

I Today, we will see how to solve this (seemingly) simple taskmathematically

-5 0 5-6

-4

-2

0

2

4

6

Page 4: CS Department - Lecture 6 - Linear Classificationmtappen/cap5415/lecs/lec6.pdf · 2010-09-09 · Basic Vocabulary I We’ll call each thing that we are classifying a point I Each

Basic VocabularyI We’ll call each thing that we are classifying a pointI Each point is described by a set of numbers that we call

featuresI It is your job to decide on the features. You need to pick

features that help you do your job.I For instance, if you are looking for edges, you’ll probably want

to look at the response of different derivative filters.I In the graph below, every point is described by two features.I The point at index i in the dataset will be described as xi

-5 0 5-6

-4

-2

0

2

4

6

Page 5: CS Department - Lecture 6 - Linear Classificationmtappen/cap5415/lecs/lec6.pdf · 2010-09-09 · Basic Vocabulary I We’ll call each thing that we are classifying a point I Each

Separating Points Linearly

I In building mathematical models for classifying, we are goingto focus on dividing these points with a straight line.

I This is called linear classification

-5 0 5-6

-4

-2

0

2

4

6

Page 6: CS Department - Lecture 6 - Linear Classificationmtappen/cap5415/lecs/lec6.pdf · 2010-09-09 · Basic Vocabulary I We’ll call each thing that we are classifying a point I Each

Expressing Linear Separation Mathematically

I Given a single point x = (x, y), we can express theclassification of the point as

sign(ax+ by + c)

where a, b, and c are constants that define a line. We’ll haveto choose these somehow.

I This function will return a +1 if ax+ by + c is positive and−1 otherwise.

Page 7: CS Department - Lecture 6 - Linear Classificationmtappen/cap5415/lecs/lec6.pdf · 2010-09-09 · Basic Vocabulary I We’ll call each thing that we are classifying a point I Each

How this Translates Graphically

I Effectively, we are projecting every point onto a line.

I Every point projects to some point on the line. The sign ofthe location along the lines determines the classification of thepoint

-6 -4 -2 0 2 4 6

-4

-2

0

2

4

-0.9

3.5

-2.6

1.5

-2.3

2.1

Page 8: CS Department - Lecture 6 - Linear Classificationmtappen/cap5415/lecs/lec6.pdf · 2010-09-09 · Basic Vocabulary I We’ll call each thing that we are classifying a point I Each

How can we find the separating line?

I This separating line can be found by looking at the line

ax+ by + c = 0

-5 0 5-6

-4

-2

0

2

4

6

Page 9: CS Department - Lecture 6 - Linear Classificationmtappen/cap5415/lecs/lec6.pdf · 2010-09-09 · Basic Vocabulary I We’ll call each thing that we are classifying a point I Each

How do we get the parameters of this line?

I We’ll find the parameters using a machine learning approach

I We’ll form a training set of examples, along with the correctclassification.

I We’ll find the line that best separates the training examples.

I Now, we have to form a set of mathematical steps for findingthe line that “best separates” the training set.

Page 10: CS Department - Lecture 6 - Linear Classificationmtappen/cap5415/lecs/lec6.pdf · 2010-09-09 · Basic Vocabulary I We’ll call each thing that we are classifying a point I Each

Optimizing the parameters of the lineI Notice that the points that are the farthest from the red

separating line have the largest response.I In some sense, we can be more confident in a point’s

classification as the point gets farther from the separating line.I So, the bigger the magnitude of a response, the more

confident in the classification we can be.

-6 -4 -2 0 2 4 6

-4

-2

0

2

4

-0.9

3.5

-2.6

1.5

-2.3

2.1

Page 11: CS Department - Lecture 6 - Linear Classificationmtappen/cap5415/lecs/lec6.pdf · 2010-09-09 · Basic Vocabulary I We’ll call each thing that we are classifying a point I Each

Looking at the confidence in classification

I We can translate this response into a probability.

I We will use the logistic function to transform the responseinto a probability of the correct classification

I We will assume that each point will be labeled with a label l.l can take values +1 or −1.

P [l = +1|x] = 1

1 + exp(−(ax+ by + c))

Page 12: CS Department - Lecture 6 - Linear Classificationmtappen/cap5415/lecs/lec6.pdf · 2010-09-09 · Basic Vocabulary I We’ll call each thing that we are classifying a point I Each

Finding the line parameters

I Now, for any set of line constants, we can find out theprobability assigned to the correct label of each item in thetraining set.

N∏i=1

P [li|xi] =

N∏i=1

1

1 + exp(−li(axi + byi + c))

I We’ve inserted li into the exponent because, if li = −1

P [l = −1|x] = 1

1 + exp((ax+ by + c))

I We multiply the probabilities because we believe the pointsare drawn independently.

I Note that for each item, this number tells us how much themodel defined by that particular line believes that theground-truth right answer is the actual right answer.

Page 13: CS Department - Lecture 6 - Linear Classificationmtappen/cap5415/lecs/lec6.pdf · 2010-09-09 · Basic Vocabulary I We’ll call each thing that we are classifying a point I Each

Finding the line

N∏i=1

P [li|xi] =

N∏i=1

1

1 + exp(−li(axi + byi + c))

I This is also called the likelihood of the data

I Ideally, this likelihood should be as close to 1 as possible.

I We can find the parameters of the line by looking for the lineparameters that maximize this likelihood.

I In other words, find the set of line parameters that make theright answers have as high a probability as possible.

Page 14: CS Department - Lecture 6 - Linear Classificationmtappen/cap5415/lecs/lec6.pdf · 2010-09-09 · Basic Vocabulary I We’ll call each thing that we are classifying a point I Each

Finding the line

N∏i=1

P [li|xi] =

N∏i=1

1

1 + exp(−li(axi + byi + c))

I Before going on, we are going to do a quick math trick.

I Because the log (natural logarithm or ln) function ismonotonically increasing (i.e. it is always increasing), theparameters that maximize the above equation will alsomaximize

log

(N∏i=1

P [li|xi]

)=

N∑i=1

− log (1 + exp(−li(axi + byi + c)))

Page 15: CS Department - Lecture 6 - Linear Classificationmtappen/cap5415/lecs/lec6.pdf · 2010-09-09 · Basic Vocabulary I We’ll call each thing that we are classifying a point I Each

Finding the line

log

(N∏i=1

P [li|xi]

)=

N∑i=1

−log (1 + exp(−li(axi + byi + c)))

I Similarly, we can multiply this equation by −1 to change froma maximization problem to a minimization problem

I Leading to a function that we will call a “loss function” or“cost function” and can be denoted as L:

L =

N∑i=1

log (1 + exp(−li(axi + byi + c)))

I Our goal is to find the line parameters that minimize L.

Page 16: CS Department - Lecture 6 - Linear Classificationmtappen/cap5415/lecs/lec6.pdf · 2010-09-09 · Basic Vocabulary I We’ll call each thing that we are classifying a point I Each

Minimizing L

I We can minimize L by using its gradient:

∇L =

∂L

∂a

∂L

∂b

∂L

∂c

I The gradient can be viewed as a vector that points in the

direction of steepest ascent.

I So, the negative gradient points in the direction of steepestdescent.

Page 17: CS Department - Lecture 6 - Linear Classificationmtappen/cap5415/lecs/lec6.pdf · 2010-09-09 · Basic Vocabulary I We’ll call each thing that we are classifying a point I Each

An algorithm for minimizing L

I This leads to a simple algorithm for optimizing the lineparameters.

I We’ll create a vector

p =

abc

I The steps are:

1. Initialize p to some value p0

2. Repeat these steps:

2.1 Calculate ∇L using the current value of p (The value of thegradient depends on p!)

2.2 p← p+ η(−∇L)I We go in the direction of the negative gadient because that

this the direction of steepest descent.

Page 18: CS Department - Lecture 6 - Linear Classificationmtappen/cap5415/lecs/lec6.pdf · 2010-09-09 · Basic Vocabulary I We’ll call each thing that we are classifying a point I Each

An Example

Page 19: CS Department - Lecture 6 - Linear Classificationmtappen/cap5415/lecs/lec6.pdf · 2010-09-09 · Basic Vocabulary I We’ll call each thing that we are classifying a point I Each

An Example

Page 20: CS Department - Lecture 6 - Linear Classificationmtappen/cap5415/lecs/lec6.pdf · 2010-09-09 · Basic Vocabulary I We’ll call each thing that we are classifying a point I Each

An Example

Page 21: CS Department - Lecture 6 - Linear Classificationmtappen/cap5415/lecs/lec6.pdf · 2010-09-09 · Basic Vocabulary I We’ll call each thing that we are classifying a point I Each

An Example

Page 22: CS Department - Lecture 6 - Linear Classificationmtappen/cap5415/lecs/lec6.pdf · 2010-09-09 · Basic Vocabulary I We’ll call each thing that we are classifying a point I Each

An Example

Page 23: CS Department - Lecture 6 - Linear Classificationmtappen/cap5415/lecs/lec6.pdf · 2010-09-09 · Basic Vocabulary I We’ll call each thing that we are classifying a point I Each

An Example

Page 24: CS Department - Lecture 6 - Linear Classificationmtappen/cap5415/lecs/lec6.pdf · 2010-09-09 · Basic Vocabulary I We’ll call each thing that we are classifying a point I Each

An Example

Page 25: CS Department - Lecture 6 - Linear Classificationmtappen/cap5415/lecs/lec6.pdf · 2010-09-09 · Basic Vocabulary I We’ll call each thing that we are classifying a point I Each

An Example

Page 26: CS Department - Lecture 6 - Linear Classificationmtappen/cap5415/lecs/lec6.pdf · 2010-09-09 · Basic Vocabulary I We’ll call each thing that we are classifying a point I Each

An Example

Page 27: CS Department - Lecture 6 - Linear Classificationmtappen/cap5415/lecs/lec6.pdf · 2010-09-09 · Basic Vocabulary I We’ll call each thing that we are classifying a point I Each

An Example

Page 28: CS Department - Lecture 6 - Linear Classificationmtappen/cap5415/lecs/lec6.pdf · 2010-09-09 · Basic Vocabulary I We’ll call each thing that we are classifying a point I Each

An Example

Page 29: CS Department - Lecture 6 - Linear Classificationmtappen/cap5415/lecs/lec6.pdf · 2010-09-09 · Basic Vocabulary I We’ll call each thing that we are classifying a point I Each

An Example

Page 30: CS Department - Lecture 6 - Linear Classificationmtappen/cap5415/lecs/lec6.pdf · 2010-09-09 · Basic Vocabulary I We’ll call each thing that we are classifying a point I Each

An Example

Page 31: CS Department - Lecture 6 - Linear Classificationmtappen/cap5415/lecs/lec6.pdf · 2010-09-09 · Basic Vocabulary I We’ll call each thing that we are classifying a point I Each

An Example

Page 32: CS Department - Lecture 6 - Linear Classificationmtappen/cap5415/lecs/lec6.pdf · 2010-09-09 · Basic Vocabulary I We’ll call each thing that we are classifying a point I Each

An Example

Page 33: CS Department - Lecture 6 - Linear Classificationmtappen/cap5415/lecs/lec6.pdf · 2010-09-09 · Basic Vocabulary I We’ll call each thing that we are classifying a point I Each

An Example

Page 34: CS Department - Lecture 6 - Linear Classificationmtappen/cap5415/lecs/lec6.pdf · 2010-09-09 · Basic Vocabulary I We’ll call each thing that we are classifying a point I Each

An Example

Page 35: CS Department - Lecture 6 - Linear Classificationmtappen/cap5415/lecs/lec6.pdf · 2010-09-09 · Basic Vocabulary I We’ll call each thing that we are classifying a point I Each

An Example

Page 36: CS Department - Lecture 6 - Linear Classificationmtappen/cap5415/lecs/lec6.pdf · 2010-09-09 · Basic Vocabulary I We’ll call each thing that we are classifying a point I Each

An Example

Page 37: CS Department - Lecture 6 - Linear Classificationmtappen/cap5415/lecs/lec6.pdf · 2010-09-09 · Basic Vocabulary I We’ll call each thing that we are classifying a point I Each

An Example

Page 38: CS Department - Lecture 6 - Linear Classificationmtappen/cap5415/lecs/lec6.pdf · 2010-09-09 · Basic Vocabulary I We’ll call each thing that we are classifying a point I Each

An Example

Page 39: CS Department - Lecture 6 - Linear Classificationmtappen/cap5415/lecs/lec6.pdf · 2010-09-09 · Basic Vocabulary I We’ll call each thing that we are classifying a point I Each

An Example

Page 40: CS Department - Lecture 6 - Linear Classificationmtappen/cap5415/lecs/lec6.pdf · 2010-09-09 · Basic Vocabulary I We’ll call each thing that we are classifying a point I Each

An Example

Page 41: CS Department - Lecture 6 - Linear Classificationmtappen/cap5415/lecs/lec6.pdf · 2010-09-09 · Basic Vocabulary I We’ll call each thing that we are classifying a point I Each

An Example

Page 42: CS Department - Lecture 6 - Linear Classificationmtappen/cap5415/lecs/lec6.pdf · 2010-09-09 · Basic Vocabulary I We’ll call each thing that we are classifying a point I Each

An Example

Page 43: CS Department - Lecture 6 - Linear Classificationmtappen/cap5415/lecs/lec6.pdf · 2010-09-09 · Basic Vocabulary I We’ll call each thing that we are classifying a point I Each

An Example

Page 44: CS Department - Lecture 6 - Linear Classificationmtappen/cap5415/lecs/lec6.pdf · 2010-09-09 · Basic Vocabulary I We’ll call each thing that we are classifying a point I Each

An Example

Page 45: CS Department - Lecture 6 - Linear Classificationmtappen/cap5415/lecs/lec6.pdf · 2010-09-09 · Basic Vocabulary I We’ll call each thing that we are classifying a point I Each

An Example

Page 46: CS Department - Lecture 6 - Linear Classificationmtappen/cap5415/lecs/lec6.pdf · 2010-09-09 · Basic Vocabulary I We’ll call each thing that we are classifying a point I Each

An Example

Page 47: CS Department - Lecture 6 - Linear Classificationmtappen/cap5415/lecs/lec6.pdf · 2010-09-09 · Basic Vocabulary I We’ll call each thing that we are classifying a point I Each

An Example

Page 48: CS Department - Lecture 6 - Linear Classificationmtappen/cap5415/lecs/lec6.pdf · 2010-09-09 · Basic Vocabulary I We’ll call each thing that we are classifying a point I Each

An Example

Page 49: CS Department - Lecture 6 - Linear Classificationmtappen/cap5415/lecs/lec6.pdf · 2010-09-09 · Basic Vocabulary I We’ll call each thing that we are classifying a point I Each

An Example

Page 50: CS Department - Lecture 6 - Linear Classificationmtappen/cap5415/lecs/lec6.pdf · 2010-09-09 · Basic Vocabulary I We’ll call each thing that we are classifying a point I Each

Learning Line parameters

I How does this relate to learning line parameters?

I We differentiate with respect L and optimize.

I The following few slides show how the line changes as theparameters change.

Page 51: CS Department - Lecture 6 - Linear Classificationmtappen/cap5415/lecs/lec6.pdf · 2010-09-09 · Basic Vocabulary I We’ll call each thing that we are classifying a point I Each

An Example

Page 52: CS Department - Lecture 6 - Linear Classificationmtappen/cap5415/lecs/lec6.pdf · 2010-09-09 · Basic Vocabulary I We’ll call each thing that we are classifying a point I Each

An Example

Page 53: CS Department - Lecture 6 - Linear Classificationmtappen/cap5415/lecs/lec6.pdf · 2010-09-09 · Basic Vocabulary I We’ll call each thing that we are classifying a point I Each

An Example

Page 54: CS Department - Lecture 6 - Linear Classificationmtappen/cap5415/lecs/lec6.pdf · 2010-09-09 · Basic Vocabulary I We’ll call each thing that we are classifying a point I Each

An Example

Page 55: CS Department - Lecture 6 - Linear Classificationmtappen/cap5415/lecs/lec6.pdf · 2010-09-09 · Basic Vocabulary I We’ll call each thing that we are classifying a point I Each

An Example

Page 56: CS Department - Lecture 6 - Linear Classificationmtappen/cap5415/lecs/lec6.pdf · 2010-09-09 · Basic Vocabulary I We’ll call each thing that we are classifying a point I Each

An Example

Page 57: CS Department - Lecture 6 - Linear Classificationmtappen/cap5415/lecs/lec6.pdf · 2010-09-09 · Basic Vocabulary I We’ll call each thing that we are classifying a point I Each

An Example

Page 58: CS Department - Lecture 6 - Linear Classificationmtappen/cap5415/lecs/lec6.pdf · 2010-09-09 · Basic Vocabulary I We’ll call each thing that we are classifying a point I Each

How does this get used in computer vision

I Step 1 - Define a labeling task

I Step 2 - Get a bunch of training examples, with ground-truthlabels

I Step 3 - Train the classifier

I Step 4 - Get a bunch of test examples, also with ground-truthlabels

I Step 5 - Test classifier on test examples

Page 59: CS Department - Lecture 6 - Linear Classificationmtappen/cap5415/lecs/lec6.pdf · 2010-09-09 · Basic Vocabulary I We’ll call each thing that we are classifying a point I Each

Let’s talk more about step 3

I Each training example is numerically represented by a set offeatures.

I These features are stored in a vector. The vector for eachtraining example will be denoted xi, where the i representsthe ith training example.

I Each training example also has a label, liI If we denote the weight vector as w, then we can rewrite the

log-loss criterion as

L =

N∑i=1

log(1 + exp((−li(wTxi

))) (1)

I (In the earlier example, w = [abc])

Page 60: CS Department - Lecture 6 - Linear Classificationmtappen/cap5415/lecs/lec6.pdf · 2010-09-09 · Basic Vocabulary I We’ll call each thing that we are classifying a point I Each

Is it really the same?

I Compare

L =

N∑i=1

log (1 + exp(−li(axi + byi + c)))

I With

L =

N∑i=1

log(1 + exp((−li(wTxi

))) (2)

I What’s missing?

I In the upper equation c is a constant.

I This is called the bias term.

Page 61: CS Department - Lecture 6 - Linear Classificationmtappen/cap5415/lecs/lec6.pdf · 2010-09-09 · Basic Vocabulary I We’ll call each thing that we are classifying a point I Each

Function of the bias term

I Going back to this example:

-6 -4 -2 0 2 4 6

-4

-2

0

2

4

-0.9

3.5

-2.6

1.5

-2.3

2.1

Page 62: CS Department - Lecture 6 - Linear Classificationmtappen/cap5415/lecs/lec6.pdf · 2010-09-09 · Basic Vocabulary I We’ll call each thing that we are classifying a point I Each

How do we implement it?

I It’s actually quite easy

I Make one feature be equal to all ones

Page 63: CS Department - Lecture 6 - Linear Classificationmtappen/cap5415/lecs/lec6.pdf · 2010-09-09 · Basic Vocabulary I We’ll call each thing that we are classifying a point I Each

How does this get used in computer vision

I Step 1 - Define a labeling task

I Step 2 - Get a bunch of training examples, with ground-truthlabels

I Step 3 - Train the classifier

I Step 4 - Get a bunch of test examples, also with ground-truthlabels

I Step 5 - Test classifier on test examples

Page 64: CS Department - Lecture 6 - Linear Classificationmtappen/cap5415/lecs/lec6.pdf · 2010-09-09 · Basic Vocabulary I We’ll call each thing that we are classifying a point I Each

Steps 4 and Steps 5 - Testing

I We want to want to know how well the system will performon data that it has never seen

I Overfitting can be a problem, especially if there isn’t muchtraining data.

I In classification, we usually measure the percentage of pointscorrectly labeled.

Page 65: CS Department - Lecture 6 - Linear Classificationmtappen/cap5415/lecs/lec6.pdf · 2010-09-09 · Basic Vocabulary I We’ll call each thing that we are classifying a point I Each

From here,

I The next problem set will involve implementing alearning-based edge-detector.