linear models & clustering

32
Linear Models & Clustering Presented by Kwak, Nam-ju 1

Upload: rhett

Post on 09-Jan-2016

51 views

Category:

Documents


4 download

DESCRIPTION

Linear Models & Clustering. Presented by Kwak , Nam- ju. Coverage. Classification Some tools for classification Linear regression Multiresponse linear regression Logistic regression Perceptron Instance-based learning Basic understanding kD -tree Ball tree Clustering - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Linear Models & Clustering

1

Linear Models & Clustering

Presented by Kwak, Nam-ju

Page 2: Linear Models & Clustering

2

Coverage Classification• Some tools for classification• Linear regression• Multiresponse linear regression• Logistic regression• Perceptron Instance-based learning• Basic understanding• kD-tree• Ball tree Clustering• Clustering and types of clustering• Iterative distance-based clustering• Faster distance calculation

Page 3: Linear Models & Clustering

3

Classification Classification• Some tools for classification• Linear regression• Multiresponse linear regression• Logistic regression• Perceptron

Page 4: Linear Models & Clustering

4

Some tools for classification An input is categorized into one of collections

of data based on its features or attributes.

Classification is important in that we can dis-tinguish a set of data having common charac-teristics from others.

Some classification operations

In-put

Class

Page 5: Linear Models & Clustering

5

Some tools for classification Decision tree & classification rule

x=1?

y=1?

y=1?

b a a bIf x=1 and y=0 then class=aIf x=0 and y=1 then class=aIf x=0 and y=0 then class=bIf x=1 and y=1 then class=b

no

no no

yes

yes

yes

Decision tree

Classification rule

x XOR y = ?

Page 6: Linear Models & Clustering

6

Linear regression If attributes and classes are numeric value, we

can express the resulting class of a given in-put as a linear transformation between the set of attributes of the input and a certain set of weights.

x: class, wi: weight, ai: attribute

It is important to set wi well, so that the trans-formation results in a desirable class for given attributes of an input.

Page 7: Linear Models & Clustering

7

Linear regression Here, we introduce a simple way to make a

machine “learn”. A machine takes several training instances,

which are associations of a set of attributes to a class.

It extracts a rule from the training instances, then builds and tunes a mechanism to infer a class from a unknown test example.

It would give us an inferred class using “learnt” knowledge.

Page 8: Linear Models & Clustering

8

Linear regression n training instances will be given, that is, n

sets of attributes and n corresponding classes is going to be provided.

x(i): the corresponding ACTUAL class for the i-th training instance

aj(i): the j-th attribute of the i-th training in-

stance It is clear that we should find the set of wj’s

minimizing the following:

Page 9: Linear Models & Clustering

9

Multiresponse linear regression Linear regression is performed for each ap-

pearing class individually, with training in-stances, such that the value of linear trans-formation becomes 1 for training instances of that class and 0 for others.

Let us assume that we are doing linear re-gression for a certain class. But the following conditions should be met.

It looks like a membership function.

Page 10: Linear Models & Clustering

10

Multiresponse linear regression Now, with a given test example, we evaluate

linear transformation for each class using wi of that class.

Select the class which gives the largest value as the class of the test example.

For Class 1

For Class 2

For Class m

For Class n

…Test

exam-ple

Value 1

Value 2

Value 3

Value 4

Largest!!

Page 11: Linear Models & Clustering

11

Logistic regression Logit function

Inverse Logit functionFrom Wikipedia

Page 12: Linear Models & Clustering

12

Logistic regression P(1|a1, … , ak): for a certain class, the probabil-

ity that a test example consisting of a1, … , ak

is of that class

We set wi’s to minimize log-likelihood for each class.

Page 13: Linear Models & Clustering

13

Logistic regression Plain multiresponse regression doesn’t guar-

antee that each linear transformation value is between 0 and 1.

With Logistic regression, the value is between 0 and 1 and satisfies one of important condi-tion for being regarded as a probability.

However, the sum of values for all the classes may not become 1.

Page 14: Linear Models & Clustering

14

Logistic regression Pairwise classification: for every pair of

classes, namely, the first one and the second one, the meaning of P(1|a1, … , ak) is some-what changed.

P(1|a1, … , ak): the probability that a test ex-ample consisting of a1, … , ak is of the first class

P(0|a1, … , ak)=1-P(1|a1, … , ak): the probabil-ity that a test example consisting of a1, … , ak

is of the second class The regression is done only for training in-

stances of either the first and the second class of the pair.

Page 15: Linear Models & Clustering

15

Logistic regression For each pair of classes, namely, the first one

and the second one, if P(1|a1, … , ak) is above 0.5, then the resulting class is the first one, otherwise, the second one.

We can count how many times each class wins pairwise classification. The class which wins the most many times is the final resulting class for the given test example.

Page 16: Linear Models & Clustering

16

Logistic regression

h

i

P(1|a1, … , ak) ≥0.5?Or not?

i

i

j

P(1|a1, … , ak) ≥0.5?Or not?

i

i

k

P(1|a1, … , ak) ≥0.5?Or not?

k

Winnerthe most many times!!

Page 17: Linear Models & Clustering

17

Perceptron Sometimes, we only need to know which class

a test example belongs to without any infor-mation of probabilities.

Assumptions for simplification• Only two classes are of interest.• Linearly separable: data space can be sepa-

rated with a single hyperplane.

Linearly sepa-rable

Not linearly sepa-rable

Page 18: Linear Models & Clustering

18

Perceptron Remind that it is about a pair of two classes,

namely, the first class and the second one.

If a test example makes it below 0, the exam-ple is of the first class. If a test example makes it above 0, the example is of the sec-ond class.

We will find wj's as described above. In other words, we’re looking for a hyperplane:

Page 19: Linear Models & Clustering

19

Perceptron Algorithm PERCEPTRON LEARNING RULE

When misclassified instance is found, parameters of the perceptron hyperplane is modified, so that the in-stance may be classified correctly in the future.

If A is added into wj's

• (w0, w1, … , wk) ☞ (w0+a0, w1+a1, … , wk+ak)

• w0a0+w1a1+ … +wkak ☞ w0a0+w1a1+ … +wkak+∑aj2

Initialize all wj's to 0’sUntil all the training instances are properly classified

For each training instance AIf A is wrongly classified by the current perceptron

If A is actually of the first class, add A into wj'sIf A is actually of the second class, subtract A from wj's

Page 20: Linear Models & Clustering

20

Perceptron Perceptron: the hyperplane found in such a

way Perceptron is grandfather/grandmother of

neural network.

a0 a1 aj ak…

w0 w1 wj wk

An instance is input into the per-ceptron.Attributes of the instance acti-vate the input layer.Attributes are linearly trans-formed with weights and sent to the output node.Output node signals 1 if the re-ceived value is above 0, -1 oth-erwise.

Page 21: Linear Models & Clustering

21

Instance-based learning Instance-based learning• kD-tree• Ball tree

Page 22: Linear Models & Clustering

22

Basic understanding Find the training instance which is the closest

to the test example and predict the class from it.

Distance• Euclidean distance• Alternatives

Normalizing attributes

Page 23: Linear Models & Clustering

23

kD-tree k: the number of attributes Assume that k=2.

(7, 4)

(2, 2)

(6, 7)

(3, 8)

(3, 8) (6,

7)

(7, 4)

(2, 2)

Page 24: Linear Models & Clustering

24

kD-tree

Page 25: Linear Models & Clustering

25

Ball tree

8

5 5

2 2 3 2

Page 26: Linear Models & Clustering

26

clustering Clustering• Iterative distance-based clustering• Faster distance calculation

Page 27: Linear Models & Clustering

27

Clustering and types of clustering No class to be predicted Instances are to be divided into groups. Types of clustering• Exclusive• Overlapping• Probabilistic• Hierarchical

Page 28: Linear Models & Clustering

28

Iterative distance-based clustering Also called k-means Step 1: Select k points randomly as centers of

k clusters. Step 2: Each instance is associated with the

center which is the closest to it. Step 3: After all the instances are associated,

for each cluster, the centroid is computed from the instances of that cluster. This cen-troid becomes a new center of the cluster.

Step 4: With new centers for clusters, the same jobs are repeated.

Page 29: Linear Models & Clustering

29

Iterative distance-based clustering The best solution (k=2)

If the randomly selected centers are as fol-lows,

Page 30: Linear Models & Clustering

30

Faster distance calculation For each node, keep the sum of all instances

and the number of instances belonging to the ball the node represents.

Traversing the tree from top to bottom, find the closest cluster center for each instance.

If an entire ball of a node belongs to a certain cluster center, we need not traverse its child nodes simply by utilizing information stored in the node.

Page 31: Linear Models & Clustering

31

Faster distance calculation

Page 32: Linear Models & Clustering

32

Conclusion Any question?