linear models & clustering

1

Linear Models & Clustering

Presented by Kwak, Nam-ju

2

Coverage Classification• Some tools for classification• Linear regression• Multiresponse linear regression• Logistic regression• Perceptron Instance-based learning• Basic understanding• kD-tree• Ball tree Clustering• Clustering and types of clustering• Iterative distance-based clustering• Faster distance calculation

3

Classification Classification• Some tools for classification• Linear regression• Multiresponse linear regression• Logistic regression• Perceptron

4

Some tools for classification An input is categorized into one of collections

of data based on its features or attributes.

Classification is important in that we can dis-tinguish a set of data having common charac-teristics from others.

Some classification operations

In-put

Class

5

Some tools for classification Decision tree & classification rule

x=1?

y=1?

y=1?

b a a bIf x=1 and y=0 then class=aIf x=0 and y=1 then class=aIf x=0 and y=0 then class=bIf x=1 and y=1 then class=b

no

no no

yes

yes

yes

Decision tree

Classification rule

x XOR y = ?

6

Linear regression If attributes and classes are numeric value, we

can express the resulting class of a given in-put as a linear transformation between the set of attributes of the input and a certain set of weights.

x: class, wi: weight, ai: attribute

It is important to set wi well, so that the trans-formation results in a desirable class for given attributes of an input.

7

Linear regression Here, we introduce a simple way to make a

machine “learn”. A machine takes several training instances,

which are associations of a set of attributes to a class.

It extracts a rule from the training instances, then builds and tunes a mechanism to infer a class from a unknown test example.

It would give us an inferred class using “learnt” knowledge.

8

Linear regression n training instances will be given, that is, n

sets of attributes and n corresponding classes is going to be provided.

x(i): the corresponding ACTUAL class for the i-th training instance

aj(i): the j-th attribute of the i-th training in-

stance It is clear that we should find the set of wj’s

minimizing the following:

9

Multiresponse linear regression Linear regression is performed for each ap-

pearing class individually, with training in-stances, such that the value of linear trans-formation becomes 1 for training instances of that class and 0 for others.

Let us assume that we are doing linear re-gression for a certain class. But the following conditions should be met.

It looks like a membership function.

10

Multiresponse linear regression Now, with a given test example, we evaluate

linear transformation for each class using wi of that class.

Select the class which gives the largest value as the class of the test example.

For Class 1

For Class 2

For Class m

For Class n

…Test

exam-ple

Value 1

Value 2

Value 3

Value 4

Largest!!

11

Logistic regression Logit function

Inverse Logit functionFrom Wikipedia

12

Logistic regression P(1|a1, … , ak): for a certain class, the probabil-

ity that a test example consisting of a1, … , ak

is of that class

We set wi’s to minimize log-likelihood for each class.

13

Logistic regression Plain multiresponse regression doesn’t guar-

antee that each linear transformation value is between 0 and 1.

With Logistic regression, the value is between 0 and 1 and satisfies one of important condi-tion for being regarded as a probability.

However, the sum of values for all the classes may not become 1.

14

Logistic regression Pairwise classification: for every pair of

classes, namely, the first one and the second one, the meaning of P(1|a1, … , ak) is some-what changed.

P(1|a1, … , ak): the probability that a test ex-ample consisting of a1, … , ak is of the first class

P(0|a1, … , ak)=1-P(1|a1, … , ak): the probabil-ity that a test example consisting of a1, … , ak

is of the second class The regression is done only for training in-

stances of either the first and the second class of the pair.

15

Logistic regression For each pair of classes, namely, the first one

and the second one, if P(1|a1, … , ak) is above 0.5, then the resulting class is the first one, otherwise, the second one.

We can count how many times each class wins pairwise classification. The class which wins the most many times is the final resulting class for the given test example.

16

Logistic regression

h

i

P(1|a1, … , ak) ≥0.5?Or not?

i

i

j

P(1|a1, … , ak) ≥0.5?Or not?

i

i

k

P(1|a1, … , ak) ≥0.5?Or not?

k

…

…

Winnerthe most many times!!

17

Perceptron Sometimes, we only need to know which class

a test example belongs to without any infor-mation of probabilities.

Assumptions for simplification• Only two classes are of interest.• Linearly separable: data space can be sepa-

rated with a single hyperplane.

Linearly sepa-rable

Not linearly sepa-rable

18

Perceptron Remind that it is about a pair of two classes,

namely, the first class and the second one.

If a test example makes it below 0, the exam-ple is of the first class. If a test example makes it above 0, the example is of the sec-ond class.

We will find wj's as described above. In other words, we’re looking for a hyperplane:

19

Perceptron Algorithm PERCEPTRON LEARNING RULE

When misclassified instance is found, parameters of the perceptron hyperplane is modified, so that the in-stance may be classified correctly in the future.

If A is added into wj's

• (w0, w1, … , wk) ☞ (w0+a0, w1+a1, … , wk+ak)

• w0a0+w1a1+ … +wkak ☞ w0a0+w1a1+ … +wkak+∑aj2

Initialize all wj's to 0’sUntil all the training instances are properly classified

For each training instance AIf A is wrongly classified by the current perceptron

If A is actually of the first class, add A into wj'sIf A is actually of the second class, subtract A from wj's

20

Perceptron Perceptron: the hyperplane found in such a

way Perceptron is grandfather/grandmother of

neural network.

a0 a1 aj ak…

w0 w1 wj wk

An instance is input into the per-ceptron.Attributes of the instance acti-vate the input layer.Attributes are linearly trans-formed with weights and sent to the output node.Output node signals 1 if the re-ceived value is above 0, -1 oth-erwise.

21

Instance-based learning Instance-based learning• kD-tree• Ball tree

22

Basic understanding Find the training instance which is the closest

to the test example and predict the class from it.

Distance• Euclidean distance• Alternatives

Normalizing attributes

23

kD-tree k: the number of attributes Assume that k=2.

(7, 4)

(2, 2)

(6, 7)

(3, 8)

(3, 8) (6,

7)

(7, 4)

(2, 2)

24

kD-tree

25

Ball tree

8

5 5

2 2 3 2

26

clustering Clustering• Iterative distance-based clustering• Faster distance calculation

27

Clustering and types of clustering No class to be predicted Instances are to be divided into groups. Types of clustering• Exclusive• Overlapping• Probabilistic• Hierarchical

28

Iterative distance-based clustering Also called k-means Step 1: Select k points randomly as centers of

k clusters. Step 2: Each instance is associated with the

center which is the closest to it. Step 3: After all the instances are associated,

for each cluster, the centroid is computed from the instances of that cluster. This cen-troid becomes a new center of the cluster.

Step 4: With new centers for clusters, the same jobs are repeated.

29

Iterative distance-based clustering The best solution (k=2)

If the randomly selected centers are as fol-lows,

30

Faster distance calculation For each node, keep the sum of all instances

and the number of instances belonging to the ball the node represents.

Traversing the tree from top to bottom, find the closest cluster center for each instance.

If an entire ball of a node belongs to a certain cluster center, we need not traverse its child nodes simply by utilizing information stored in the node.

31

Faster distance calculation

32

Conclusion Any question?

linear models & clustering

Documents