fast perceptron decision tree learning from evolving data streams

34
Fast Perceptron Decision Tree Learning from Evolving Data Streams Albert Bifet, Geoff Holmes, Bernhard Pfahringer, and Eibe Frank University of Waikato Hamilton, New Zealand Hyderabad, 23 June 2010 14th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD’10)

Upload: albert-bifet

Post on 08-May-2015

1.284 views

Category:

Technology


1 download

DESCRIPTION

This talk explains how to use perceptrons and combine them with decision trees for evolving data streams.

TRANSCRIPT

Page 1: Fast Perceptron Decision Tree Learning from Evolving Data Streams

Fast Perceptron Decision Tree Learningfrom Evolving Data Streams

Albert Bifet, Geoff Holmes, Bernhard Pfahringer, and Eibe Frank

University of WaikatoHamilton, New Zealand

Hyderabad, 23 June 201014th Pacific-Asia Conference on Knowledge Discovery and Data Mining

(PAKDD’10)

Page 2: Fast Perceptron Decision Tree Learning from Evolving Data Streams

Motivation

RAM HoursTime and Memory in one measure

Hoeffding Decision Trees with Perceptron Learners atleaves

Improve performance of classification methods for datastreams

2 / 28

Page 3: Fast Perceptron Decision Tree Learning from Evolving Data Streams

Outline

1 RAM-Hours

2 Perceptron Decision Tree Learning

3 Empirical evaluation

3 / 28

Page 4: Fast Perceptron Decision Tree Learning from Evolving Data Streams

Mining Massive Data

2007Digital Universe: 281 exabytes (billion gigabytes)The amount of information created exceeded availablestorage for the first time

Web 2.0

106 million registered users600 million search queries per day3 billion requests a day via its API.

4 / 28

Page 5: Fast Perceptron Decision Tree Learning from Evolving Data Streams

Green Computing

Green Computing

Study and practice of using computing resources efficiently.

Algorithmic EfficiencyA main approach of Green Computing

Data StreamsFast methods without storing all dataset in memory

5 / 28

Page 6: Fast Perceptron Decision Tree Learning from Evolving Data Streams

Data stream classification cycle

1 Process an example at a time,and inspect it only once (atmost)

2 Use a limited amount ofmemory

3 Work in a limited amount oftime

4 Be ready to predict at anypoint

6 / 28

Page 7: Fast Perceptron Decision Tree Learning from Evolving Data Streams

Mining Massive Data

Koichi KawanaSimplicity means the achievement of maximum effect withminimum means.

time

accuracy

memory

Data Streams

7 / 28

Page 8: Fast Perceptron Decision Tree Learning from Evolving Data Streams

Evaluation Example

Accuracy Time MemoryClassifier A 70% 100 20Classifier B 80% 20 40

Which classifier is performing better?

8 / 28

Page 9: Fast Perceptron Decision Tree Learning from Evolving Data Streams

RAM-Hours

RAM-HourEvery GB of RAM deployed for 1 hour

Cloud Computing Rental Cost Options

9 / 28

Page 10: Fast Perceptron Decision Tree Learning from Evolving Data Streams

Evaluation Example

Accuracy Time Memory RAM-HoursClassifier A 70% 100 20 2,000Classifier B 80% 20 40 800

Which classifier is performing better?

10 / 28

Page 11: Fast Perceptron Decision Tree Learning from Evolving Data Streams

Outline

1 RAM-Hours

2 Perceptron Decision Tree Learning

3 Empirical evaluation

11 / 28

Page 12: Fast Perceptron Decision Tree Learning from Evolving Data Streams

Hoeffding TreesHoeffding Tree : VFDT

Pedro Domingos and Geoff Hulten.Mining high-speed data streams. 2000

With high probability, constructs an identical model that atraditional (greedy) method would learnWith theoretical guarantees on the error rate

Time

Contains “Money”

YESYes

NONo

Day

YES

Night

12 / 28

Page 13: Fast Perceptron Decision Tree Learning from Evolving Data Streams

Hoeffding Naive Bayes Tree

Hoeffding TreeMajority Class learner at leaves

Hoeffding Naive Bayes Tree

G. Holmes, R. Kirkby, and B. Pfahringer.Stress-testing Hoeffding trees, 2005.

monitors accuracy of a Majority Class learnermonitors accuracy of a Naive Bayes learnerpredicts using the most accurate method

13 / 28

Page 14: Fast Perceptron Decision Tree Learning from Evolving Data Streams

Perceptron

Attribute 1

Attribute 2

Attribute 3

Attribute 4

Attribute 5

Output h~w (~xi)

w1

w2

w3

w4

w5

Data stream: 〈~xi ,yi〉Classical perceptron: h~w (~xi) = sgn(~wT~xi),Minimize Mean-square error: J(~w) = 1

2 ∑(yi −h~w (~xi))2

14 / 28

Page 15: Fast Perceptron Decision Tree Learning from Evolving Data Streams

Perceptron

Attribute 1

Attribute 2

Attribute 3

Attribute 4

Attribute 5

Output h~w (~xi)

w1

w2

w3

w4

w5

We use sigmoid function h~w = σ(~wT~x) where

σ(x) = 1/(1+e−x)

σ′(x) = σ(x)(1−σ(x))

14 / 28

Page 16: Fast Perceptron Decision Tree Learning from Evolving Data Streams

Perceptron

Minimize Mean-square error: J(~w) = 12 ∑(yi −h~w (~xi))

2

Stochastic Gradient Descent: ~w = ~w +η∇J~xi

Gradient of the error function:

∇J =−∑i(yi −h~w (~xi))∇h~w (~xi)

∇h~w (~xi) = h~w (~xi)(1−h~w (~xi))

Weight update rule

~w = ~w +η ∑i(yi −h~w (~xi))h~w (~xi)(1−h~w (~xi))~xi

14 / 28

Page 17: Fast Perceptron Decision Tree Learning from Evolving Data Streams

Perceptron

PERCEPTRON LEARNING(Stream,η)

1 for each class2 do PERCEPTRON LEARNING(Stream,class,η)

PERCEPTRON LEARNING(Stream,class,η)

1 � Let w0 and ~w be randomly initialized2 for each example (~x ,y) in Stream3 do if class = y4 then δ = (1−h~w (~x)) ·h~w (~x) · (1−h~w (~x))5 else δ = (0−h~w (~x)) ·h~w (~x) · (1−h~w (~x))6 ~w = ~w +η ·δ ·~x

PERCEPTRON PREDICTION(~x)

1 return argmaxclass h~wclass(~x)

15 / 28

Page 18: Fast Perceptron Decision Tree Learning from Evolving Data Streams

Hybrid Hoeffding Trees

Hoeffding Naive Bayes TreeTwo learners at leaves: Naive Bayes and Majority Class

Hoeffding Perceptron TreeTwo learners at leaves: Perceptron and Majority Class

Hoeffding Naive Bayes Perceptron TreeThree learners at leaves: Naive Bayes, Perceptron and MajorityClass

16 / 28

Page 19: Fast Perceptron Decision Tree Learning from Evolving Data Streams

Outline

1 RAM-Hours

2 Perceptron Decision Tree Learning

3 Empirical evaluation

17 / 28

Page 20: Fast Perceptron Decision Tree Learning from Evolving Data Streams

What is MOA?

{M}assive {O}nline {A}nalysis is a framework for onlinelearning from data streams.

It is closely related to WEKAIt includes a collection of offline and online methods as wellas tools for evaluation:

boosting and baggingHoeffding Trees

with and without Naı̈ve Bayes classifiers at the leaves.

18 / 28

Page 21: Fast Perceptron Decision Tree Learning from Evolving Data Streams

What is MOA?

Easy to extendEasy to design and run experiments

Philipp Kranen, Hardy Kremer, Timm Jansen, ThomasSeidl, Albert Bifet, Geoff Holmes, Bernhard Pfahringer

RWTH Aachen University, University of WaikatoBenchmarking Stream Clustering Algorithms within theMOA FrameworkKDD 2010 Demo

18 / 28

Page 22: Fast Perceptron Decision Tree Learning from Evolving Data Streams

MOA: the bird

The Moa (another native NZ bird) is not only flightless, like theWeka, but also extinct.

19 / 28

Page 23: Fast Perceptron Decision Tree Learning from Evolving Data Streams

MOA: the bird

The Moa (another native NZ bird) is not only flightless, like theWeka, but also extinct.

19 / 28

Page 24: Fast Perceptron Decision Tree Learning from Evolving Data Streams

MOA: the bird

The Moa (another native NZ bird) is not only flightless, like theWeka, but also extinct.

19 / 28

Page 25: Fast Perceptron Decision Tree Learning from Evolving Data Streams

Concept Drift Framework

t

f (t) f (t)

α

α

t0W

0.5

1

DefinitionGiven two data streams a, b, we define c = a⊕W

t0 b as the datastream built joining the two data streams a and b

Pr[c(t) = b(t)] = 1/(1+ e−4(t−t0)/W ).Pr[c(t) = a(t)] = 1−Pr[c(t) = b(t)]

20 / 28

Page 26: Fast Perceptron Decision Tree Learning from Evolving Data Streams

Concept Drift Framework

t

f (t) f (t)

α

α

t0W

0.5

1

Example

(((a⊕W0t0 b)⊕W1

t1 c)⊕W2t2 d) . . .

(((SEA9⊕Wt0 SEA8)⊕W

2t0 SEA7)⊕W3t0 SEA9.5)

CovPokElec = (CoverType⊕5,000581,012 Poker)⊕5,000

1,000,000 ELEC2

20 / 28

Page 27: Fast Perceptron Decision Tree Learning from Evolving Data Streams

Empirical evaluation

Accuracy

40

45

50

55

60

65

70

75

80

10.000 120.000 230.000 340.000 450.000 560.000 670.000 780.000 890.000 1.000.0

Instances

Ac

cu

rac

y (

%)

htnbp

htnb

htp

ht

Figure: Accuracy on dataset LED with three concept drifts.

21 / 28

Page 28: Fast Perceptron Decision Tree Learning from Evolving Data Streams

Empirical evaluation

RunTime

0

5

10

15

20

25

30

35

10.000 120.000 230.000 340.000 450.000 560.000 670.000 780.000 890.000

Instances

Tim

e (

se

c.) htnbp

htnb

htp

ht

Figure: Time on dataset LED with three concept drifts.

22 / 28

Page 29: Fast Perceptron Decision Tree Learning from Evolving Data Streams

Empirical evaluation

Memory

0

0,5

1

1,5

2

2,5

3

3,5

4

4,5

5

10.000 130.000 250.000 370.000 490.000 610.000 730.000 850.000 970.000

Instances

Me

mo

ry (

Mb

)

htnbp

htnb

htp

ht

Figure: Memory on dataset LED with three concept drifts.

23 / 28

Page 30: Fast Perceptron Decision Tree Learning from Evolving Data Streams

Empirical evaluation

RAM-Hours

0,00E+00

5,00E-06

1,00E-05

1,50E-05

2,00E-05

2,50E-05

3,00E-05

3,50E-05

4,00E-05

4,50E-05

10.000 130.000 250.000 370.000 490.000 610.000 730.000 850.000 970.000

Instances

RA

M-H

ou

rs

htnbp

htnb

htp

ht

Figure: RAM-Hours on dataset LED with three concept drifts.

24 / 28

Page 31: Fast Perceptron Decision Tree Learning from Evolving Data Streams

Empirical evaluation Cover Type Dataset

Accuracy Time Mem RAM-HoursPerceptron 81.68 12.21 0.05 1.00Naı̈ve Bayes 60.52 22.81 0.08 2.99Hoeffding Tree 68.3 13.43 2.59 56.98TreesNaı̈ve Bayes HT 81.06 24.73 2.59 104.92Perceptron HT 83.59 16.53 3.46 93.68NB Perceptron HT 85.77 22.16 3.46 125.59BaggingNaı̈ve Bayes HT 85.73 165.75 0.8 217.20Perceptron HT 86.33 50.06 1.66 136.12NB Perceptron HT 87.88 115.58 1.25 236.65

25 / 28

Page 32: Fast Perceptron Decision Tree Learning from Evolving Data Streams

Empirical evaluation Electricity Dataset

Accuracy Time Mem RAM-HoursPerceptron 79.07 0.53 0.01 1.00Naı̈ve Bayes 73.36 0.55 0.01 1.04Hoeffding Tree 75.35 0.86 0.12 19.47TreesNaı̈ve Bayes HT 80.69 0.96 0.12 21.74Perceptron HT 84.24 0.93 0.21 36.85NB Perceptron HT 84.34 1.07 0.21 42.40BaggingNaı̈ve Bayes HT 84.36 3.17 0.13 77.75Perceptron HT 85.22 2.59 0.44 215.02NB Perceptron HT 86.44 3.55 0.3 200.94

26 / 28

Page 33: Fast Perceptron Decision Tree Learning from Evolving Data Streams

Summary

http://moa.cs.waikato.ac.nz/

SummarySensor Networks

use PerceptronHandheld Computers

use Hoeffding Naive Bayes Perceptron TreeServers

use Bagging Hoeffding Naive Bayes Perceptron Tree

27 / 28

Page 34: Fast Perceptron Decision Tree Learning from Evolving Data Streams

Summary

http://moa.cs.waikato.ac.nz/

ConclusionsRAM-Hours as a new measure of time and memoryHoeffding Perceptron TreeHoeffding Naive Bayes Perceptron Tree

Future WorkAdaptive learning rate for the Perceptron.

28 / 28