fast perceptron decision tree learning from evolving data streams
DESCRIPTION
This talk explains how to use perceptrons and combine them with decision trees for evolving data streams.TRANSCRIPT
Fast Perceptron Decision Tree Learningfrom Evolving Data Streams
Albert Bifet, Geoff Holmes, Bernhard Pfahringer, and Eibe Frank
University of WaikatoHamilton, New Zealand
Hyderabad, 23 June 201014th Pacific-Asia Conference on Knowledge Discovery and Data Mining
(PAKDD’10)
Motivation
RAM HoursTime and Memory in one measure
Hoeffding Decision Trees with Perceptron Learners atleaves
Improve performance of classification methods for datastreams
2 / 28
Outline
1 RAM-Hours
2 Perceptron Decision Tree Learning
3 Empirical evaluation
3 / 28
Mining Massive Data
2007Digital Universe: 281 exabytes (billion gigabytes)The amount of information created exceeded availablestorage for the first time
Web 2.0
106 million registered users600 million search queries per day3 billion requests a day via its API.
4 / 28
Green Computing
Green Computing
Study and practice of using computing resources efficiently.
Algorithmic EfficiencyA main approach of Green Computing
Data StreamsFast methods without storing all dataset in memory
5 / 28
Data stream classification cycle
1 Process an example at a time,and inspect it only once (atmost)
2 Use a limited amount ofmemory
3 Work in a limited amount oftime
4 Be ready to predict at anypoint
6 / 28
Mining Massive Data
Koichi KawanaSimplicity means the achievement of maximum effect withminimum means.
time
accuracy
memory
Data Streams
7 / 28
Evaluation Example
Accuracy Time MemoryClassifier A 70% 100 20Classifier B 80% 20 40
Which classifier is performing better?
8 / 28
RAM-Hours
RAM-HourEvery GB of RAM deployed for 1 hour
Cloud Computing Rental Cost Options
9 / 28
Evaluation Example
Accuracy Time Memory RAM-HoursClassifier A 70% 100 20 2,000Classifier B 80% 20 40 800
Which classifier is performing better?
10 / 28
Outline
1 RAM-Hours
2 Perceptron Decision Tree Learning
3 Empirical evaluation
11 / 28
Hoeffding TreesHoeffding Tree : VFDT
Pedro Domingos and Geoff Hulten.Mining high-speed data streams. 2000
With high probability, constructs an identical model that atraditional (greedy) method would learnWith theoretical guarantees on the error rate
Time
Contains “Money”
YESYes
NONo
Day
YES
Night
12 / 28
Hoeffding Naive Bayes Tree
Hoeffding TreeMajority Class learner at leaves
Hoeffding Naive Bayes Tree
G. Holmes, R. Kirkby, and B. Pfahringer.Stress-testing Hoeffding trees, 2005.
monitors accuracy of a Majority Class learnermonitors accuracy of a Naive Bayes learnerpredicts using the most accurate method
13 / 28
Perceptron
Attribute 1
Attribute 2
Attribute 3
Attribute 4
Attribute 5
Output h~w (~xi)
w1
w2
w3
w4
w5
Data stream: 〈~xi ,yi〉Classical perceptron: h~w (~xi) = sgn(~wT~xi),Minimize Mean-square error: J(~w) = 1
2 ∑(yi −h~w (~xi))2
14 / 28
Perceptron
Attribute 1
Attribute 2
Attribute 3
Attribute 4
Attribute 5
Output h~w (~xi)
w1
w2
w3
w4
w5
We use sigmoid function h~w = σ(~wT~x) where
σ(x) = 1/(1+e−x)
σ′(x) = σ(x)(1−σ(x))
14 / 28
Perceptron
Minimize Mean-square error: J(~w) = 12 ∑(yi −h~w (~xi))
2
Stochastic Gradient Descent: ~w = ~w +η∇J~xi
Gradient of the error function:
∇J =−∑i(yi −h~w (~xi))∇h~w (~xi)
∇h~w (~xi) = h~w (~xi)(1−h~w (~xi))
Weight update rule
~w = ~w +η ∑i(yi −h~w (~xi))h~w (~xi)(1−h~w (~xi))~xi
14 / 28
Perceptron
PERCEPTRON LEARNING(Stream,η)
1 for each class2 do PERCEPTRON LEARNING(Stream,class,η)
PERCEPTRON LEARNING(Stream,class,η)
1 � Let w0 and ~w be randomly initialized2 for each example (~x ,y) in Stream3 do if class = y4 then δ = (1−h~w (~x)) ·h~w (~x) · (1−h~w (~x))5 else δ = (0−h~w (~x)) ·h~w (~x) · (1−h~w (~x))6 ~w = ~w +η ·δ ·~x
PERCEPTRON PREDICTION(~x)
1 return argmaxclass h~wclass(~x)
15 / 28
Hybrid Hoeffding Trees
Hoeffding Naive Bayes TreeTwo learners at leaves: Naive Bayes and Majority Class
Hoeffding Perceptron TreeTwo learners at leaves: Perceptron and Majority Class
Hoeffding Naive Bayes Perceptron TreeThree learners at leaves: Naive Bayes, Perceptron and MajorityClass
16 / 28
Outline
1 RAM-Hours
2 Perceptron Decision Tree Learning
3 Empirical evaluation
17 / 28
What is MOA?
{M}assive {O}nline {A}nalysis is a framework for onlinelearning from data streams.
It is closely related to WEKAIt includes a collection of offline and online methods as wellas tools for evaluation:
boosting and baggingHoeffding Trees
with and without Naı̈ve Bayes classifiers at the leaves.
18 / 28
What is MOA?
Easy to extendEasy to design and run experiments
Philipp Kranen, Hardy Kremer, Timm Jansen, ThomasSeidl, Albert Bifet, Geoff Holmes, Bernhard Pfahringer
RWTH Aachen University, University of WaikatoBenchmarking Stream Clustering Algorithms within theMOA FrameworkKDD 2010 Demo
18 / 28
MOA: the bird
The Moa (another native NZ bird) is not only flightless, like theWeka, but also extinct.
19 / 28
MOA: the bird
The Moa (another native NZ bird) is not only flightless, like theWeka, but also extinct.
19 / 28
MOA: the bird
The Moa (another native NZ bird) is not only flightless, like theWeka, but also extinct.
19 / 28
Concept Drift Framework
t
f (t) f (t)
α
α
t0W
0.5
1
DefinitionGiven two data streams a, b, we define c = a⊕W
t0 b as the datastream built joining the two data streams a and b
Pr[c(t) = b(t)] = 1/(1+ e−4(t−t0)/W ).Pr[c(t) = a(t)] = 1−Pr[c(t) = b(t)]
20 / 28
Concept Drift Framework
t
f (t) f (t)
α
α
t0W
0.5
1
Example
(((a⊕W0t0 b)⊕W1
t1 c)⊕W2t2 d) . . .
(((SEA9⊕Wt0 SEA8)⊕W
2t0 SEA7)⊕W3t0 SEA9.5)
CovPokElec = (CoverType⊕5,000581,012 Poker)⊕5,000
1,000,000 ELEC2
20 / 28
Empirical evaluation
Accuracy
40
45
50
55
60
65
70
75
80
10.000 120.000 230.000 340.000 450.000 560.000 670.000 780.000 890.000 1.000.0
Instances
Ac
cu
rac
y (
%)
htnbp
htnb
htp
ht
Figure: Accuracy on dataset LED with three concept drifts.
21 / 28
Empirical evaluation
RunTime
0
5
10
15
20
25
30
35
10.000 120.000 230.000 340.000 450.000 560.000 670.000 780.000 890.000
Instances
Tim
e (
se
c.) htnbp
htnb
htp
ht
Figure: Time on dataset LED with three concept drifts.
22 / 28
Empirical evaluation
Memory
0
0,5
1
1,5
2
2,5
3
3,5
4
4,5
5
10.000 130.000 250.000 370.000 490.000 610.000 730.000 850.000 970.000
Instances
Me
mo
ry (
Mb
)
htnbp
htnb
htp
ht
Figure: Memory on dataset LED with three concept drifts.
23 / 28
Empirical evaluation
RAM-Hours
0,00E+00
5,00E-06
1,00E-05
1,50E-05
2,00E-05
2,50E-05
3,00E-05
3,50E-05
4,00E-05
4,50E-05
10.000 130.000 250.000 370.000 490.000 610.000 730.000 850.000 970.000
Instances
RA
M-H
ou
rs
htnbp
htnb
htp
ht
Figure: RAM-Hours on dataset LED with three concept drifts.
24 / 28
Empirical evaluation Cover Type Dataset
Accuracy Time Mem RAM-HoursPerceptron 81.68 12.21 0.05 1.00Naı̈ve Bayes 60.52 22.81 0.08 2.99Hoeffding Tree 68.3 13.43 2.59 56.98TreesNaı̈ve Bayes HT 81.06 24.73 2.59 104.92Perceptron HT 83.59 16.53 3.46 93.68NB Perceptron HT 85.77 22.16 3.46 125.59BaggingNaı̈ve Bayes HT 85.73 165.75 0.8 217.20Perceptron HT 86.33 50.06 1.66 136.12NB Perceptron HT 87.88 115.58 1.25 236.65
25 / 28
Empirical evaluation Electricity Dataset
Accuracy Time Mem RAM-HoursPerceptron 79.07 0.53 0.01 1.00Naı̈ve Bayes 73.36 0.55 0.01 1.04Hoeffding Tree 75.35 0.86 0.12 19.47TreesNaı̈ve Bayes HT 80.69 0.96 0.12 21.74Perceptron HT 84.24 0.93 0.21 36.85NB Perceptron HT 84.34 1.07 0.21 42.40BaggingNaı̈ve Bayes HT 84.36 3.17 0.13 77.75Perceptron HT 85.22 2.59 0.44 215.02NB Perceptron HT 86.44 3.55 0.3 200.94
26 / 28
Summary
http://moa.cs.waikato.ac.nz/
SummarySensor Networks
use PerceptronHandheld Computers
use Hoeffding Naive Bayes Perceptron TreeServers
use Bagging Hoeffding Naive Bayes Perceptron Tree
27 / 28
Summary
http://moa.cs.waikato.ac.nz/
ConclusionsRAM-Hours as a new measure of time and memoryHoeffding Perceptron TreeHoeffding Naive Bayes Perceptron Tree
Future WorkAdaptive learning rate for the Perceptron.
28 / 28