efficient data stream classification via probabilistic adaptive windows

23
Efficient Data Stream Classification via Probabilistic Adaptive Windows Albert Bifet 1 , Jesse Read 2 , Bernhard Pfahringer 3 , Geoff Holmes 3 1 Yahoo! Research Barcelona 2 Universidad Carlos III, Madrid, Spain 3 University of Waikato, Hamilton, New Zealand SAC 2013, 19 March 2013

Upload: albert-bifet

Post on 08-May-2015

301 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Efficient Data Stream Classification via Probabilistic Adaptive Windows

Efficient Data Stream Classification viaProbabilistic Adaptive Windows

Albert Bifet1, Jesse Read2,Bernhard Pfahringer3, Geoff Holmes3

1Yahoo! Research Barcelona2Universidad Carlos III, Madrid, Spain

3University of Waikato, Hamilton, New Zealand

SAC 2013, 19 March 2013

Page 2: Efficient Data Stream Classification via Probabilistic Adaptive Windows

Data Streams

Big Data & Real Time

Page 3: Efficient Data Stream Classification via Probabilistic Adaptive Windows

Data Streams

Data StreamsI Sequence is potentially infiniteI High amount of data: sublinear spaceI High speed of arrival: sublinear time per exampleI Once an element from a data stream has been processed

it is discarded or archived

Big Data & Real Time

Page 4: Efficient Data Stream Classification via Probabilistic Adaptive Windows

Data Streams

Approximation algorithms

I Small error rate with high probabilityI An algorithm (ε, δ)−approximates F if it outputs F̃ for which

Pr[|F̃ − F | > εF ] < δ.

Big Data & Real Time

Page 5: Efficient Data Stream Classification via Probabilistic Adaptive Windows

Data Stream Sliding Window

Sampling algorithms

I Giving equal weight to old and new examples: RESERVOIR

SAMPLING

I Giving more weight to recent examples: PROBABILISTIC

APPROXIMATE WINDOW

Big Data & Real Time

Page 6: Efficient Data Stream Classification via Probabilistic Adaptive Windows

8 Bits Counter

1 0 1 0 1 0 1 0

What is the largest number we canstore in 8 bits?

Page 7: Efficient Data Stream Classification via Probabilistic Adaptive Windows

8 Bits Counter

What is the largest number we canstore in 8 bits?

Page 8: Efficient Data Stream Classification via Probabilistic Adaptive Windows

8 Bits Counter

0 20 40 60 80 1000

20

40

60

80

100

x

f (x) = log(1 + x)/ log(2)

f (0) = 0, f (1) = 1

Page 9: Efficient Data Stream Classification via Probabilistic Adaptive Windows

8 Bits Counter

0 2 4 6 8 100

2

4

6

8

10

x

f (x) = log(1 + x)/ log(2)

f (0) = 0, f (1) = 1

Page 10: Efficient Data Stream Classification via Probabilistic Adaptive Windows

8 Bits Counter

0 2 4 6 8 100

2

4

6

8

10

x

f (x) = log(1 + x/30)/ log(1 + 1/30)

f (0) = 0, f (1) = 1

Page 11: Efficient Data Stream Classification via Probabilistic Adaptive Windows

8 Bits Counter

0 20 40 60 80 1000

20

40

60

80

100

x

f (x) = log(1 + x/30)/ log(1 + 1/30)

f (0) = 0, f (1) = 1

Page 12: Efficient Data Stream Classification via Probabilistic Adaptive Windows

8 bits Counter

MORRIS APPROXIMATE COUNTING ALGORITHM

1 Init counter c ← 02 for every event in the stream3 do rand = random number between 0 and 14 if rand < p5 then c ← c + 1

What is the largest number we canstore in 8 bits?

Page 13: Efficient Data Stream Classification via Probabilistic Adaptive Windows

8 bits Counter

MORRIS APPROXIMATE COUNTING ALGORITHM

1 Init counter c ← 02 for every event in the stream3 do rand = random number between 0 and 14 if rand < p5 then c ← c + 1

With p = 1/2 we can store 2× 256with standard deviation σ =

√n/2

Page 14: Efficient Data Stream Classification via Probabilistic Adaptive Windows

8 bits Counter

MORRIS APPROXIMATE COUNTING ALGORITHM

1 Init counter c ← 02 for every event in the stream3 do rand = random number between 0 and 14 if rand < p5 then c ← c + 1

With p = 2−c then E [2c] = n + 2 withvariance σ2 = n(n + 1)/2

Page 15: Efficient Data Stream Classification via Probabilistic Adaptive Windows

8 bits Counter

MORRIS APPROXIMATE COUNTING ALGORITHM

1 Init counter c ← 02 for every event in the stream3 do rand = random number between 0 and 14 if rand < p5 then c ← c + 1

If p = b−c then E [bc] = n(b − 1) + b,σ2 = (b − 1)n(n + 1)/2

Page 16: Efficient Data Stream Classification via Probabilistic Adaptive Windows

PROBABILISTIC APPROXIMATE WINDOW

1 Init window w ← ∅2 for every instance i in the stream3 do store the new instance i in window w4 for every instance j in the window5 do rand = random number between 0 and 16 if rand > b−1

7 then remove instance j from window w

PAW maintains a sample of instancesin logarithmic memory, giving greater

weight to newer instances

Page 17: Efficient Data Stream Classification via Probabilistic Adaptive Windows

Experiments: Methods

Abbr. Classifier Parameters

NB Naive BayesHT Hoeffding TreeHTLB Leveraging Bagging with HT n = 10kNN k Nearest Neighbour w = 1000, k = 10kNNW kNN with PAW w = 1000, k = 10kNNWA kNN with PAW+ADWIN w = 1000, k = 10kNNLB

W Leveraging Bagging with kNNW n = 10

The methods we consider. Leveraging Baggingmethods use n models. kNNWA empties its

window (of max w) when drift is detected (usingthe ADWIN drift detector).

Page 18: Efficient Data Stream Classification via Probabilistic Adaptive Windows

Experimental Evaluation

Table : The window size for kNN and corresponding performance.

Accuracy−w 100 −w 500 −w 1000 −w 5000

Real Avg. 77.88 77.78 79.59 78.23Synth. Avg. 57.99 81.93 84.74 86.03Overall Avg. 62.53 80.28 82.59 83.11

Results

Page 19: Efficient Data Stream Classification via Probabilistic Adaptive Windows

Experimental Evaluation

Table : The window size for kNN and corresponding performance.

Time (seconds)−w 100 −w 500 −w 1000 −w 5000

Real Tot. 297 998 1754 7900Synth. Tot. 371 1297 2313 10671Overall Tot. 668 2295 4067 18570

Results

Page 20: Efficient Data Stream Classification via Probabilistic Adaptive Windows

Experimental Evaluation

Table : The window size for kNN and corresponding performance.

RAM Hours−w 100 −w 500 −w 1000 −w 5000

Real Tot. 0.007 0.082 0.269 5.884Synth. Tot. 0.002 0.026 0.088 1.988Overall Tot. 0.009 0.108 0.357 7.872

Results

Page 21: Efficient Data Stream Classification via Probabilistic Adaptive Windows

Experimental Evaluation

Table : Summary of Efficiency: Accuracy and RAM-Hours.

NB HT HTLB kNN kNNW kNNWA kNNLBW

Accuracy 56.19 73.95 83.75 82.59 82.92 83.19 84.67RAM-Hrs 0.02 1.57 300.02 0.36 8.08 8.80 250.98

Results

Page 22: Efficient Data Stream Classification via Probabilistic Adaptive Windows

Conclusions

Sampling algorithms for kNN

I Giving equal weight to old and new examples: RESERVOIR

SAMPLING

I Giving more weight to recent examples: PROBABILISTIC

APPROXIMATE WINDOW

Big Data & Real Time

Page 23: Efficient Data Stream Classification via Probabilistic Adaptive Windows

Thanks!