data mining using fractals and power laws

100
School of Computer Science Carnegie Mellon Data Mining using Fractals and Power laws Christos Faloutsos Carnegie Mellon University

Upload: kourtney-warren

Post on 31-Dec-2015

25 views

Category:

Documents


3 download

DESCRIPTION

Data Mining using Fractals and Power laws. Christos Faloutsos Carnegie Mellon University. Thank you!. Prof. Hsing-Kuo Kenneth PAO Prof. Yuh-Jye LEE Hsin Yeh. And also thanks to. Lei LI Leman AKOGLU Ian ROLEWICZ. Ching-Hao (Eric) MAO Ming-Kung (Morgan) SUN Yi-Ren (Ian) YEH. Overview. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Data Mining using Fractals and  Power  laws

School of Computer ScienceCarnegie Mellon

Data Mining using Fractals and Power laws

Christos Faloutsos

Carnegie Mellon University

Page 2: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 2

School of Computer ScienceCarnegie Mellon

Thank you!

• Prof. Hsing-Kuo Kenneth PAO

• Prof. Yuh-Jye LEE

• Hsin Yeh

Page 3: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 3

School of Computer ScienceCarnegie Mellon

And also thanks to

Ching-Hao (Eric) MAO

Ming-Kung (Morgan) SUN

Yi-Ren (Ian) YEH

Lei LI

Leman AKOGLU

Ian ROLEWICZ

Page 4: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 4

School of Computer ScienceCarnegie Mellon

Overview

• Goals/ motivation: find patterns in large datasets:– (A) Sensor data– (B) network intrusion data

• Solutions: self-similarity and power laws

• Discussion

Page 5: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 5

School of Computer ScienceCarnegie Mellon

Applications of sensors/streams

• network monitoring

time

# alerts

Page 6: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 6

School of Computer ScienceCarnegie Mellon

Applications of sensors/streams

• Financial, sales, economic series

• Medical: ECGs +; blood pressure etc monitoring

Page 7: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 7

School of Computer ScienceCarnegie Mellon

Motivation - Applications• Scientific data: seismological;

astronomical; environment / anti-pollution; meteorological

Sunspot Data

0

50

100

150

200

250

300

Page 8: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 8

School of Computer ScienceCarnegie Mellon

Motivation - Applications (cont’d)

• Computer systems

– web servers (buffering, prefetching)

– ...

http://repository.cs.vt.edu/lbl-conn-7.tar.Z

Page 9: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 9

School of Computer ScienceCarnegie Mellon

Web traffic

• [Crovella Bestavros, SIGMETRICS’96]

Page 10: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 10

School of Computer ScienceCarnegie Mellon

...

survivable,self-managing storage

infrastructure

...

a storage brick(0.5–5 TB)~1 PB

“self-*” = self-managing, self-tuning, self-healing, …

Self-* Storage (Ganger+)

Page 11: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 11

School of Computer ScienceCarnegie Mellon

...

survivable,self-managing storage

infrastructure

...

a storage brick(0.5–5 TB)~1 PB

“self-*” = self-managing, self-tuning, self-healing, … Goal: 1 petabyte (PB) www.pdl.cmu.edu/SelfStar

Self-* Storage (Ganger+)

Page 12: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 12

School of Computer ScienceCarnegie Mellon

Problem definition

• Given: one or more sequences x1 , x2 , … , xt , …; (y1, y2, … , yt, …)

• Find – patterns; clusters; outliers; forecasts;

Page 13: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 13

School of Computer ScienceCarnegie Mellon

Problem

• Find patterns, in large datasets

time

# bytes

Page 14: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 14

School of Computer ScienceCarnegie Mellon

Problem

• Find patterns, in large datasets

time

# bytes

Poisson indep., ident. distr

Page 15: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 15

School of Computer ScienceCarnegie Mellon

Problem

• Find patterns, in large datasets

time

# bytes

Poisson indep., ident. distr

Page 16: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 16

School of Computer ScienceCarnegie Mellon

Problem

• Find patterns, in large datasets

time

# bytes

Poisson indep., ident. distr

Q: Then, how to generatesuch bursty traffic?

Page 17: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 17

School of Computer ScienceCarnegie Mellon

Solutions

• New tools: power laws, self-similarity and ‘fractals’ work, where traditional assumptions fail

• Let’s see the details:

Page 18: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 18

School of Computer ScienceCarnegie Mellon

Overview

• Goals/ motivation: find patterns in large datasets:– (A) Sensor data– (B) network data

• Solutions: self-similarity and power laws

• Discussion

Page 19: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 19

School of Computer ScienceCarnegie Mellon

What is a fractal?

= self-similar point set, e.g., Sierpinski triangle:

...zero area: (3/4)^inf

infinite length!

(4/3)^inf

Q: What is its dimensionality??

Page 20: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 20

School of Computer ScienceCarnegie Mellon

What is a fractal?

= self-similar point set, e.g., Sierpinski triangle:

...zero area: (3/4)^inf

infinite length!

(4/3)^inf

Q: What is its dimensionality??A: log3 / log2 = 1.58 (!?!)

Page 21: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 21

School of Computer ScienceCarnegie Mellon

Intrinsic (‘fractal’) dimension

• Q: fractal dimension of a line?

• Q: fd of a plane?

Page 22: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 22

School of Computer ScienceCarnegie Mellon

Intrinsic (‘fractal’) dimension

• Q: fractal dimension of a line?

• A: nn ( <= r ) ~ r^1(‘power law’: y=x^a)

• Q: fd of a plane?• A: nn ( <= r ) ~ r^2fd== slope of (log(nn) vs..

log(r) )

Page 23: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 23

School of Computer ScienceCarnegie Mellon

Sierpinsky triangle

log( r )

log(#pairs within <=r )

1.58

== ‘correlation integral’

= CDF of pairwise distances

Page 24: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 24

School of Computer ScienceCarnegie Mellon

Observations: Fractals <-> power laws

Closely related:

• fractals <=>

• self-similarity <=>

• scale-free <=>

• power laws ( y= xa ; F=K r-2)

• (vs y=e-ax or y=xa+b)log( r )

log(#pairs within <=r )

1.58

Page 25: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 25

School of Computer ScienceCarnegie Mellon

Outline

• Problems

• Self-similarity and power laws

• Solutions to posed problems

• Discussion

Page 26: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 26

School of Computer ScienceCarnegie Mellon

time

#bytes

Solution #1: traffic

• disk traces: self-similar: (also: [Leland+94])• How to generate such traffic?

Page 27: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 27

School of Computer ScienceCarnegie Mellon

Solution #1: traffic

• disk traces (80-20 ‘law’) – ‘multifractals’

time

#bytes

20% 80%

Page 28: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 28

School of Computer ScienceCarnegie Mellon

80-20 / multifractals20 80

Page 29: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 29

School of Computer ScienceCarnegie Mellon

80-20 / multifractals20

• p ; (1-p) in general

• yes, there are dependencies

80

Page 30: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 30

School of Computer ScienceCarnegie Mellon

More on 80/20: PQRS

• Part of ‘self-* storage’ project

time

cylinder#

Page 31: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 31

School of Computer ScienceCarnegie Mellon

More on 80/20: PQRS

• Part of ‘self-* storage’ project

p q

r s

q

r s

Page 32: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 32

School of Computer ScienceCarnegie Mellon

Overview

• Goals/ motivation: find patterns in large datasets:– (A) Sensor data

– (B) network data

• Solutions: self-similarity and power laws– sensor/traffic data

– network data

• Discussion

Page 33: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 33

School of Computer ScienceCarnegie Mellon

Problem dfn

<source-ip, target-ip, timestamp, alert-type>

eg., <192.168.2.5; 128.2.220.159; 3am june 6; ICMP-redirect-host>

goal: find patterns / anomalies

Page 34: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 34

School of Computer ScienceCarnegie Mellon

Power laws in intrusion data

rank

count

Page 35: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 35

School of Computer ScienceCarnegie Mellon

Page 36: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 36

School of Computer ScienceCarnegie Mellon

human-like

robot-like

robot-like (bursty)

Q: Can we visuallysummarize / classifyour sequences?

Page 37: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 37

School of Computer ScienceCarnegie Mellon

Answer: yes!

two features:

• F1: how periodic (24h-cycle) is a sequence

• F2: how bursty it is

Q: how to measure burstiness?

A: Fractal dimension!

Page 38: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 38

School of Computer ScienceCarnegie Mellon

Burstiness & f.d.

uniform: fd = 1

@same time-tick: fd = 0

Page 39: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 39

School of Computer ScienceCarnegie Mellon

Burstiness & f.d.

uniform: fd = 1

@same time-tick: fd = 0

bursts within bursts within bursts:0<fd<1

Page 40: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 40

School of Computer ScienceCarnegie Mellon

6. Proposed Methods: The FDP Plot

• Notice: clustering wrt alert types!

Page 41: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 41

School of Computer ScienceCarnegie Mellon

human-like

robot-like

robot-like (bursty)

can we visuallysummarize / classifyour sequences?

Page 42: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 42

School of Computer ScienceCarnegie Mellon

Exampleshuman-like behavior

Page 43: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 43

School of Computer ScienceCarnegie Mellon

Exampleshuman-like behavior

Page 44: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 44

School of Computer ScienceCarnegie Mellon

Exampleshuman-like behavior

Page 45: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 45

School of Computer ScienceCarnegie Mellon

Exampleshuman-like behavior

Page 46: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 46

School of Computer ScienceCarnegie Mellon

Outline

• problems

• Fractals

• Solutions

• Discussion – what else can they solve? – how frequent are fractals?

Page 47: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 47

School of Computer ScienceCarnegie Mellon

What else can they solve?

• separability [KDD’02]• forecasting [CIKM’02]• dimensionality reduction [SBBD’00]• non-linear axis scaling [KDD’02]• disk trace modeling [PEVA’02]• selectivity of spatial/multimedia queries

[PODS’94, VLDB’95, ICDE’00]• ...

Page 48: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 48

School of Computer ScienceCarnegie Mellon

Problem #3 - spatial d.m.

Galaxies (Sloan Digital Sky Survey w/ B. Nichol) - ‘spiral’ and ‘elliptical’

galaxies

- patterns? (not Gaussian; not uniform)

-attraction/repulsion?

- separability??

Page 49: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 49

School of Computer ScienceCarnegie Mellon

Solution#3: spatial d.m.

log(r)

log(#pairs within <=r )

spi-spi

spi-ell

ell-ell

- 1.8 slope

- plateau!

- repulsion!

CORRELATION INTEGRAL!

Page 50: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 50

School of Computer ScienceCarnegie Mellon

Solution#3: spatial d.m.

log(r)

log(#pairs within <=r )

spi-spi

spi-ell

ell-ell

- 1.8 slope

- plateau!

- repulsion!

[w/ Seeger, Traina, Traina, SIGMOD00]

Page 51: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 51

School of Computer ScienceCarnegie Mellon

Solution#3: spatial d.m.

r1r2

r1

r2

Heuristic on choosing # of clusters

Page 52: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 52

School of Computer ScienceCarnegie Mellon

Solution#3: spatial d.m.

log(r)

log(#pairs within <=r )

spi-spi

spi-ell

ell-ell

- 1.8 slope

- plateau!

- repulsion!

Page 53: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 53

School of Computer ScienceCarnegie Mellon

Outline

• problems

• Fractals

• Solutions

• Discussion – what else can they solve? – how frequent are fractals?

Page 54: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 54

School of Computer ScienceCarnegie Mellon

Fractals & power laws:

appear in numerous settings:

• medical

• geographical / geological

• social

• computer-system related

• <and many-many more! see [Mandelbrot]>

Page 55: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 55

School of Computer ScienceCarnegie Mellon

Fractals: Brain scans

• brain-scans

octree levels

Log(#octants)

2.63 = fd

Page 56: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 56

School of Computer ScienceCarnegie Mellon

More fractals

• periphery of malignant tumors: ~1.5

• benign: ~1.3

• [Burdet+]

Page 57: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 57

School of Computer ScienceCarnegie Mellon

More fractals:

• cardiovascular system: 3 (!) lungs: ~2.9

Page 58: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 58

School of Computer ScienceCarnegie Mellon

Fractals & power laws:

appear in numerous settings:

• medical

• geographical / geological

• social

• computer-system related

Page 59: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 59

School of Computer ScienceCarnegie Mellon

More fractals:

• Coastlines: 1.2-1.58

1 1.1

1.3

Page 60: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 60

School of Computer ScienceCarnegie Mellon

Page 61: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 61

School of Computer ScienceCarnegie Mellon

More fractals:

• the fractal dimension for the Amazon river is 1.85 (Nile: 1.4)

[ems.gphys.unc.edu/nonlinear/fractals/examples.html]

Page 62: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 62

School of Computer ScienceCarnegie Mellon

More fractals:

• the fractal dimension for the Amazon river is 1.85 (Nile: 1.4)

[ems.gphys.unc.edu/nonlinear/fractals/examples.html]

Page 63: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 63

School of Computer ScienceCarnegie Mellon

Cross-roads of Montgomery county:

•any rules?

GIS points

Page 64: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 64

School of Computer ScienceCarnegie Mellon

GIS

A: self-similarity:• intrinsic dim. = 1.51

log( r )

log(#pairs(within <= r))

1.51

Page 65: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 65

School of Computer ScienceCarnegie Mellon

Examples:LB county

• Long Beach county of CA (road end-points)

1.7

log(r)

log(#pairs)

Page 66: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 66

School of Computer ScienceCarnegie Mellon

More power laws: areas – Korcak’s law

Scandinavian lakes

Any pattern?

Page 67: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 67

School of Computer ScienceCarnegie Mellon

More power laws: areas – Korcak’s law

Scandinavian lakes area vs complementary cumulative count (log-log axes)

log(count( >= area))

log(area)

Page 68: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 68

School of Computer ScienceCarnegie Mellon

More power laws: Korcak

Japan islands;

area vs cumulative count (log-log axes) log(area)

log(count( >= area))

Page 69: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 69

School of Computer ScienceCarnegie Mellon

More power laws

• Energy of earthquakes (Gutenberg-Richter law) [simscience.org]

log(count)

Magnitude = log(energy)day

Energy released

Page 70: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 70

School of Computer ScienceCarnegie Mellon

Fractals & power laws:

appear in numerous settings:

• medical

• geographical / geological

• social

• computer-system related

Page 71: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 71

School of Computer ScienceCarnegie Mellon

A famous power law: Zipf’s law

• Bible - rank vs. frequency (log-log)

log(rank)

log(freq)

“a”

“the”

“Rank/frequency plot”

Page 72: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 72

School of Computer ScienceCarnegie Mellon

TELCO data

# of service units

count ofcustomers

‘best customer’

Page 73: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 73

School of Computer ScienceCarnegie Mellon

SALES data – store#96

# units sold

count of products

“aspirin”

Page 74: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 74

School of Computer ScienceCarnegie Mellon

Olympic medals (Sidney’00, Athens’04):

log( rank)

log(#medals)

0

0.5

1

1.5

2

2.5

0 0.5 1 1.5 2

athens

sidney

Page 75: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 75

School of Computer ScienceCarnegie Mellon

Olympic medals (Sidney’00, Athens’04):

log( rank)

log(#medals)

0

0.5

1

1.5

2

2.5

0 0.5 1 1.5 2

athens

sidney

Page 76: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 76

School of Computer ScienceCarnegie Mellon

Even more power laws:

• Income distribution (Pareto’s law)• size of firms

• publication counts (Lotka’s law)

Page 77: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 77

School of Computer ScienceCarnegie Mellon

Even more power laws:

library science (Lotka’s law of publication count); and citation counts: (citeseer.nj.nec.com 6/2001)

log(#citations)

log(count)

Ullman

Page 78: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 78

School of Computer ScienceCarnegie Mellon

Even more power laws:

• web hit counts [w/ A. Montgomery]

Web Site Traffic

log(freq)

log(count)

Zipf“yahoo.com”

Page 79: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 79

School of Computer ScienceCarnegie Mellon

Autonomous systems• Power law in the degree distribution [SIGCOMM99]

internet domains

log(rank)

log(degree)

-0.82

att.com

ibm.com

Page 80: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 80

School of Computer ScienceCarnegie Mellon

The Peer-to-Peer Topology

• Number of immediate peers (= degree), follows a power-law

[Jovanovic+]

degree

count

Page 81: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 81

School of Computer ScienceCarnegie Mellon

epinions.com

• who-trusts-whom [Richardson + Domingos, KDD 2001]

(out) degree

count

Page 82: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 82

School of Computer ScienceCarnegie Mellon

Fractals & power laws:

appear in numerous settings:

• medical

• geographical / geological

• social

• computer-system related

Page 83: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 83

School of Computer ScienceCarnegie Mellon

Power laws, cont’d

• In- and out-degree distribution of web sites [Barabasi], [IBM-CLEVER]

log indegree

- log(freq)

from [Ravi Kumar, Prabhakar Raghavan, Sridhar Rajagopalan, Andrew Tomkins ]

Page 84: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 84

School of Computer ScienceCarnegie Mellon

Power laws, cont’d

• In- and out-degree distribution of web sites [Barabasi], [IBM-CLEVER]

log indegree

log(freq)

from [Ravi Kumar, Prabhakar Raghavan, Sridhar Rajagopalan, Andrew Tomkins ]

Page 85: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 85

School of Computer ScienceCarnegie Mellon

Power laws, cont’d

• In- and out-degree distribution of web sites [Barabasi], [IBM-CLEVER]

log indegree

log(freq)

Q: ‘how can we usethese power laws?’

Page 86: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 86

School of Computer ScienceCarnegie Mellon

“Foiled by power law”

• [Broder+, WWW’00]

(log) in-degree

(log) count

Page 87: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 87

School of Computer ScienceCarnegie Mellon

“Foiled by power law”

• [Broder+, WWW’00]

“The anomalous bump at 120on the x-axis is due a large clique formed by a single spammer”

(log) in-degree

(log) count

Page 88: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 88

School of Computer ScienceCarnegie Mellon

Power laws, cont’d

• In- and out-degree distribution of web sites [Barabasi], [IBM-CLEVER]

• length of file transfers [Crovella+Bestavros ‘96]

• duration of UNIX jobs

Page 89: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 89

School of Computer ScienceCarnegie Mellon

Conclusions

• Fascinating problems in Data Mining: find patterns in streams & network data

Page 90: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 90

School of Computer ScienceCarnegie Mellon

Conclusions - cont’d

New tools for Data Mining: self-similarity & power laws: appear in many cases

Bad news:

lead to skewed distributions

(no Gaussian, Poisson,

uniformity, independence,

mean, variance)

Good news:• ‘correlation integral’

for separability• rank/frequency plots• 80-20 (multifractals)• (Hurst exponent, • strange attractors,• renormalization theory, • ++)

Page 91: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 91

School of Computer ScienceCarnegie Mellon

Resources

• Manfred Schroeder “Chaos, Fractals and Power Laws”, 1991

Page 92: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 92

School of Computer ScienceCarnegie Mellon

References

• [vldb95] Alberto Belussi and Christos Faloutsos, Estimating the Selectivity of Spatial Queries Using the `Correlation' Fractal Dimension Proc. of VLDB, p. 299-310, 1995

• [Broder+’00] Andrei Broder, Ravi Kumar , Farzin Maghoul1, Prabhakar Raghavan , Sridhar Rajagopalan , Raymie Stata, Andrew Tomkins , Janet Wiener, Graph structure in the web , WWW’00

• M. Crovella and A. Bestavros, Self similarity in World wide web traffic: Evidence and possible causes , SIGMETRICS ’96.

Page 93: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 93

School of Computer ScienceCarnegie Mellon

References

• J. Considine, F. Li, G. Kollios and J. Byers, Approximate Aggregation Techniques for Sensor Databases (ICDE’04, best paper award).

• [pods94] Christos Faloutsos and Ibrahim Kamel, Beyond Uniformity and Independence: Analysis of R-trees Using the Concept of Fractal Dimension, PODS, Minneapolis, MN, May 24-26, 1994, pp. 4-13

Page 94: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 94

School of Computer ScienceCarnegie Mellon

References

• [vldb96] Christos Faloutsos, Yossi Matias and Avi Silberschatz, Modeling Skewed Distributions Using Multifractals and the `80-20 Law’ Conf. on Very Large Data Bases (VLDB), Bombay, India, Sept. 1996.

• [sigmod2000] Christos Faloutsos, Bernhard Seeger, Agma J. M. Traina and Caetano Traina Jr., Spatial Join Selectivity Using Power Laws, SIGMOD 2000

Page 95: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 95

School of Computer ScienceCarnegie Mellon

References

• [vldb96] Christos Faloutsos and Volker Gaede Analysis of the Z-Ordering Method Using the Hausdorff Fractal Dimension VLD, Bombay, India, Sept. 1996

• [sigcomm99] Michalis Faloutsos, Petros Faloutsos and Christos Faloutsos, What does the Internet look like? Empirical Laws of the Internet Topology, SIGCOMM 1999

Page 96: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 96

School of Computer ScienceCarnegie Mellon

References

• [Leskovec 05] Jure Leskovec, Jon M. Kleinberg, Christos Faloutsos: Graphs over time: densification laws, shrinking diameters and possible explanations. KDD 2005: 177-187

Page 97: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 97

School of Computer ScienceCarnegie Mellon

References

• [ieeeTN94] W. E. Leland, M.S. Taqqu, W. Willinger, D.V. Wilson, On the Self-Similar Nature of Ethernet Traffic, IEEE Transactions on Networking, 2, 1, pp 1-15, Feb. 1994.

• [brite] Alberto Medina, Anukool Lakhina, Ibrahim Matta, and John Byers. BRITE: An Approach to Universal Topology Generation. MASCOTS '01

Page 98: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 98

School of Computer ScienceCarnegie Mellon

References

• [icde99] Guido Proietti and Christos Faloutsos, I/O complexity for range queries on region data stored using an R-tree (ICDE’99)

• Stan Sclaroff, Leonid Taycher and Marco La Cascia , "ImageRover: A content-based image browser for the world wide web" Proc. IEEE Workshop on Content-based Access of Image and Video Libraries, pp 2-9, 1997.

Page 99: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 99

School of Computer ScienceCarnegie Mellon

References

• [kdd2001] Agma J. M. Traina, Caetano Traina Jr., Spiros Papadimitriou and Christos Faloutsos: Tri-plots: Scalable Tools for Multidimensional Data Mining, KDD 2001, San Francisco, CA.

Page 100: Data Mining using Fractals and  Power  laws

ICAST, June'09 C. Faloutsos 100

School of Computer ScienceCarnegie Mellon

Thank you!

Contact info:christos <at> cs.cmu.edu

www. cs.cmu.edu /~christos

(w/ papers, datasets, code for fractal dimension estimation, etc)