school of computer science carnegie mellon thank you! data mining using fractals … ·...

30
MSU, 2005 C. Faloutsos 1 School of Computer Science Carnegie Mellon Data Mining using Fractals and Power laws Christos Faloutsos Carnegie Mellon University MSU, 2005 C. Faloutsos 2 School of Computer Science Carnegie Mellon THANK YOU! Prof. Rong Jin Prof. Pang-Ning Tan MSU, 2005 C. Faloutsos 3 School of Computer Science Carnegie Mellon Overview Goals/ motivation: find patterns in large datasets: – (A) Sensor data – (B) network/graph data Solutions: self-similarity and power laws • Discussion MSU, 2005 C. Faloutsos 4 School of Computer Science Carnegie Mellon Applications of sensors/streams ‘Smart house’: monitoring temperature, humidity etc Financial, sales, economic series 10/21/2005 23:09

Upload: others

Post on 06-Aug-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: School of Computer Science Carnegie Mellon THANK YOU! Data Mining using Fractals … · 2005-10-23 · School of Computer Science Carnegie Mellon Solutions • New tools: power laws,

MSU, 2005 C. Faloutsos 1

School of Computer Science

Carnegie Mellon

Data Mining using Fractals and

Power laws

Christos Faloutsos

Carnegie Mellon University

MSU, 2005 C. Faloutsos 2

School of Computer Science

Carnegie Mellon

THANK YOU!

• Prof. Rong Jin

• Prof. Pang-Ning Tan

MSU, 2005 C. Faloutsos 3

School of Computer Science

Carnegie Mellon

Overview

• Goals/ motivation: find patterns in large

datasets:

– (A) Sensor data

– (B) network/graph data

• Solutions: self-similarity and power laws

• Discussion

MSU, 2005 C. Faloutsos 4

School of Computer Science

Carnegie Mellon

Applications of sensors/streams

• ‘Smart house’: monitoring temperature,

humidity etc

• Financial, sales, economic series

10/21/2005 23:09

Page 2: School of Computer Science Carnegie Mellon THANK YOU! Data Mining using Fractals … · 2005-10-23 · School of Computer Science Carnegie Mellon Solutions • New tools: power laws,

MSU, 2005 C. Faloutsos 5

School of Computer Science

Carnegie Mellon

Motivation - Applications

• Medical: ECGs +; blood

pressure etc monitoring

• Scientific data: seismological;

astronomical; environment /

anti-pollution; meteorologicalSunspot Data

0

50

100

150

200

250

300

MSU, 2005 C. Faloutsos 6

School of Computer Science

Carnegie Mellon

Motivation - Applications

(cont’d)

• civil/automobile infrastructure

– bridge vibrations [Oppenheim+02]

– road conditions / traffic monitoring

Automobile traffic

0200400600800

100012001400160018002000

time

# cars

MSU, 2005 C. Faloutsos 7

School of Computer Science

Carnegie Mellon

Motivation - Applications

(cont’d)

• Computer systems

– web servers (buffering, prefetching)

– network traffic monitoring

– ...

http://repository.cs.vt.edu/lbl-conn-7.tar.Z

MSU, 2005 C. Faloutsos 8

School of Computer Science

Carnegie Mellon

Web traffic

• [Crovella Bestavros, SIGMETRICS’96]

10/21/2005 23:09

Page 3: School of Computer Science Carnegie Mellon THANK YOU! Data Mining using Fractals … · 2005-10-23 · School of Computer Science Carnegie Mellon Solutions • New tools: power laws,

MSU, 2005 C. Faloutsos 9

School of Computer Science

Carnegie Mellon

...

survivable,

self-managing storage

infrastructure

...

a storage brick

(0.5–5 TB)~1 PB

� “self-*” = self-managing, self-tuning, self-healing, …

� Goal: 1 petabyte (PB) for CMU researchers

� www.pdl.cmu.edu/SelfStar

Self-* Storage (Ganger+)

MSU, 2005 C. Faloutsos 10

School of Computer Science

Carnegie Mellon

Problem definition

• Given: one or more sequences

x1 , x2 , … , xt , …; (y1, y2, … , yt, …)

• Find

– patterns; clusters; outliers; forecasts;

MSU, 2005 C. Faloutsos 11

School of Computer Science

Carnegie Mellon

Problem #1

• Find patterns, in large

datasets

time

# bytes

MSU, 2005 C. Faloutsos 12

School of Computer Science

Carnegie Mellon

Problem #1

• Find patterns, in large

datasets

time

# bytes

Poisson

indep.,

ident. distr

10/21/2005 23:09

Page 4: School of Computer Science Carnegie Mellon THANK YOU! Data Mining using Fractals … · 2005-10-23 · School of Computer Science Carnegie Mellon Solutions • New tools: power laws,

MSU, 2005 C. Faloutsos 13

School of Computer Science

Carnegie Mellon

Problem #1

• Find patterns, in large

datasets

time

# bytes

Poisson

indep.,

ident. distr

MSU, 2005 C. Faloutsos 14

School of Computer Science

Carnegie Mellon

Problem #1

• Find patterns, in large

datasets

time

# bytes

Poisson

indep.,

ident. distr

Q: Then, how to generate

such bursty traffic?

MSU, 2005 C. Faloutsos 15

School of Computer Science

Carnegie Mellon

Overview

• Goals/ motivation: find patterns in large datasets:

– (A) Sensor data

– (B) network/graph data

• Solutions: self-similarity and power laws

• Discussion

MSU, 2005 C. Faloutsos 16

School of Computer Science

Carnegie Mellon

Problem #2 - network and

graph mining

• How does the Internet look like?

• How does the web look like?

• What constitutes a ‘normal’ social

network?

• What is the ‘network value’ of a

customer?

• which gene/species affects the others

the most?

10/21/2005 23:09

Page 5: School of Computer Science Carnegie Mellon THANK YOU! Data Mining using Fractals … · 2005-10-23 · School of Computer Science Carnegie Mellon Solutions • New tools: power laws,

MSU, 2005 C. Faloutsos 17

School of Computer Science

Carnegie Mellon

Network and graph mining

Food Web

[Martinez ’91]

Protein Interactions

[genomebiology.com]Friendship Network

[Moody ’01]

Graphs are everywhere!

MSU, 2005 C. Faloutsos 18

School of Computer Science

Carnegie Mellon

Problem#2

Given a graph:

• which node to market-to /

defend / immunize first?

• Are there un-natural sub-

graphs? (eg., criminals’ rings)?

[from Lumeta: ISPs 6/1999]

MSU, 2005 C. Faloutsos 19

School of Computer Science

Carnegie Mellon

Solutions

• New tools: power laws, self-similarity and

‘fractals’ work, where traditional

assumptions fail

• Let’s see the details:

MSU, 2005 C. Faloutsos 20

School of Computer Science

Carnegie Mellon

Overview

• Goals/ motivation: find patterns in large

datasets:

– (A) Sensor data

– (B) network/graph data

• Solutions: self-similarity and power laws

• Discussion

10/21/2005 23:09

Page 6: School of Computer Science Carnegie Mellon THANK YOU! Data Mining using Fractals … · 2005-10-23 · School of Computer Science Carnegie Mellon Solutions • New tools: power laws,

MSU, 2005 C. Faloutsos 21

School of Computer Science

Carnegie Mellon

What is a fractal?

= self-similar point set, e.g., Sierpinski triangle:

...zero area: (3/4)^inf

infinite length!

(4/3)^inf

Q: What is its dimensionality??

MSU, 2005 C. Faloutsos 22

School of Computer Science

Carnegie Mellon

What is a fractal?

= self-similar point set, e.g., Sierpinski triangle:

...zero area: (3/4)^inf

infinite length!

(4/3)^inf

Q: What is its dimensionality??

A: log3 / log2 = 1.58 (!?!)

MSU, 2005 C. Faloutsos 23

School of Computer Science

Carnegie Mellon

Intrinsic (‘fractal’) dimension

• Q: fractal dimension

of a line?

• Q: fd of a plane?

MSU, 2005 C. Faloutsos 24

School of Computer Science

Carnegie Mellon

Intrinsic (‘fractal’) dimension

• Q: fractal dimension of a line?

• A: nn ( <= r ) ~ r^1

(‘power law’: y=x^a)

• Q: fd of a plane?

• A: nn ( <= r ) ~ r^2

fd== slope of (log(nn) vs.. log(r) )

10/21/2005 23:09

Page 7: School of Computer Science Carnegie Mellon THANK YOU! Data Mining using Fractals … · 2005-10-23 · School of Computer Science Carnegie Mellon Solutions • New tools: power laws,

MSU, 2005 C. Faloutsos 25

School of Computer Science

Carnegie Mellon

Sierpinsky triangle

log( r )

log(#pairs

within <=r )

1.58

== ‘correlation integral’

= CDF of pairwise distances

MSU, 2005 C. Faloutsos 26

School of Computer Science

Carnegie Mellon

Observations: Fractals <->

power laws

Closely related:

• fractals <=>

• self-similarity <=>

• scale-free <=>

• power laws ( y= xa ;

F=K r-2)

• (vs y=e-ax or y=xa+b)log( r )

log(#pairs

within <=r )

1.58

MSU, 2005 C. Faloutsos 27

School of Computer Science

Carnegie Mellon

Outline

• Problems

• Self-similarity and power laws

• Solutions to posed problems

• Discussion

MSU, 2005 C. Faloutsos 28

School of Computer Science

Carnegie Mellon

time

#bytes

Solution #1: traffic

• disk traces: self-similar: (also: [Leland+94])

• How to generate such traffic?

10/21/2005 23:09

Page 8: School of Computer Science Carnegie Mellon THANK YOU! Data Mining using Fractals … · 2005-10-23 · School of Computer Science Carnegie Mellon Solutions • New tools: power laws,

MSU, 2005 C. Faloutsos 29

School of Computer Science

Carnegie Mellon

Solution #1: traffic

• disk traces (80-20 ‘law’) – ‘multifractals’

time

#bytes

20% 80%

MSU, 2005 C. Faloutsos 30

School of Computer Science

Carnegie Mellon

80-20 / multifractals20 80

MSU, 2005 C. Faloutsos 31

School of Computer Science

Carnegie Mellon

80-20 / multifractals

20• p ; (1-p) in general

• yes, there are

dependencies

80

MSU, 2005 C. Faloutsos 32

School of Computer Science

Carnegie Mellon

More on 80/20: PQRS

• Part of ‘self-* storage’ project

time

cylinder#

10/21/2005 23:09

Page 9: School of Computer Science Carnegie Mellon THANK YOU! Data Mining using Fractals … · 2005-10-23 · School of Computer Science Carnegie Mellon Solutions • New tools: power laws,

MSU, 2005 C. Faloutsos 33

School of Computer Science

Carnegie Mellon

More on 80/20: PQRS

• Part of ‘self-* storage’ project

p q

r s

q

r s

MSU, 2005 C. Faloutsos 34

School of Computer Science

Carnegie Mellon

Overview

• Goals/ motivation: find patterns in large datasets:

– (A) Sensor data

– (B) network/graph data

• Solutions: self-similarity and power laws

– sensor/traffic data

– network/graph data

• Discussion

MSU, 2005 C. Faloutsos 35

School of Computer Science

Carnegie Mellon

Problem #2 - topology

How does the Internet look like? Any rules?

MSU, 2005 C. Faloutsos 36

School of Computer Science

Carnegie Mellon

Patterns?

• avg degree is, say 3.3

• pick a node at random

– guess its degree,

exactly (-> “mode”)

degree

count

avg: 3.3

10/21/2005 23:09

Page 10: School of Computer Science Carnegie Mellon THANK YOU! Data Mining using Fractals … · 2005-10-23 · School of Computer Science Carnegie Mellon Solutions • New tools: power laws,

MSU, 2005 C. Faloutsos 37

School of Computer Science

Carnegie Mellon

Patterns?

• avg degree is, say 3.3

• pick a node at random

– guess its degree,

exactly (-> “mode”)

• A: 1!!

degree

count

avg: 3.3MSU, 2005 C. Faloutsos 38

School of Computer Science

Carnegie Mellon

Patterns?

• avg degree is, say 3.3

• pick a node at random - what is the degree you expect it to have?

• A: 1!!

• A’: very skewed distr.

• Corollary: the mean is meaningless!

• (and std -> infinity (!))

degree

count

avg: 3.3

MSU, 2005 C. Faloutsos 39

School of Computer Science

Carnegie Mellon

Solution#2: Rank exponent R• A1: Power law in the degree distribution

[SIGCOMM99]

internet domains

log(rank)

log(degree)

-0.82

att.com

ibm.com

MSU, 2005 C. Faloutsos 40

School of Computer Science

Carnegie Mellon

Solution#2’: Eigen Exponent E

• A2: power law in the eigenvalues of the adjacency matrix

E = -0.48

Exponent = slope

Eigenvalue

Rank of decreasing eigenvalue

May 2001

10/21/2005 23:09

Page 11: School of Computer Science Carnegie Mellon THANK YOU! Data Mining using Fractals … · 2005-10-23 · School of Computer Science Carnegie Mellon Solutions • New tools: power laws,

MSU, 2005 C. Faloutsos 41

School of Computer Science

Carnegie Mellon

Power laws - discussion

• do they hold, over time?

• do they hold on other graphs/domains?

MSU, 2005 C. Faloutsos 42

School of Computer Science

Carnegie Mellon

Power laws - discussion

• do they hold, over time?

• Yes! for multiple years [Siganos+]

• do they hold on other graphs/domains?

• Yes!

– web sites and links [Tomkins+], [Barabasi+]

– peer-to-peer graphs (gnutella-style)

– who-trusts-whom (epinions.com)

MSU, 2005 C. Faloutsos 43

School of Computer Science

Carnegie Mellon

Time Evolution: rank R

-1

-0.9

-0.8

-0.7

-0.6

-0.5

0 200 400 600 800

Instances in time: Nov'97 and on

Ra

nk

ex

po

ne

nt

• The rank exponent has not changed!

[Siganos+]

Domain

level

log(rank)

log(degree)

-0.82

att.com

ibm.com

MSU, 2005 C. Faloutsos 44

School of Computer Science

Carnegie Mellon

The Peer-to-Peer Topology

• Number of immediate peers (= degree), follows a

power-law

[Jovanovic+]

degree

count

10/21/2005 23:09

Page 12: School of Computer Science Carnegie Mellon THANK YOU! Data Mining using Fractals … · 2005-10-23 · School of Computer Science Carnegie Mellon Solutions • New tools: power laws,

MSU, 2005 C. Faloutsos 45

School of Computer Science

Carnegie Mellon

epinions.com

• who-trusts-whom

[Richardson +

Domingos, KDD

2001]

(out) degree

count

MSU, 2005 C. Faloutsos 46

School of Computer Science

Carnegie Mellon

Why care about these patterns?

• better graph generators [BRITE, INET]

– for simulations

– extrapolations

• ‘abnormal’ graph and subgraph detection

MSU, 2005 C. Faloutsos 47

School of Computer Science

Carnegie Mellon

Recent discoveries [KDD’05]

• How do graphs evolve?

• degree-exponent seems constant - anything

else?

MSU, 2005 C. Faloutsos 48

School of Computer Science

Carnegie Mellon

Evolution of diameter?

• Prior analysis, on power-law-like graphs,

hints that

diameter ~ O(log(N)) or

diameter ~ O( log(log(N)))

• i.e.., slowly increasing with network size

• Q: What is happening, in reality?

10/21/2005 23:09

Page 13: School of Computer Science Carnegie Mellon THANK YOU! Data Mining using Fractals … · 2005-10-23 · School of Computer Science Carnegie Mellon Solutions • New tools: power laws,

MSU, 2005 C. Faloutsos 49

School of Computer Science

Carnegie Mellon

Evolution of diameter?

• Prior analysis, on power-law-like graphs,

hints that

diameter ~ O(log(N)) or

diameter ~ O( log(log(N)))

• i.e.., slowly increasing with network size

• Q: What is happening, in reality?

• A: It shrinks(!!), towards a constant value

MSU, 2005 C. Faloutsos 50

School of Computer Science

Carnegie Mellon

Shrinking diameter

ArXiv physics papers

and their citations

[Leskovec+05a]

MSU, 2005 C. Faloutsos 51

School of Computer Science

Carnegie Mellon

Shrinking diameter

ArXiv: who wrote what

MSU, 2005 C. Faloutsos 52

School of Computer Science

Carnegie Mellon

Shrinking diameter

U.S. patents citing each

other

10/21/2005 23:09

Page 14: School of Computer Science Carnegie Mellon THANK YOU! Data Mining using Fractals … · 2005-10-23 · School of Computer Science Carnegie Mellon Solutions • New tools: power laws,

MSU, 2005 C. Faloutsos 53

School of Computer Science

Carnegie Mellon

Shrinking diameter

Autonomous systems

MSU, 2005 C. Faloutsos 54

School of Computer Science

Carnegie Mellon

Temporal evolution of graphs

• N(t) nodes; E(t) edges at time t

• suppose that

N(t+1) = 2 * N(t)

• Q: what is your guess for

E(t+1) =? 2 * E(t)

MSU, 2005 C. Faloutsos 55

School of Computer Science

Carnegie Mellon

Temporal evolution of graphs

• N(t) nodes; E(t) edges at time t

• suppose that

N(t+1) = 2 * N(t)

• Q: what is your guess for

E(t+1) =? 2 * E(t)

• A: over-doubled!

x

MSU, 2005 C. Faloutsos 56

School of Computer Science

Carnegie Mellon

Temporal evolution of graphs

• A: over-doubled - but obeying:

E(t) ~ N(t)a for all t

where 1<a<2

a=1: constant avg degree

a=2: ~full clique

• Real graphs densify over time [Leskovec+05]

10/21/2005 23:09

Page 15: School of Computer Science Carnegie Mellon THANK YOU! Data Mining using Fractals … · 2005-10-23 · School of Computer Science Carnegie Mellon Solutions • New tools: power laws,

MSU, 2005 C. Faloutsos 57

School of Computer Science

Carnegie Mellon

Temporal evolution of graphs

• A: over-doubled - but obeying:

E(t) ~ N(t)a for all t

• Identically:

log(E(t)) / log(N(t)) = constant for all t

MSU, 2005 C. Faloutsos 58

School of Computer Science

Carnegie Mellon

Densification Power Law

ArXiv: Physics papers

and their citations

1.69

N(t)

E(t)

MSU, 2005 C. Faloutsos 59

School of Computer Science

Carnegie Mellon

Densification Power Law

U.S. Patents, citing each

other

1.66

N(t)

E(t)

MSU, 2005 C. Faloutsos 60

School of Computer Science

Carnegie Mellon

Densification Power Law

Autonomous Systems

1.18

N(t)

E(t)

10/21/2005 23:09

Page 16: School of Computer Science Carnegie Mellon THANK YOU! Data Mining using Fractals … · 2005-10-23 · School of Computer Science Carnegie Mellon Solutions • New tools: power laws,

MSU, 2005 C. Faloutsos 61

School of Computer Science

Carnegie Mellon

Densification Power Law

ArXiv: who wrote what

1.15

N(t)

E(t)

MSU, 2005 C. Faloutsos 62

School of Computer Science

Carnegie Mellon

Outline

• problems

• Fractals

• Solutions

• Discussion

– what else can they solve?

– how frequent are fractals?

MSU, 2005 C. Faloutsos 63

School of Computer Science

Carnegie Mellon

What else can they solve?

• separability [KDD’02]

• forecasting [CIKM’02]

• dimensionality reduction [SBBD’00]

• non-linear axis scaling [KDD’02]

• disk trace modeling [PEVA’02]

• selectivity of spatial/multimedia queries

[PODS’94, VLDB’95, ICDE’00]

• ...

MSU, 2005 C. Faloutsos 64

School of Computer Science

Carnegie Mellon

Storyboard

• Search results

(ranked)

Collage with maps,

common phrases,

named entities and

dynamic query sliders

• Query (6TB of

data)

Full Content Indexing, Search and Retrieval from Digital Video Archives

www.informedia.cs.cmu.edu

10/21/2005 23:09

Page 17: School of Computer Science Carnegie Mellon THANK YOU! Data Mining using Fractals … · 2005-10-23 · School of Computer Science Carnegie Mellon Solutions • New tools: power laws,

MSU, 2005 C. Faloutsos 65

School of Computer Science

Carnegie Mellon

Index trees

• OMNI trees [ICDE’01, w/ Roberto Filho,

Caetano and Agma Traina]

• Slim-trees [EDBT’00, w/ C+A Traina,

Seeger]

• ...

MSU, 2005 C. Faloutsos 66

School of Computer Science

Carnegie Mellon

What else can they solve?

• separability [KDD’02]

• forecasting [CIKM’02]

• dimensionality reduction [SBBD’00]

• non-linear axis scaling [KDD’02]

• disk trace modeling [PEVA’02]

• selectivity of spatial/multimedia queries

[PODS’94, VLDB’95, ICDE’00]

• ...

MSU, 2005 C. Faloutsos 67

School of Computer Science

Carnegie Mellon

Problem #3 - spatial d.m.

Galaxies (Sloan Digital Sky Survey w/ B.

Nichol) - ‘spiral’ and ‘elliptical’

galaxies

- patterns? (not Gaussian; not

uniform)

-attraction/repulsion?

- separability??

MSU, 2005 C. Faloutsos 68

School of Computer Science

Carnegie Mellon

Solution#3: spatial d.m.

log(r)

log(#pairs within <=r )

spi-spi

spi-ell

ell-ell

- 1.8 slope

- plateau!

- repulsion!

CORRELATION INTEGRAL!

10/21/2005 23:09

Page 18: School of Computer Science Carnegie Mellon THANK YOU! Data Mining using Fractals … · 2005-10-23 · School of Computer Science Carnegie Mellon Solutions • New tools: power laws,

MSU, 2005 C. Faloutsos 69

School of Computer Science

Carnegie Mellon

Solution#3: spatial d.m.

log(r)

log(#pairs within <=r )

spi-spi

spi-ell

ell-ell

- 1.8 slope

- plateau!

- repulsion!

[w/ Seeger, Traina, Traina, SIGMOD00]

MSU, 2005 C. Faloutsos 70

School of Computer Science

Carnegie Mellon

Solution#3: spatial d.m.

r1r2

r1

r2

Heuristic on choosing # of

clusters

MSU, 2005 C. Faloutsos 71

School of Computer Science

Carnegie Mellon

Solution#3: spatial d.m.

log(r)

log(#pairs within <=r )

spi-spi

spi-ell

ell-ell

- 1.8 slope

- plateau!

- repulsion!

MSU, 2005 C. Faloutsos 72

School of Computer Science

Carnegie Mellon

What else can they solve?

• separability [KDD’02]

• forecasting [CIKM’02]

• dimensionality reduction [SBBD’00]

• non-linear axis scaling [KDD’02]

• disk trace modeling [PEVA’02]

• selectivity of spatial/multimedia queries

[PODS’94, VLDB’95, ICDE’00]

• ...

10/21/2005 23:09

Page 19: School of Computer Science Carnegie Mellon THANK YOU! Data Mining using Fractals … · 2005-10-23 · School of Computer Science Carnegie Mellon Solutions • New tools: power laws,

MSU, 2005 C. Faloutsos 73

School of Computer Science

Carnegie Mellon

Problem#4: dim. reduction

• given attributes x1, ... xn

– possibly, non-linearly correlated

• drop the useless ones mpg

cc

MSU, 2005 C. Faloutsos 74

School of Computer Science

Carnegie Mellon

Problem#4: dim. reduction

• given attributes x1, ... xn

– possibly, non-linearly correlated

• drop the useless ones

(Q: why?

A: to avoid the ‘dimensionality curse’)

Solution: keep on dropping attributes, until the f.d. changes! [w/ Traina+, SBBD’00]

mpg

cc

MSU, 2005 C. Faloutsos 75

School of Computer Science

Carnegie Mellon

Outline

• problems

• Fractals

• Solutions

• Discussion

– what else can they solve?

– how frequent are fractals?

MSU, 2005 C. Faloutsos 76

School of Computer Science

Carnegie Mellon

Fractals & power laws:

appear in numerous settings:

• medical

• geographical / geological

• social

• computer-system related

• <and many-many more! see [Mandelbrot]>

10/21/2005 23:09

Page 20: School of Computer Science Carnegie Mellon THANK YOU! Data Mining using Fractals … · 2005-10-23 · School of Computer Science Carnegie Mellon Solutions • New tools: power laws,

MSU, 2005 C. Faloutsos 77

School of Computer Science

Carnegie Mellon

Fractals: Brain scans

• brain-scans

octree levels

Log(#octants)

2.63 =

fd

MSU, 2005 C. Faloutsos 78

School of Computer Science

Carnegie Mellon

fMRI brain scans

• Center for Cognitive Brain Imaging @ CMU

• Tom Mitchell, Marcel Just, ++

fMRI Goal: human brain function

Which voxels are active,

for a given cognitive task?

MSU, 2005 C. Faloutsos 79

School of Computer Science

Carnegie Mellon

More fractals

• periphery of malignant tumors: ~1.5

• benign: ~1.3

• [Burdet+]

MSU, 2005 C. Faloutsos 80

School of Computer Science

Carnegie Mellon

More fractals:

• cardiovascular system: 3 (!) lungs: ~2.9

10/21/2005 23:09

Page 21: School of Computer Science Carnegie Mellon THANK YOU! Data Mining using Fractals … · 2005-10-23 · School of Computer Science Carnegie Mellon Solutions • New tools: power laws,

MSU, 2005 C. Faloutsos 81

School of Computer Science

Carnegie Mellon

Fractals & power laws:

appear in numerous settings:

• medical

• geographical / geological

• social

• computer-system related

MSU, 2005 C. Faloutsos 82

School of Computer Science

Carnegie Mellon

More fractals:

• Coastlines: 1.2-1.58

1 1.1

1.3

MSU, 2005 C. Faloutsos 83

School of Computer Science

Carnegie Mellon

MSU, 2005 C. Faloutsos 84

School of Computer Science

Carnegie Mellon

More fractals:

• the fractal dimension for the Amazon river

is 1.85 (Nile: 1.4)[ems.gphys.unc.edu/nonlinear/fractals/examples.html]

10/21/2005 23:09

Page 22: School of Computer Science Carnegie Mellon THANK YOU! Data Mining using Fractals … · 2005-10-23 · School of Computer Science Carnegie Mellon Solutions • New tools: power laws,

MSU, 2005 C. Faloutsos 85

School of Computer Science

Carnegie Mellon

More fractals:

• the fractal dimension for the Amazon river

is 1.85 (Nile: 1.4)[ems.gphys.unc.edu/nonlinear/fractals/examples.html]

MSU, 2005 C. Faloutsos 86

School of Computer Science

Carnegie Mellon

Cross-roads of

Montgomery county:

•any rules?

GIS points

MSU, 2005 C. Faloutsos 87

School of Computer Science

Carnegie Mellon

GIS

A: self-similarity:

• intrinsic dim. = 1.51

log( r )

log(#pairs(within <= r))

1.51

MSU, 2005 C. Faloutsos 88

School of Computer Science

Carnegie Mellon

Examples:LB county

• Long Beach county of CA (road end-points)

1.7

log(r)

log(#pairs)

10/21/2005 23:09

Page 23: School of Computer Science Carnegie Mellon THANK YOU! Data Mining using Fractals … · 2005-10-23 · School of Computer Science Carnegie Mellon Solutions • New tools: power laws,

MSU, 2005 C. Faloutsos 89

School of Computer Science

Carnegie Mellon

More power laws: areas –

Korcak’s law

Scandinavian lakes

Any pattern?

MSU, 2005 C. Faloutsos 90

School of Computer Science

Carnegie Mellon

More power laws: areas –

Korcak’s law

Scandinavian lakes

area vs

complementary

cumulative count

(log-log axes)

log(count( >= area))

log(area)

MSU, 2005 C. Faloutsos 91

School of Computer Science

Carnegie Mellon

More power laws: Korcak

Japan islands;

area vs cumulative

count (log-log axes) log(area)

log(count( >= area))

MSU, 2005 C. Faloutsos 92

School of Computer Science

Carnegie Mellon

More power laws

• Energy of earthquakes (Gutenberg-Richter

law) [simscience.org]

log(count)

Magnitude = log(energy)day

Energy

released

10/21/2005 23:09

Page 24: School of Computer Science Carnegie Mellon THANK YOU! Data Mining using Fractals … · 2005-10-23 · School of Computer Science Carnegie Mellon Solutions • New tools: power laws,

MSU, 2005 C. Faloutsos 93

School of Computer Science

Carnegie Mellon

Fractals & power laws:

appear in numerous settings:

• medical

• geographical / geological

• social

• computer-system related

MSU, 2005 C. Faloutsos 94

School of Computer Science

Carnegie Mellon

A famous power law: Zipf’s

law

• Bible - rank vs.

frequency (log-log)

log(rank)

log(freq)

“a”

“the”

“Rank/frequency plot”

MSU, 2005 C. Faloutsos 95

School of Computer Science

Carnegie Mellon

TELCO data

# of service units

count of

customers

‘best customer’

MSU, 2005 C. Faloutsos 96

School of Computer Science

Carnegie Mellon

SALES data – store#96

# units sold

count of

products

“aspirin”

10/21/2005 23:09

Page 25: School of Computer Science Carnegie Mellon THANK YOU! Data Mining using Fractals … · 2005-10-23 · School of Computer Science Carnegie Mellon Solutions • New tools: power laws,

MSU, 2005 C. Faloutsos 97

School of Computer Science

Carnegie Mellon

Olympic medals (Sidney’00,

Athens’04):

log( rank)

log(#medals)

0

0.5

1

1.5

2

2.5

0 0.5 1 1.5 2

athens

sidney

MSU, 2005 C. Faloutsos 98

School of Computer Science

Carnegie Mellon

Olympic medals (Sidney’00,

Athens’04):

log( rank)

log(#medals)

0

0.5

1

1.5

2

2.5

0 0.5 1 1.5 2

athens

sidney

MSU, 2005 C. Faloutsos 99

School of Computer Science

Carnegie Mellon

Even more power laws:

• Income distribution (Pareto’s law)

• size of firms

• publication counts (Lotka’s law)

MSU, 2005 C. Faloutsos 100

School of Computer Science

Carnegie Mellon

Even more power laws:

library science (Lotka’s law of publication

count); and citation counts:

(citeseer.nj.nec.com 6/2001)

log(#citations)

log(count)

Ullman

10/21/2005 23:09

Page 26: School of Computer Science Carnegie Mellon THANK YOU! Data Mining using Fractals … · 2005-10-23 · School of Computer Science Carnegie Mellon Solutions • New tools: power laws,

MSU, 2005 C. Faloutsos 101

School of Computer Science

Carnegie Mellon

Even more power laws:

• web hit counts [w/ A. Montgomery]

Web Site Traffic

log(freq)

log(count)

Zipf

“yahoo.com”

MSU, 2005 C. Faloutsos 102

School of Computer Science

Carnegie Mellon

Fractals & power laws:

appear in numerous settings:

• medical

• geographical / geological

• social

• computer-system related

MSU, 2005 C. Faloutsos 103

School of Computer Science

Carnegie Mellon

Power laws, cont’d

• In- and out-degree distribution of web sites

[Barabasi], [IBM-CLEVER]

log indegree

- log(freq)

from [Ravi Kumar,

Prabhakar Raghavan,

Sridhar Rajagopalan,

Andrew Tomkins ]

MSU, 2005 C. Faloutsos 104

School of Computer Science

Carnegie Mellon

Power laws, cont’d

• In- and out-degree distribution of web sites

[Barabasi], [IBM-CLEVER]

log indegree

log(freq)

from [Ravi Kumar,

Prabhakar Raghavan,

Sridhar Rajagopalan,

Andrew Tomkins ]

10/21/2005 23:09

Page 27: School of Computer Science Carnegie Mellon THANK YOU! Data Mining using Fractals … · 2005-10-23 · School of Computer Science Carnegie Mellon Solutions • New tools: power laws,

MSU, 2005 C. Faloutsos 105

School of Computer Science

Carnegie Mellon

Power laws, cont’d

• In- and out-degree distribution of web sites

[Barabasi], [IBM-CLEVER]

• length of file transfers [Crovella+Bestavros

‘96]

• duration of UNIX jobs

MSU, 2005 C. Faloutsos 106

School of Computer Science

Carnegie Mellon

Conclusions

• Fascinating problems in Data Mining: find

patterns in

– sensors/streams

– graphs/networks

MSU, 2005 C. Faloutsos 108

School of Computer Science

Carnegie Mellon

Resources

• Manfred Schroeder “Chaos, Fractals and

Power Laws”, 1991

10/21/2005 23:09

Page 28: School of Computer Science Carnegie Mellon THANK YOU! Data Mining using Fractals … · 2005-10-23 · School of Computer Science Carnegie Mellon Solutions • New tools: power laws,

MSU, 2005 C. Faloutsos 109

School of Computer Science

Carnegie Mellon

References

• [vldb95] Alberto Belussi and Christos Faloutsos,

Estimating the Selectivity of Spatial Queries Using

the `Correlation' Fractal Dimension Proc. of

VLDB, p. 299-310, 1995

• M. Crovella and A. Bestavros, Self similarity in

World wide web traffic: Evidence and possible

causes , SIGMETRICS ’96.

MSU, 2005 C. Faloutsos 110

School of Computer Science

Carnegie Mellon

References

• J. Considine, F. Li, G. Kollios and J. Byers,

Approximate Aggregation Techniques for Sensor

Databases (ICDE’04, best paper award).

• [pods94] Christos Faloutsos and Ibrahim Kamel,

Beyond Uniformity and Independence: Analysis of

R-trees Using the Concept of Fractal Dimension,

PODS, Minneapolis, MN, May 24-26, 1994, pp.

4-13

MSU, 2005 C. Faloutsos 111

School of Computer Science

Carnegie Mellon

References

• [vldb96] Christos Faloutsos, Yossi Matias and Avi

Silberschatz, Modeling Skewed Distributions

Using Multifractals and the `80-20 Law’ Conf. on

Very Large Data Bases (VLDB), Bombay, India,

Sept. 1996.

• [sigmod2000] Christos Faloutsos, Bernhard

Seeger, Agma J. M. Traina and Caetano Traina Jr.,

Spatial Join Selectivity Using Power Laws,

SIGMOD 2000

MSU, 2005 C. Faloutsos 112

School of Computer Science

Carnegie Mellon

References

• [vldb96] Christos Faloutsos and Volker Gaede

Analysis of the Z-Ordering Method Using the

Hausdorff Fractal Dimension VLD, Bombay,

India, Sept. 1996

• [sigcomm99] Michalis Faloutsos, Petros Faloutsos

and Christos Faloutsos, What does the Internet

look like? Empirical Laws of the Internet

Topology, SIGCOMM 1999

10/21/2005 23:09

Page 29: School of Computer Science Carnegie Mellon THANK YOU! Data Mining using Fractals … · 2005-10-23 · School of Computer Science Carnegie Mellon Solutions • New tools: power laws,

MSU, 2005 C. Faloutsos 113

School of Computer Science

Carnegie Mellon

References

• [Leskovec 05] Jure Leskovec, Jon M. Kleinberg,

Christos Faloutsos: Graphs over time:

densification laws, shrinking diameters and

possible explanations. KDD 2005: 177-187

MSU, 2005 C. Faloutsos 114

School of Computer Science

Carnegie Mellon

References

• [ieeeTN94] W. E. Leland, M.S. Taqqu, W.

Willinger, D.V. Wilson, On the Self-Similar

Nature of Ethernet Traffic, IEEE Transactions on

Networking, 2, 1, pp 1-15, Feb. 1994.

• [brite] Alberto Medina, Anukool Lakhina, Ibrahim

Matta, and John Byers. BRITE: An Approach to

Universal Topology Generation. MASCOTS '01

MSU, 2005 C. Faloutsos 115

School of Computer Science

Carnegie Mellon

References

• [icde99] Guido Proietti and Christos Faloutsos,

I/O complexity for range queries on region data

stored using an R-tree (ICDE’99)

• Stan Sclaroff, Leonid Taycher and Marco La

Cascia , "ImageRover: A content-based image

browser for the world wide web" Proc. IEEE

Workshop on Content-based Access of Image and

Video Libraries, pp 2-9, 1997.

MSU, 2005 C. Faloutsos 116

School of Computer Science

Carnegie Mellon

References

• [kdd2001] Agma J. M. Traina, Caetano Traina Jr.,

Spiros Papadimitriou and Christos Faloutsos: Tri-

plots: Scalable Tools for Multidimensional Data

Mining, KDD 2001, San Francisco, CA.

10/21/2005 23:09

Page 30: School of Computer Science Carnegie Mellon THANK YOU! Data Mining using Fractals … · 2005-10-23 · School of Computer Science Carnegie Mellon Solutions • New tools: power laws,

MSU, 2005 C. Faloutsos 117

School of Computer Science

Carnegie Mellon

Thank you!

Contact info:

christos <at> cs.cmu.edu

www. cs.cmu.edu /~christos

(w/ papers, datasets, code for fractal dimension

estimation, etc)

10/21/2005 23:09