· pocs|@pocsvox power-lawsize distributions ourintuition definition examples wildvs.mild ccdfs...

160
PoCS | @pocsvox Power-Law Size Distributions Our Intuition Definition Examples Wild vs. Mild CCDFs Zipf’s law Zipf CCDF Appendix References . . . . . . . . . . . . . 1 of 52 Power-Law Size Distributions Principles of Complex Systems | @pocsvox CSYS/MATH 300, Fall, 2015 | #FallPoCS2015 Prof. Peter Dodds | @peterdodds Dept. of Mathematics & Statistics | Vermont Complex Systems Center Vermont Advanced Computing Core | University of Vermont What's the Story? Principles of Complex Systems @pocsvox PoCS Licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 License.

Upload: others

Post on 14-Oct-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................1 of 52

Power-Law Size DistributionsPrinciples of Complex Systems | @pocsvoxCSYS/MATH 300, Fall, 2015 | #FallPoCS2015

Prof. Peter Dodds | @peterdodds

Dept. of Mathematics & Statistics | Vermont Complex Systems CenterVermont Advanced Computing Core | University of Vermont

What's the Story?

Principles ofComplex Systems

@pocsvox

PoCS

Licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 License.

Page 2:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................2 of 52

These slides are brought to you by:

Page 3:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................3 of 52

Outline

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

Page 4:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................4 of 52

The deal:

Page 5:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................4 of 52

The deal:

Page 6:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................4 of 52

The deal:

Page 7:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................4 of 52

The deal:

Page 8:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................5 of 52

.Two of the many things we struggle withcognitively:..

.

1. Probability. Ex. The Monty Hall Problem. Ex. Daughter/Son born on Tuesday.(see next two slides; Wikipedia entry here.)

2. Logarithmic scales.

.On counting and logarithms:..

.

Listen to Radiolab’s 2009 piece:“Numbers.”. Later: Benford’s Law.

.

.Also to be enjoyed: the magnificence of theDunning-Kruger effect

Page 9:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................5 of 52

.Two of the many things we struggle withcognitively:..

.

1. Probability. Ex. The Monty Hall Problem. Ex. Daughter/Son born on Tuesday.(see next two slides; Wikipedia entry here.)

2. Logarithmic scales.

.On counting and logarithms:..

.

Listen to Radiolab’s 2009 piece:“Numbers.”. Later: Benford’s Law.

.

.Also to be enjoyed: the magnificence of theDunning-Kruger effect

Page 10:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................5 of 52

.Two of the many things we struggle withcognitively:..

.

1. Probability. Ex. The Monty Hall Problem. Ex. Daughter/Son born on Tuesday.(see next two slides; Wikipedia entry here.)

2. Logarithmic scales.

.On counting and logarithms:..

.

Listen to Radiolab’s 2009 piece:“Numbers.”. Later: Benford’s Law.

.

.Also to be enjoyed: the magnificence of theDunning-Kruger effect

Page 11:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................6 of 52

Homo probabilisticus?.The set up:..

. A parent has two children.

.Simple probability question:..

.

What is the probability that both children are girls?

1/4...

.The next set up:..

.

A parent has two children. We know one of them is a girl.

.The next probabilistic poser:..

.

What is the probability that both children are girls?

1/3...

Page 12:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................6 of 52

Homo probabilisticus?.The set up:..

. A parent has two children.

.Simple probability question:..

.

What is the probability that both children are girls?

1/4...

.The next set up:..

.

A parent has two children. We know one of them is a girl.

.The next probabilistic poser:..

.

What is the probability that both children are girls?

1/3...

Page 13:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................6 of 52

Homo probabilisticus?.The set up:..

. A parent has two children.

.Simple probability question:..

.

What is the probability that both children are girls?

1/4...

.The next set up:..

.

A parent has two children. We know one of them is a girl.

.The next probabilistic poser:..

.

What is the probability that both children are girls?

1/3...

Page 14:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................6 of 52

Homo probabilisticus?.The set up:..

. A parent has two children.

.Simple probability question:..

.

What is the probability that both children are girls?

1/4...

.The next set up:..

.

A parent has two children. We know one of them is a girl.

.The next probabilistic poser:..

.

What is the probability that both children are girls?

1/3...

Page 15:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................6 of 52

Homo probabilisticus?.The set up:..

. A parent has two children.

.Simple probability question:..

.

What is the probability that both children are girls?

1/4...

.The next set up:..

.

A parent has two children. We know one of them is a girl.

.The next probabilistic poser:..

.

What is the probability that both children are girls?

1/3...

Page 16:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................6 of 52

Homo probabilisticus?.The set up:..

. A parent has two children.

.Simple probability question:..

.

What is the probability that both children are girls?

1/4...

.The next set up:..

.

A parent has two children. We know one of them is a girl.

.The next probabilistic poser:..

.

What is the probability that both children are girls?

1/3...

Page 17:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................6 of 52

Homo probabilisticus?.The set up:..

. A parent has two children.

.Simple probability question:..

.

What is the probability that both children are girls?

1/4...

.The next set up:..

.

A parent has two children. We know one of them is a girl.

.The next probabilistic poser:..

.

What is the probability that both children are girls?

1/3...

Page 18:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................6 of 52

Homo probabilisticus?.The set up:..

. A parent has two children.

.Simple probability question:..

.

What is the probability that both children are girls? 1/4...

.The next set up:..

.

A parent has two children. We know one of them is a girl.

.The next probabilistic poser:..

.

What is the probability that both children are girls?

1/3...

Page 19:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................6 of 52

Homo probabilisticus?.The set up:..

. A parent has two children.

.Simple probability question:..

.

What is the probability that both children are girls? 1/4...

.The next set up:..

.

A parent has two children. We know one of them is a girl.

.The next probabilistic poser:..

.

What is the probability that both children are girls? 1/3...

Page 20:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................7 of 52

.Try this one:..

.

A parent has two children. We know one of them is a girl born on a Tuesday.

.Simple question #3:..

.

What is the probability that both children are girls?

?

.Last:..

.

A parent has two children. We know one of them is a girl born on December31.

.And …..

.

What is the probability that both children are girls?

?

Page 21:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................7 of 52

.Try this one:..

.

A parent has two children. We know one of them is a girl born on a Tuesday.

.Simple question #3:..

.

What is the probability that both children are girls?

?

.Last:..

.

A parent has two children. We know one of them is a girl born on December31.

.And …..

.

What is the probability that both children are girls?

?

Page 22:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................7 of 52

.Try this one:..

.

A parent has two children. We know one of them is a girl born on a Tuesday.

.Simple question #3:..

.

What is the probability that both children are girls?

?

.Last:..

.

A parent has two children. We know one of them is a girl born on December31.

.And …..

.

What is the probability that both children are girls?

?

Page 23:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................7 of 52

.Try this one:..

.

A parent has two children. We know one of them is a girl born on a Tuesday.

.Simple question #3:..

.

What is the probability that both children are girls?

?

.Last:..

.

A parent has two children. We know one of them is a girl born on December31.

.And …..

.

What is the probability that both children are girls?

?

Page 24:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................7 of 52

.Try this one:..

.

A parent has two children. We know one of them is a girl born on a Tuesday.

.Simple question #3:..

.

What is the probability that both children are girls?

?

.Last:..

.

A parent has two children. We know one of them is a girl born on December31.

.And …..

.

What is the probability that both children are girls?

?

Page 25:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................7 of 52

.Try this one:..

.

A parent has two children. We know one of them is a girl born on a Tuesday.

.Simple question #3:..

.

What is the probability that both children are girls?

?

.Last:..

.

A parent has two children. We know one of them is a girl born on December31.

.And …..

.

What is the probability that both children are girls?

?

Page 26:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................7 of 52

.Try this one:..

.

A parent has two children. We know one of them is a girl born on a Tuesday.

.Simple question #3:..

.

What is the probability that both children are girls?

?

.Last:..

.

A parent has two children. We know one of them is a girl born on December31.

.And …..

.

What is the probability that both children are girls?

?

Page 27:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................7 of 52

.Try this one:..

.

A parent has two children. We know one of them is a girl born on a Tuesday.

.Simple question #3:..

.

What is the probability that both children are girls? ?

.Last:..

.

A parent has two children. We know one of them is a girl born on December31.

.And …..

.

What is the probability that both children are girls?

?

Page 28:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................7 of 52

.Try this one:..

.

A parent has two children. We know one of them is a girl born on a Tuesday.

.Simple question #3:..

.

What is the probability that both children are girls? ?

.Last:..

.

A parent has two children. We know one of them is a girl born on December31.

.And …..

.

What is the probability that both children are girls? ?

Page 29:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................8 of 52

Let’s test our collective intuition:.

.

Money≡Belief

.Two questions about wealth distribution in theUnited States:..

.

1. Please estimate the percentage of all wealthowned by individuals when grouped into quintiles.

2. Please estimate what you believe each quintileshould own, ideally.

3. Extremes: 100, 0, 0, 0, 0 and 20, 20, 20, 20, 20

Page 30:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................8 of 52

Let’s test our collective intuition:.

.

Money≡Belief

.Two questions about wealth distribution in theUnited States:..

.

1. Please estimate the percentage of all wealthowned by individuals when grouped into quintiles.

2. Please estimate what you believe each quintileshould own, ideally.

3. Extremes: 100, 0, 0, 0, 0 and 20, 20, 20, 20, 20

Page 31:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................8 of 52

Let’s test our collective intuition:.

.

Money≡Belief

.Two questions about wealth distribution in theUnited States:..

.

1. Please estimate the percentage of all wealthowned by individuals when grouped into quintiles.

2. Please estimate what you believe each quintileshould own, ideally.

3. Extremes: 100, 0, 0, 0, 0 and 20, 20, 20, 20, 20

Page 32:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................8 of 52

Let’s test our collective intuition:.

.

Money≡Belief

.Two questions about wealth distribution in theUnited States:..

.

1. Please estimate the percentage of all wealthowned by individuals when grouped into quintiles.

2. Please estimate what you believe each quintileshould own, ideally.

3. Extremes: 100, 0, 0, 0, 0 and 20, 20, 20, 20, 20

Page 33:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................8 of 52

Let’s test our collective intuition:.

.

Money≡Belief

.Two questions about wealth distribution in theUnited States:..

.

1. Please estimate the percentage of all wealthowned by individuals when grouped into quintiles.

2. Please estimate what you believe each quintileshould own, ideally.

3. Extremes: 100, 0, 0, 0, 0 and 20, 20, 20, 20, 20

Page 34:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................9 of 52

.Wealth distribution in the United States: [11]..

.“Building a better America—One wealth quintile at a time”Norton and Ariely, 2011. [11]

Page 35:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................10 of 52

.Wealth distribution in the United States: [11]..

.A highly watched video based on this research is here.

Page 36:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................11 of 52

.

.

The sizes of many systems’ elements appear to obey aninverse power-law size distribution:(size = ) ∼ −

where 0 < min < < max and > 1..

.

min = lower cutoff, max = upper cutoff Negative linear relationship in log-log space:

log10 () = log10 − log10 We use base 10 because we are good people.

.

.

power-law decays in probability:The Statistics of Surprise.

Page 37:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................11 of 52

.

.

The sizes of many systems’ elements appear to obey aninverse power-law size distribution:(size = ) ∼ −

where 0 < min < < max and > 1..

.

min = lower cutoff, max = upper cutoff Negative linear relationship in log-log space:

log10 () = log10 − log10 We use base 10 because we are good people.

.

.

power-law decays in probability:The Statistics of Surprise.

Page 38:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................11 of 52

.

.

The sizes of many systems’ elements appear to obey aninverse power-law size distribution:(size = ) ∼ −

where 0 < min < < max and > 1..

.

min = lower cutoff, max = upper cutoff Negative linear relationship in log-log space:

log10 () = log10 − log10 We use base 10 because we are good people.

.

.

power-law decays in probability:The Statistics of Surprise.

Page 39:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................11 of 52

.

.

The sizes of many systems’ elements appear to obey aninverse power-law size distribution:(size = ) ∼ −

where 0 < min < < max and > 1..

.

min = lower cutoff, max = upper cutoff Negative linear relationship in log-log space:

log10 () = log10 − log10 We use base 10 because we are good people.

.

.

power-law decays in probability:The Statistics of Surprise.

Page 40:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................11 of 52

.

.

The sizes of many systems’ elements appear to obey aninverse power-law size distribution:(size = ) ∼ −

where 0 < min < < max and > 1..

.

min = lower cutoff, max = upper cutoff Negative linear relationship in log-log space:

log10 () = log10 − log10 We use base 10 because we are good people.

.

.

power-law decays in probability:The Statistics of Surprise.

Page 41:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................12 of 52

Size distributions:

.Usually, only the tail of the distribution obeys apower law:..

.

() ∼ − for large. Still use term ‘power-law size distribution.’ Other terms: Fat-tailed distributions. Heavy-tailed distributions.

.Beware:..

.

Inverse power laws aren’t the only ones:lognormals, Weibull distributions, …

Page 42:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................12 of 52

Size distributions:

.Usually, only the tail of the distribution obeys apower law:..

.

() ∼ − for large. Still use term ‘power-law size distribution.’ Other terms: Fat-tailed distributions. Heavy-tailed distributions.

.Beware:..

.

Inverse power laws aren’t the only ones:lognormals, Weibull distributions, …

Page 43:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................12 of 52

Size distributions:

.Usually, only the tail of the distribution obeys apower law:..

.

() ∼ − for large. Still use term ‘power-law size distribution.’ Other terms: Fat-tailed distributions. Heavy-tailed distributions.

.Beware:..

.

Inverse power laws aren’t the only ones:lognormals, Weibull distributions, …

Page 44:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................12 of 52

Size distributions:

.Usually, only the tail of the distribution obeys apower law:..

.

() ∼ − for large. Still use term ‘power-law size distribution.’ Other terms: Fat-tailed distributions. Heavy-tailed distributions.

.Beware:..

.

Inverse power laws aren’t the only ones:lognormals, Weibull distributions, …

Page 45:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................13 of 52

Size distributions:

.Many systems have discrete sizes :..

.

Word frequency Node degree in networks: # friends, # hyperlinks,etc. # citations for articles, court decisions, etc.

.

.

() ∼ −where min ≤ ≤ max

Obvious fail for = 0. Again, typically a description of distribution’s tail.

Page 46:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................13 of 52

Size distributions:

.Many systems have discrete sizes :..

.

Word frequency Node degree in networks: # friends, # hyperlinks,etc. # citations for articles, court decisions, etc.

.

.

() ∼ −where min ≤ ≤ max

Obvious fail for = 0. Again, typically a description of distribution’s tail.

Page 47:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................13 of 52

Size distributions:

.Many systems have discrete sizes :..

.

Word frequency Node degree in networks: # friends, # hyperlinks,etc. # citations for articles, court decisions, etc.

.

.

() ∼ −where min ≤ ≤ max

Obvious fail for = 0. Again, typically a description of distribution’s tail.

Page 48:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................13 of 52

Size distributions:

.Many systems have discrete sizes :..

.

Word frequency Node degree in networks: # friends, # hyperlinks,etc. # citations for articles, court decisions, etc.

.

.

() ∼ −where min ≤ ≤ max

Obvious fail for = 0. Again, typically a description of distribution’s tail.

Page 49:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................14 of 52

The statistics of surprise—words:.Brown Corpus (∼ 06 words):..

.

rank word % q1. the 6.88722. of 3.58393. and 2.84014. to 2.57445. a 2.29966. in 2.10107. that 1.04288. is 0.99439. was 0.966110. he 0.939211. for 0.934012. it 0.862313. with 0.717614. as 0.713715. his 0.6886

rank word % q1945. apply 0.00551946. vital 0.00551947. September 0.00551948. review 0.00551949. wage 0.00551950. motor 0.00551951. fifteen 0.00551952. regarded 0.00551953. draw 0.00551954. wheel 0.00551955. organized 0.00551956. vision 0.00551957. wild 0.00551958. Palmer 0.00551959. intensity 0.0055

Page 50:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................15 of 52

Jonathan Harris’s Wordcount:.A word frequency distribution explorer:..

.

Page 51:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................16 of 52

The statistics of surprise—words:.First—a Gaussian example:..

.

()d = √ −(−)2/22dlinear:

0 5 10 15 200

0.1

0.2

0.3

0.4

x

P(x

)

log-log

−3 −2 −1 0 1 2−25

−20

−15

−10

−5

0

log10

x

log 1

0P

(x)

mean = 0, variance 2 = 1.

.

. Activity: Sketch () ∼ −1 for = to = 07.

Page 52:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................17 of 52

The statistics of surprise—words:

.Raw ‘probability’ (binned) for Brown Corpus:..

.

linear:

0 1 2 3 4 5 6 70

200

400

600

800

1000

1200

q

Nq

= frequency of occurrence of word expressed as apercentage. = number of distinct words that have a frequencyof occurrence .

Page 53:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................17 of 52

The statistics of surprise—words:

.Raw ‘probability’ (binned) for Brown Corpus:..

.

linear:

0 1 2 3 4 5 6 70

200

400

600

800

1000

1200

q

Nq

log-log

−2.5 −2 −1.5 −1 −0.5 0 0.5 10

0.5

1

1.5

2

2.5

3

3.5

log10

q

log

10 N

q

= frequency of occurrence of word expressed as apercentage. = number of distinct words that have a frequencyof occurrence .

Page 54:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................18 of 52

The statistics of surprise—words:

.Complementary Cumulative ProbabilityDistribution >:..

.

linear:

0 1 2 3 4 5 6 70

500

1000

1500

2000

2500

q

N>

q

log-log

−2.5 −2 −1.5 −1 −0.5 0 0.5 10

0.5

1

1.5

2

2.5

3

3.5

log10

q

log

10 N

> q

Also known as the ‘Exceedance Probability.’

Page 55:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................19 of 52

My, what big words you have...

.

.

Test capitalizes on word frequency following aheavily skewed frequency distribution with adecaying power-law tail.

Page 56:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................20 of 52

The statistics of surprise:.Gutenberg-Richter law..

.

Log-log plot Base 10 Slope = -1( > ) ∝ −1.

.

From both the very awkwardly similar Christensenet al. and Bak et al.:“Unified scaling law for earthquakes” [3, 1]

Page 57:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................21 of 52

The statistics of surprise:.From: “Quake Moves Japan Closer to U.S. andAlters Earth’s Spin” by Kenneth Chang, March13, 2011, NYT:..

.

‘What is perhaps most surprising about the Japanearthquake is how misleading history can be. In thepast 300 years, no earthquake nearly thatlarge—nothing larger than magnitude eight—hadstruck in the Japan subduction zone. That, in turn, ledto assumptions about how large a tsunami mightstrike the coast.’.

.

“‘It did them a giant disservice,” said Dr. Stein of thegeological survey. That is not the first time that theearthquake potential of a fault has beenunderestimated. Most geophysicists did not think theSumatra fault could generate a magnitude 9.1earthquake, …’

Page 58:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................21 of 52

The statistics of surprise:.From: “Quake Moves Japan Closer to U.S. andAlters Earth’s Spin” by Kenneth Chang, March13, 2011, NYT:..

.

‘What is perhaps most surprising about the Japanearthquake is how misleading history can be. In thepast 300 years, no earthquake nearly thatlarge—nothing larger than magnitude eight—hadstruck in the Japan subduction zone. That, in turn, ledto assumptions about how large a tsunami mightstrike the coast.’.

.

“‘It did them a giant disservice,” said Dr. Stein of thegeological survey. That is not the first time that theearthquake potential of a fault has beenunderestimated. Most geophysicists did not think theSumatra fault could generate a magnitude 9.1earthquake, …’

Page 59:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................21 of 52

The statistics of surprise:.From: “Quake Moves Japan Closer to U.S. andAlters Earth’s Spin” by Kenneth Chang, March13, 2011, NYT:..

.

‘What is perhaps most surprising about the Japanearthquake is how misleading history can be. In thepast 300 years, no earthquake nearly thatlarge—nothing larger than magnitude eight—hadstruck in the Japan subduction zone. That, in turn, ledto assumptions about how large a tsunami mightstrike the coast.’.

.

“‘It did them a giant disservice,” said Dr. Stein of thegeological survey. That is not the first time that theearthquake potential of a fault has beenunderestimated. Most geophysicists did not think theSumatra fault could generate a magnitude 9.1earthquake, …’

Page 60:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................21 of 52

The statistics of surprise:.From: “Quake Moves Japan Closer to U.S. andAlters Earth’s Spin” by Kenneth Chang, March13, 2011, NYT:..

.

‘What is perhaps most surprising about the Japanearthquake is how misleading history can be. In thepast 300 years, no earthquake nearly thatlarge—nothing larger than magnitude eight—hadstruck in the Japan subduction zone. That, in turn, ledto assumptions about how large a tsunami mightstrike the coast.’.

.

“‘It did them a giant disservice,” said Dr. Stein of thegeological survey. That is not the first time that theearthquake potential of a fault has beenunderestimated. Most geophysicists did not think theSumatra fault could generate a magnitude 9.1earthquake, …’

Page 61:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................21 of 52

The statistics of surprise:.From: “Quake Moves Japan Closer to U.S. andAlters Earth’s Spin” by Kenneth Chang, March13, 2011, NYT:..

.

‘What is perhaps most surprising about the Japanearthquake is how misleading history can be. In thepast 300 years, no earthquake nearly thatlarge—nothing larger than magnitude eight—hadstruck in the Japan subduction zone. That, in turn, ledto assumptions about how large a tsunami mightstrike the coast.’.

.

“‘It did them a giant disservice,” said Dr. Stein of thegeological survey. That is not the first time that theearthquake potential of a fault has beenunderestimated. Most geophysicists did not think theSumatra fault could generate a magnitude 9.1earthquake, …’

Page 62:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................21 of 52

The statistics of surprise:.From: “Quake Moves Japan Closer to U.S. andAlters Earth’s Spin” by Kenneth Chang, March13, 2011, NYT:..

.

‘What is perhaps most surprising about the Japanearthquake is how misleading history can be. In thepast 300 years, no earthquake nearly thatlarge—nothing larger than magnitude eight—hadstruck in the Japan subduction zone. That, in turn, ledto assumptions about how large a tsunami mightstrike the coast.’.

.

“‘It did them a giant disservice,” said Dr. Stein of thegeological survey. That is not the first time that theearthquake potential of a fault has beenunderestimated. Most geophysicists did not think theSumatra fault could generate a magnitude 9.1earthquake, …’

Page 63:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................22 of 52

ingredients such as salt, sugar, and egg constitute a major part of

our every-day diet. As a result, the set of distinct ingredients

roughly follows Heap’s law, as seen in Fig. 4, with an exponent

around 0:64. According to the method in previous work [20], the

exponent of Zipf’s law corresponding to Fig. 3 can be estimated by1

l1. The product of this exponent and the exponent of Heap’s

law (0.64) is close to 1, which is consistent with the previous result

[21].

Quantifying similarity between cuisinesOur dataset can be considered as a bipartite network with a set

of recipes and a set of ingredients. An edge between a recipe and

an ingredient indicates that the recipe contains the corresponding

ingredient. Since each recipe belongs to one and only one regional

cuisine, the edges could be categorized into cuisines. Given a

cuisine c and an ingredient i, we use nci to denote the degree of

ingredient i, counted with edges in cuisine c. In other words, nci is

the number of recipes (in cuisine c) that use ingredient i.

Therefore, the ingredient-usage vector of regional cuisine c is

written in the following form:

fPcPc~(pc1,p

c2, . . . ,p

ci , . . . ,p

cn), ð1Þ

where pci~nciPi~1 n

ci

is the probability of ingredient i appears in

cuisine c. For example, if recipes in a regional cuisine c use 1,000

ingredients (with duplicates) in total and ingredient i appears in 10

recipes in that cuisine, we have pci~10

1000.

Since common ingredients carry little information, we use an

ingredient-usage vector inspired by TF-IDF (Term Frequency

Inverse Document Frequency) [22]:

Pc~(w1p

c1,w2p

c2, . . . ,wjp

ci , . . . ,wnp

cn), ð2Þ

where a prior weight wi~log

Pc

Pi n

ciP

c nci

is introduced to penalize a

popular ingredient. We use Pc for all calculations in this paper.

With this representation in hand, we quantify the similarity

between two cuisines using the Pearson correlation coefficient (Eq.

3) and cosine similarity (Eq. 4).

(i) Pearson product-moment correlation [23]: This metric

measures the extent to which a linear relationship is present

between the two vectors. It is defined as

Figure 1. Map of regional cuisines in China.doi:10.1371/journal.pone.0079161.g001

0 10 20 30 400

0.04

0.08

0.12

Number of ingredients per recipe (k)

P(k

)

Lu

Chuan

Yue

Su

MIn

Zhe

Xiang

Hui

Figure 2. Probability distribution of the number of ingredientsper recipe. All regional cuisines show similar distributions, which havea peak around 10.doi:10.1371/journal.pone.0079161.g002

Geography and Similarity of Chinese Cuisines

PLOS ONE | www.plosone.org 2 November 2013 | Volume 8 | Issue 11 | e79161

“Geography and Similarity of RegionalCuisines in China”Zhu et al.,PLoS ONE, 8, e79161, 2013. [16] Fraction of ingredients

that appear in at least recipes. Oops in notation: () isthe ComplementaryCumulative Distribution≥()

Page 64:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................23 of 52

“On a class of skew distributionfunctions”Herbert A. Simon,Biometrika, 42, 425–440, 1955. [13]

2 Power laws, Pareto distributions and Zipf’s law

0 50 100 150 200 250

heights of males

0

2

4

6

perc

enta

ge

0 20 40 60 80 100

speeds of cars

0

1

2

3

4

FIG. 1 Left: histogram of heights in centimetres of American males. Data from the National Health Examination Survey,1959–1962 (US Department of Health and Human Services). Right: histogram of speeds in miles per hour of cars on UKmotorways. Data from Transport Statistics 2003 (UK Department for Transport).

0 2×105

4×105

population of city

0

0.001

0.002

0.003

0.004

perc

enta

ge o

f ci

ties

104

105

106

107

10-8

10-7

10-6

10-5

10-4

10-3

10-2

FIG. 2 Left: histogram of the populations of all US cities with population of 10 000 or more. Right: another histogram of thesame data, but plotted on logarithmic scales. The approximate straight-line form of the histogram in the right panel impliesthat the distribution follows a power law. Data from the 2000 US Census.

is fixed, it is determined by the requirement that thedistribution p(x) sum to 1; see Section III.A.)

Power-law distributions occur in an extraordinarily di-verse range of phenomena. In addition to city popula-tions, the sizes of earthquakes [3], moon craters [4], solarflares [5], computer files [6] and wars [7], the frequency ofuse of words in any human language [2, 8], the frequencyof occurrence of personal names in most cultures [9], thenumbers of papers scientists write [10], the number ofcitations received by papers [11], the number of hits onweb pages [12], the sales of books, music recordings andalmost every other branded commodity [13, 14], the num-bers of species in biological taxa [15], people’s annual in-comes [16] and a host of other variables all follow power-law distributions.1

1 Power laws also occur in many situations other than the statis-

Power-law distributions are the subject of this arti-cle. In the following sections, I discuss ways of detectingpower-law behaviour, give empirical evidence for powerlaws in a variety of systems and describe some of themechanisms by which power-law behaviour can arise.

Readers interested in pursuing the subject further mayalso wish to consult the reviews by Sornette [18] andMitzenmacher [19], as well as the bibliography by Li.2

tical distributions of quantities. For instance, Newton’s famous1/r2 law for gravity has a power-law form with exponent α = 2.While such laws are certainly interesting in their own way, theyare not the topic of this paper. Thus, for instance, there hasin recent years been some discussion of the “allometric” scal-ing laws seen in the physiognomy and physiology of biologicalorganisms [17], but since these are not statistical distributionsthey will not be discussed here.

2 http://linkage.rockefeller.edu/wli/zipf/.

“Power laws, Pareto distributions and Zipf’slaw”M. E. J. Newman,Contemporary Physics, 46, 323–351,2005. [10]

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

SIAM REVIEW c© 2009 Society for Industrial and Applied MathematicsVol. 51, No. 4, pp. 661–703

Power-Law Distributions in

Empirical Data∗

Aaron Clauset†

Cosma Rohilla Shalizi‡

M. E. J. Newman§

Abstract. Power-law distributions occur in many situations of scientific interest and have significantconsequences for our understanding of natural and man-made phenomena. Unfortunately,the detection and characterization of power laws is complicated by the large fluctuationsthat occur in the tail of the distribution—the part of the distribution representing largebut rare events—and by the difficulty of identifying the range over which power-law behav-ior holds. Commonly used methods for analyzing power-law data, such as least-squaresfitting, can produce substantially inaccurate estimates of parameters for power-law dis-tributions, and even in cases where such methods return accurate answers they are stillunsatisfactory because they give no indication of whether the data obey a power law atall. Here we present a principled statistical framework for discerning and quantifyingpower-law behavior in empirical data. Our approach combines maximum-likelihood fittingmethods with goodness-of-fit tests based on the Kolmogorov–Smirnov (KS) statistic andlikelihood ratios. We evaluate the effectiveness of the approach with tests on syntheticdata and give critical comparisons to previous approaches. We also apply the proposedmethods to twenty-four real-world data sets from a range of different disciplines, each ofwhich has been conjectured to follow a power-law distribution. In some cases we find theseconjectures to be consistent with the data, while in others the power law is ruled out.

Key words. power-law distributions, Pareto, Zipf, maximum likelihood, heavy-tailed distributions,likelihood ratio test, model selection

AMS subject classifications. 62-07, 62P99, 65C05, 62F99

DOI. 10.1137/070710111

1. Introduction. Many empirical quantities cluster around a typical value. The

speeds of cars on a highway, the weights of apples in a store, air pressure, sea level,

the temperature in New York at noon on a midsummer’s day: all of these things vary

somewhat, but their distributions place a negligible amount of probability far from

the typical value, making the typical value representative of most observations. For

instance, it is a useful statement to say that an adult male American is about 180cm

tall because no one deviates very far from this height. Even the largest deviations,

which are exceptionally rare, are still only about a factor of two from the mean in

∗Received by the editors December 2, 2007; accepted for publication (in revised form) February2, 2009; published electronically November 6, 2009. This work was supported in part by the SantaFe Institute (AC) and by grants from the James S. McDonnell Foundation (CRS and MEJN) andthe National Science Foundation (MEJN).

http://www.siam.org/journals/sirev/51-4/71011.html†Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, NM 87501, and Department of Computer

Science, University of New Mexico, Albuquerque, NM 87131.‡Department of Statistics, Carnegie Mellon University, Pittsburgh, PA 15213.§Department of Physics and Center for the Study of Complex Systems, University of Michigan,

Ann Arbor, MI 48109.

661

“Power-law distributions in empiricaldata”Clauset, Shalizi, and Newman,SIAM Review, 51, 661–703, 2009. [4]

Page 65:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................24 of 52

.

.

100

102

104

word frequency

100

102

104

100

102

104

citations

100

102

104

106

100

102

104

web hits

100

102

104

106

107

books sold

1

10

100

100

102

104

106

telephone calls received

100

103

106

2 3 4 5 6 7

earthquake magnitude

102

103

104

0.01 0.1 1

crater diameter in km

10-4

10-2

100

102

102

103

104

105

peak intensity

101

102

103

104

1 10 100

intensity

1

10

100

109

1010

net worth in US dollars

1

10

100

104

105

106

name frequency

100

102

104

103

105

107

population of city

100

102

104

(a) (b) (c)

(d) (e) (f)

(g) (h) (i)

(j) (k) (l)FIG

.4

Cum

ula

tive

distr

ibutionsor

“ra

nk/fr

equen

cyplo

ts”

oftw

elve

quantities

repute

dto

follow

pow

erla

ws.

The

distr

ibutions

wer

eco

mpute

das

des

crib

edin

Appen

dix

A.

Data

inth

esh

aded

regio

ns

wer

eex

cluded

from

the

calc

ula

tions

ofth

eex

ponen

tsin

Table

I.Sourc

ere

fere

nce

sfo

rth

edata

are

giv

enin

the

text.

(a)

Num

ber

sofocc

urr

ence

sofw

ord

sin

the

nov

elM

oby

Dic

k

by

Her

mann

Mel

ville

.(b

)N

um

ber

sof

cita

tions

tosc

ientific

paper

spublish

edin

1981,

from

tim

eof

publica

tion

until

June

1997.

(c)

Num

ber

sofhits

on

web

site

sby

60

000

use

rsofth

eA

mer

ica

Online

Inte

rnet

serv

ice

for

the

day

of1

Dec

ember

1997.

(d)

Num

ber

sof

copie

sof

bes

tsel

ling

books

sold

inth

eU

Sbet

wee

n1895

and

1965.

(e)

Num

ber

of

calls

rece

ived

by

AT

&T

tele

phone

cust

om

ersin

the

US

fora

single

day

.(f

)M

agnitude

ofea

rthquakes

inC

alifo

rnia

bet

wee

nJanuary

1910

and

May

1992.

Magnitude

ispro

port

ionalto

the

logarith

mofth

em

axim

um

am

plitu

de

ofth

eea

rthquake,

and

hen

ceth

edistr

ibution

obey

sa

pow

erla

wev

enth

ough

the

horizo

nta

laxis

islinea

r.(g

)D

iam

eter

ofcr

ate

rson

the

moon.

Ver

tica

laxis

ism

easu

red

per

square

kilom

etre

.(h

)Pea

kgam

ma-r

ayin

tensity

of

sola

rflare

sin

counts

per

seco

nd,

mea

sure

dfr

om

Eart

horb

itbet

wee

nFeb

ruary

1980

and

Nov

ember

1989.

(i)

Inte

nsity

ofw

ars

from

1816

to1980,m

easu

red

as

batt

ledea

ths

per

10000

ofth

epopula

tion

ofth

epart

icip

ating

countr

ies.

(j)

Aggre

gate

net

wort

hin

dollars

ofth

erich

est

indiv

iduals

inth

eU

Sin

Oct

ober

2003.

(k)

Fre

quen

cyofocc

urr

ence

offa

mily

nam

esin

the

US

inth

eyea

r1990.

(l)

Popula

tions

ofU

Sci

ties

inth

eyea

r2000.

Page 66:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................25 of 52

Size distributions:

.Some examples:..

.

Earthquake magnitude (Gutenberg-Richterlaw): [8, 1] () ∝ −2 # war deaths: [12] ( ) ∝ −1.8 Sizes of forest fires [7] Sizes of cities: [13] () ∝ −2.1 # links to and from websites [2]

.

. Note: Exponents range in error

Page 67:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................25 of 52

Size distributions:

.Some examples:..

.

Earthquake magnitude (Gutenberg-Richterlaw): [8, 1] () ∝ −2 # war deaths: [12] ( ) ∝ −1.8 Sizes of forest fires [7] Sizes of cities: [13] () ∝ −2.1 # links to and from websites [2]

.

. Note: Exponents range in error

Page 68:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................25 of 52

Size distributions:

.Some examples:..

.

Earthquake magnitude (Gutenberg-Richterlaw): [8, 1] () ∝ −2 # war deaths: [12] ( ) ∝ −1.8 Sizes of forest fires [7] Sizes of cities: [13] () ∝ −2.1 # links to and from websites [2]

.

. Note: Exponents range in error

Page 69:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................25 of 52

Size distributions:

.Some examples:..

.

Earthquake magnitude (Gutenberg-Richterlaw): [8, 1] () ∝ −2 # war deaths: [12] ( ) ∝ −1.8 Sizes of forest fires [7] Sizes of cities: [13] () ∝ −2.1 # links to and from websites [2]

.

. Note: Exponents range in error

Page 70:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................25 of 52

Size distributions:

.Some examples:..

.

Earthquake magnitude (Gutenberg-Richterlaw): [8, 1] () ∝ −2 # war deaths: [12] ( ) ∝ −1.8 Sizes of forest fires [7] Sizes of cities: [13] () ∝ −2.1 # links to and from websites [2]

.

. Note: Exponents range in error

Page 71:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................25 of 52

Size distributions:

.Some examples:..

.

Earthquake magnitude (Gutenberg-Richterlaw): [8, 1] () ∝ −2 # war deaths: [12] ( ) ∝ −1.8 Sizes of forest fires [7] Sizes of cities: [13] () ∝ −2.1 # links to and from websites [2]

.

. Note: Exponents range in error

Page 72:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................26 of 52

Size distributions:.More examples:..

.

# citations to papers: [5, 6] () ∝ −3. Individual wealth (maybe): () ∝ −2. Distributions of tree trunk diameters: ( ) ∝ −2. The gravitational force at a random point in theuniverse: [9] () ∝ −5/2. (See the Holtsmarkdistribution and stable distributions.) Diameter of moon craters: [10] ( ) ∝ −3. Word frequency: [13] e.g., () ∝ −2.2 (variable). # religious adherents in cults: [4] () ∝ −1.8±0.1. # sightings of birds per species (North AmericanBreeding Bird Survey for 2003): [4] () ∝ −2.1±0.1. # species per genus: [15, 13, 4] () ∝ −2.4±0.2.

Page 73:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................26 of 52

Size distributions:.More examples:..

.

# citations to papers: [5, 6] () ∝ −3. Individual wealth (maybe): () ∝ −2. Distributions of tree trunk diameters: ( ) ∝ −2. The gravitational force at a random point in theuniverse: [9] () ∝ −5/2. (See the Holtsmarkdistribution and stable distributions.) Diameter of moon craters: [10] ( ) ∝ −3. Word frequency: [13] e.g., () ∝ −2.2 (variable). # religious adherents in cults: [4] () ∝ −1.8±0.1. # sightings of birds per species (North AmericanBreeding Bird Survey for 2003): [4] () ∝ −2.1±0.1. # species per genus: [15, 13, 4] () ∝ −2.4±0.2.

Page 74:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................26 of 52

Size distributions:.More examples:..

.

# citations to papers: [5, 6] () ∝ −3. Individual wealth (maybe): () ∝ −2. Distributions of tree trunk diameters: ( ) ∝ −2. The gravitational force at a random point in theuniverse: [9] () ∝ −5/2. (See the Holtsmarkdistribution and stable distributions.) Diameter of moon craters: [10] ( ) ∝ −3. Word frequency: [13] e.g., () ∝ −2.2 (variable). # religious adherents in cults: [4] () ∝ −1.8±0.1. # sightings of birds per species (North AmericanBreeding Bird Survey for 2003): [4] () ∝ −2.1±0.1. # species per genus: [15, 13, 4] () ∝ −2.4±0.2.

Page 75:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................26 of 52

Size distributions:.More examples:..

.

# citations to papers: [5, 6] () ∝ −3. Individual wealth (maybe): () ∝ −2. Distributions of tree trunk diameters: ( ) ∝ −2. The gravitational force at a random point in theuniverse: [9] () ∝ −5/2. (See the Holtsmarkdistribution and stable distributions.) Diameter of moon craters: [10] ( ) ∝ −3. Word frequency: [13] e.g., () ∝ −2.2 (variable). # religious adherents in cults: [4] () ∝ −1.8±0.1. # sightings of birds per species (North AmericanBreeding Bird Survey for 2003): [4] () ∝ −2.1±0.1. # species per genus: [15, 13, 4] () ∝ −2.4±0.2.

Page 76:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................26 of 52

Size distributions:.More examples:..

.

# citations to papers: [5, 6] () ∝ −3. Individual wealth (maybe): () ∝ −2. Distributions of tree trunk diameters: ( ) ∝ −2. The gravitational force at a random point in theuniverse: [9] () ∝ −5/2. (See the Holtsmarkdistribution and stable distributions.) Diameter of moon craters: [10] ( ) ∝ −3. Word frequency: [13] e.g., () ∝ −2.2 (variable). # religious adherents in cults: [4] () ∝ −1.8±0.1. # sightings of birds per species (North AmericanBreeding Bird Survey for 2003): [4] () ∝ −2.1±0.1. # species per genus: [15, 13, 4] () ∝ −2.4±0.2.

Page 77:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................26 of 52

Size distributions:.More examples:..

.

# citations to papers: [5, 6] () ∝ −3. Individual wealth (maybe): () ∝ −2. Distributions of tree trunk diameters: ( ) ∝ −2. The gravitational force at a random point in theuniverse: [9] () ∝ −5/2. (See the Holtsmarkdistribution and stable distributions.) Diameter of moon craters: [10] ( ) ∝ −3. Word frequency: [13] e.g., () ∝ −2.2 (variable). # religious adherents in cults: [4] () ∝ −1.8±0.1. # sightings of birds per species (North AmericanBreeding Bird Survey for 2003): [4] () ∝ −2.1±0.1. # species per genus: [15, 13, 4] () ∝ −2.4±0.2.

Page 78:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................26 of 52

Size distributions:.More examples:..

.

# citations to papers: [5, 6] () ∝ −3. Individual wealth (maybe): () ∝ −2. Distributions of tree trunk diameters: ( ) ∝ −2. The gravitational force at a random point in theuniverse: [9] () ∝ −5/2. (See the Holtsmarkdistribution and stable distributions.) Diameter of moon craters: [10] ( ) ∝ −3. Word frequency: [13] e.g., () ∝ −2.2 (variable). # religious adherents in cults: [4] () ∝ −1.8±0.1. # sightings of birds per species (North AmericanBreeding Bird Survey for 2003): [4] () ∝ −2.1±0.1. # species per genus: [15, 13, 4] () ∝ −2.4±0.2.

Page 79:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................26 of 52

Size distributions:.More examples:..

.

# citations to papers: [5, 6] () ∝ −3. Individual wealth (maybe): () ∝ −2. Distributions of tree trunk diameters: ( ) ∝ −2. The gravitational force at a random point in theuniverse: [9] () ∝ −5/2. (See the Holtsmarkdistribution and stable distributions.) Diameter of moon craters: [10] ( ) ∝ −3. Word frequency: [13] e.g., () ∝ −2.2 (variable). # religious adherents in cults: [4] () ∝ −1.8±0.1. # sightings of birds per species (North AmericanBreeding Bird Survey for 2003): [4] () ∝ −2.1±0.1. # species per genus: [15, 13, 4] () ∝ −2.4±0.2.

Page 80:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................26 of 52

Size distributions:.More examples:..

.

# citations to papers: [5, 6] () ∝ −3. Individual wealth (maybe): () ∝ −2. Distributions of tree trunk diameters: ( ) ∝ −2. The gravitational force at a random point in theuniverse: [9] () ∝ −5/2. (See the Holtsmarkdistribution and stable distributions.) Diameter of moon craters: [10] ( ) ∝ −3. Word frequency: [13] e.g., () ∝ −2.2 (variable). # religious adherents in cults: [4] () ∝ −1.8±0.1. # sightings of birds per species (North AmericanBreeding Bird Survey for 2003): [4] () ∝ −2.1±0.1. # species per genus: [15, 13, 4] () ∝ −2.4±0.2.

Page 81:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

.Table 3 from Clauset, Shalizi, and Newman [4]:..

.

Basic parameters of the data sets described in section 6, along with their power-law fits and the corresponding p-values (statistically significant valuesare denoted in bold).

Quantity n 〈x〉 σ xmax xmin α ntail p

count of word use 18 855 11.14 148.33 14 086 7 ± 2 1.95(2) 2958 ± 987 0.49

protein interaction degree 1846 2.34 3.05 56 5 ± 2 3.1(3) 204 ± 263 0.31

metabolic degree 1641 5.68 17.81 468 4 ± 1 2.8(1) 748 ± 136 0.00Internet degree 22 688 5.63 37.83 2583 21 ± 9 2.12(9) 770 ± 1124 0.29

telephone calls received 51 360 423 3.88 179.09 375 746 120 ± 49 2.09(1) 102 592 ± 210 147 0.63

intensity of wars 115 15.70 49.97 382 2.1 ± 3.5 1.7(2) 70 ± 14 0.20

terrorist attack severity 9101 4.35 31.58 2749 12 ± 4 2.4(2) 547 ± 1663 0.68

HTTP size (kilobytes) 226 386 7.36 57.94 10 971 36.25 ± 22.74 2.48(5) 6794 ± 2232 0.00species per genus 509 5.59 6.94 56 4 ± 2 2.4(2) 233 ± 138 0.10

bird species sightings 591 3384.36 10 952.34 138 705 6679 ± 2463 2.1(2) 66 ± 41 0.55

blackouts (×103) 211 253.87 610.31 7500 230 ± 90 2.3(3) 59 ± 35 0.62

sales of books (×103) 633 1986.67 1396.60 19 077 2400 ± 430 3.7(3) 139 ± 115 0.66

population of cities (×103) 19 447 9.00 77.83 8 009 52.46 ± 11.88 2.37(8) 580 ± 177 0.76

email address books size 4581 12.45 21.49 333 57 ± 21 3.5(6) 196 ± 449 0.16

forest fire size (acres) 203 785 0.90 20.99 4121 6324 ± 3487 2.2(3) 521 ± 6801 0.05solar flare intensity 12 773 689.41 6520.59 231 300 323 ± 89 1.79(2) 1711 ± 384 1.00

quake intensity (×103) 19 302 24.54 563.83 63 096 0.794 ± 80.198 1.64(4) 11 697 ± 2159 0.00religious followers (×106) 103 27.36 136.64 1050 3.85 ± 1.60 1.8(1) 39 ± 26 0.42

freq. of surnames (×103) 2753 50.59 113.99 2502 111.92 ± 40.67 2.5(2) 239 ± 215 0.20

net worth (mil. USD) 400 2388.69 4 167.35 46 000 900 ± 364 2.3(1) 302 ± 77 0.00citations to papers 415 229 16.17 44.02 8904 160 ± 35 3.16(6) 3455 ± 1859 0.20

papers authored 401 445 7.21 16.52 1416 133 ± 13 4.3(1) 988 ± 377 0.90

hits to web sites 119 724 9.83 392.52 129 641 2 ± 13 1.81(8) 50 981 ± 16 898 0.00links to web sites 241 428 853 9.15 106 871.65 1 199 466 3684 ± 151 2.336(9) 28 986 ± 1560 0.00

We’ll explore various exponent measurementtechniques in assignments.

Page 82:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................28 of 52

power-law size distributions

.Gaussians versus power-law size distributions:..

.

Mediocristan versus Extremistan Mild versus Wild (Mandelbrot) Example: Height versus wealth.

.

.

See “The Black Swan” by NassimTaleb. [14]

Page 83:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................29 of 52

Turkeys...

From “The Black Swan” [14]

Page 84:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................30 of 52

Taleb’s table [14]

.Mediocristan/Extremistan..

.

Most typical member is mediocre/Most typical is eithergiant or tiny Winners get a small segment/Winner take almost alleffects When you observe for a while, you know what’s goingon/It takes a very long time to figure out what’s goingon Prediction is easy/Prediction is hard History crawls/History makes jumps Tyranny of the collective/Tyranny of the rare andaccidental

Page 85:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................30 of 52

Taleb’s table [14]

.Mediocristan/Extremistan..

.

Most typical member is mediocre/Most typical is eithergiant or tiny Winners get a small segment/Winner take almost alleffects When you observe for a while, you know what’s goingon/It takes a very long time to figure out what’s goingon Prediction is easy/Prediction is hard History crawls/History makes jumps Tyranny of the collective/Tyranny of the rare andaccidental

Page 86:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................30 of 52

Taleb’s table [14]

.Mediocristan/Extremistan..

.

Most typical member is mediocre/Most typical is eithergiant or tiny Winners get a small segment/Winner take almost alleffects When you observe for a while, you know what’s goingon/It takes a very long time to figure out what’s goingon Prediction is easy/Prediction is hard History crawls/History makes jumps Tyranny of the collective/Tyranny of the rare andaccidental

Page 87:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................30 of 52

Taleb’s table [14]

.Mediocristan/Extremistan..

.

Most typical member is mediocre/Most typical is eithergiant or tiny Winners get a small segment/Winner take almost alleffects When you observe for a while, you know what’s goingon/It takes a very long time to figure out what’s goingon Prediction is easy/Prediction is hard History crawls/History makes jumps Tyranny of the collective/Tyranny of the rare andaccidental

Page 88:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................30 of 52

Taleb’s table [14]

.Mediocristan/Extremistan..

.

Most typical member is mediocre/Most typical is eithergiant or tiny Winners get a small segment/Winner take almost alleffects When you observe for a while, you know what’s goingon/It takes a very long time to figure out what’s goingon Prediction is easy/Prediction is hard History crawls/History makes jumps Tyranny of the collective/Tyranny of the rare andaccidental

Page 89:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................30 of 52

Taleb’s table [14]

.Mediocristan/Extremistan..

.

Most typical member is mediocre/Most typical is eithergiant or tiny Winners get a small segment/Winner take almost alleffects When you observe for a while, you know what’s goingon/It takes a very long time to figure out what’s goingon Prediction is easy/Prediction is hard History crawls/History makes jumps Tyranny of the collective/Tyranny of the rare andaccidental

Page 90:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................30 of 52

Taleb’s table [14]

.Mediocristan/Extremistan..

.

Most typical member is mediocre/Most typical is eithergiant or tiny Winners get a small segment/Winner take almost alleffects When you observe for a while, you know what’s goingon/It takes a very long time to figure out what’s goingon Prediction is easy/Prediction is hard History crawls/History makes jumps Tyranny of the collective/Tyranny of the rare andaccidental

Page 91:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................31 of 52

Size distributions:

.

.

Power-law size distributions aresometimes calledPareto distributions after Italianscholar Vilfredo Pareto..

.

Pareto noted wealth in Italy wasdistributed unevenly (80–20 rule;misleading). Term used especially bypractitioners of the DismalScience.

Page 92:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................31 of 52

Size distributions:

.

.

Power-law size distributions aresometimes calledPareto distributions after Italianscholar Vilfredo Pareto..

.

Pareto noted wealth in Italy wasdistributed unevenly (80–20 rule;misleading). Term used especially bypractitioners of the DismalScience.

Page 93:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................31 of 52

Size distributions:

.

.

Power-law size distributions aresometimes calledPareto distributions after Italianscholar Vilfredo Pareto..

.

Pareto noted wealth in Italy wasdistributed unevenly (80–20 rule;misleading). Term used especially bypractitioners of the DismalScience.

Page 94:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................32 of 52

Devilish power-law size distribution details:

.Exhibit A:..

.

Given () = − with 0 < min < < max,the mean is ( ≠ ):⟨⟩ = − (2−

max − 2−min ) .

.

.

Mean ‘blows up’ with upper cutoff if < . Mean depends on lower cutoff if > . < : Typical sample is large. > : Typical sample is small.

Insert question from assignment 2

Page 95:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................32 of 52

Devilish power-law size distribution details:

.Exhibit A:..

.

Given () = − with 0 < min < < max,the mean is ( ≠ ):⟨⟩ = − (2−

max − 2−min ) .

.

.

Mean ‘blows up’ with upper cutoff if < . Mean depends on lower cutoff if > . < : Typical sample is large. > : Typical sample is small.

Insert question from assignment 2

Page 96:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................32 of 52

Devilish power-law size distribution details:

.Exhibit A:..

.

Given () = − with 0 < min < < max,the mean is ( ≠ ):⟨⟩ = − (2−

max − 2−min ) .

.

.

Mean ‘blows up’ with upper cutoff if < . Mean depends on lower cutoff if > . < : Typical sample is large. > : Typical sample is small.

Insert question from assignment 2

Page 97:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................32 of 52

Devilish power-law size distribution details:

.Exhibit A:..

.

Given () = − with 0 < min < < max,the mean is ( ≠ ):⟨⟩ = − (2−

max − 2−min ) .

.

.

Mean ‘blows up’ with upper cutoff if < . Mean depends on lower cutoff if > . < : Typical sample is large. > : Typical sample is small.

Insert question from assignment 2

Page 98:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................32 of 52

Devilish power-law size distribution details:

.Exhibit A:..

.

Given () = − with 0 < min < < max,the mean is ( ≠ ):⟨⟩ = − (2−

max − 2−min ) .

.

.

Mean ‘blows up’ with upper cutoff if < . Mean depends on lower cutoff if > . < : Typical sample is large. > : Typical sample is small.

Insert question from assignment 2

Page 99:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................33 of 52

And in general...

.Moments:..

.

All moments depend only on cutoffs. No internal scale that dominates/matters. Compare to a Gaussian, exponential, etc.

.For many real size distributions: < < ..

.

mean is finite (depends on lower cutoff) 2 = variance is ‘infinite’ (depends on upper cutoff) Width of distribution is ‘infinite’ If > , distribution is less terrifying and may beeasily confused with other kinds of distributions.

Insert question from assignment 3

Page 100:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................33 of 52

And in general...

.Moments:..

.

All moments depend only on cutoffs. No internal scale that dominates/matters. Compare to a Gaussian, exponential, etc.

.For many real size distributions: < < ..

.

mean is finite (depends on lower cutoff) 2 = variance is ‘infinite’ (depends on upper cutoff) Width of distribution is ‘infinite’ If > , distribution is less terrifying and may beeasily confused with other kinds of distributions.

Insert question from assignment 3

Page 101:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................33 of 52

And in general...

.Moments:..

.

All moments depend only on cutoffs. No internal scale that dominates/matters. Compare to a Gaussian, exponential, etc.

.For many real size distributions: < < ..

.

mean is finite (depends on lower cutoff) 2 = variance is ‘infinite’ (depends on upper cutoff) Width of distribution is ‘infinite’ If > , distribution is less terrifying and may beeasily confused with other kinds of distributions.

Insert question from assignment 3

Page 102:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................33 of 52

And in general...

.Moments:..

.

All moments depend only on cutoffs. No internal scale that dominates/matters. Compare to a Gaussian, exponential, etc.

.For many real size distributions: < < ..

.

mean is finite (depends on lower cutoff) 2 = variance is ‘infinite’ (depends on upper cutoff) Width of distribution is ‘infinite’ If > , distribution is less terrifying and may beeasily confused with other kinds of distributions.

Insert question from assignment 3

Page 103:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................33 of 52

And in general...

.Moments:..

.

All moments depend only on cutoffs. No internal scale that dominates/matters. Compare to a Gaussian, exponential, etc.

.For many real size distributions: < < ..

.

mean is finite (depends on lower cutoff) 2 = variance is ‘infinite’ (depends on upper cutoff) Width of distribution is ‘infinite’ If > , distribution is less terrifying and may beeasily confused with other kinds of distributions.

Insert question from assignment 3

Page 104:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................33 of 52

And in general...

.Moments:..

.

All moments depend only on cutoffs. No internal scale that dominates/matters. Compare to a Gaussian, exponential, etc.

.For many real size distributions: < < ..

.

mean is finite (depends on lower cutoff) 2 = variance is ‘infinite’ (depends on upper cutoff) Width of distribution is ‘infinite’ If > , distribution is less terrifying and may beeasily confused with other kinds of distributions.

Insert question from assignment 3

Page 105:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................33 of 52

And in general...

.Moments:..

.

All moments depend only on cutoffs. No internal scale that dominates/matters. Compare to a Gaussian, exponential, etc.

.For many real size distributions: < < ..

.

mean is finite (depends on lower cutoff) 2 = variance is ‘infinite’ (depends on upper cutoff) Width of distribution is ‘infinite’ If > , distribution is less terrifying and may beeasily confused with other kinds of distributions.

Insert question from assignment 3

Page 106:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................33 of 52

And in general...

.Moments:..

.

All moments depend only on cutoffs. No internal scale that dominates/matters. Compare to a Gaussian, exponential, etc.

.For many real size distributions: < < ..

.

mean is finite (depends on lower cutoff) 2 = variance is ‘infinite’ (depends on upper cutoff) Width of distribution is ‘infinite’ If > , distribution is less terrifying and may beeasily confused with other kinds of distributions.

Insert question from assignment 3

Page 107:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................34 of 52

Moments.Standard deviation is a mathematicalconvenience:..

.

Variance is nice analytically... Another measure of distribution width:

Mean average deviation (MAD) = ⟨| − ⟨⟩|⟩ For a pure power law with < < :⟨| − ⟨⟩|⟩ is finite. But MAD is mildly unpleasant analytically... We still speak of infinite ‘width’ if < .Insert question from assignment 2

Page 108:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................34 of 52

Moments.Standard deviation is a mathematicalconvenience:..

.

Variance is nice analytically... Another measure of distribution width:

Mean average deviation (MAD) = ⟨| − ⟨⟩|⟩ For a pure power law with < < :⟨| − ⟨⟩|⟩ is finite. But MAD is mildly unpleasant analytically... We still speak of infinite ‘width’ if < .Insert question from assignment 2

Page 109:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................34 of 52

Moments

.Standard deviation is a mathematicalconvenience:..

.

Variance is nice analytically... Another measure of distribution width:

Mean average deviation (MAD) = ⟨| − ⟨⟩|⟩ For a pure power law with < < :⟨| − ⟨⟩|⟩ is finite. But MAD is mildly unpleasant analytically... We still speak of infinite ‘width’ if < .Insert question from assignment 2

Page 110:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................34 of 52

Moments

.Standard deviation is a mathematicalconvenience:..

.

Variance is nice analytically... Another measure of distribution width:

Mean average deviation (MAD) = ⟨| − ⟨⟩|⟩ For a pure power law with < < :⟨| − ⟨⟩|⟩ is finite. But MAD is mildly unpleasant analytically... We still speak of infinite ‘width’ if < .Insert question from assignment 2

Page 111:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................34 of 52

Moments

.Standard deviation is a mathematicalconvenience:..

.

Variance is nice analytically... Another measure of distribution width:

Mean average deviation (MAD) = ⟨| − ⟨⟩|⟩ For a pure power law with < < :⟨| − ⟨⟩|⟩ is finite. But MAD is mildly unpleasant analytically... We still speak of infinite ‘width’ if < .Insert question from assignment 2

Page 112:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................34 of 52

Moments

.Standard deviation is a mathematicalconvenience:..

.

Variance is nice analytically... Another measure of distribution width:

Mean average deviation (MAD) = ⟨| − ⟨⟩|⟩ For a pure power law with < < :⟨| − ⟨⟩|⟩ is finite. But MAD is mildly unpleasant analytically... We still speak of infinite ‘width’ if < .Insert question from assignment 2

Page 113:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................35 of 52

How sample sizes grow...

.Given () ∼ −:..

.

We can show that after samples, we expect thelargest sample to be1 ≳ ′1/(−1)

Sampling from a finite-variance distribution givesa much slower growth with . e.g., for () = −, we find1 ≳ ln.

Insert question from assignment 2

Page 114:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................35 of 52

How sample sizes grow...

.Given () ∼ −:..

.

We can show that after samples, we expect thelargest sample to be1 ≳ ′1/(−1)

Sampling from a finite-variance distribution givesa much slower growth with . e.g., for () = −, we find1 ≳ ln.

Insert question from assignment 2

Page 115:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................35 of 52

How sample sizes grow...

.Given () ∼ −:..

.

We can show that after samples, we expect thelargest sample to be1 ≳ ′1/(−1)

Sampling from a finite-variance distribution givesa much slower growth with . e.g., for () = −, we find1 ≳ ln.

Insert question from assignment 2

Page 116:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................36 of 52

Complementary Cumulative Distribution Function:.CCDF:..

.

≥() = (′ ≥ ) = − (′ < ) = ∫∞′= (′)d′ ∝ ∫∞′=(′)−d′ = − + (′)−+1∣∞′= ∝ −+1

Page 117:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................36 of 52

Complementary Cumulative Distribution Function:.CCDF:..

.

≥() = (′ ≥ ) = − (′ < ) = ∫∞′= (′)d′ ∝ ∫∞′=(′)−d′ = − + (′)−+1∣∞′= ∝ −+1

Page 118:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................36 of 52

Complementary Cumulative Distribution Function:.CCDF:..

.

≥() = (′ ≥ ) = − (′ < ) = ∫∞′= (′)d′ ∝ ∫∞′=(′)−d′ = − + (′)−+1∣∞′= ∝ −+1

Page 119:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................36 of 52

Complementary Cumulative Distribution Function:.CCDF:..

.

≥() = (′ ≥ ) = − (′ < ) = ∫∞′= (′)d′ ∝ ∫∞′=(′)−d′ = − + (′)−+1∣∞′= ∝ −+1

Page 120:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................36 of 52

Complementary Cumulative Distribution Function:.CCDF:..

.

≥() = (′ ≥ ) = − (′ < ) = ∫∞′= (′)d′ ∝ ∫∞′=(′)−d′ = − + (′)−+1∣∞′= ∝ −+1

Page 121:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................36 of 52

Complementary Cumulative Distribution Function:.CCDF:..

.

≥() = (′ ≥ ) = − (′ < ) = ∫∞′= (′)d′ ∝ ∫∞′=(′)−d′ = − + (′)−+1∣∞′= ∝ −+1

Page 122:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................37 of 52

Complementary Cumulative Distribution Function:.CCDF:..

.

≥() ∝ −+1 Use when tail of follows a power law. Increases exponent by one. Useful in cleaning up data.

Page 123:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................37 of 52

Complementary Cumulative Distribution Function:.CCDF:..

.

≥() ∝ −+1 Use when tail of follows a power law. Increases exponent by one. Useful in cleaning up data.

Page 124:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................37 of 52

Complementary Cumulative Distribution Function:.CCDF:..

.

≥() ∝ −+1 Use when tail of follows a power law. Increases exponent by one. Useful in cleaning up data.

Page 125:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................37 of 52

Complementary Cumulative Distribution Function:.CCDF:..

.

≥() ∝ −+1 Use when tail of follows a power law. Increases exponent by one. Useful in cleaning up data.

Page 126:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................37 of 52

Complementary Cumulative Distribution Function:.CCDF:..

.

≥() ∝ −+1 Use when tail of follows a power law. Increases exponent by one. Useful in cleaning up data.

.

.

PDF:

−2.5 −2 −1.5 −1 −0.5 0 0.5 10

0.5

1

1.5

2

2.5

3

3.5

log10

q

log

10 N

q

CCDF:

−2.5 −2 −1.5 −1 −0.5 0 0.5 10

0.5

1

1.5

2

2.5

3

3.5

log10

q

log

10 N

> q

Page 127:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................38 of 52

Complementary Cumulative Distribution Function:

.

.

Same story for a discrete variable: () ∼ −. ≥() = (′ ≥ )

= ∞∑′= ()∝ −+1

Use integrals to approximate sums.

Page 128:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................38 of 52

Complementary Cumulative Distribution Function:

.

.

Same story for a discrete variable: () ∼ −. ≥() = (′ ≥ )= ∞∑′= ()

∝ −+1

Use integrals to approximate sums.

Page 129:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................38 of 52

Complementary Cumulative Distribution Function:

.

.

Same story for a discrete variable: () ∼ −. ≥() = (′ ≥ )= ∞∑′= ()∝ −+1

Use integrals to approximate sums.

Page 130:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................38 of 52

Complementary Cumulative Distribution Function:

.

.

Same story for a discrete variable: () ∼ −. ≥() = (′ ≥ )= ∞∑′= ()∝ −+1

Use integrals to approximate sums.

Page 131:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................39 of 52

Zipfian rank-frequency plots

.George Kingsley Zipf:..

.

Noted various rank distributionshave power-law tails, often with exponent -1(word frequency, city sizes...) Zipf’s 1949 Magnum Opus:

We’ll study Zipf’s law in depth...

Page 132:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................39 of 52

Zipfian rank-frequency plots

.George Kingsley Zipf:..

.

Noted various rank distributionshave power-law tails, often with exponent -1(word frequency, city sizes...) Zipf’s 1949 Magnum Opus:

“Human Behaviour and the Principle ofLeast-Effort”by G. K. Zipf (1949). [17]

We’ll study Zipf’s law in depth...

Page 133:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................39 of 52

Zipfian rank-frequency plots

.George Kingsley Zipf:..

.

Noted various rank distributionshave power-law tails, often with exponent -1(word frequency, city sizes...) Zipf’s 1949 Magnum Opus:

“Human Behaviour and the Principle ofLeast-Effort”by G. K. Zipf (1949). [17]

We’ll study Zipf’s law in depth...

Page 134:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................40 of 52

Zipfian rank-frequency plots

.Zipf’s way:..

.

Given a collection of entities, rank them by size,largest to smallest. = the size of the th ranked entity. = corresponds to the largest size. Example: 1 could be the frequency of occurrenceof the most common word in a text. Zipf’s observation: ∝ −

Page 135:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................40 of 52

Zipfian rank-frequency plots

.Zipf’s way:..

.

Given a collection of entities, rank them by size,largest to smallest. = the size of the th ranked entity. = corresponds to the largest size. Example: 1 could be the frequency of occurrenceof the most common word in a text. Zipf’s observation: ∝ −

Page 136:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................40 of 52

Zipfian rank-frequency plots

.Zipf’s way:..

.

Given a collection of entities, rank them by size,largest to smallest. = the size of the th ranked entity. = corresponds to the largest size. Example: 1 could be the frequency of occurrenceof the most common word in a text. Zipf’s observation: ∝ −

Page 137:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................40 of 52

Zipfian rank-frequency plots

.Zipf’s way:..

.

Given a collection of entities, rank them by size,largest to smallest. = the size of the th ranked entity. = corresponds to the largest size. Example: 1 could be the frequency of occurrenceof the most common word in a text. Zipf’s observation: ∝ −

Page 138:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................40 of 52

Zipfian rank-frequency plots

.Zipf’s way:..

.

Given a collection of entities, rank them by size,largest to smallest. = the size of the th ranked entity. = corresponds to the largest size. Example: 1 could be the frequency of occurrenceof the most common word in a text. Zipf’s observation: ∝ −

Page 139:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................40 of 52

Zipfian rank-frequency plots

.Zipf’s way:..

.

Given a collection of entities, rank them by size,largest to smallest. = the size of the th ranked entity. = corresponds to the largest size. Example: 1 could be the frequency of occurrenceof the most common word in a text. Zipf’s observation: ∝ −

Page 140:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................41 of 52

Size distributions:.Brown Corpus (1,015,945 words):..

.

CCDF:

−2.5 −2 −1.5 −1 −0.5 0 0.5 10

0.5

1

1.5

2

2.5

3

3.5

log10

q

log

10 N

> q

Zipf:

0 0.5 1 1.5 2 2.5 3 3.5−2.5

−2

−1.5

−1

−0.5

0

0.5

1

log10

rank ilo

g10 q

i

.

.

The, of, and, to, a, ... = ‘objects’ ‘Size’ = word frequency

Beep: (Important) CCDF and Zipf plots arerelated...

Page 141:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................41 of 52

Size distributions:.Brown Corpus (1,015,945 words):..

.

CCDF:

−2.5 −2 −1.5 −1 −0.5 0 0.5 10

0.5

1

1.5

2

2.5

3

3.5

log10

q

log

10 N

> q

Zipf:

0 0.5 1 1.5 2 2.5 3 3.5−2.5

−2

−1.5

−1

−0.5

0

0.5

1

log10

rank ilo

g10 q

i

.

.

The, of, and, to, a, ... = ‘objects’ ‘Size’ = word frequency Beep: (Important) CCDF and Zipf plots arerelated...

Page 142:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................42 of 52

Size distributions:.Brown Corpus (1,015,945 words):..

.

CCDF:

−2.5 −2 −1.5 −1 −0.5 0 0.5 10

0.5

1

1.5

2

2.5

3

3.5

log10

q

log

10 N

> q

Zipf:

−3 −2 −1 0 10

0.5

1

1.5

2

2.5

3

3.5

log10

qi

log

10 r

ank

i

.

.

The, of, and, to, a, ... = ‘objects’ ‘Size’ = word frequency Beep: (Important) CCDF and Zipf plots arerelated...

Page 143:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................43 of 52

.Observe:..

.

≥() = the number of objects with size at least where = total number of objects. If an object has size , then ≥() is its rank . So ∝ − = (≥())−

∝ (−+1)(−) since ≥() ∼ −+1.We therefore have 1 = (− + 1)(−) or: = 1 − 1

A rank distribution exponent of = 1 corresponds to asize distribution exponent = 2.

Page 144:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................43 of 52

.Observe:..

.

≥() = the number of objects with size at least where = total number of objects. If an object has size , then ≥() is its rank . So ∝ − = (≥())−

∝ (−+1)(−) since ≥() ∼ −+1.We therefore have 1 = (− + 1)(−) or: = 1 − 1

A rank distribution exponent of = 1 corresponds to asize distribution exponent = 2.

Page 145:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................43 of 52

.Observe:..

.

≥() = the number of objects with size at least where = total number of objects. If an object has size , then ≥() is its rank . So ∝ − = (≥())−

∝ (−+1)(−) since ≥() ∼ −+1.We therefore have 1 = (− + 1)(−) or: = 1 − 1

A rank distribution exponent of = 1 corresponds to asize distribution exponent = 2.

Page 146:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................43 of 52

.Observe:..

.

≥() = the number of objects with size at least where = total number of objects. If an object has size , then ≥() is its rank . So ∝ − = (≥())−

∝ (−+1)(−) since ≥() ∼ −+1.We therefore have 1 = (− + 1)(−) or: = 1 − 1

A rank distribution exponent of = 1 corresponds to asize distribution exponent = 2.

Page 147:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................43 of 52

.Observe:..

.

≥() = the number of objects with size at least where = total number of objects. If an object has size , then ≥() is its rank . So ∝ − = (≥())−

∝ (−+1)(−) since ≥() ∼ −+1.We therefore have 1 = (− + 1)(−) or: = 1 − 1 A rank distribution exponent of = 1 corresponds to asize distribution exponent = 2.

Page 148:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................43 of 52

.Observe:..

.

≥() = the number of objects with size at least where = total number of objects. If an object has size , then ≥() is its rank . So ∝ − = (≥())−

∝ (−+1)(−) since ≥() ∼ −+1.We therefore have 1 = (− + 1)(−) or: = 1 − 1 A rank distribution exponent of = 1 corresponds to asize distribution exponent = 2.

Page 149:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................44 of 52

The Don..Extreme deviations in test cricket:..

.

1000 10 20 30 9040 50 60 70 80

Don Bradman’s batting average= 166% next best.

That’s pretty solid. Later in the course: Understanding success—is the Mona Lisa like Don Bradman?

Page 150:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................44 of 52

The Don..Extreme deviations in test cricket:..

.

1000 10 20 30 9040 50 60 70 80

Don Bradman’s batting average= 166% next best. That’s pretty solid. Later in the course: Understanding success—is the Mona Lisa like Don Bradman?

Page 151:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................44 of 52

The Don..Extreme deviations in test cricket:..

.

1000 10 20 30 9040 50 60 70 80

Don Bradman’s batting average= 166% next best. That’s pretty solid. Later in the course: Understanding success—is the Mona Lisa like Don Bradman?

Page 152:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................44 of 52

The Don..Extreme deviations in test cricket:..

.

1000 10 20 30 9040 50 60 70 80

Don Bradman’s batting average= 166% next best. That’s pretty solid. Later in the course: Understanding success—is the Mona Lisa like Don Bradman?

Page 153:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................45 of 52

.A good eye:..

.

Page 154:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................46 of 52

Actual:Fall 2014:

0 20 40 60 80 100

Ideal:Fall 2014:

0 20 40 60 80 100

Page 155:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................47 of 52

Actual:Fall 2013:

0 20 40 60 80 100

Spring 2013:

0 20 40 60 80 100

Ideal:Fall 2013:

0 20 40 60 80 100

Spring 2013:

0 20 40 60 80 100

Page 156:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................48 of 52

References I

[1] P. Bak, K. Christensen, L. Danon, and T. Scanlon.Unified scaling law for earthquakes.Phys. Rev. Lett., 88:178501, 2002. pdf

[2] A.-L. Barabási and R. Albert.Emergence of scaling in random networks.Science, 286:509–511, 1999. pdf

[3] K. Christensen, L. Danon, T. Scanlon, and P. Bak.Unified scaling law for earthquakes.Proc. Natl. Acad. Sci., 99:2509–2513, 2002. pdf

[4] A. Clauset, C. R. Shalizi, and M. E. J. Newman.Power-law distributions in empirical data.SIAM Review, 51:661–703, 2009. pdf

Page 157:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................49 of 52

References II[5] D. J. de Solla Price.

Networks of scientific papers.Science, 149:510–515, 1965. pdf

[6] D. J. de Solla Price.A general theory of bibliometric and othercumulative advantage processes.J. Amer. Soc. Inform. Sci., 27:292–306, 1976.

[7] P. Grassberger.Critical behaviour of the Drossel-Schwabl forestfire model.New Journal of Physics, 4:17.1–17.15, 2002. pdf

[8] B. Gutenberg and C. F. Richter.Earthquake magnitude, intensity, energy, andacceleration.Bull. Seism. Soc. Am., 499:105–145, 1942. pdf

Page 158:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................50 of 52

References III[9] J. Holtsmark.

Über die verbreiterung von spektrallinien.Ann. Phys., 58:577–, 1919.

[10] M. E. J. Newman.Power laws, Pareto distributions and Zipf’s law.Contemporary Physics, 46:323–351, 2005. pdf

[11] M. I. Norton and D. Ariely.Building a better America—One wealth quintile ata time.Perspectives on Psychological Science, 6:9–12,2011. pdf

[12] L. F. Richardson.Variation of the frequency of fatal quarrels withmagnitude.J. Amer. Stat. Assoc., 43:523–546, 1949. pdf

Page 159:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................51 of 52

References IV

[13] H. A. Simon.On a class of skew distribution functions.Biometrika, 42:425–440, 1955. pdf

[14] N. N. Taleb.The Black Swan.Random House, New York, 2007.

[15] G. U. Yule.A mathematical theory of evolution, based on theconclusions of Dr J. C. Willis, F.R.S.Phil. Trans. B, 213:21–, 1924.

[16] Y.-X. Zhu, J. Huang, Z.-K. Zhang, Q.-M. Zhang,T. Zhou, and Y.-Y. Ahn.Geography and similarity of regional cuisines inchina.PLoS ONE, 8:e79161, 2013. pdf

Page 160:  · PoCS|@pocsvox Power-LawSize Distributions OurIntuition Definition Examples Wildvs.Mild CCDFs Zipf’slaw Zipf⇔CCDF Appendix References......... 14of52

PoCS|@pocsvox

Power-Law SizeDistributions

Our Intuition

Definition

Examples

Wild vs. Mild

CCDFs

Zipf’s law

Zipf ⇔ CCDF

Appendix

References

................52 of 52

References V

[17] G. K. Zipf.Human Behaviour and the Principle ofLeast-Effort.Addison-Wesley, Cambridge, MA, 1949.