textbook statistics taleb

17
Probability, Risk and(Anti)fragility

Upload: dom-desicilia

Post on 13-Apr-2015

39 views

Category:

Documents


3 download

DESCRIPTION

fallacy of econometrics

TRANSCRIPT

Page 1: Textbook statistics taleb

Probability, Risk and(Anti)fragility

Page 2: Textbook statistics taleb

1Risk is Not in The Past (the Turkey Problem)

Introduction: Fragility, not Statistics

Fragility (Chapter 2) can be defined as an accelerating sensitivity to a harmful stressor: this response plots as a concave curve and mathemati-cally culminates in more harm than benefit from the disorder cluster [(i) uncertainty, (ii) variability, (iii) imperfect, incomplete knowledge, (iv)chance, (v) chaos, (vi) volatility, (vii) disorder, (viii) entropy, (ix) time, (x) the unknown, (xi) randomness, (xii) turmoil, (xiii) stressor, (xiv)error, (xv) dispersion of outcomes, (xvi) unknowledge. Antifragility is the opposite, producing a convex response that leads to more benefit than harm.We do not need to know the history and statistics of an item to measure its fragility or antifragility, or to be able to predict rare and random('black swan') events. All we need is to be able to assess whether the item is accelerating towards harm or benefit. The relation of fragility, convexity and sensitivity to disorder is thus mathematical and not derived from empirical data.

The risk of breaking of the coffee cup is not necessarily in the past time series of the variable; if fact surviving objects have to have had a “rosy” past.

The problem with risk management is that “past” time series can be (and actually are) unreliable. Some finance journalist (Bloomberg) wascommenting on my statement in Antifragile about our chronic inability to get the risk of a variable from the past with economic time series.“Where is he going to get the risk from since we cannot get it from the past? from the future?”, he wrote. Not really, you finance-imbecile:from the present, the present state of the system. This explains in a way why the detection of fragility is vastly more potent than that of risk -andmuch easier.

† But this is not just a problem with journalism. Naive inference from time series is incompatible with rigorous statistical inference; yet workers with time series believe that it is statistical inference.

Turkey ProblemsDefinition: Take, as of time T, a standard sequence X= 8Xt0+i Dt<i=0

N as the discretely monitored history of a function of the process Xtover the interval (t0, TD , T=t0 +N Dt, the estimator MT

XHA, f L is defined as

2 (Anti)Fragility- N. N. Taleb.nb

Page 3: Textbook statistics taleb

MTXHA, f L ª

⁄i=0N 1A f HXt0+iDtL

⁄i=0N 1A

where 1A : X ö{0,1} is an indicator function taking values 1 if X œ A and 0 otherwise, and f is a function of X. f(X)=1, f(X)=X, and f(X) =XN correspond to the probability, the first moment, and Nth moment, respectively.a) Standard Estimator. MtX(A,f) where f(x) =x and A is defined on the domain of the process X, standard measures from x, such as momentsof order z, etc., "as of period" T, The measure might be useful for the knowledge of a process, but remain insufficient for decision making as the decision-maker may beconcerned for risk management purposes with the left tail (for distributions that are not entirely skewed, such as purely loss functions such asdamage from earthquakes, terrorism, etc.

b) Standard Risk Estimator. The shortfall S= E[M|X<K] estimated by MTXHA, f Lª

⁄i=0N 1A Xt0+i Dt

⁄i=0N 1A

, A = (-¶, K], f(x) =x

Criterion: The measures M or S are considered to be an estimator over interval (t- N Dt, t] if and only if holds in expectation over the periodXt+i Dt, across counterfactuals of the process, with a threshold x, so |E[Mt+i Dt

X (A,f)]- MtX(A, f)]|< x . In other words, it should have somestability for the estimator not to be considered random: it estimates the “true” value of the variable.This is standard sampling theory. Actually, it is at the core of statistics. Let us rephrase:

Standard statistical theory doesn’t allow claims on estimators made in a given set unless these can “generalize”, that is, reproduce out ofsample, into the part of the series that has not taken place (or not seen), i.e., for time series, for t>t.

An Application: Test of Turkey Problem on Macroeconomic data

Performance of Standard Parametric Risk Estimators, f(x)= xn (Norm !2 )With economic variables one single observation in 10,000, that is, one single day in 40 years, can explain the bulk of the "kurtosis", a measureof "fat tails", that is, both a measure how much the distribution under consideration departs from the standard Gaussian, or the role of remoteevents in determining the total properties. For the U.S. stock market, a single day, the crash of 1987, determined 80% of the kurtosis. The sameproblem is found with interest and exchange rates, commodities, and other variables. The problem is not just that the data had "fat tails",something people knew but sort of wanted to forget; it was that we would never be able to determine "how fat" the tails were. Never.The implication is that those tools used in economics that are based on squaring variables (more technically, the Euclidian, or L-2 norm), suchas standard deviation, variance, correlation, regression, or value-at-risk, the kind of stuff you find in textbooks, are not valid scientifically (ex-cept in some rare cases where the variable is bounded). The so-called "p values" you find in studies have no meaning with economic andfinancial variables. Even the more sophisticated techniques of stochastic calculus used in mathematical finance do not work in economicsexcept in selected pockets.The results of most papers in economics based on these standard statistical methods—the kind of stuff people learn in statistics class—are thusnot expected to replicate, and they effectively don't. Further, these tools invite foolish risk taking. Neither do alternative techniques yieldreliable measures of rare events, except that we can tell if a remote event is underpriced, without assigning an exact value.From Taleb (2009), using Log returns,

Xt ª logP HtL

P Ht - i DtL

Take the measure MtXHH-¶, ¶L, X4L of the fourth noncentral moment

MtXHH-¶, ¶L, X4L ª ⁄i=0

N HXt-iDtL4

N

and the N-sample maximum quartic observation Max{Xt-iDt4 <i=0

N. Q(N) is the contribution of the maximum quartic variations

QHNL ª

Max 8Xt-iDt4 <i=0

N

⁄i=0N HXt-iDtL

4

(Anti)Fragility- N. N. Taleb.nb 3

Page 4: Textbook statistics taleb

VARIABLE Q HMax Quartic Contr.L N HyearsLSilver 0.94 46.SP500 0.79 56.CrudeOil 0.79 26.Short Sterling 0.75 17.Heating Oil 0.74 31.Nikkei 0.72 23.FTSE 0.54 25.JGB 0.48 24.Eurodollar Depo 1M 0.31 19.Sugar Ò11 0.3 48.Yen 0.27 38.Bovespa 0.27 16.Eurodollar Depo 3M 0.25 28.CT 0.25 48.DAX 0.2 18.

Description of dataset:

The entire dataset

Naively, the fourth moment expresses the stability of the second moment. The higher the variations the higher...

For a Gaussian (i.e., the distribution of the square of a Chi-square distributed variable) show the maximum contribution should be around .008± .0028

Performance of Standard NonParametric Risk Estimators, f(x)= x or |x| (Norm !1)Does the past resemble the future?

0.001 0.002 0.003 0.004M@tD

0.001

0.002

0.003

0.004

M@t+1D

4 (Anti)Fragility- N. N. Taleb.nb

Page 5: Textbook statistics taleb

0.001 0.002 0.003 0.004M@tD

0.001

0.002

0.003

0.004

M@t+1D

Fig 1 Comparing M[t-1, t] and M[t,t+1], where t= 1year, 252 days, for macroeconomic data using extreme deviations, A= (-¶ ,-2 standard deviations (equivalent)], f(x) = x (replication of data from The Fourth Quadrant, Taleb, 2009)

Concentration of tail events without predecessors

Concentration of tail events without successors

0.0001 0.0002 0.0003 0.0004 0.0005M@tD

0.0001

0.0002

0.0003

0.0004

M@t+1D

Fig 2 This are a lot worse for large deviations A= (-¶ ,-4 standard deviations (equivalent)], f(x) = x

0.005 0.010 0.015 0.020 0.025 0.030M@tD

0.005

0.010

0.015

0.020

0.025

0.030M@t+1D

Fig 3 The “regular” is predictive of the regular, that is mean deviation. Comparing M[t] and M[t+1 year] for macroeconomic data using regular deviations, A= (-¶ ,¶) , f(x)= |x|

Typical Manifestations of The Turkey Surprise

(Anti)Fragility- N. N. Taleb.nb 5

Page 6: Textbook statistics taleb

200 400 600 800 1000

-50

-40

-30

-20

-10

10

Fig x The Turkey Problem (The Black Swan, 2007/2010)

When the generating process is powerlaw with low exponent, plenty of confusion can take place. For instance, take Pinker(2011) claiming thatthe generating process has a tail exponent ~1.15 and drawing quantitative conclusions from it. The next two figures show the realizations of twosubsamples, one before, and the other after the turkey problem, illustrating the inability of a set to deliver true probabilities.

Fig x First 100 years (Sample Path): A Monte Carlo generated realization of a process of the "80/20 or 80/02 style", that is tail exponent a= 1.1

Fig x: The Turkey Surprise: Now 200 years, the second 100 years dwarf the first; these are realizations of the exact same process, seen with a longer widow and at a different scale.

6 (Anti)Fragility- N. N. Taleb.nb

Page 7: Textbook statistics taleb

Summary and Conclusion

The Problem With The Use of Statistics in Social ScienceMany social scientists do not have a clear idea of the difference between science and journalism, or the one between rigorous empiricism andanecdotal statements. Science is not about making claims about a sample, but using a sample to make general claims and discuss properties thatapply outside the sample.Take M* the estimator we saw above from the realizations (a sample path) for some process, and M the "true" mean that would emanate fromknowledge of the generating process for such variable. When someone says: "Crime rate in NYC dropped between 2000 and 2010", the claim isabout M* the observed mean, not M the true mean, hence the claim can be deemed merely journalistic, not scientific, and journalists are there toreport "facts" not theories. No scientific and causal statement should be made from M* on "why violence has dropped" unless one establishes alink to M the true mean. M* cannot be deemed "evidence" by itself. Working with M* cannot be called "empiricism". What I just wrote is at the foundation of statistics (and, it looks like, science). Bayesians disagree on how M* converges to M, etc., never on thispoint. From his statements, Pinker seems to be aware that M* may have dropped (which is a straight equality) and sort of perhaps we might notbe able to make claims on M which might not have really been dropping.Now Pinker is excusable. The practice is widespread in social science where academics use mechanistic techniques of statistics withoutunderstanding the properties of the statistical claims. And in some areas not involving time series, the differnce between M* and M is negligi-ble. So I rapidly jot down a few rules before showing proofs and derivations (limiting M to the arithmetic mean). Where E is the expectationoperator under "real-world" probability measure P:

Tails Sampling Property: E[|M*-M|] increases in with fat-tailedness (the mean deviation of M* seen from the realizations in different samplesof the same process). In other words, fat tails tend to mask the distributional properties.

Counterfactual Property: Another way to view the previous point, m[M*], The distance between different values of M* one gets fromrepeated sampling of the process (say counterfactual history) increases with fat tails.

Survivorship Bias Property: E[M*-M ] increases under the presence of an absorbing barrier for the process. (Casanova effect)

Left Tail Sample Insufficiency: E[M*-M] increases with negative skeweness of the true underying variable.

Asymmetry in Inference: Under both negative skewness and fat tails, negative deviations from the mean are more informational than positivedeviations.

Power of Extreme Deviations (N=1 is OK): Under fat tails, large deviations from the mean are vastly more informational than small ones.They are not "anecdotal". (The last two properties corresponds to the black swan problem).

The Problem of Past Time SeriesThe four aspects of what we will call the nonreplicability issue, particularly for mesures that are in the tails:

a- Statistical rigor (or Pinker Problem). The idea that an estimator is not about fitness to past data, but related to how it can capture futurerealizations of a process seems absent from the discourse. Much of econometrics/risk management methods do not meet this simple point andthe rigor required by orthodox, basic statistical theory. b- Statistical argument on the limit of knowledge of tail events. Problems of replicability are acute for tail events. Tail events are impossibleto price owing to the limitations from the size of the sample. Naively rare events have little data hence what estimator we may have is noisier. c- Mathematical argument about statistical decidability. No probability without metaprobability. Metadistributions matter more with tailevents, and with fat-tailed distributions.

† The soft problem: we accept the probability distribution, but the imprecision in the calibration (or parameter errors) percolates in the tails.† The hard problem (Taleb and Pilpel, 2001, Taleb and Douady, 2009): We need to specify an a priori probability distribution from which

we depend, or alternatively, propose a metadistribution with compact support.† Both problems are bridged in that a nested stochastization of standard deviation (or the scale of the parameters) for a Gaussian turn a

thin-tailed distribution into a powerlaw (and stochastization that includes the mean turns it into a jump-diffusion or mixed-Poisson).d- Economic arguments: The Friedman-Phelps and Lucas critiques, Goodhart’s law. Acting on statistical information (a metric, a response)changes the statistical properties of some processes.

(Anti)Fragility- N. N. Taleb.nb 7

Page 8: Textbook statistics taleb

2Preasymptotics and The Central Limit in the Real World

An Erroneous Notion of Limit:

Take the conventional formulation of the Central Limit Theorem (Grimmet & Stirzaker, 1982; Feller 1971, Vol. II):

Let X1,X2,... be a sequence of independent identically distributed random variables with mean m & variance s 2satisfying m< ¶ and 0<s 2<¶, then

⁄i=1N Xi - N m

s nØDN H0, 1L as n Ø ¶

Where ØD is converges "in distribution".Granted convergence "in distribution" is about the weakest form of convergence. Effectively we are dealing with a double problem. The first, as uncovered by Jaynes, corresponds to the abuses of measure theory: Some properties that hold at infinity might not hold in alllimiting processes --a manifestation of the classical problem of uniform and pointwise convergence. ‡ Jaynes 2003 (p.44):"The danger is that the present measure theory notation presupposes the infinite limit already accomplished, but contains

no symbol indicating which limiting process was used (...) Any attempt to go directly to the limit can result in nonsense".Granted Jaynes is still too Platonic (he falls headlong for the Gaussian by mixing thermodynamics and information). But we accord with himon this point --along with the definition of probability as information incompleteness, about which later.

The second problem is that we do not have a "clean" limiting process --the process is itself idealized.

Now how should we look at the Central Limit Theorem? Let us see how we arrive to it assuming "independence".

The Problem of ConvergenceThe CLT works does not fill-in uniformily, but in a Gaussian way --indeed, disturbingly so. Simply, whatever your distribution (assuming onemode), your sample is going to be skewed to deliver more central observations, and fewer tail events. The consequence is that, under aggrega-tion, the sum of these variables will converge "much" faster in the body of the distribution than in the tails. As N, the number of observationsincreases, the Gaussian zone should cover more grounds... but not in the "tails".This quick note shows the intuition of the convergence and presents the difference between distributions.

Take the sum of of random independent variables Xi with finite variance under distribution j(X). Assume 0 mean for simplicity (and symme-try, absence of skewness to simplify).A more useful formulation of the Central Limit Theorem (Kolmogorov et al,x)

PB-u § Z =⁄i=0n Xi

n s§ uF =

1

2 p‡-u

ue-

Z2

2 „Z

So the distribution is going to be:

8 (Anti)Fragility- N. N. Taleb.nb

Page 9: Textbook statistics taleb

1 - ‡-u

ue-

Z2

2 „Z for - u § z § u

inside the "tunnel" [-u,u] --the odds of falling inside the tunnel itself and

‡-¶

uj'@nD HZL „z + ‡

u

j'@nD HZL „z

outside the tunnel [-u,u]

Where j'[n] is the n-summed distribution of j.How j'[n] behaves is a bit interesting here --it is distribution dependent.

Width of the Tunnel [-u,u]Clearly we do not have a "tunnel", but rather a statistical area for crossover points. I break it into two situations:

1) Case 1: The distribution j(x) is not scale free, i.e., for x large, j Hn xL

j HxL> j H2 n xL

j Hn xL; in other words the distribution has an exponential

tail e-k x. In other words:it has all the moments.2) Case 2: The distribution j(x) is scale free, i.e., for x large, j Hn xL

j HxLdepends on n not x.

Dealing With the Distribution of the Summed distribution j Assume the very simple case of a mixed distribution, where X follows a Gaussian (m1, s1) with probability p and with probability (1-p) followsanother Gaussian (m2, s2). Where (1-p) is very small, large, m2is very large and s2small we can be dealing with a jump (at the limit it becomesa Poisson). Alternatively, a route I am taking here for simplification of the calculations, I can take means of 0, and variance in the smallprobability case to be very large, leading to a huge, but unlikely jump.Take x(t) the characteristic function, x nthe one under n-convolutions.

3) Using Log Cumulants & Observing Convergence to the GaussianThe normalized cumulant of order n, C(n) is the derivative of the log of the characteristic function f which we convolute N times divided bythe second cumulant (i,e., second moment).

CHn, NL =H-ÂLn !n logHfNL

H-!2 logHfLNLn-1ê.z Ø 0

Since C(N+M)=C(N)+C(M), the additivity of the Log Characteristic function under convolution makes it easy to see the speed of the conver-gence to the Gaussian.Fat tails implies that higher moments implode --not just the 4th .Table of Normalized Cumulants -Speed of Convergence (Dividing by snwhere n is the order of the cumulant).

Distribution Normal@m, sD PoissonHlL ExponentialHlL GHa, bL

PDF ‰-Hx-mL2

2s2

2 p s

‰-l lx

x!‰-x l l b-a ‰

-x

b xa-1

GHaL

N - convolutedLog Characteristic

N log ‰Â z m-z2 s2

2 N logI‰I-1+‰Â zM lM N logJ l

l-Â zN N logHH1 - Â b zL-aL

2 nd Cum 1 1 1 1

3 rd 0 1N l

2 lN

2a b N

4 th 0 1N2 l2

3! l2

N23!

a2 b2 N2

(Anti)Fragility- N. N. Taleb.nb 9

Page 10: Textbook statistics taleb

5 th 0 1N3 l3

4! l3

N34!

a3 b3 N3

6 th 0 1N4 l4

5! l4

N45!

a4 b4 N4

7 th 0 1N5 l5

6! l5

N56!

a5 b5 N5

8 th 0 1N6 l6

7! l6

N67!

a6 b6 N6

9 th 0 1N7 l7

8! l7

N78!

a7 b7 N7

10 th 0 1N8 l8

9! l8

N89!

a8 b8 N8

Distribution Mixed GaussiansHStoch VolL

StudentTH3L StudentTH4L

PDF p ‰-

x2

2s12

2 p s1+ H1 - pL ‰

-x2

2s22

2 p s2

6 3

p Ix2+3M212 I

1x2+4

M5ê2

N - convolutedLog Characteristic

N log p ‰-z2 s1

2

2 + H1 - pL ‰-z2 s2

2

2 N JlogJ 3 †z§ + 1N - 3 †z§N N logH2 †z§2 K2H2 †z§LL

2 nd Cum 1 1 13 rd 0 Ind Ñ

4 th -I3 H-1 + pL p Hs12 - s2

2L2M

ë IN2 Hp s12 - H-1 + pLs22L

3M

Ind Ind

5 th 0 Ind Ind6 th I15 H-1 + pL p H-1 + 2 pL Hs12 - s2

2L3M

ë IN4 Hp s12 - H-1 + pLs22L

5M

Ind Ind

Note: On "Infinite Kurtosis"- DiscussionNote on Chebyshev's Inequality and upper bound on deviations under finite variance

Even then finite variance not considering that it still does not mean much. Consider Chebyshev's inequality:

P@X > aD §s2

a2

P@X > n s D §1

n2

Which effectively accommodate power laws but puts a bound on the probability distribution of large deviations --but still significant.

The Effect of Finiteness of Variance

This table shows the probability of exceeding a certain s for the Gaussian and the lower on probability limit for any distribution with finitevariance.Deviation3

Gaussian7. µ 102

Chebyshev Upper Bound9

4 3. µ 104 16

5 3. µ 106 25

6 1. µ 109 36

7 8. µ 1011 49

8 2. µ 1015 64

9 9. µ 1018 81

10 (Anti)Fragility- N. N. Taleb.nb

Page 11: Textbook statistics taleb

10 1. µ 1023 100

Extreme Value Theory: FuhgetabouditExtreme Value Theory has been considered a panacea for dealing with extreme events by some “risk modelers” . On paper it looks great. Butonly on paper. The problem is the calibration and parameter uncertainty --in the real world we don’t know the parameters. The ranges in theprobabilities generated we get are monstrous. This is a short presentation of the idea, followed by an exposition of the difficulty.

What is Extreme Value Theory? A Simplified ExpositionCase 1, Thin Tailed Distribution

Let us proceed with a simple example.

The Extremum of a Gaussian variable: Say we generate N Gaussian variables 8Zi<i=1N with mean 0 and unitary standard deviation, and takethe highest value we find. We take the upper bound Ej for the N-size sample run j

E j = Max 9Zi, j=i=1N

Assume we do so M times, to get M samples of maxima for the set E

E = 9Max 9Zi, j=i=1N

= j=1M

The next figure will plot a histogram of the result.

Figure 1: Taking M samples of Gaussian maxima; here N= 30,000, M=10,000. We get the Mean of the maxima = 4.11159 Standard Deviation= 0.286938; Median = 4.07344

Let us fit to the sample an Extreme Value Distribution (Gumbel) with location and scale parameters a and b, respectively: f(x;a,b) = ‰-‰

-x+a

b +-x+a

b

b

(Anti)Fragility- N. N. Taleb.nb 11

Page 12: Textbook statistics taleb

Figure 2: Fitting an extreme value distribution (Gumbel) a= 3.97904, b= 0.235239

So far, beautiful. Let us next move to fat(ter) tails.

Case 2, Fat-Tailed Distribution

Now let us generate, exactly as before, but change the distribution, with N random powerlaw distributed variables Zi, with tail exponent m=3,generated from a Student T Distribution with 3 degrees of freedom. Again, we take the upper bound. This time it is not the Gumbel, but the

Fréchet distribution that would fit the result, Fréchet f(x; a,b)= ‰-

x

b

-a

a Jx

bN-1-a

b, for x>0

Figure 3: Fitting a Fréchet distribution tothe Student T generated with m=3 degrees of freedom. The Frechet distribution a=3, b=32 fits up to higher values of E. But next two graphs shows the fit more closely.

12 (Anti)Fragility- N. N. Taleb.nb

Page 13: Textbook statistics taleb

Figure 5: Q-Q plot. Fits up to extremely high values of E, the rest of course owing to sample insuficiency for extremely large values, a bias that typically causes the underestimation of tails, as the points tend to fall to the right.

Figure 4: Seen more closely

How Extreme Value Has a Severe Inverse Problem In the Real WorldIn the previous case we start with the distribution, with the assumed parameters, then get the corresponding values, as these “risk modelers” do.In the real world, we don’t quite know the calibration, the a of the distribution, assuming (generously) that we know the distribution. So herewe go with the inverse problem. The next table illustrates the different calibrations of PK the probabilities that the maximum exceeds a certainvalue K (as a multiple of b under different values of K and a.

(Anti)Fragility- N. N. Taleb.nb 13

Page 14: Textbook statistics taleb

a1

P>3 b

1

P>10 b

1

P>20 b

1. 3.52773 10.5083 20.50421.25 4.46931 18.2875 42.79681.5 5.71218 32.1254 89.94371.75 7.3507 56.7356 189.6492. 9.50926 100.501 400.52.25 12.3517 178.328 846.3972.5 16.0938 316.728 1789.352.75 21.0196 562.841 3783.473. 27.5031 1000.5 8000.53.25 36.0363 1778.78 16 918.43.5 47.2672 3162.78 35 777.63.75 62.048 5623.91 75 659.84. 81.501 10 000.5 160 000.4.25 107.103 17 783.3 338 359.4.5 140.797 31 623.3 715 542.

4.75 185.141 56 234.6 1.51319 µ 106

5. 243.5 100 001. 3.2 µ 106

Consider that the error in estimating the a of a distribution is quite large, often > 1/2, and typically overstimated. So we can see that we get theprobabilities mixed up > an order of magnitude. In other words the imprecision in the computation of the a compounds in the evaluation ofthe probabilities of extreme values.

14 (Anti)Fragility- N. N. Taleb.nb

Page 15: Textbook statistics taleb

3On the Difference Between Binaries and Vanillas

This explains how and where prediction markets (or, more general discussions of betting matters) do notcorrespond to reality and have little to do with exposures to fat tails and “Black Swan” effects. Elementary facts,but with implications. This show show, for instance, the “long shot bias” is misapplied in real life variables, whypolitical predictions are more robut than economic ones.This discussion is based on Taleb (1997) showing the difference between a binary and a vanilla option.

Definitions

A binary bet (or just “a binary” or “a digital”): a outcome with payoff 0 or 1 (or, yes/no, -1,1, etc.) Example: “prediction market”,“election”, most games and “lottery tickets”. Also called digital. Any statistic based on YES/NO switch.

Binaries are effectively bets on probability. They are rarely ecological, except for political predictions.(More technically, they are mapped by the Heaviside function.)

A exposure or “vanilla”: an outcome with no open limit: say “revenues”, “market crash”, “casualties from war”, “success”, “growth”,“inflation”, “epidemics”... in other words, about everything.

Exposures are generally “expectations”, or the arithmetic mean, never bets on probability, rather the pair probability ä payoff

A bounded exposure: an exposure(vanilla) with an upper and lower bound: say an insurance policy with a cap, or a lottery ticket. When theboundary is close, it approaches a binary bet in properties. When the boundary is remote (and unknown), it can be treated like a pure exposure.The idea of “clipping tails” of exposures transforms them into such a category.The Problem

The properties of binaries diverge from those of vanilla exposures. This note is to show how conflation of the two takes place: predictionmarkets, ludic fallacy (using the world of games to apply to real life),

1. They have diametrically opposite responses to skewness.2. They repond differently to fat-tailedness (sometimes in opposite directions). Fat tails makes binaries more tractable.3. Rise in complexity lowers the value of the binary and increases that of the exposure.

Some direct applications:

1- Studies of “long shot biases” that typically apply to binaries should not port to vanillas.

2- Many are surprised that I find many econometricians total charlatans, while Nate Silver to be immune to my problem. This explains why.

3- Why prediction markets provide very limited information outside specific domains.

4- Etc.

The Elementary Betting Mistake

One can hold beliefs that a variable can go lower yet bet that it is going higher. Simply, the digital and the vanilla diverge. PHX > X0L>1

2, but

EHXL < EHX0L. This is normal in the presence of skewness and extremely common with economic variables. Philosophers have a relatedproblem called the lottery paradox which in statistical terms is not a paradox.The Elementary Fat Tails Mistake

A slightly more difficult problem. When I ask economists or social scientists, “what happens to the probability of a deviation >1s when youfatten the tail (while preserving other properties)?”, almost all answer: it increases (so far all have made the mistake). Wrong. They miss theidea that fat tails is the contribution of the extreme events to the total properties, and that it is a pair probability ä payoff that matters, not justprobability.

(Anti)Fragility- N. N. Taleb.nb 15

Page 16: Textbook statistics taleb

A slightly more difficult problem. When I ask economists or social scientists, “what happens to the probability of a deviation >1s when youfatten the tail (while preserving other properties)?”, almost all answer: it increases (so far all have made the mistake). Wrong. They miss theidea that fat tails is the contribution of the extreme events to the total properties, and that it is a pair probability ä payoff that matters, not justprobability. I’ve asked variants of the same question. “The Gaussian distribution spends 68.2% of the time between ± 1 standard deviation. The real worldhas fat tails. In finance, how much time do stocks spend between ± 1 standard deviations?” The answer has been invariably “lower”. Why?“Because there are more deviations.” Sorry, there are fewer deviations: stocks spend between 78% and 98% between ± 1 standard deviations(computed from past samples).Some simple derivations: Let x follow a Gaussian distribution (m , s). Assume m=0 for the exercise. What is the probability of exceeding one

standard deviation? P>1 s= 1 - 12

erfcK- 1

2O , where erfc is the complimentary error function, P>1 s = P< 1 s>15.86% and the probability of

staying within the “stability tunnel” between ± 1 s is > 68.2 %.

Let us fatten the tail, using a standard method of linear combination of two Gaussians with two standard deviations separated by s 1 + a ands 1 - a , where a is the "vvol" (which is variance preserving, a technically of no big effect here, as a standard deviation-preserving spread-ing gives the same qualitative result). Such a method leads to immediate raising of the Kurtosis by a factor of I1 + a2M sinceE Ix4M

E Ix2M2= 3 Ha2 + 1L

P>1s = P< 1s = 1 -1

2erfc -

1

2 1 - a-

1

2erfc -

1

2 a + 1

So then, for different values of a as we can see, the probability of staying inside 1 sigma increases.

-4 -2 2 4

0.1

0.2

0.3

0.4

0.5

0.6

Fatter and fatter tails: different values of a. We notice that higher peak ïlower probability of nothing leaving the ±1 s tunnel

The Event Timing Mistake

Fatter tails increases time spent between deviations, giving the illusion of absence of volatility when in fact events are delayed and made worse(my critique of the “Great Moderation”).Stopping Time & Fattening of the tails of a Brownian Motion: Consider the distribution of the time it takes for a continuously monitoredBrownian motion S to exit from a "tunnel" with a lower bound L and an upper bound H. Counterintuitively, fatter tails makes an exit (at somesigma) take longer. You are likely to spend more time inside the tunnel --since exits are far more dramatic.y is the distribution of exit time t, where t ª inf {t: S – [L,H]}From Taleb (1997) we have the following approximation

yH t sL =1

HlogHHL - logHLLL2‰-

1

8It s2M p s2

‚n=1

m 1

H LH-1Ln ‰

-n2 p2 t s2

2 HlogHHL-logHLLL2 n S L sinn p HlogHLL - logHSLL

logHHL - logHLL- H sin

n p HlogHHL - logHSLL

logHHL - logHLL

and the fatter-tailed distribution from mixing Brownians with s separared by a coefficient a:

16 (Anti)Fragility- N. N. Taleb.nb

Page 17: Textbook statistics taleb

yHt s, aL =1

2pH t s H1 - aLL +

1

2pHt s H1 + aLL

This graph shows the lengthening of the stopping time between events coming from fatter tails.

2 4 6 8Exit Time

0.1

0.2

0.3

0.4

0.5

Probability

0.1 0.2 0.3 0.4 0.5 0.6 0.7v

3

4

5

6

7

8

Expected t

More Complicated : MetaProbabilities (TK)

{this is a note. a more advanced discussion explains why more uncertain mean (vanilla) might mean less uncertain probability (prediction), etc.Also see the “Why We Don’t Know What We Are Talking About When We Talk About Probability”.

(Anti)Fragility- N. N. Taleb.nb 17