appendix a symbols, useful formulas, and normal table978-1-4419-9634-3/1.pdf · a. dasgupta,...

35
Appendix A Symbols, Useful Formulas, and Normal Table A.1 Glossary of Symbols sample space P.B j A/ conditional probability f.1/;:::;.n/g permutation of f1;:::;ng F CDF N F 1 F F 1 ;Q quantile function iid, IID independent and identically distributed p.x;y/;f.x 1 ;:::;x n / joint pmf; joint density F.x 1 ;:::;x n / joint CDF f.y j x/;E.Y j X D x/ conditional density and expectation var; Var variance Var.Y j X D x/ conditional variance Cov covariance X;Y correlation G.s/;.t/ generating function; mgf or characteristic function .t 1 ;:::;t n / joint mgf ˇ; skewness and kurtosis k E.X / k m k sample kth central moment r r th cumulant r r th standardized cumulant; correlation of lag r r; polar coordinates J Jacobian F n empirical CDF F 1 n sample quantile function X .n/ sample observation vector .X 1 ;:::;X n / M n sample median X .k/ ;X kWn kth-order statistic A. DasGupta, Probability for Statistics and Machine Learning: Fundamentals, 747 and Advanced Topics, Springer Texts in Statistics, DOI 10.1007/978-1-4419-9634-3, c Springer Science+Business Media, LLC 2011

Upload: others

Post on 31-May-2020

19 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Appendix A Symbols, Useful Formulas, and Normal Table978-1-4419-9634-3/1.pdf · A. DasGupta, Probability for Statistics and Machine Learning: Fundamentals, 747 and Advanced Topics,

Appendix ASymbols, Useful Formulas, and Normal Table

A.1 Glossary of Symbols

� sample spaceP.B jA/ conditional probabilityf�.1/; : : : ; �.n/g permutation of f1; : : : ; ngF CDFNF 1 � F

F�1;Q quantile functioniid, IID independent and identically distributedp.x; y/; f .x1; : : : ; xn/ joint pmf; joint densityF.x1; : : : ; xn/ joint CDFf .y j x/;E.Y jX D x/ conditional density and expectationvar; Var varianceVar.Y jX D x/ conditional varianceCov covariance�X;Y correlationG.s/; .t/ generating function; mgf or characteristic function .t1; : : : ; tn/ joint mgfˇ; skewness and kurtosis�k E.X � �/k

mk sample kth central moment�r r th cumulant�r r th standardized cumulant; correlation of lag rr; � polar coordinatesJ JacobianFn empirical CDFF�1n sample quantile functionX .n/ sample observation vector .X1; : : : ; Xn/Mn sample medianX.k/; XkWn kth-order statistic

A. DasGupta, Probability for Statistics and Machine Learning: Fundamentals, 747and Advanced Topics, Springer Texts in Statistics, DOI 10.1007/978-1-4419-9634-3,c� Springer Science+Business Media, LLC 2011

Page 2: Appendix A Symbols, Useful Formulas, and Normal Table978-1-4419-9634-3/1.pdf · A. DasGupta, Probability for Statistics and Machine Learning: Fundamentals, 747 and Advanced Topics,

748 Appendix A Symbols, Useful Formulas, and Normal Table

Wn sample rangeIQR interquartile rangeP);

P! convergence in probabilityop.1/ convergence in probability to zeroOp.1/ bounded in probabilityan bn 0 < lim inf an

bn lim sup an

bn< 1

an bn; an � bn lim anbn

D 1a:s:);

a:s:! almost sure convergencew.p. 1 with probability 1a.e. almost everywherei.o. infinitely oftenL);

L! convergence in distributionr);

r! convergence in r th meanu. i. uniformly integrableLIL law of iterated logarithmVST variance stabilizing transformationıx point mass at xP.fxg/ probability of the point x� Lebesgue measure� convolution� absolutely continuousdPd�

Radon–Nikodym derivativeN

product measure; Kronecker productI.�/ Fisher information function or matrixT natural parameter spacef .� j x/; �.� jX .n// posterior density of �LRT likelihood ratio testƒn likelihood ratioS sample covariance matrixT 2 Hotelling’s T 2 statisticMLE maximum likelihood estimatel.�; X/ complete data likelihood in EML.�;X/ log l.�; X/l.�; Y / likelihood for observed dataL.�; Y / log l.�; Y /OLk.�; Y / function to be maximized in M-stepHBoot bootstrap distribution of a statisticP� bootstrap measurepij transition probabilities in a Markov chain

Page 3: Appendix A Symbols, Useful Formulas, and Normal Table978-1-4419-9634-3/1.pdf · A. DasGupta, Probability for Statistics and Machine Learning: Fundamentals, 747 and Advanced Topics,

A.1 Glossary of Symbols 749

pij .n/ n-step transition probabilitiesTi first passage time� stationary distribution of a Markov chainSn random walk; partial sums�.x; n/; �.x; T / local timeW.t/; B.t/ Brownian motion and Brownian bridgeW d .t/ d -dimensional Brownian motion�.s; t/ covariance kernelX.t/;N.t/ Poisson process… Poisson point process’n.t/; un.y/ uniform empirical and quantile processFn.t/; ˇn.t/ general empirical processPn empirical measureDn Kolmogorov–Smirnov statisticMCMC Markov chain Monte Carlo�ij proposal probabilitiesij acceptance probabilitiesS.n; C/ shattering coefficientSLEM second largest eigenvalue in modulus�.P / Dobrushin’s coefficientV.x/ energy or drift functionVC.C/ VC dimensionN.;F ; jj:jj/ covering numberNbc.;F ; jj:jj/ bracketing numberD.x; ; P / packing number�.x/ intensity function of a Poisson processR real lineRd d -dimensional Euclidean spaceC.X/ real-valued continuous functions on XCk.R/ k times continuously differentiable functionsC0.R/ real continuous functions f on R such that

f .x/ ! 0 as jxj ! 1F family of functions5;4 gradient vector and Laplacianf .m/ mth derivativeDkf .x1; : : : ; xn/

Pm1;m2;:::;mn�0;m1C:::mnDk

@m1f

@xm11

: : : @mnf

@xmnn

DC;DC Dini derivativesjj:jj Euclidean normjj:jj1 supnormtr trace of a matrixjA j determinant of a matrixK.x/;K.x; y/ kernels

Page 4: Appendix A Symbols, Useful Formulas, and Normal Table978-1-4419-9634-3/1.pdf · A. DasGupta, Probability for Statistics and Machine Learning: Fundamentals, 747 and Advanced Topics,

750 Appendix A Symbols, Useful Formulas, and Normal Table

jjxjj1; jjxjj L1; L2 norm in Rn

H Hilbert spacejjxjjH Hilbert norm.x; y/ inner productg.X/; �n classification rules�x; �.x/ Mercer’s feature mapsB.x; r/ sphere with center at x and radius rU;U 0; NU domain, interior, and closureIA indicator function of AI.t/ large deviation rate functionfg fractional partb:c integer partsgn, sign signum functionxC; xC maxfx; 0gmax;min maximum, minimumsup; inf supremum, infimumK� ; I� ; J� Bessel functionsLp.�/; L

p.�/ set of functions such thatR jf jpd� < 1

d;L; � Kolmogorov, Levy, and total variation distanceH;K Hellinger and Kullback–Leibler distanceW; `2 Wasserstein distanceD separation distancedf f -divergenceN.�; �2/ normal distribution�;ˆ standard normal density and CDFNp.�;†/;MVN.�;†/ multivariate normal distributionBVN bivariate normal distributionMN.n; p1; : : : ; pk/ multinomial distribution with these parameterstn; t.n/ t-distribution with n degrees of freedomBer(p), Bin(n; p) Bernoulli and binomial distributionPoi(�) Poisson distributionGeo(p) geometric distributionNB(r; p) negative binomial distributionExp.�) exponential distribution with mean �Gamma.’; �/ Gamma density with shape parameter ’, and

scale parameter ��2n; �

2.n/ chi-square distribution with n degrees of freedomC.�; �) Cauchy distributionDn.’/ Dirichlet distributionWp.k;†/ Wishart distributionDoubleExp.�; �/ double exponential with parameters �; �

Page 5: Appendix A Symbols, Useful Formulas, and Normal Table978-1-4419-9634-3/1.pdf · A. DasGupta, Probability for Statistics and Machine Learning: Fundamentals, 747 and Advanced Topics,

A.2 Moments and MGFs of Common Distributions 751

A.2

Mom

ents

and

MG

Fs

ofC

omm

onD

istr

ibut

ions

Dis

cret

eD

istr

ibut

ions

Dis

trib

utio

np.x/

Mea

nV

aria

nce

Skew

ness

Kur

tosi

sM

GF

Uni

form

1 n;x

D1;:::;n

nC1

2n2�1

12

0�6

.n2C1/

5.n2�1/

e.n

C1/t

�et

n.et�1/

Bin

omia

l� n x

� px.1

�p/n

�x;x

D0;:::;n

np

np.1

�p/

1�2p

p

np.1

�p/

1�6p.1

�p/

np.1

�p/

.petC1

�p/n

Pois

son

e���x

xŠ;x

D0;1;:::

��

1p

1 �e�.et�1/

Geo

met

ric

p.1

�p/x

�1;x

D1;2;:::

1 p

1�p

p2

2�p

p

1�p

6C

p2

1�p

pet

1�.1

�p/et

Neg

.Bin

.� x

�1

r�1

� pr.1

�p/x

�r;x

�r;

r p

r.1

�p/

p2

2�p

p

r.1

�p/

6 rC

p2

r.1

�p/

.pet

1�.1

�p/et/r

Hyp

erge

om.D x/.N

�D

n�x/

.Nn/

nD N

nD N.1

�D N/N

�n

N�1

Com

plex

Com

plex

Com

plex

Ben

ford

log.1C

1 x/

log10;x

D1;:::;9

3.4

46.

057

.796

2.45

P9 x

D1etxp.x/

Page 6: Appendix A Symbols, Useful Formulas, and Normal Table978-1-4419-9634-3/1.pdf · A. DasGupta, Probability for Statistics and Machine Learning: Fundamentals, 747 and Advanced Topics,

752 Appendix A Symbols, Useful Formulas, and Normal Table

Con

tinu

ous

Dis

trib

utio

ns

Dis

trib

utio

nf.x/

Mea

nV

aria

nce

Skew

ness

Kur

tosi

s

Uni

form

1b�a;a

�x

�b

aCb

2

.b�a/2

12

0�6 5

Exp

onen

tial

e�x=�

�;x

�0

��2

26

Gam

ma

e�x=�x’

�1

�’�.’/;x

�0

’�

’�2

2p

’6 ’

�2 m

e�x=2xm=2�1

2m=2�.m 2/;x

�0

m2m

q8 m

12 m

Wei

bull

ˇ �.x �/ˇ

�1e

�.x �/ˇ;x>0

��.1

C1 ˇ/

�2�.1

C2 ˇ/

��2

�3�.1

C

3 ˇ/�3��2��3

�3

Com

plex

Bet

ax’

�1.1

�x/ˇ

�1

B.’;ˇ/

;0�x

�1

’’

’ˇ

.’Cˇ/2.’

C1/

2.ˇ

�’/p

’Cˇ

C1

p

’ˇ.’

C2/

Com

plex

Nor

mal

1

�p

2�e

�.x

��/2=.2�2/ ;x

2R�

�2

00

logn

orm

al1

�p

2�xe

.logx

��/2

2�2

;x>0

e�

C�2=2

.e�2

�1/e2�

C�2

e�2C2p e

�2

�1

Com

plex

(con

tinu

ed)

Page 7: Appendix A Symbols, Useful Formulas, and Normal Table978-1-4419-9634-3/1.pdf · A. DasGupta, Probability for Statistics and Machine Learning: Fundamentals, 747 and Advanced Topics,

A.2 Moments and MGFs of Common Distributions 753

Dis

trib

utio

nf.x/

Mea

nV

aria

nce

Skew

ness

Kur

tosi

s

Cau

chy

1��.1

C.x

��/2=�2/;x

2RN

one

Non

eN

one

Non

e

t m�.m

C1

2/

p

m��.m 2/

1.1

Cx2=m/.m

C1/=2;x

2R0(m

>1)

mm

�2(m

>2)

0(m>3)

6m

�4(m>4)

F.ˇ ’/ˇx’

�1

B.’;ˇ/.x

C

ˇ ’/’

Cˇ;x>0

ˇ

ˇ�1.ˇ>1/

ˇ2.’

�1/

’.ˇ

�2/.ˇ

�1/2.ˇ>2/

Com

plex

Com

plex

Dou

ble

Exp

.e

�jx

��

j=�

2�

;x2R

�2�2

03

Pare

to’�’

x’

C1;x

��>0

’�

’�1.’>1/

’�2

.’�1/2.’

�2/.’>2/

2.’

C1/

’�3

q’

�2

’.’>3/

Com

plex

Gum

bel

1 �e.�e

x��

�/e

x��

�;x

2R�

C�

�2 6�2

12p

6�.3/

�3

12 5

Not

e:Fo

rth

eG

umbe

ldis

trib

utio

n,

�:577216

isth

eE

uler

cons

tant

,and�.3/

isR

iem

ann’

sze

tafu

ncti

on�.3/

DP

1 nD11 n3

�1:20206.

Page 8: Appendix A Symbols, Useful Formulas, and Normal Table978-1-4419-9634-3/1.pdf · A. DasGupta, Probability for Statistics and Machine Learning: Fundamentals, 747 and Advanced Topics,

754 Appendix A Symbols, Useful Formulas, and Normal Table

Table of MGFs of Continuous Distributions

Distribution f .x/ MGF

Uniform 1b�a ; a x b ebt�eat

.b�a/t

Exponential e�x=�

�; x � 0 .1 � �t/�1.t < 1=�/

Gamma e�x=�x’�1

�’�.’/; x � 0 .1 � �t/�’.t < 1=�/

�2me�x=2xm=2�1

2m=2�.m2/; x � 0 .1 � 2t/�m=2.t < 1

2/

Weibull ˇ�.x�/ˇ�1e�. x

�/ˇ ; x > 0

P1nD0

.�t/n

nŠ�.1C n

ˇ/

Beta x’�1.1�x/ˇ�1

B.’;ˇ/; 0 x 1 1F1.’; ’C ˇ; t/

Normal 1

�p2�e�.x��/2=.2�2/; x 2 R et�Ct2�2=2

lognormal 1

�p2�x

e� .logx��/2

2�2 ; x > 0 None

Cauchy 1��.1C.x��/2=�2/ ; x 2 R None

tm�.mC1

2 /pm��.m

2/

1

.1Cx2=m/.mC1/=2 ; x 2 R None

F.ˇ’ /ˇx’�1

B.’;ˇ/.xCˇ’ /’Cˇ

; x > 0 None

Double Exp. e�jx��j=�

2�; x 2 R et�

1��2t2 .jt j < 1=�/

Pareto ’�’

x’C1 ; x � � > 0 None

Gumbel 1�e.�e�

x��� /e�x��

� ; x 2 R et��.1 � t�/.t < 1=�/

Page 9: Appendix A Symbols, Useful Formulas, and Normal Table978-1-4419-9634-3/1.pdf · A. DasGupta, Probability for Statistics and Machine Learning: Fundamentals, 747 and Advanced Topics,

A.3 Normal Table 755

A.3 Normal Table

Standard Normal Probabilities P.Z � t/ and Standard Normal Percentiles

Quantity tabulated in the next page is ˆ.t/ D P.Z t/ for given t � 0, whereZ N.0; 1/. For example, from the table, P.Z 1:52/ D :9357.

For any positive t; P.�t Z t/ D 2ˆ.t/�1, and P.Z > � t/DP.Z > t/ D1 �ˆ.t/.

Selected standard normal percentiles z’ are given below. Here, the meaning of z’is P.Z > z’/ D ’.

’ z’:25 :675

:2 :84

:1 1:28

:05 1:645

:025 1:96

:02 2:055

:01 2:33

:005 2:575

:001 3:08

:0001 3:72

Page 10: Appendix A Symbols, Useful Formulas, and Normal Table978-1-4419-9634-3/1.pdf · A. DasGupta, Probability for Statistics and Machine Learning: Fundamentals, 747 and Advanced Topics,

756 Appendix A Symbols, Useful Formulas, and Normal Table

0 1 2 3 4 5 6 7 8 9

0.0 0.5000 0.5040 0.5080 0.5120 0.5160 0.5199 0.5239 0.5279 0.5319 0.53590.1 0.5398 0.5438 0.5478 0.5517 0.5557 0.5596 0.5636 0.5675 0.5714 0.57530.2 0.5793 0.5832 0.5871 0.5910 0.5948 0.5987 0.6026 0.6064 0.6103 0.61410.3 0.6179 0.6217 0.6255 0.6293 0.6331 0.6368 0.6406 0.6443 0.6480 0.65170.4 0.6554 0.6591 0.6628 0.6664 0.6700 0.6736 0.6772 0.6808 0.6844 0.68790.5 0.6915 0.6950 0.6985 0.7019 0.7054 0.7088 0.7123 0.7157 0.7190 0.72240.6 0.7257 0.7291 0.7324 0.7357 0.7389 0.7422 0.7454 0.7486 0.7517 0.75490.7 0.7580 0.7611 0.7642 0.7673 0.7704 0.7734 0.7764 0.7794 0.7823 0.78520.8 0.7881 0.7910 0.7939 0.7967 0.7995 0.8023 0.8051 0.8078 0.8106 0.81330.9 0.8159 0.8186 0.8212 0.8238 0.8264 0.8289 0.8315 0.8340 0.8365 0.83891.0 0.8413 0.8438 0.8461 0.8485 0.8508 0.8531 0.8554 0.8577 0.8599 0.86211.1 0.8643 0.8665 0.8686 0.8708 0.8729 0.8749 0.8770 0.8790 0.8810 0.88301.2 0.8849 0.8869 0.8888 0.8907 0.8925 0.8944 0.8962 0.8980 0.8997 0.90151.3 0.9032 0.9049 0.9066 0.9082 0.9099 0.9115 0.9131 0.9147 0.9162 0.91771.4 0.9192 0.9207 0.9222 0.9236 0.9251 0.9265 0.9279 0.9292 0.9306 0.93191.5 0.9332 0.9345 0.9357 0.9370 0.9382 0.9394 0.9406 0.9418 0.9429 0.94411.6 0.9452 0.9463 0.9474 0.9484 0.9495 0.9505 0.9515 0.9525 0.9535 0.95451.7 0.9554 0.9564 0.9573 0.9582 0.9591 0.9599 0.9608 0.9616 0.9625 0.96331.8 0.9641 0.9649 0.9656 0.9664 0.9671 0.9678 0.9686 0.9693 0.9699 0.97061.9 0.9713 0.9719 0.9726 0.9732 0.9738 0.9744 0.9750 0.9756 0.9761 0.97672.0 0.9772 0.9778 0.9783 0.9788 0.9793 0.9798 0.9803 0.9808 0.9812 0.98172.1 0.9821 0.9826 0.9830 0.9834 0.9838 0.9842 0.9846 0.9850 0.9854 0.98572.2 0.9861 0.9864 0.9868 0.9871 0.9875 0.9878 0.9881 0.9884 0.9887 0.98902.3 0.9893 0.9896 0.9898 0.9901 0.9904 0.9906 0.9909 0.9911 0.9913 0.99162.4 0.9918 0.9920 0.9922 0.9925 0.9927 0.9929 0.9931 0.9932 0.9934 0.99362.5 0.9938 0.9940 0.9941 0.9943 0.9945 0.9946 0.9948 0.9949 0.9951 0.99522.6 0.9953 0.9955 0.9956 0.9957 0.9959 0.9960 0.9961 0.9962 0.9963 0.99642.7 0.9965 0.9966 0.9967 0.9968 0.9969 0.9970 0.9971 0.9972 0.9973 0.99742.8 0.9974 0.9975 0.9976 0.9977 0.9977 0.9978 0.9979 0.9979 0.9980 0.99812.9 0.9981 0.9982 0.9982 0.9983 0.9984 0.9984 0.9985 0.9985 0.9986 0.99863.0 0.9987 0.9987 0.9987 0.9988 0.9988 0.9989 0.9989 0.9989 0.9990 0.99903.1 0.9990 0.9991 0.9991 0.9991 0.9992 0.9992 0.9992 0.9992 0.9993 0.99933.2 0.9993 0.9993 0.9994 0.9994 0.9994 0.9994 0.9994 0.9995 0.9995 0.99953.3 0.9995 0.9995 0.9995 0.9996 0.9996 0.9996 0.9996 0.9996 0.9996 0.99973.4 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.99983.5 0.9998 0.9998 0.9998 0.9998 0.9998 0.9998 0.9998 0.9998 0.9998 0.99983.6 0.9998 0.9998 0.9999 0.9999 0.9999 0.9999 0.9999 0.9999 0.9999 0.99993.7 0.9999 0.9999 0.9999 0.9999 0.9999 0.9999 0.9999 0.9999 0.9999 0.99993.8 0.9999 0.9999 0.9999 0.9999 0.9999 0.9999 0.9999 0.9999 0.9999 0.99993.9 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.00004.0 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000

Page 11: Appendix A Symbols, Useful Formulas, and Normal Table978-1-4419-9634-3/1.pdf · A. DasGupta, Probability for Statistics and Machine Learning: Fundamentals, 747 and Advanced Topics,

Author Index

AAdler, R.J., 576Aitchison, J., 188Aizerman, M., 732Alexander, K., 545Alon, N., 20Aronszajn, N., 726Ash, R., 1, 249Athreya, K., 614, 667, 671, 694Azuma, K., 483

BBalakrishnan, N., 54Barbour, A., 34Barnard, G., 623Barndorff-Nielsen, O., 583Basu, D., 188, 208, 600, 604Basu, S., 570Beran, R., 545Berlinet, A., 731, 733Bernstein, S., 51, 52, 89Berry, A., 308Besag, J., 615, 623Bhattacharya, R.N., 1, 71, 76, 82, 249, 308,

339, 402, 408, 409, 414, 423Bickel, P.J., 230, 249, 282, 583, 603, 690, 693,

704, 714Billingsley, P., 1, 421, 531Blackwell, D., 188Borell, C., 572Bose, R., 208Braverman, E., 732Breiman, L., 1, 249, 402Bremaud, P., 339, 366, 614, 652, 654, 655, 658Brown, B.M., 498Brown, L.D., 76, 416, 429, 583, 600, 601, 612Brown, R., 401

Bucklew, J., 560Burkholder, D.L., 481, 482

CCai, T., 76Carlin, B., 614, 673Carlstein, E., 702Casella, G., 583, 600, 614, 704, 714Chan, K., 671, 714Cheney, W., 731, 734Chernoff, H., 51, 52, 68–70, 89Chervonenkis, A., 540, 541, 736Chibisov, D., 534Chow, Y.S., 249, 259, 264, 267, 276, 463Chung, K.L., 1, 381, 393, 394, 463Cirelson, B.S., 572Clifford, P., 615, 623Coles, S., 238Cowles, M., 673Cox, D., 157Cramer, H., 249Cressie, N., 570Cristianini, N., 736Csaki, E., 535Csorgo, M., 421, 423, 425, 427, 527, 536Csorgo, S., 421

DDasGupta, A., 1, 3, 71, 76, 215, 238, 243, 249,

257, 311, 317, 323, 327, 328, 333, 337,421, 429, 505, 527, 570, 695, 699, 704,714, 723

Dasgupta, S., 208David, H.A., 221, 228, 234, 236, 238, 323Davis, B., 481, 482de Haan, L., 323, 329, 331, 332, 334Deheuvels, P., 527, 553

757

Page 12: Appendix A Symbols, Useful Formulas, and Normal Table978-1-4419-9634-3/1.pdf · A. DasGupta, Probability for Statistics and Machine Learning: Fundamentals, 747 and Advanced Topics,

758 Author Index

del Barrio, E., 527, 553Dembo, A., 560, 570Dempster, A., 705den Hollander, F., 560Devroye, L., 485–488, 560, 725Diaconis, P., 67, 339, 505, 614, 615, 652, 657,

671Dimakos, X.K., 615Do, K.-A., 634Dobrushin, R.L., 652, 657Doksum, K., 249, 282, 583, 603, 704, 714Donsker, M., 421, 422, 530Doob, J.L., 463Doss, H., 614, 667, 671Dubhashi, D., 560Dudley, R.M., 1, 505, 508, 527, 530, 531, 541,

578Durrett, R., 402Dvoretzky, A., 242Dym, H., 390

EEaton, M., 208Efron, B., 570, 571, 690Eicker, F., 535Einmahl, J., 535Einmahl, U., 425Embrechts, P., 238Erdos, P., 421, 422Esseen, C., 308Ethier, S., 243Everitt, B., 54

FFalk, M., 238Feller, W., 1, 62, 71, 76, 77, 249, 254, 258,

277, 285, 307, 308, 316, 317, 339, 375,386, 389

Ferguson, T., 188, 249Fernique, X., 576Fill, J., 615, 657Finch, S., 382Fisher, R.A., 25, 212, 586, 602Fishman, G.S., 614, 624Freedman, D., 339, 402, 693Fristedt, B., 463, 472, 493, 496Fuchs, W., 381, 394

GGalambos, J., 221, 238, 323, 328, 331Gamerman, D., 614Garren, S., 674

Gelman, A., 614, 674Geman, D., 614Geman, S., 614Genz, A., 157Geyer, C., 614Ghosh, M., 208Gibbs, A., 505, 508, 509, 516Gilks, W., 614Gine, E., 527, 538, 541, 542, 545, 548, 570,

694Glauber, R., 646Gnedenko, B.V., 328Gotze, F., 570Gray, L., 463, 472, 493, 496Green, P.J., 615Groeneboom, P., 563Gundy, R.F., 481, 482Gyorfi, L., 560, 725

HHaff, L.R., 208, 327, 429Hall, P., 34, 71, 224, 231, 249, 308, 331, 421,

463, 497, 498, 560, 570, 623, 634, 690,691, 694, 701, 704

Hastings, W., 614Heyde, C., 249, 421, 463, 497, 498Higdon, D., 615Hinkley, D., 174Hobert, J., 671Hoeffding, W., 483, 570Horowitz, J., 704Hotelling, H., 210Husler, J., 238

IIbragimov, I.A., 508Isaacson, D., 339

JJaeschke, D., 535Jing, B., 704Johnson, N., 54Jones, G., 671

KKac, M., 421, 422Kagan, A., 69, 157Kamat, A., 156Karatzas, I., 414, 416, 417, 463Karlin, S., 402, 411, 427, 437, 463

Page 13: Appendix A Symbols, Useful Formulas, and Normal Table978-1-4419-9634-3/1.pdf · A. DasGupta, Probability for Statistics and Machine Learning: Fundamentals, 747 and Advanced Topics,

Author Index 759

Kass, R., 519Kemperman, J., 339Kendall, M.G., 54, 62, 65Kendall, W., 615Kesten, H., 254, 257Khare, K., 614, 671Kiefer, J., 242Kingman, J.F.C., 437, 439, 442, 451,

456, 457Kluppelberg, C., 238Knight, S., 673Komlos, J., 421, 425, 426, 537Korner, T., 417Kosorok, M., 527Kotz, S., 54Krishnan, T., 705, 711, 714, 739Kunsch, H.R., 702

LLahiri, S.N., 690, 702–704Laird, N., 705Landau, H.J., 577Lange, K., 714Lawler, G., 402, 437Le Cam, L., 34, 71, 509, 704Leadbetter, M., 221Ledolter, J., 714Ledoux, M., 560Lehmann, E.L., 238, 249, 583, 600,

699, 704Leise, F., 505Levine, R., 714Light, W., 731Lindgren, G., 221Linnik, Y., 69, 157Liu, J., 614, 657Logan, B.F., 570Lugosi, G., 560, 725

MMadsen, R., 339Mahalanobis, P., 208Major, P., 421, 425, 426, 537Mallows, C.L., 570Martynov, G., 238Mason, D.M., 535, 537, 570Massart, P., 242McDiarmid, C., 485, 487, 488, 560McKean, H., 390McLachlan, G., 705, 711, 714, 739Mee, R., 157Mengersen, K., 667, 669, 673

Metropolis, N., 614Meyn, S., 339Mikosch, T., 238Millar, P., 545Minh, H., 734Morris, C., 612Morters, P., 402, 420Murray, G.D., 712Mykland, P., 674

NNiyogi, P., 734Norris, J., 339, 366

OO’Reilly, N., 534Olkin, I., 208Oosterhoff, J., 563Owen, D., 157

PPaley, R.E., 20Panconesi, A., 560Parzen, E., 437Patel, J., 157Peres, Y., 402, 420Perlman, M., 208Petrov, V., 62, 249, 301, 308–311Pitman, J., 1Plackett, R., 157Politis, D., 702, 704Pollard, D., 527, 538, 548Polya, G., 382Port, S., 239, 264, 265, 302, 303, 306,

315, 437Propp, J., 615Pyke, R., 421

RRachev, S.T., 505Rao, C.R., 20, 62, 69, 157, 210, 505, 519, 520,

522Rao, R.R., 71, 76, 82, 249, 308Read, C., 157Reiss, R., 221, 231, 234, 236, 238, 285, 323,

326, 328, 505, 516Renyi, A., 375, 380, 389Resnick, S., 221, 402Revesz, P., 254, 420, 421, 423, 425,

427, 527

Page 14: Appendix A Symbols, Useful Formulas, and Normal Table978-1-4419-9634-3/1.pdf · A. DasGupta, Probability for Statistics and Machine Learning: Fundamentals, 747 and Advanced Topics,

760 Author Index

Revuz, D., 402Rice, S.O., 570Richardson, S., 614Ripley, B.D., 614Robert, C., 614, 624, 673Roberts, G., 614, 667Romano, J., 702Rootzen, H., 221Rosenblatt, M., 721Rosenbluth, A., 614Rosenbluth, M., 614Rosenthal, J.S., 615, 667, 671Ross, S., 1, 614, 624Roy, S., 208Rozonoer, L., 732Rubin, D., 614, 674, 705Rubin, H., 337, 628Rudin, W., 726, 730

SSaloff-Coste, L., 505, 614, 652, 671Sauer, N., 539, 540Schmeiser, B., 624Sen, P.K., 328, 329Seneta, E., 339, 361Serfling, R., 249, 311, 323, 326Sethuraman, J., 614, 667, 671Shao, J., 690, 695, 701Shao, Q.M., 560, 570, 571Shawe-Taylor, J., 736Shepp, L., 570, 577Shorack, G., 238, 527Shreve, S., 414, 416, 417, 463Singer, J., 328, 329Singh, K., 693, 699, 701Sinha, B., 208Smith, A.F.M., 614Smith, R.L., 674Sparre-Andersen, E., 389Spencer, J., 20Spiegelhalter, D., 614Spitzer, F., 375, 389, 390Steele, J.M., 34Stein, C., 67Stern, H., 614Stigler, S., 71Stirzaker, D., 1, 339Strassen, V., 425Strawderman, W.E., 429Stroock, D., 560, 614Stuart, A., 54, 62, 65Sturmfels, B., 615Su, F., 505, 508, 509, 516

Sudakov, V.N., 572Sundberg, R., 705

TTalagrand, M., 572, 578Tanner, M., 614, 714Taylor, H.M., 402, 411, 427, 437, 463Teicher, H., 249, 259, 264, 267, 276, 463Teller, A., 614Teller, E., 614Thomas-Agnan, C., 731, 733Thonnes, E., 615Tibshirani, R., 690Tierney, L., 614, 667, 668, 671Titterington, D.M., 623Tiwari, R., 188Tong, Y., 138, 154, 201, 203, 204, 208, 210,

211, 215, 698Tu, D., 690, 695, 701Tusnady, G., 421, 425, 426, 537Tweedie, R., 339, 667, 669

VVajda, I., 505van de Geer, S., 527, 553van der Vaart, A., 249, 310, 527, 546,

547, 557van Zwet, W.R., 537Vapnik, V., 540, 541, 736Varadhan, S.R.S., 456, 560Vos, P., 519

WWang, Q., 560Wasserman, L., 67Waymire, E., 1, 339, 402, 408, 409,

414, 423Wei, G., 714Wellner, J., 238, 310, 527, 538, 546,

547, 557Wermuth, N., 157White, A., 704Whitt, W., 421Widder, D., 53Williams, D., 463Wilson, B., 615Wolf, M., 702Wolfowitz, J., 242Wong, W., 614Wu, C.F.J., 711–713

Page 15: Appendix A Symbols, Useful Formulas, and Normal Table978-1-4419-9634-3/1.pdf · A. DasGupta, Probability for Statistics and Machine Learning: Fundamentals, 747 and Advanced Topics,

Author Index 761

Y

Yang, G., 704

Yao, Y., 734

Yor, M., 402

Yu, B., 674

ZZabell, S., 67Zeitouni, O., 560Zinn, J., 538, 694Zolotarev, V.M., 505Zygmund, A., 20

Page 16: Appendix A Symbols, Useful Formulas, and Normal Table978-1-4419-9634-3/1.pdf · A. DasGupta, Probability for Statistics and Machine Learning: Fundamentals, 747 and Advanced Topics,

Subject Index

AABO allele, 709–711Acceptance distribution, 664Accept-reject method

beta generation, 627–628described, 625generation, standard normal values,

626–627scheme efficiency, 628

Almost surely, 252, 253, 256, 259, 261–262,267, 287, 288, 314, 410, 461, 470,491–494, 496, 498, 503, 535, 537,542, 555, 576, 577, 578, 615, 618,632, 641, 694, 725

Ancillarydefinition, 601statistic, 602, 604, 605

Anderson’s inequality, 214, 218Annulus, Dirichlet problem, 417–418Aperiodic state, 351Approximation of moments

first-order, 278, 279scalar function, 282second-order, 279, 280variance, 281

Arc sine law, 386ARE. See Asymptotic relative efficiencyArrival rate, 552, 553

intensity function, 455Poisson process, 439, 441, 442, 446,

460–462Arrival time

definition, 438independent Poisson process, 445interarrival times, 438

Asymptotic independence, 337Asymptotic relative efficiency (ARE)

IQR-based estimation, 327

median, 335variance ratio, 325

Asymptoticsconvergence, distribution

CDF, 262CLT, 266Cramer–Wold theorem, 263–264Helly’s theorem, 264LIL, 267multivariate CLT, 267Polya’s theorem, 265Portmanteau theorem, 265–266

densities and Scheffe’s theorem, 282–286laws, large numbers

Borel-Cantelli Lemma, 254–256Glivenko-Cantelli theorem, 258–259strong, 256–257weak, 256–258

moments, convergence (see Convergenceof moments)

notation and convergence, 250–254preservation, convergence

continuous mapping, 260–261Delta theorem, 269–272multidimension, 260sample correlation, 261–262sample variance, 261Slutsky’s theorem, 268–269transformations, 259variance stabilizing transformations,

272–274Asymptotics, extremes and order statistics

application, 325–326several, 326–327single, 323–325

convergence, types theoremlimit distributions, types, 332Mills ratio, 333

distribution theory, 323

763

Page 17: Appendix A Symbols, Useful Formulas, and Normal Table978-1-4419-9634-3/1.pdf · A. DasGupta, Probability for Statistics and Machine Learning: Fundamentals, 747 and Advanced Topics,

764 Subject Index

Asymptotics, extremes and orderstatistics (cont.)

easy applicable limit theoremGumbel distribution, 331Hermite polynomial, 329

Fisher–Tippet familydefinition, 333theorem, 334–335

Avogadro’s constant, 401Azuma’s inequality, 483–486

BBallot theorem, 390Barker’s algorithm, 643Basu’s theorem

applications, probabilityconvergence result, 606–607covariance calculation, 605expectation calculation, 606exponential distribution result, 605mean and variance independence,

604–605exponential family, 602and Mahalanobis’s D2, 611and Neyman–Fisher factorization

definition, 602–603factorization, 603–604general, 604

Bayes estimate, 147–152, 467–468, 493–494Bayes theorem

conditional densities, 141, 147, 149use, 147

Berry–Esseen bounddefinition, 76normal approximation, 76–79

Bessel’s inequality, 742Best linear predictor, 110–111, 154Best predictor, 142, 154Beta-Binomial distribution

Gibbs sampler, 965–966simulation, 643–644

Beta densitydefintion, 59–60mean, 150mode, 89percentiles, 618

Bhattacharya affinity, 525Bikelis local bound, 322Binomial confidence interval

normal approximation, 74score confidence, 76Wald confidence, 74–76

Binomial distributionhypergeometric distribution problems, 30negative, 28–29

Bivariate Cauchy, 195Bivariate normal

definition, 136distribution, 212five-parameter distribution, 138–139formula, 138joint distribution, 140mean and variance independence, 139–140missing coordinates, 707–709property, 202simulation, 136, 137, 200–201

Bivariate normal conditional distributionsconditional expectation, 154Galton’s observation, 155

Bivariate normal formulas, 138, 156Bivariate Poisson, 120

Gibbs chain, 683MGF, 121

Bivariate uniform, 125–126, 133Block bootstrap, 701, 702, 703Bochner’s theorem, 306–307Bonferroni bounds, 5Bootstrap

consistencycorrelation coefficient, 697–698Kolmogorov metric, 692–693Mallows-Wasserstein metric, 693sample variance, 696Skewness Coefficient, 697t -statistic, 698

distribution, 690, 691, 694, 696, 738failure, 698–699higher-order accuracy

CBB, 702MBB, 702optimal block length, 704second-order accuracy, 699–700smooth function, 703–704thumb rule, 700–701

Monte Carlo, 694, 697, 737ordinary bootstrap distribution, 690–691resampling, 689–690variance estimate, 737

Borel–Cantelli lemmaalmost sure convergence, binomial

proportion, 255–256pairwise independence, 254–255

Borell inequality, 215Borel’s paradox

Jacobian transformation, 159

Page 18: Appendix A Symbols, Useful Formulas, and Normal Table978-1-4419-9634-3/1.pdf · A. DasGupta, Probability for Statistics and Machine Learning: Fundamentals, 747 and Advanced Topics,

Subject Index 765

marginal and conditional density, 159–160mean residual life, 160–161

Boundary crossing probabilitiesannulus, 417–418domain and harmonic, 417harmonic function, 416irregular boundary points, 416recurrence and transience, 418–419

Bounded in probability, 251–252Box–Mueller transformation, 194Bracket, 546Bracketing number, 546Branching process, 500Brownian bridge

Brownian motion, 403empirical process, iid random variables,

404Karhunen–Loeve expansion, 554maximum, 410standard, 405

Brownian motioncovariance functions, 406d -dimensional, 405Dirichlet problem and boundary crossing

probabilitiesannulus, 417–418domain and harmonic, 417harmonic function, 416irregular boundary points, 416recurrence and transience, 418–419

distributional propertiesfractal nature, level sets, 415–416path properties and behavior, 412–414reflection principle, 410–411

explicit constructionKarhunen–Loeve expansion, 408

Gaussian process, Markovcontinuous random variables, 407correlation function, 406

invariance principle and statisticsconvergence, partial sum process,

423–424Donsker’s, 424–425partial sums, 421Skorohod embedding theorem, 422,

423uniform metric, 422

local timeLebesgue measure, 419one-dimensional, 420zero, 420–421

negative drift and density, 427–428Ornstein–Uhlenbeck process

convergence, 430–431

covariance function, 429Gaussian process, 430

random walksscaled, simulated plot, 403

state visited, planar, 406stochastic process

definition, 403Gaussian process, 404real-valued, 404standard Wiener process, 404–405

strong invariance principle and KMTtheorem, 425–427

transition density and heat equation,428–429

CCampbell’s theorem

characteristic functional …, 456Poisson random variables, 457shot effects, 456stable laws, 458

Canonical exponential familydescription, 590form and properties

binomial distribution, 590closure, 594–596convexity, 590–591moment and generating function,

591–594one parameter, 589–590

k-parameter, 597Canonical metric

definition, 575Capture-recapture, 32–33Cauchy

density, 44distribution, 44, 48

Cauchy order statistics, 232Cauchy random walk, 395Cauchy–Schwarz inequality, 21, 109, 259,

516, 731CDF. See Cumulative distribution functionCentral limit theorem (CLT)

approximation, 83binomial confidence interval, 74–76binomial probabilities., 72–73continuous distributions, 71de Moivre–Laplace, 72discrete distribution, 72empirical measures, 543–547error, 76–79iid case, 304martingale, 498multivariate, 267

Page 19: Appendix A Symbols, Useful Formulas, and Normal Table978-1-4419-9634-3/1.pdf · A. DasGupta, Probability for Statistics and Machine Learning: Fundamentals, 747 and Advanced Topics,

766 Subject Index

Central limit theorem (CLT) (cont.)random walks, 73and WLLN, 305–306

Change of variable, 13Change point problem, 650–651Chapman–Kolmogorov equation, 256, 315,

318transition probabilities, 345–346

Characteristic function, 171, 263, 383–385,394–396, 449, 456, 458, 462, 500,533

Bochner’s theorem, 306–307CLT error

Berry–Esseen theorem, 309–310CDF sequence, 308–309theorem, 310–311

continuity theorems, 303–304Cramer–Wold theorem, 263Euler’s formula, 293inequalities

Bikelis nonuniform, 317Hoeffding’s, 318Kolmogorov’s maximal, 318partial sums moment and von

Bahr–Esseen, 318Rosenthal, 318–319

infinite divisibility and stable lawsdistributions, 317triangular arrays, 316

inversion and uniquenessdistribution determining property,

299–300Esseen’s Lemma, 301lattice random variables, 300–301theorems, 298–299

Lindeberg–Feller theorem, 311–315Polya’s criterion, 307–308proof

CLT, 305Cramer–Wold theorem, 306WLLN, 305

random sums, 320standard distributions

binomial, normal and Poisson, 295–296exponential, double exponential, and

Cauchy, 296n-dimensional unit ball, 296–297

Taylor expansions, differentiability andmoments

CDF, 302Riemann–Lebesgue lemma, 302–303

Chebyshev polynomials, 742Chebyshev’s inequality

bound, 52large deviation probabilities, 51

Chernoff’s variance inequalityequality holding, 68–69normal distribution, 68

Chibisov–O’Reilly theorem, 534–535Chi square density

degree of freedom, 45Gamma densities, 58

Chung–Fuchs theorem, 381, 394–396Circular Block Bootstrap (CBB), 702Classification rule, 724, 725, 734, 735, 736,

743, 744Communicating classes

equivalence relation, 350identification, 350–351irreducibility, 349period computation, 351

Completenessapplications, probability

covariance calculation, 605expectation calculation, 606exponential distribution result, 605mean and variance independence,

604weak convergence result, 606–607

definition, 601Neyman–Fisher factorization and Basu’s

theoremdefinition, 602–603

Complete randomnesshomogeneous Poisson process, 551property, 446–448

Compound Poisson Process, 445, 448, 449,459, 461

Concentration inequalitiesAzuma’s, 483Burkholder, Davis and Gundy

general square integrable martingale,481–482

martingale sequence, 480Cirel’son and Borell, 215generalized Hoeffding lemma, 484–485Hoeffding’s lemma, 483–484maximal

martingale, 477–478moment bounds, 479–480sharper bounds near zero, 479submartingale, 478

McDiarmid and DevroyeKolmogorov–Smirnov statistic,

487–488martingale decomposition and

two-point distribution, 486–487optional stopping theorem, 477upcrossing, 488–490

Page 20: Appendix A Symbols, Useful Formulas, and Normal Table978-1-4419-9634-3/1.pdf · A. DasGupta, Probability for Statistics and Machine Learning: Fundamentals, 747 and Advanced Topics,

Subject Index 767

Conditional densityBayes theorem, 141best predictor, 142definition, 140–141two stage experiment, 143–144

Conditional distribution, 202–205binomial, 119definition, 100–101interarrival times, 460and marginal, 189–190and Markov property, 235–238Poisson, 104recurrence times, 399

Conditional expectation, 141, 150, 154Jensen’s inequality, 494order statistics, 247

Conditional probabilitydefinition, 5prior to posterior belief, 8

Conditional variance, 143, 487, 497definition, 103, 141

Confidence band, continuous CDF, 242–243Confidence interval

binomial, 74–75central limit theorem, 61–62normal approximation theorem, 81Poisson distribution, 80sample size calculation., 66

Confidence interval, quantile, 246, 326Conjugate priors, 151–152Consistency bootstrap, 693–699Consistency of kernel classification, 743Continuity of Gaussian process

logarithm tail, maxima and Landau–Shepptheorem, 577

Wiener process, 576–577Continuous mapping, 260–262, 268, 270, 275,

306, 327, 694Convergence diagnostics

Garren–Smith multiple chain Gibbsmethod, 675

Gelman–Rubin multiple chain method, 674spectral and drift methods, 673–674Yu–Mykland single chain method,

674–675Convergence in distribution

CLT, 266Cramer–Wold theorem, 263Delta theorem, 269–272description, 262Helly’s theorem, 264LIL, 267multivariate CLT, 267Polya’s theorem, 265

Portmanteau theorem, 265–266Slutsky’s theorem, 268–269

Convergence in mean, 253Convergence in probability

continuous mapping theorem, 270sure convergence, 252, 260

Convergence of Gibbs samplerdiscrete and continuous state spaces, 671drift method, 672–673failure, 670–671joint density, 672

Convergence of martingalesL1 and L2

basic convergence theorem, 493Bayes estimates, 493–494Polya’s urn, 493

theoremFatou’s lemma and monotone, 491

Convergence of MCMCDobrushin’s inequality and

Diaconis–Fill–Stroock bound,657–659

drift and minorization methods, 659–662geometric and uniform ergodic, 653separation and chi-square distance, 653SLEM, 651–652spectral bounds, 653–657stationary Markov chain, 651total variation distance, 652

Convergence of medians, 287Convergence of moments

approximation, 278–282distribution, 277–278uniform integrability

conditions, 276–277dominated convergence theorem,

275–276Convergence of types, 328, 332Convex function theorem, 468, 495–496Convolution, 715

definition, 167–168density, 168double exponential, 195n-fold, 168–169random variable symmetrization, 170–172uniform and exponential, 194

Correlation, 137, 195, 201, 204, 271–272coefficient, 212–213, 697–698convergence, 261–262definition, 108exponential order statistics, 234inequality, 120order statistics, 244, 245

Page 21: Appendix A Symbols, Useful Formulas, and Normal Table978-1-4419-9634-3/1.pdf · A. DasGupta, Probability for Statistics and Machine Learning: Fundamentals, 747 and Advanced Topics,

768 Subject Index

Correlation (cont.)Poisson process, 461properties, 108–109

Correlation coefficient distributionbivariate normal, 212–213bootstrapping, 697–698

Countable additivitydefinition, 2–3

Coupon collection, 288Covariance, 544, 574, 735

calculation, 107, 605definition, 107function, 412, 429inequality, 120, 215matrix, 207, 216multinomial, 610properties, 108–109

Covariance matrix, 136–137, 200, 203, 205,207, 210, 216, 217, 267, 272, 280,326, 530, 597, 611, 709

Cox–Wermuth approximationapproximations test, 157–158

Cramer–Chernoff theoremCauchy case, 564exponential tilting technique, 563inequality, 561large deviation, 562rate function

Bernoulli case, 563–564definition, 560normal, 563

Cramer–Levy theorem, 293Cramer–von Mises statistic, 532–533Cramer–Wold theorem, 263, 266, 305, 306Cubic lattice random walks

distribution theory, 378–379Polya’s formula, 382–383recurrence and transience, 379–381recurrent state, 377three dimensions, 375two simulated, 376, 377

Cumulantsdefinition, 25–26recursion relations, 26

Cumulative distribution function (CDF)application, 87–88, 90, 245continuous random variable, 36–37definition, 9empirical, 241–243exchangeable normals, 206–207and independence, 9–12joint, 177, 193jump function, 10Markov property, 236

moments and tail, 49–50nonnegative integer-valued random

variables, 16PDF and median, 38Polya’s theorem, 265quantile transformation, 44range, 226standard normal density, 41–42, 62

Curse of dimensionality, 131–132, 184Curved exponential family

application, 612definition, 583, 608density, 607–608Poissons, random covariates, 608–609specific bivariate normal, 608

DDelta theorem, continuous mapping theorem

Cramer, 269–270random vectors, 269sample correlation, 271–272sample variance and standard deviation,

271Density

midrange, 245range, 226

Density functioncontinuous random variable, 38description, 36location scale parameter, 39normal densities, 40standard normal density, 42

Detailed balance equation, 639–640Diaconis–Fill–Stroock bound, 657–658Differential metrics

Fisher informationdirect differentiation, 521multivariate densities, 520

Kullback–Leibler distance curvature,519–520

Rao’s Geodesic distances, distributionsgeodesic curves, 522variance-stabilizing transformation, 522

Diffusion coefficientBrownian motion, 427, 428

Dirichlet cross moment, 196Dirichlet distribution

density, 189Jacobian density theorem, 188–189marginal and conditional, 189–190and normal, 190Poincare’s Lemma, 191subvector sum, 190

Page 22: Appendix A Symbols, Useful Formulas, and Normal Table978-1-4419-9634-3/1.pdf · A. DasGupta, Probability for Statistics and Machine Learning: Fundamentals, 747 and Advanced Topics,

Subject Index 769

Dirichlet problemannulus, 417–418domain and harmonic, 417harmonic function, 416irregular boundary points, 416recurrence and transience, 418–419

Discrete random variableCDF and independence, 9–12definition, 8expectation and moments, 13–19inequalities, 19–22moment-generating functions, 22–26

Discrete uniform distribution, 2, 25, 277Distribution

binomialhypergeometric distribution problems,

30Cauchy, 44, 48CDF

definition, 9Chebyshev’s inequality, 20conditional, 203–205continuous, 71correlation coefficient, 212–213discrete, 72discrete uniform, 2, 25, 277lognormal, 65noncentral, 213–214Poisson, 80quadratic forms, Hotelling’s T2

Fisher Cochran theorem, 211sampling, statistics

correlation coefficient, 212–213Hotelling’s T2 and quadratic forms,

209–212Wishart, 207–208Wishart identities, 208–209

DKW inequality, 241–242, 541, 553Dobrushin coefficient

convergence of Metropolis chains,668–669

defined, 657–658Metropolis–Hastings, truncated geometric,

659two-state chain, 658–659

Dobrushin’s inequalitycoefficient, 657–658Metropolis–Hastings, truncated geometric,

659stationary Markov chain, 658two-state chain, 658–659

Domaindefinition, 417Dirichlet problem, 416, 417

irregular boundary point, 416maximal attraction, 332, 336

Dominated convergence theorem, 275, 285Donsker class, 544Donsker’s theorem, 423, 531Double exponential density, 39–40Doubly stochastic

bistochastic, 340one-step transition matrix, 347transition probability matrix, 342

DriftBayes example, 672–673drift-diffusion equation, 429and minorization methods, 659–662negative, Brownian motion, 427–428time-dependent, 429

Drift-diffusion equation., 429

EEdgeworth expansion, 82–83, 302, 622, 699,

700Ehrenfest model, 342, 373EM algorithm, 689, 704–744

ABO allele, 709–711background plus signal model, 738censoring, 740Fisher scoring method, 705genetics problem, 739mixture problem, 739monotone ascent and convergence, 711Poisson, missing values, 706–707truncated geometric, 738

Empirical CDFconfidence band, continuous, 242, 243definition, 527DKW inequality, 241–242

Empirical measureCLTs

entropy bounds, 544–546notation and formulation, 543–544

definition, 222, 528Empirical process

asymptotic propertiesapproximation, 530–531Brownian bridge, 529–530invariance principle and statistical

applications, 531–533multivariate central limit theorem, 530quantile process, 536strong approximations, 537weighted empirical process, 534–535

CLTs, measures and applicationscompact convex subsets, 547

Page 23: Appendix A Symbols, Useful Formulas, and Normal Table978-1-4419-9634-3/1.pdf · A. DasGupta, Probability for Statistics and Machine Learning: Fundamentals, 747 and Advanced Topics,

770 Subject Index

Empirical process (cont.)entropy bounds, 544–546invariance principles, 543monotone and Lipschitz functions, 547notation and formulation, 543–544

maximal inequalities and symmetrizationGaussian processes, 547–548

notation and definitionscadlag functions and Glivenko–Cantelli

theorem, 529Poisson process

distributional equality, 552exceedance probability, 553randomness property, 551Stirling’s approximation, 553

Vapnik–Chervonenkis (VC) theorydimensions and classes, 540–541Glivenko–Cantelli class, 538–539measurability conditions, 542shattering coefficient and Sauer’s

lemma, 539–540uniform convergence, relative

frequencies, 542–543Empirical risk, 736Entropy bounds

bracket and bracketing number, 546covering number, 545–546Kolmogorov–Smirnov type statistic, 545P -Donsker, 546VC-subgraph, 545

Equally likelyconcept, 3finite sample space, 3sample points, 3

Ergodic theorem, 366, 640, 641Error function, 90Error of CLT

Berry–Esseen bound, 77Error probability

likelihood ratio test, 565, 580type I and II, 565Wald’s SPRT, 476–477

Esseen’s lemma, 301, 309Estimation

density estimator, 721histogram estimate, 683, 720–721, 737,

740optimal local bandwidth, 723plug-in estimate, 212, 630, 704, 720, 740series estimation, 720–721

Exceedance probability, 553Exchangeable normal

variablesCDF, 206–207

construction, 205–206definition, 205

Expectationcontinuous random variables, 45–46definitions, 46–47gamma function, 47

Experimentcountable additivity, 2–3countably infinite set, 2definition, 2inclusion–exclusion formula, 5

Exponential densitybasic properties, 55definition, 38densities, 55mean and variance, 47standard double, 40–41

Exponential familycanonical form and properties

closure, 594–596convexity, 590–591definition, 589–590moments and moment generating

function, 591–594curved (see Curved exponential family)multiparameter, 596–600one-parameter

binomial distribution, 586definition, 585–586gamma distribution, 587irregular distribution, 588–589normal distribution, mean and variance,

584–586unusual gamma distribution, 587variables errors, 586–587

sufficiency and completenessapplications, probability, 604–607definition, 601–602description, 600Neyman–Fisher factorization and

Basu’s theorem, 602–604Exponential kernel, 736, 740, 742, 744Exponential tilting

technique, 563Extremes. See also Asymptotics, extremes and

order statisticsdefinition, 221discrete-time stochastic sequence, 574–575finite sample theory, 221–247

FFactorial moment, 22Factorization theorem, 602–604

Page 24: Appendix A Symbols, Useful Formulas, and Normal Table978-1-4419-9634-3/1.pdf · A. DasGupta, Probability for Statistics and Machine Learning: Fundamentals, 747 and Advanced Topics,

Subject Index 771

Failure of EM, kernel, 712–713Fatou’s lemma, 491, 492F-distribution, 174–175, 218f -divergence

finite partition property, 507, 517Feature maps, 732–744Filtered Poisson process, 443–444Finite additivity, 3Finite dimensional distributions, 575Finite sample theory

advanced distribution theorydensity function, 226density, standard normals, 229exponential order statistics, 227–228joint density, 225–226moment formulas, uniform case, 227normal order statistics, 228

basic distribution theoryempirical CDF, 222joint density, 222–223minimum, median and maximum

variables, 225order statistics, 221quantile function, 222uniform order statistics, 223–224

distribution, multinomial maximumequiprobable case, 243Poissonization technique, 243

empirical CDFconfidence band, continuous CDF,

242–243DKW inequality, 241–242

existence of moments, 230–232quantile transformation theorem, 229–230records

density, values and times, 240–241interarrival times, 238

spacingsdescription, 233exponential and Reyni’s representation,

233–234uniform, 234–235

First passage, 354–359, 383–386, 409, 410,427, 475

Fisher Cochran theorem, 211Fisher information

and differential metrics, 520–521matrix, 597–600, 610–611

Fisher information matrixapplication, 597–610definition, 597and differential, 520–521multiparameter exponential family,

597–600

Fisher linear, 735Fisher’s Iris data, 744Fisher–Tippet family

definition, 333theorem, 334–335

Formulachange of variable, 13inclusion-exclusion, 4Tailsum, 16

Full conditionals, 645–646, 648–651, 671, 672Function theorem

convex, 468, 495–496implicit, 569

GGambler’s ruin, 351–353, 370, 392, 464,

468–470, 475, 501Gamma density

distributions, 59exponential density, 56skewness, 82

Gamma functiondefinition, 47

Garren–Smith multiple chain Gibbs method,675

Gartner–Ellis theoremconditions, 580large deviation rate function, 567multivariate normal mean, 568–569non-iid setups, 567–568

Gaussian factorization, 176–177Gaussian process

definition, 404Markov property, 406–407stationary, 430

Gaussian radial kernel, 736Gegenbauer polynomials, 741Gelman–Rubin multiple chain method, 674Generalized chi-square distance, 525, 652, 653Generalized Hoeffding lemma, 484–485Generalized negative binomial distribution,

28–29, 609Generalized Wald identity, 475–476Generalized Wasserstein distance, 525Generating function

central moment, 25–26Chernoff–Bernstein inequality, 51–52definition, 22distribution-determining property, 24–25inversion, 53Jensen’s inequality, 52–53Poisson distribution, 23–24standard exponential, 51

Page 25: Appendix A Symbols, Useful Formulas, and Normal Table978-1-4419-9634-3/1.pdf · A. DasGupta, Probability for Statistics and Machine Learning: Fundamentals, 747 and Advanced Topics,

772 Subject Index

Geometrically ergodicconvergence of MCMC, 653Metropolis chain, 669–670spectral bounds, 654

Geometric distributiondefinition, 28exponential densities, 55lack of memory, 32

Gibbs samplerbeta-binomial pair, 647–648change point problem, 650–651convergence, 670–673definition, 646Dirichlet distributions, 649–650full conditionals, 645–646Gaussian Bayes

formal and improper priors, 648Markov chain generation problem, 645random scan, 646–647systematic scan, 647

Glivenko–Cantelli class, 539, 541Glivenko–Cantelli theorem, 258–259, 336,

487, 529, 538, 541, 545, 555, 690Gumbel density

definition, 61standard, 61

HHajek–Renyi inequality, 396Hardy–Weinberg law, 609Harmonic

definition, 417function, 416

Harris recurrenceconvergence property, 672drift and minorization methods, 660Metropolis chains, 667–668

Heat equation, 429, 435Hellinger metric, 506–507, 514Helly’s theorem, 264Helmert’s transformation, 186–187Hermite polynomials, 83, 320, 329, 741High dimensional formulas, 191–192Hilbert norm, 729Hilbert space, 725–732, 734, 735, 743Histogram estimate, 683, 720–721, 737, 740Hoeffding’s inequality, 318, 483–484Holder continuity, 413Holder’s inequality, 21Hotelling’s T2

Fisher Cochran theorem, 211quadratic form, 210

Hypergeometric distributiondefinition, 29ingenious use, 32

IIID. See Independent and identically

distributedImportance sampling

Bayesian calculations, 629–630binomial Bayes problem, 631described, 619–620distribution, 629, 633–634Radon–Nikodym derivatives, 629theoretical properties, 632–633

Importance sampling distribution, 633–634Inclusion-exclusion formula, 4, 5Incomplete inner product space, 728Independence

definition, 5, 9Independence of mean and variance, 139–140,

185–187, 211, 604–605Independent and identically distributed (IID)

observations, 140random variables, 20

Independent increments, 404–405, 409, 412,427, 433, 449

Independent metropolis algorithm, 643Indicator variable

Bernoulli variable, 11binomial random variable, 34distribution, 11mathematical calculations, 10

InequalityAnderson’s, 214Borell concentration, 215Cauchy–Schwarz, 21central moment, 25Chen, 215Chernoff–Bernstein inequality, 51–52Cirel’son, 215covariance, 215–216distribution, 19Jensen’s inequality, 52–53monotonicity, 214–215, 593positive dependence, 215probability, 20–21Sidak, 215Slepian’s I and II, 214

Infinite divisibility and stable lawsdescription, 315distributions, 317triangular arrays, 316

Page 26: Appendix A Symbols, Useful Formulas, and Normal Table978-1-4419-9634-3/1.pdf · A. DasGupta, Probability for Statistics and Machine Learning: Fundamentals, 747 and Advanced Topics,

Subject Index 773

Initial distribution, 369, 641, 651, 652, 655,658, 668

definition, 340Markov chain, 361weak ergodic theorem, 366

Inner product space, 726, 727, 728, 740–742Inspection paradox, 444Integrated Brownian motion, 432Intensity, 557, 682

function, 454, 455, 460piecewise linear, 454–456

Interarrival time, 238, 240, 241conditional distribution, 453Poisson process, 438transformed process, 453

Interquartile range (IQR)definition, 327limit distribution, 335

Invariance principleapplication, 434convergence, partial sum process, 423–424Donsker’s, 424–425, 538partial sums, 421Skorohod embedding theorem, 422, 423and statistics

Cramer–von Mises statistic, 532–533Kolmogorov–Smirnov statistic,

531–532strong

KMT theorem, 425–427partial sum process, 530–531, 537

uniform metric, 422weak, 536

Invariant measure, 663Inverse Gamma density, 59Inverse quadratic kernel, 736, 744Inversion of mgf

moment-generating function, 53Inversion theorem

CDF, 298–299failure, 307–308Plancherel’s identity, 298

IQR. See Interquartile rangeIrreducible, 569, 581, 640–643, 645, 647,

653–655, 660, 661, 663, 667, 671,685

definition, 349loop chains, 371regular chain, 363

Isolated points, 415, 416Isotropic, 717, 742, 743Iterated expectation, 105, 119, 146, 162, 172

applications, 105–106

formula, 105, 146, 167, 468higher-order, 107

Iterated variance, 119, 623applications, 105–106formula, 105

JJacobian formula

CDF, 42technique, 45

Jacobi polynomials, 741, 742Jensen’s inequality, 52–53, 280, 468, 491, 494,

516Joint cumulative distribution function, 97, 112,

124, 177, 193, 261, 263, 447Joint density

bivariate uniform, 125–126continuous random vector, 125defined, 123–124dimensionality curse, 131–132nonuniform, uniform marginals, 128–129

Joint moment-generating function (mgf),112–114, 121, 156, 597

Joint probability mass function (pmf), 121,148, 152, 585, 599, 608

definition, 96, 112function expectation, 100

KKarhunen–Loeve expansion, 408, 533, 554Kernel classification rule, 724–731Kernels and classification

annhilator, 743definition, 715density, 719–723estimation

density estimator, 721histograms, 720optimal local bandwidth, 723plug-in, 720series estimation, 720–721

Fourier, 717kernel plots, 742product kernels, 743smoothing

definition, 715–716density estimation, 718

statistical classificationexponential kernel, 736Fisher linear, 735Gaussian radial kernel, 736Hilbert norm, 729Hilbert space, 725–727

Page 27: Appendix A Symbols, Useful Formulas, and Normal Table978-1-4419-9634-3/1.pdf · A. DasGupta, Probability for Statistics and Machine Learning: Fundamentals, 747 and Advanced Topics,

774 Subject Index

Kernels and classification (cont.)inner product space, 728linearly separable data, 735linear span, 726–727, 729, 742Mercer’s theorem, 732–744operator norm, 729parallelogram law, 727polynomial kernel, 736positive definiteness, 732Pythagorean identity, 727reproducing kernel, 725Riesz representation theorem, 730rule, 724–725sigmoid kernel, 736support vector machines, 734–736

Weierstrass kernels, 719Kolmogorov metric, 506, 511, 692, 693, 695,

696Kolmogorov–Smirnov statistic

invariance principle, 531–532McDiarmid and Devroye inequalities,

487–488null hypothesis testing, 545weighted, 534–535

Komlos, Major, Tusnady theoremrate, 537strong invariance principle, 425–427

Kullback–Leibler distancedefinition, 507inequality, 711, 712

Kurtosiscoefficient, 82defined, 18skewness, 82

LLack of memory

exponential densities, 55exponential distribution, 55–56geometric distribution, 32

Laguerre polynomials, 741Landau–Shepp theorem, 577Laplacian, 417Large deviation, 51, 463, 483

continuous timeextreme statistics, 574finite-dimensional distribution (FDD),

575Gaussian process, 576–577metric entropy, supremum, 577–579

Cramer–Chernoff theorem, 560–564Cramer’s theorem, general set, 566–567

Gartner–Ellis theorem and Markov chain,567–570

Lipschitz functions and Talagrand’sinequality

correlated normals, 573–574outer parallel body, 572–573Taylor expansion, 572

multiple testing, 559rate function, properties

error probabilities, likelihood ratio test,565–566

shape and smoothness, 564self-normalization, 560t -statistic

definition, 570normal case, 571–572rate function, 570

Law of iterated logarithm (LIL)application, 433Brownian motion, 419use, 267

Legendre polynomials, 741Level sets

fractal nature, 415–416Levy inequality, 397, 400Likelihood function

complete data, 708, 710definition, 148, 565log, 712shape, 153

Likelihood ratiosconvergence, 490–491error probabilities, 565–566Gibbs sampler, 645martingale formation, 467sequential tests, statistics, 469

Lindeberg–Feller theoremfailure condition, 315IID variables, linear combination,

314–315Lyapounov’s theorem,

311–312Linear functional, 729, 730, 743Linear span, 726, 727, 729, 742Local limit theorem, 82Local time

Brownian motionLebesgue measure, 419one-dimensional, 420zero, 420–421

Lognormal densityfinite mgf, 65skewness, 65

Loop chains, 369, 371

Page 28: Appendix A Symbols, Useful Formulas, and Normal Table978-1-4419-9634-3/1.pdf · A. DasGupta, Probability for Statistics and Machine Learning: Fundamentals, 747 and Advanced Topics,

Subject Index 775

Lower semi-continuousdefinition, 564

Lyapounov inequality, 21–22, 52–53

MMapping theorem, 261, 262, 275, 327, 422,

424, 694continuous, 269–272intensity measure, 452, 453lower-dimensional projections, 452

Marginal densitybivariate uniform, 125–126function, 125

Marginal probability mass function (pmf), 117definition, 98E.X/ calculation, 110

Margin of error, 66, 91Markov chain, 381, 463, 466, 500, 567–570,

581, 613–686Chapman–Kolmogorov equation

transition probabilities, 345–346communicating classes

equivalence relation, 350identification, 350–351irreducibility, 349period computation, 351

gambler’s ruin, 352–353long run evolution and stationary

distributionasymmetric random walk, 365–366Ehrenfest Urn, 364–365finite Markov chain, 361one-step transition probability matrix,

360weak ergodic theorem, 366

recurrence, transience and first passagetimes

definition, 354infinite expectation, 355–358simple random walk, 354–355

Markov chain large deviation, 567–570Markov chain Monte Carlo (MCMC) methods

algorithms, 638convergence and bounds, 651–662general spaces

Gibbs sampler, convergence, 670–673independent Metropolis scheme, 665independent sampling, 664Markov transition kernel, 662–663Metropolis chains, 664Metropolis schemes, convergence,

666–670

nonconventional multivariatedistribution, 666

practical convergence diagnostics,673–675

random walk Metropolis scheme, 664simulation, t-distribution, 665stationary Markov chain, 663–664transition density, 663

Gibbs sampler, 645–651Metropolis algorithm

Barker and independent, 643beta–binomial distribution, 643–644independent sampling, 642–643Metropolis–Hastings algorithm, 643proposal distribution, 645transition matrix, 642truncated geometric distribution,

644–645Monte Carlo

conventional, 614ordinary, 615–624

principle, 614reversible Markov chains

discrete state space, 640–641ergodic theorem, 641–642irreducible and aperiodic stationary

distribution, 640stationary distribution, 639–640

simulationdescribed, 613textbook techniques, 614–615, 624–636

target distribution, 613, 637–638Markov process

definition, 404Ornstein–Uhlenbeck process, 431two-dimensional Brownian motion, 433

Markov’s inequalitydescription, 68

Markov transition kernel, 662–663Martingale

applicationsadapted to sequence, 464–465Bayes estimates, 467–468convex function theorem, 468gambler’s fortune, 463–464Jensen’s inequality, 468likelihood ratios, 467matching problem, 465–466partial sums, 465Polya urn scheme, 466sums of squares, 465supermartingale, 464Wright–Fisher Markov chain, 466–467

Page 29: Appendix A Symbols, Useful Formulas, and Normal Table978-1-4419-9634-3/1.pdf · A. DasGupta, Probability for Statistics and Machine Learning: Fundamentals, 747 and Advanced Topics,

776 Subject Index

Martingale (cont.)central limit theorem, 497–498concentration inequalities, 477–490convergence, 490–494Kolmogorov’s SLLN proof, 496optional stopping theorem, 468–477reverse, 494–496

Martingale central limit theoremCLT, 498conditions

concentration, 497Martingale Lindeberg, 498

Martingale Lindeberg condition, 498Martingale maximal inequality, 477–478Matching problem, 84, 465–466Maximal inequality

Kolmogorov’s, 318, 396martingale and concentration, 477–480and symmetrization, 547–551

Maximum likelihood estimate (MLE)closed-form formulas, 705definition, 152–153endpoint, uniform, 153–154exponential mean, 153normal mean and variance, 153

McDiarmid’s inequalityKolmogorov–Smirnov statistic, 487–488martingale decomposition and two-point

distribution, 486–487Mean absolute deviation

definition, 17and mode, 30

Mean measure, 450, 557Mean residual life, 160–161Median

CDF to PDF, 38definition, 9distribution, 55exponential, 55

Mercer’s theorem, 732–744Metric entropy, 545, 577–579Metric inequalities

Cauchy–Schwarz, 516variation vs. Hellinger, 517–518

Metropolis–Hastings algorithmbeta–binomial distribution, 644described, 643SLEM, 656–657truncated geometric, 659

Midpoint process, 462Midrange, 245Mills ratio, 160, 333Minkowski’s inequality, 21

Minorization methodsBernoulli experiment, 661–662coupling, 661drift condition, 660Harris recurrence, 660hierarchical Bayes linear models, 660–661nonreversible chains, 659–660

Mixed distribution, 88, 434Mixtures, 39, 214, 300, 307, 611, 661, 721,

739MLE. See Maximum likelihood estimateMode, 30–31, 89, 91, 232, 252, 282, 552, 627,

628, 714Moment generating function (MGF)

central moment, 25–26Chernoff–Bernstein inequality, 51–52definition, 22distribution-determining property, 24–25inversion, 53Jensen’s inequality, 52–53Poisson distribution, 23–24standard exponential, 51

Moment generating function (mgf), 116, 121Moment problem, 277–278Moments

continuous random variables, 45–46definitions, 46–47standard normal, 48–49

Monotone convergence theorem, 491, 492Monte Carlo

bootstrap, 691–692conventional, 614–615EM algorithm, 706Markov chain, 637–645ordinary

Bayesian, 618–619computation, confidence intervals, 616P-values, 622–623Rao–Blackwellization, 623–624

Monte Carlo P-valuescomputation and application, 622–623statistical hypothesis-testing problem, 622

Moving Block Bootstrap (MBB), 702Multidimensional densities

bivariate normalconditional distributions, 154–155five-parameter, density, 136, 138–139formulas, 138, 155–158mean and variance, 139–140normal marginals, 140singular, 137variance–covariance matrix, 136–137

Borel’s paradox, 158–161

Page 30: Appendix A Symbols, Useful Formulas, and Normal Table978-1-4419-9634-3/1.pdf · A. DasGupta, Probability for Statistics and Machine Learning: Fundamentals, 747 and Advanced Topics,

Subject Index 777

conditional density and expectations,140–147

posterior densities, likelihood functionsand Bayes estimates, 147–152

Multinomial distribution, 243, 542, 599–600MGF, 116pmf, 114and Poisson process, 451–452

Multinomial expansion, 116Multinomial maximum distribution

maximum cell frequency, 243Multiparameter exponential family

assumption, 596–597definition, 596, 597Fisher information matrix definition, 597general multivariate normal distribution,

598–599multinomial distribution, 599–600two-parameter

gamma, 598inverse gaussian distribution, 600normal distribution, 597–598

Multivariate Cauchy, 196Multivariate CLT, 267Multivariate Jacobian formula, 177–178Multivariate normal

conditional distributionsbivariate normal case, 202independent, 203quadrant probability, 204–205

definition and propertiesbivariate normal simulation, 200, 201characterization, 201–202density, 200linear transformations density, 200–201

exchangeable normal variables, 205–207inequalities, 214–216noncentral distributions

chi-square, 214t -distribution, 213

NNatural parameter

definition, 589, 599space, 589, 590, 596, 598and sufficient statistic, 600

Natural sufficient statisticdefinition, 585, 596one-parameter exponential family, 595–596

Nearest neighbor, 462, 554, 556Negative

binomial distribution, 28–29, 609drift, 427–428

Neighborhood recurrent, 394, 418, 419Noncentral chi square distribution, 214, 218

Noncentral F distribution, 210Noncentral t -distribution, 213, 218Non-lattice, 701Nonreversible chain, 659–660Normal approximation

binomialconfidence interval, 74–76

Normal densityCDF, 41–42definition, 40

Normalizingdensity function, 39–40

Normal order statistics, 228Null recurrent

definition, 356Markov chain, 373

OOperator norm, 729Optimal local bandwidth, 723Optional stopping theorem

applicationserror probabilities, Wald’s SPRT,

476–477gambler’s ruin, 475generalized Wald identity, 475–476hitting times, random walk, 475Wald identities, 474–475

stopping timesdefined, 469sequential tests and Wald’s SPRT, 469

Order statisticsbasic distribution theory, 221–222Cauchy, 232conditional distributions, 235–236density, 225description, 221existence of moments, 230–232exponential, 227–228joint density, 222–223moments, uniform, 227normal, 228–229quantile transformation, 229–230and range, 226–227spacings, 233–235uniform, 223–224

Orlicz norms, 556–557Ornstein–Uhlenbeck process

convergence, 430–431covariance function, 429Gaussian process, 430

Orthonormal bases, Parseval’s identity,728–729

Orthonormal system, 741

Page 31: Appendix A Symbols, Useful Formulas, and Normal Table978-1-4419-9634-3/1.pdf · A. DasGupta, Probability for Statistics and Machine Learning: Fundamentals, 747 and Advanced Topics,

778 Subject Index

PPacking number, 548Parallelogram law, 727Pareto density, 60Partial correlation, 204Partial sum process

convergence, Brownian motion, 423–424interpolated, 425strong invariance principle, 537

Pattern problemsdiscrete probability, 26recursion relation, 27variance formula, 27

Periodburn-in, 675circular block bootstrap (CBB), 702computation, 351intensity function, 454, 455

Perron–Frobenius theorem, 361–363, 569, 654Plancherel’s identity, 298–299Plug-in estimate, 212, 630, 704, 720, 740Poincare’s lemma, 191Poisson approximation

applications, 35binomial random variable, 34confidence intervals, 80–82

Poisson distributioncharacteristic function, 295–296confidence interval, 80–81moment-generating function, 23–24

Poissonization, 116–118, 121, 243, 244Poisson point process

higher-dimensionalintensity/mean measure, 450Mapping theorem, 452–453multinomial distribution, 451–452Nearest Event site, 451

Nearest Neighbor, 462polar coordinates, 460

Poisson process, 551–553, 557, 636, 682Campbell’s theorem and shot noise

characteristic functional …, 456–457shot effects, 456stable laws, 458

1-D nonhomogeneous processesintensity and mean function, plots, 455mapping theorem, 453–454piecewise linear intensity, 454–456

higher-dimensional Poisson pointprocesses

distance, nearest event site, 451mapping theorem, 452–453stationary/homogeneous, 450

Polar coordinatesspherical calculations

definition, 182–183dimensionality curse, 184joint density, 183–184spherically symmetric facts, 185

two dimensionsn uniforms product, 181–182polar transformation usefulness, 181transformation, 180–181

use, 134–135Polar transformation

spherical calculations, 182–185use, 181

Polya’s criterioncharacteristic functions, 308inversion theorem failure, 307–308stable distributions, 307

Polya’s formula, 382–383Polya’s theorem

CDF, 265, 509return probability, 382–383

Polya’s urn, 466, 493Polynomial kernel, 736Portmanteau theorem

definition, 265Positive definiteness, 568, 717, 732Positive recurrent, 356–357, 361, 363, 365,

651Posterior density

definition, 147–148exponential mean, 148–149normal mean, 151–152Poisson mean, 150

Posterior meanapproximation, 651binomial, 149–150

Prior densitydefinition, 148uses, 151

Probability metricsdifferential metrics, 519–522metric inequalities, 515–518properties

coupling identity, 508f -divergences, 513–514Hellinger distance, 510–511joint and marginal distribution

distances, 509–510standard probability, statisticsf -divergences, 507Hellinger metric, 506–507Kolmogorov metric, 506Kullback–Leibler distance, 507

Page 32: Appendix A Symbols, Useful Formulas, and Normal Table978-1-4419-9634-3/1.pdf · A. DasGupta, Probability for Statistics and Machine Learning: Fundamentals, 747 and Advanced Topics,

Subject Index 779

Levy–Prokhorov metric, 507separation distance, 506total variation metric, 506Wasserstein metric, 506

Product kernels., 743Product martingale, 499Projection formula, 742Prophet inequality

application, 400Bickel, 397Doob–Klass, 397

Proportions, convergence of EM, 282,711–713

Proposal distribution, 645, 664, 665Pythagorean identity, 727

QQuadrant probability, 204Quadratic exponential family, 612Quadratic forms distribution

Fisher Cochran theorem, 211Hotelling’s T2 statistic, 209–210independence, 211–212spectral decomposition theorem, 210

Quantile processnormalized, 528–529restricted Glivenko–Cantelli property, 536uniform, 529weak invariance principle, 536

Quantile transformationaccept–reject method, 625application, 242Cauchy distribution, 44closed-form formulas, 625definition, 44theorem, 229–230

Quartile ratio, 335

RRandom permutation, 636Random scan Gibbs sampler, 646, 647, 672Random walks

arc sine law, 386Brownian motion, 402, 403Chung–Fuchs theorem, 394–396cubic lattice

distribution theory, 378–379Polya’s formula, 382–383recurrence and transience, 379–381recurrent state, 377three dimensions, 375two simulated, 376, 377

Erdos–Kac generalization, 390first passage time

definition, 383distribution, return times, 383–386

inequalitiesBickel, 397Hajek–Renyi, 396Kolmogorov’s maximal, 396Levy and Doob–Klass Prophet, 397

Sparre–Andersen generalization, 389, 390Wald’s identity

stopping time, 391, 392Rao–Blackwellization

Monte Carlo estimate, 623–624variance formula, 623

Rao’s geodesic distance, 522Rate function, 52Ratio estimate, 630Real analytic, 564–565Records

density, values and times, 240–241interarrival times, 238

Recurrence, 354–359, 379–381, 383, 385, 393,396, 399, 418–419, 472, 660, 667,668

Recurrent state, 355, 357–359, 370, 373, 377,394

Recurrent value, 393Reflection principle, 410–411, 432Regular chain

communicating classes, 349irreducible, 363

Regular variation, 332, 333Reproducing kernel, 725–731, 733, 735, 743Resampling, 543, 689, 690, 691, 692, 701Restricted Glivenko–Cantelli property, 536Return probability, 382–383Return times

distribution, 383scaled, asymptotic density, 385

Reverse martingaleconvergence theorem, 496defined, 494sample means, 494–495second convex function theorem, 495–496

Reverse submartingale, 494–496Reversible chain, 373, 640, 645, 652, 685Reyni’s representation, 233–234Riemann–Lebesgue lemma, 302, 303Riesz representation theorem, 730, 743RKHS, 743Rosenthal inequality, 318–319, 322

Page 33: Appendix A Symbols, Useful Formulas, and Normal Table978-1-4419-9634-3/1.pdf · A. DasGupta, Probability for Statistics and Machine Learning: Fundamentals, 747 and Advanced Topics,

780 Subject Index

SSample maximum, 224, 235, 274, 276, 287,

337, 603–604asymptotic distribution, 330Cauchy, mode, 232domain of attraction, 336

Sample pointsconcept, 3

Sample range, 337Sample space

countable additivity, 2–3countably infinite set, 2definition, 2inclusion–exclusion formula, 5

Sauer’s lemma, 539–540Schauder functions, 408Scheffes theorem, 284–285Score confidence interval, 74, 76Second Aresine law, 410Second largest eigenvalue in modulus (SLEM)

approximation, 686calculation, 685convergence, Gibbs sampler, 671vs. Dobrushin coefficients, 686Metropolis–Hastings algorithm, 656–657transition probability matrix, 651–652

Second order accuracy, 699–701Separation distance, 506, 653Sequential probability ratio test (SPRT), 469,

476–477Sequential tests, 391, 469Series estimate, 720–721Shattering coefficient, 539, 540, 542Shot noise, 456–458Sidak inequality, 215Sigmoid kernel, 736Simple random walk, 354, 375, 393, 395, 397,

399, 474mathematical formulation, 345periodicity and, 369simulated, 376, 377

Simulationalgorithms, common distributions,

634–636beta-binomial distribution, 643–644bivariate normal, 137, 201textbook techniques

accept–reject method, 625–628quantile transformation, 624–625

truncated geometric distribution, 644–645Skewness

defined, 18normal approximation, 82

Skorohod embedding, 422, 423

SLEM. See Second largest eigenvalue inmodulus

Slepian’s inequaliity, 214SLLN. See Strong law of large numbersSlutsky’s theorem

application, 269, 270Spacings

description, 233exponential and Reyni’s representation,

233–234uniform, 234–235

Spatial Poisson process, 459Spectral bounds

decomposition, 655–656Perron–Frobenius theorem, 653–654SLEM, Metropolis–Hastings algorithm,

656–657Spherically symmetric, 135, 180, 182–186,

191SPRT. See Sequential probability ratio testStable laws

index 1/2, 385–386infinite divisibility

description, 315distributions, 317triangular arrays, 316

and Poisson process, 458Stable random walk, 395Standard discrete distributions

binomial, 28geometric, 28hypergeometric, 29negative binomial, 28–29Poisson, 29–34

Stationary, 152, 153, 163, 404, 406, 430, 431,440, 449, 450, 466, 569, 575, 576,614, 639, 640, 641, 645–648,651–656, 658–664, 667, 672, 674,684, 685, 701–703, 711, 713

Stationary distribution, 614, 639, 640,646–648, 651, 653, 658, 659, 661,663, 667, 672, 674, 684, 685

Statistical classification, kernelsexponential kernel, 736Fisher linear, 735Gaussian radial kernel, 736Hilbert norm, 729inner product space, 728linear functional, 729, 730linearly separable data, 735linear span, 726–727, 729, 742Mercer’s theorem, 732–744orthonormal bases, 728–729parallelogram law, 727

Page 34: Appendix A Symbols, Useful Formulas, and Normal Table978-1-4419-9634-3/1.pdf · A. DasGupta, Probability for Statistics and Machine Learning: Fundamentals, 747 and Advanced Topics,

Subject Index 781

polynomial kernel, 736Pythagorean identity, 727Riesz representation theorem, 730sigmoid kernel, 736support vector machines, 734–736

Statistics and machine learning toolsbootstrap

consistency, 692–699dependent data, 701–704higher-order accuracy, 699–701

EM algorithmABO allele, 709–711modifications, 714monotone ascent and convergence,

711–713Poisson, missing values, 706–707

kernelscompact support, 718Fourier, 717Mercer’s theorem, 722–734nonnegativity, 717rapid decay, 718smoothing, 715–717

Stein’s lemmacharacterization, 70normal distribution, 66principal applications, 67–68

Stochastic matrix, 340, 654Stochastic process

continuous-time, 422d -dimensional, 418definition, 403standard Wiener process, 404

Stopped martingale, 472Stopping time

definition, 391, 478optional stopping theorem, 470–471sequential tests, 469Wald’s SPRT, 469

Strong approximations, 530–531, 537Strong ergodic theorem, 641Strong invariance principle

and KMT theorem, 425–427partial sum process, 530–531, 537

Strong law of large numbers (SLLN)application, 261Cramer-Chernoff theorem, 579Kolmogorov’s, 258, 496, 615, 632Markov chain, 640proof, 496Zygmund–Marcinkiewicz, 693–695

Strong Markov property, 409, 411, 423Submartingale

convergence theorem, 491–492

convex function theorem, 468nonnegative, 477–478optional stopping theorem, 470–471reverse, 494–496upcrossing inequality, 489

SufficiencyNeyman–Fisher factorization and Basu’s

theoremdefinition, 602general, 604

Sum of uniformsdistribution, 77normal density, 78, 79

Supermartingale, 464, 491, 494Support vector machine, 734–736Symmetrization, 170–172, 319, 547–551Systematic scan Gibbs sampler

bivariate distribution, 647–648change point problem, 650–651convergence property, 672defined, 646Dirichlet distribution, 650

TTailsum formula, 16–19, 86, 357Target distribution

convergence, MCMC, 651–652Metropolis chains, 664

Taylor expansion, characteristic function, 303t confidence interval, 187–188, 290–291t -distribution

noncentral, 213, 218simulation, 665, 668student, 175–176, 187

Time reversal, 412Total probability, 5, 6, 117Total variation distance

closed-form formula, 283definition, 282and densities, 282–283normals, 283–284

Total variation metric, 506, 509Transformations

arctanh, 274Box–Mueller, 194Helmert’s, 186–187linear, 200–201multivariate Jacobian formula, 177–178n-dimensional polar, 191nonmonotone, 45polar coordinates, 180–185quantile, 44–45, 229–232, 624–628simple linear, 42–43variance stabilizing, 272–274, 291

Page 35: Appendix A Symbols, Useful Formulas, and Normal Table978-1-4419-9634-3/1.pdf · A. DasGupta, Probability for Statistics and Machine Learning: Fundamentals, 747 and Advanced Topics,

782 Subject Index

Transience, 354–359, 379–381, 394, 399,418–419

Transition densitydefinition, 428, 662–663drift-diffusion equation, 429Gibbs chain, 672

Transition probabilityMarkov chain, 341matrix, 340nonreversible chain, 684n-step, 346one-step, 344

Triangular arrays, 316Triangular density

definition, 39–40piecewise linear polynomial., 77

UUniform density

basic properties, 54Uniform empirical process, 528, 530, 531,

552, 553Uniform integrability

conditions, 276–277dominated convergence theorem, 275–276

Uniformly ergodic, 653, 669–670Uniform metric, 422, 424, 425Uniform order statistics, 223–225

joint density, 235joint distribution, 230moments, 227

Uniform spacings, 234–235Unimodal

properties, 44Unimodality, order statistics, 39, 246Universal Donsker class, 544Upcrossing, 488–490Upcrossing inequality

discrete time process, 488optional stopping theorem, 490stopping times, 488–489submartingale and decomposition, 489

VVapnik–Chervonenkis (VC)

dimension, 539–541subgraph, 545, 546

Variancedefinition, 17, 46linear function, 68–69mean and, 19

Variance stabilizing transformations (VST)binomial case, 273–274confidence intervals, 272Fisher’s, 274unusual, 274

Void probabilities, 450VST. See Variance stabilizing transformations

WWald confidence interval, 74Wald’s identity

application, 399first and second, 474–477stopping time, 391, 392

Wasserstein metric, 506, 693Weak law of large numbers (WLLN), 20, 256,

257, 269, 270, 305Weakly stationary, 404Weibull density, 56Weierstrass’s theorem, 265–266, 268, 742Weighted empirical process

Chibisov–O’Reilly theorem, 534–535test statistics, 534

Wiener processcontinuous, 576, 577increments, 581standard, 404, 405tied down, 405

Wishart distribution, 207–208Wishart identities, 208–209WLLN. See Weak law of large numbersWright–Fisher Markov chain, 466–467

YYu–Mykland single chain method, 674–675

ZZygmund–Marcinkiewicz SLLN, 693–694