appendix a symbols, useful formulas, and normal table978-1-4419-9634-3/1.pdf · a. dasgupta,...
TRANSCRIPT
![Page 1: Appendix A Symbols, Useful Formulas, and Normal Table978-1-4419-9634-3/1.pdf · A. DasGupta, Probability for Statistics and Machine Learning: Fundamentals, 747 and Advanced Topics,](https://reader035.vdocuments.site/reader035/viewer/2022081517/5eda90445f8d0d7f302a5580/html5/thumbnails/1.jpg)
Appendix ASymbols, Useful Formulas, and Normal Table
A.1 Glossary of Symbols
� sample spaceP.B jA/ conditional probabilityf�.1/; : : : ; �.n/g permutation of f1; : : : ; ngF CDFNF 1 � F
F�1;Q quantile functioniid, IID independent and identically distributedp.x; y/; f .x1; : : : ; xn/ joint pmf; joint densityF.x1; : : : ; xn/ joint CDFf .y j x/;E.Y jX D x/ conditional density and expectationvar; Var varianceVar.Y jX D x/ conditional varianceCov covariance�X;Y correlationG.s/; .t/ generating function; mgf or characteristic function .t1; : : : ; tn/ joint mgfˇ; skewness and kurtosis�k E.X � �/k
mk sample kth central moment�r r th cumulant�r r th standardized cumulant; correlation of lag rr; � polar coordinatesJ JacobianFn empirical CDFF�1n sample quantile functionX .n/ sample observation vector .X1; : : : ; Xn/Mn sample medianX.k/; XkWn kth-order statistic
A. DasGupta, Probability for Statistics and Machine Learning: Fundamentals, 747and Advanced Topics, Springer Texts in Statistics, DOI 10.1007/978-1-4419-9634-3,c� Springer Science+Business Media, LLC 2011
![Page 2: Appendix A Symbols, Useful Formulas, and Normal Table978-1-4419-9634-3/1.pdf · A. DasGupta, Probability for Statistics and Machine Learning: Fundamentals, 747 and Advanced Topics,](https://reader035.vdocuments.site/reader035/viewer/2022081517/5eda90445f8d0d7f302a5580/html5/thumbnails/2.jpg)
748 Appendix A Symbols, Useful Formulas, and Normal Table
Wn sample rangeIQR interquartile rangeP);
P! convergence in probabilityop.1/ convergence in probability to zeroOp.1/ bounded in probabilityan bn 0 < lim inf an
bn lim sup an
bn< 1
an bn; an � bn lim anbn
D 1a:s:);
a:s:! almost sure convergencew.p. 1 with probability 1a.e. almost everywherei.o. infinitely oftenL);
L! convergence in distributionr);
r! convergence in r th meanu. i. uniformly integrableLIL law of iterated logarithmVST variance stabilizing transformationıx point mass at xP.fxg/ probability of the point x� Lebesgue measure� convolution� absolutely continuousdPd�
Radon–Nikodym derivativeN
product measure; Kronecker productI.�/ Fisher information function or matrixT natural parameter spacef .� j x/; �.� jX .n// posterior density of �LRT likelihood ratio testƒn likelihood ratioS sample covariance matrixT 2 Hotelling’s T 2 statisticMLE maximum likelihood estimatel.�; X/ complete data likelihood in EML.�;X/ log l.�; X/l.�; Y / likelihood for observed dataL.�; Y / log l.�; Y /OLk.�; Y / function to be maximized in M-stepHBoot bootstrap distribution of a statisticP� bootstrap measurepij transition probabilities in a Markov chain
![Page 3: Appendix A Symbols, Useful Formulas, and Normal Table978-1-4419-9634-3/1.pdf · A. DasGupta, Probability for Statistics and Machine Learning: Fundamentals, 747 and Advanced Topics,](https://reader035.vdocuments.site/reader035/viewer/2022081517/5eda90445f8d0d7f302a5580/html5/thumbnails/3.jpg)
A.1 Glossary of Symbols 749
pij .n/ n-step transition probabilitiesTi first passage time� stationary distribution of a Markov chainSn random walk; partial sums�.x; n/; �.x; T / local timeW.t/; B.t/ Brownian motion and Brownian bridgeW d .t/ d -dimensional Brownian motion�.s; t/ covariance kernelX.t/;N.t/ Poisson process… Poisson point process’n.t/; un.y/ uniform empirical and quantile processFn.t/; ˇn.t/ general empirical processPn empirical measureDn Kolmogorov–Smirnov statisticMCMC Markov chain Monte Carlo�ij proposal probabilitiesij acceptance probabilitiesS.n; C/ shattering coefficientSLEM second largest eigenvalue in modulus�.P / Dobrushin’s coefficientV.x/ energy or drift functionVC.C/ VC dimensionN.;F ; jj:jj/ covering numberNbc.;F ; jj:jj/ bracketing numberD.x; ; P / packing number�.x/ intensity function of a Poisson processR real lineRd d -dimensional Euclidean spaceC.X/ real-valued continuous functions on XCk.R/ k times continuously differentiable functionsC0.R/ real continuous functions f on R such that
f .x/ ! 0 as jxj ! 1F family of functions5;4 gradient vector and Laplacianf .m/ mth derivativeDkf .x1; : : : ; xn/
Pm1;m2;:::;mn�0;m1C:::mnDk
@m1f
@xm11
: : : @mnf
@xmnn
DC;DC Dini derivativesjj:jj Euclidean normjj:jj1 supnormtr trace of a matrixjA j determinant of a matrixK.x/;K.x; y/ kernels
![Page 4: Appendix A Symbols, Useful Formulas, and Normal Table978-1-4419-9634-3/1.pdf · A. DasGupta, Probability for Statistics and Machine Learning: Fundamentals, 747 and Advanced Topics,](https://reader035.vdocuments.site/reader035/viewer/2022081517/5eda90445f8d0d7f302a5580/html5/thumbnails/4.jpg)
750 Appendix A Symbols, Useful Formulas, and Normal Table
jjxjj1; jjxjj L1; L2 norm in Rn
H Hilbert spacejjxjjH Hilbert norm.x; y/ inner productg.X/; �n classification rules�x; �.x/ Mercer’s feature mapsB.x; r/ sphere with center at x and radius rU;U 0; NU domain, interior, and closureIA indicator function of AI.t/ large deviation rate functionfg fractional partb:c integer partsgn, sign signum functionxC; xC maxfx; 0gmax;min maximum, minimumsup; inf supremum, infimumK� ; I� ; J� Bessel functionsLp.�/; L
p.�/ set of functions such thatR jf jpd� < 1
d;L; � Kolmogorov, Levy, and total variation distanceH;K Hellinger and Kullback–Leibler distanceW; `2 Wasserstein distanceD separation distancedf f -divergenceN.�; �2/ normal distribution�;ˆ standard normal density and CDFNp.�;†/;MVN.�;†/ multivariate normal distributionBVN bivariate normal distributionMN.n; p1; : : : ; pk/ multinomial distribution with these parameterstn; t.n/ t-distribution with n degrees of freedomBer(p), Bin(n; p) Bernoulli and binomial distributionPoi(�) Poisson distributionGeo(p) geometric distributionNB(r; p) negative binomial distributionExp.�) exponential distribution with mean �Gamma.’; �/ Gamma density with shape parameter ’, and
scale parameter ��2n; �
2.n/ chi-square distribution with n degrees of freedomC.�; �) Cauchy distributionDn.’/ Dirichlet distributionWp.k;†/ Wishart distributionDoubleExp.�; �/ double exponential with parameters �; �
![Page 5: Appendix A Symbols, Useful Formulas, and Normal Table978-1-4419-9634-3/1.pdf · A. DasGupta, Probability for Statistics and Machine Learning: Fundamentals, 747 and Advanced Topics,](https://reader035.vdocuments.site/reader035/viewer/2022081517/5eda90445f8d0d7f302a5580/html5/thumbnails/5.jpg)
A.2 Moments and MGFs of Common Distributions 751
A.2
Mom
ents
and
MG
Fs
ofC
omm
onD
istr
ibut
ions
Dis
cret
eD
istr
ibut
ions
Dis
trib
utio
np.x/
Mea
nV
aria
nce
Skew
ness
Kur
tosi
sM
GF
Uni
form
1 n;x
D1;:::;n
nC1
2n2�1
12
0�6
.n2C1/
5.n2�1/
e.n
C1/t
�et
n.et�1/
Bin
omia
l� n x
� px.1
�p/n
�x;x
D0;:::;n
np
np.1
�p/
1�2p
p
np.1
�p/
1�6p.1
�p/
np.1
�p/
.petC1
�p/n
Pois
son
e���x
xŠ;x
D0;1;:::
��
1p
�
1 �e�.et�1/
Geo
met
ric
p.1
�p/x
�1;x
D1;2;:::
1 p
1�p
p2
2�p
p
1�p
6C
p2
1�p
pet
1�.1
�p/et
Neg
.Bin
.� x
�1
r�1
� pr.1
�p/x
�r;x
�r;
r p
r.1
�p/
p2
2�p
p
r.1
�p/
6 rC
p2
r.1
�p/
.pet
1�.1
�p/et/r
Hyp
erge
om.D x/.N
�D
n�x/
.Nn/
nD N
nD N.1
�D N/N
�n
N�1
Com
plex
Com
plex
Com
plex
Ben
ford
log.1C
1 x/
log10;x
D1;:::;9
3.4
46.
057
.796
2.45
P9 x
D1etxp.x/
![Page 6: Appendix A Symbols, Useful Formulas, and Normal Table978-1-4419-9634-3/1.pdf · A. DasGupta, Probability for Statistics and Machine Learning: Fundamentals, 747 and Advanced Topics,](https://reader035.vdocuments.site/reader035/viewer/2022081517/5eda90445f8d0d7f302a5580/html5/thumbnails/6.jpg)
752 Appendix A Symbols, Useful Formulas, and Normal Table
Con
tinu
ous
Dis
trib
utio
ns
Dis
trib
utio
nf.x/
Mea
nV
aria
nce
Skew
ness
Kur
tosi
s
Uni
form
1b�a;a
�x
�b
aCb
2
.b�a/2
12
0�6 5
Exp
onen
tial
e�x=�
�;x
�0
��2
26
Gam
ma
e�x=�x’
�1
�’�.’/;x
�0
’�
’�2
2p
’6 ’
�2 m
e�x=2xm=2�1
2m=2�.m 2/;x
�0
m2m
q8 m
12 m
Wei
bull
ˇ �.x �/ˇ
�1e
�.x �/ˇ;x>0
��.1
C1 ˇ/
�2�.1
C2 ˇ/
��2
�3�.1
C
3 ˇ/�3��2��3
�3
Com
plex
Bet
ax’
�1.1
�x/ˇ
�1
B.’;ˇ/
;0�x
�1
’’
Cˇ
’ˇ
.’Cˇ/2.’
Cˇ
C1/
2.ˇ
�’/p
’Cˇ
C1
p
’ˇ.’
Cˇ
C2/
Com
plex
Nor
mal
1
�p
2�e
�.x
��/2=.2�2/ ;x
2R�
�2
00
logn
orm
al1
�p
2�xe
�
.logx
��/2
2�2
;x>0
e�
C�2=2
.e�2
�1/e2�
C�2
e�2C2p e
�2
�1
Com
plex
(con
tinu
ed)
![Page 7: Appendix A Symbols, Useful Formulas, and Normal Table978-1-4419-9634-3/1.pdf · A. DasGupta, Probability for Statistics and Machine Learning: Fundamentals, 747 and Advanced Topics,](https://reader035.vdocuments.site/reader035/viewer/2022081517/5eda90445f8d0d7f302a5580/html5/thumbnails/7.jpg)
A.2 Moments and MGFs of Common Distributions 753
Dis
trib
utio
nf.x/
Mea
nV
aria
nce
Skew
ness
Kur
tosi
s
Cau
chy
1��.1
C.x
��/2=�2/;x
2RN
one
Non
eN
one
Non
e
t m�.m
C1
2/
p
m��.m 2/
1.1
Cx2=m/.m
C1/=2;x
2R0(m
>1)
mm
�2(m
>2)
0(m>3)
6m
�4(m>4)
F.ˇ ’/ˇx’
�1
B.’;ˇ/.x
C
ˇ ’/’
Cˇ;x>0
ˇ
ˇ�1.ˇ>1/
ˇ2.’
Cˇ
�1/
’.ˇ
�2/.ˇ
�1/2.ˇ>2/
Com
plex
Com
plex
Dou
ble
Exp
.e
�jx
��
j=�
2�
;x2R
�2�2
03
Pare
to’�’
x’
C1;x
��>0
’�
’�1.’>1/
’�2
.’�1/2.’
�2/.’>2/
2.’
C1/
’�3
q’
�2
’.’>3/
Com
plex
Gum
bel
1 �e.�e
�
x��
�/e
�
x��
�;x
2R�
C�
�2 6�2
12p
6�.3/
�3
12 5
Not
e:Fo
rth
eG
umbe
ldis
trib
utio
n,
�:577216
isth
eE
uler
cons
tant
,and�.3/
isR
iem
ann’
sze
tafu
ncti
on�.3/
DP
1 nD11 n3
�1:20206.
![Page 8: Appendix A Symbols, Useful Formulas, and Normal Table978-1-4419-9634-3/1.pdf · A. DasGupta, Probability for Statistics and Machine Learning: Fundamentals, 747 and Advanced Topics,](https://reader035.vdocuments.site/reader035/viewer/2022081517/5eda90445f8d0d7f302a5580/html5/thumbnails/8.jpg)
754 Appendix A Symbols, Useful Formulas, and Normal Table
Table of MGFs of Continuous Distributions
Distribution f .x/ MGF
Uniform 1b�a ; a x b ebt�eat
.b�a/t
Exponential e�x=�
�; x � 0 .1 � �t/�1.t < 1=�/
Gamma e�x=�x’�1
�’�.’/; x � 0 .1 � �t/�’.t < 1=�/
�2me�x=2xm=2�1
2m=2�.m2/; x � 0 .1 � 2t/�m=2.t < 1
2/
Weibull ˇ�.x�/ˇ�1e�. x
�/ˇ ; x > 0
P1nD0
.�t/n
nŠ�.1C n
ˇ/
Beta x’�1.1�x/ˇ�1
B.’;ˇ/; 0 x 1 1F1.’; ’C ˇ; t/
Normal 1
�p2�e�.x��/2=.2�2/; x 2 R et�Ct2�2=2
lognormal 1
�p2�x
e� .logx��/2
2�2 ; x > 0 None
Cauchy 1��.1C.x��/2=�2/ ; x 2 R None
tm�.mC1
2 /pm��.m
2/
1
.1Cx2=m/.mC1/=2 ; x 2 R None
F.ˇ’ /ˇx’�1
B.’;ˇ/.xCˇ’ /’Cˇ
; x > 0 None
Double Exp. e�jx��j=�
2�; x 2 R et�
1��2t2 .jt j < 1=�/
Pareto ’�’
x’C1 ; x � � > 0 None
Gumbel 1�e.�e�
x��� /e�x��
� ; x 2 R et��.1 � t�/.t < 1=�/
![Page 9: Appendix A Symbols, Useful Formulas, and Normal Table978-1-4419-9634-3/1.pdf · A. DasGupta, Probability for Statistics and Machine Learning: Fundamentals, 747 and Advanced Topics,](https://reader035.vdocuments.site/reader035/viewer/2022081517/5eda90445f8d0d7f302a5580/html5/thumbnails/9.jpg)
A.3 Normal Table 755
A.3 Normal Table
Standard Normal Probabilities P.Z � t/ and Standard Normal Percentiles
Quantity tabulated in the next page is ˆ.t/ D P.Z t/ for given t � 0, whereZ N.0; 1/. For example, from the table, P.Z 1:52/ D :9357.
For any positive t; P.�t Z t/ D 2ˆ.t/�1, and P.Z > � t/DP.Z > t/ D1 �ˆ.t/.
Selected standard normal percentiles z’ are given below. Here, the meaning of z’is P.Z > z’/ D ’.
’ z’:25 :675
:2 :84
:1 1:28
:05 1:645
:025 1:96
:02 2:055
:01 2:33
:005 2:575
:001 3:08
:0001 3:72
![Page 10: Appendix A Symbols, Useful Formulas, and Normal Table978-1-4419-9634-3/1.pdf · A. DasGupta, Probability for Statistics and Machine Learning: Fundamentals, 747 and Advanced Topics,](https://reader035.vdocuments.site/reader035/viewer/2022081517/5eda90445f8d0d7f302a5580/html5/thumbnails/10.jpg)
756 Appendix A Symbols, Useful Formulas, and Normal Table
0 1 2 3 4 5 6 7 8 9
0.0 0.5000 0.5040 0.5080 0.5120 0.5160 0.5199 0.5239 0.5279 0.5319 0.53590.1 0.5398 0.5438 0.5478 0.5517 0.5557 0.5596 0.5636 0.5675 0.5714 0.57530.2 0.5793 0.5832 0.5871 0.5910 0.5948 0.5987 0.6026 0.6064 0.6103 0.61410.3 0.6179 0.6217 0.6255 0.6293 0.6331 0.6368 0.6406 0.6443 0.6480 0.65170.4 0.6554 0.6591 0.6628 0.6664 0.6700 0.6736 0.6772 0.6808 0.6844 0.68790.5 0.6915 0.6950 0.6985 0.7019 0.7054 0.7088 0.7123 0.7157 0.7190 0.72240.6 0.7257 0.7291 0.7324 0.7357 0.7389 0.7422 0.7454 0.7486 0.7517 0.75490.7 0.7580 0.7611 0.7642 0.7673 0.7704 0.7734 0.7764 0.7794 0.7823 0.78520.8 0.7881 0.7910 0.7939 0.7967 0.7995 0.8023 0.8051 0.8078 0.8106 0.81330.9 0.8159 0.8186 0.8212 0.8238 0.8264 0.8289 0.8315 0.8340 0.8365 0.83891.0 0.8413 0.8438 0.8461 0.8485 0.8508 0.8531 0.8554 0.8577 0.8599 0.86211.1 0.8643 0.8665 0.8686 0.8708 0.8729 0.8749 0.8770 0.8790 0.8810 0.88301.2 0.8849 0.8869 0.8888 0.8907 0.8925 0.8944 0.8962 0.8980 0.8997 0.90151.3 0.9032 0.9049 0.9066 0.9082 0.9099 0.9115 0.9131 0.9147 0.9162 0.91771.4 0.9192 0.9207 0.9222 0.9236 0.9251 0.9265 0.9279 0.9292 0.9306 0.93191.5 0.9332 0.9345 0.9357 0.9370 0.9382 0.9394 0.9406 0.9418 0.9429 0.94411.6 0.9452 0.9463 0.9474 0.9484 0.9495 0.9505 0.9515 0.9525 0.9535 0.95451.7 0.9554 0.9564 0.9573 0.9582 0.9591 0.9599 0.9608 0.9616 0.9625 0.96331.8 0.9641 0.9649 0.9656 0.9664 0.9671 0.9678 0.9686 0.9693 0.9699 0.97061.9 0.9713 0.9719 0.9726 0.9732 0.9738 0.9744 0.9750 0.9756 0.9761 0.97672.0 0.9772 0.9778 0.9783 0.9788 0.9793 0.9798 0.9803 0.9808 0.9812 0.98172.1 0.9821 0.9826 0.9830 0.9834 0.9838 0.9842 0.9846 0.9850 0.9854 0.98572.2 0.9861 0.9864 0.9868 0.9871 0.9875 0.9878 0.9881 0.9884 0.9887 0.98902.3 0.9893 0.9896 0.9898 0.9901 0.9904 0.9906 0.9909 0.9911 0.9913 0.99162.4 0.9918 0.9920 0.9922 0.9925 0.9927 0.9929 0.9931 0.9932 0.9934 0.99362.5 0.9938 0.9940 0.9941 0.9943 0.9945 0.9946 0.9948 0.9949 0.9951 0.99522.6 0.9953 0.9955 0.9956 0.9957 0.9959 0.9960 0.9961 0.9962 0.9963 0.99642.7 0.9965 0.9966 0.9967 0.9968 0.9969 0.9970 0.9971 0.9972 0.9973 0.99742.8 0.9974 0.9975 0.9976 0.9977 0.9977 0.9978 0.9979 0.9979 0.9980 0.99812.9 0.9981 0.9982 0.9982 0.9983 0.9984 0.9984 0.9985 0.9985 0.9986 0.99863.0 0.9987 0.9987 0.9987 0.9988 0.9988 0.9989 0.9989 0.9989 0.9990 0.99903.1 0.9990 0.9991 0.9991 0.9991 0.9992 0.9992 0.9992 0.9992 0.9993 0.99933.2 0.9993 0.9993 0.9994 0.9994 0.9994 0.9994 0.9994 0.9995 0.9995 0.99953.3 0.9995 0.9995 0.9995 0.9996 0.9996 0.9996 0.9996 0.9996 0.9996 0.99973.4 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.99983.5 0.9998 0.9998 0.9998 0.9998 0.9998 0.9998 0.9998 0.9998 0.9998 0.99983.6 0.9998 0.9998 0.9999 0.9999 0.9999 0.9999 0.9999 0.9999 0.9999 0.99993.7 0.9999 0.9999 0.9999 0.9999 0.9999 0.9999 0.9999 0.9999 0.9999 0.99993.8 0.9999 0.9999 0.9999 0.9999 0.9999 0.9999 0.9999 0.9999 0.9999 0.99993.9 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.00004.0 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000
![Page 11: Appendix A Symbols, Useful Formulas, and Normal Table978-1-4419-9634-3/1.pdf · A. DasGupta, Probability for Statistics and Machine Learning: Fundamentals, 747 and Advanced Topics,](https://reader035.vdocuments.site/reader035/viewer/2022081517/5eda90445f8d0d7f302a5580/html5/thumbnails/11.jpg)
Author Index
AAdler, R.J., 576Aitchison, J., 188Aizerman, M., 732Alexander, K., 545Alon, N., 20Aronszajn, N., 726Ash, R., 1, 249Athreya, K., 614, 667, 671, 694Azuma, K., 483
BBalakrishnan, N., 54Barbour, A., 34Barnard, G., 623Barndorff-Nielsen, O., 583Basu, D., 188, 208, 600, 604Basu, S., 570Beran, R., 545Berlinet, A., 731, 733Bernstein, S., 51, 52, 89Berry, A., 308Besag, J., 615, 623Bhattacharya, R.N., 1, 71, 76, 82, 249, 308,
339, 402, 408, 409, 414, 423Bickel, P.J., 230, 249, 282, 583, 603, 690, 693,
704, 714Billingsley, P., 1, 421, 531Blackwell, D., 188Borell, C., 572Bose, R., 208Braverman, E., 732Breiman, L., 1, 249, 402Bremaud, P., 339, 366, 614, 652, 654, 655, 658Brown, B.M., 498Brown, L.D., 76, 416, 429, 583, 600, 601, 612Brown, R., 401
Bucklew, J., 560Burkholder, D.L., 481, 482
CCai, T., 76Carlin, B., 614, 673Carlstein, E., 702Casella, G., 583, 600, 614, 704, 714Chan, K., 671, 714Cheney, W., 731, 734Chernoff, H., 51, 52, 68–70, 89Chervonenkis, A., 540, 541, 736Chibisov, D., 534Chow, Y.S., 249, 259, 264, 267, 276, 463Chung, K.L., 1, 381, 393, 394, 463Cirelson, B.S., 572Clifford, P., 615, 623Coles, S., 238Cowles, M., 673Cox, D., 157Cramer, H., 249Cressie, N., 570Cristianini, N., 736Csaki, E., 535Csorgo, M., 421, 423, 425, 427, 527, 536Csorgo, S., 421
DDasGupta, A., 1, 3, 71, 76, 215, 238, 243, 249,
257, 311, 317, 323, 327, 328, 333, 337,421, 429, 505, 527, 570, 695, 699, 704,714, 723
Dasgupta, S., 208David, H.A., 221, 228, 234, 236, 238, 323Davis, B., 481, 482de Haan, L., 323, 329, 331, 332, 334Deheuvels, P., 527, 553
757
![Page 12: Appendix A Symbols, Useful Formulas, and Normal Table978-1-4419-9634-3/1.pdf · A. DasGupta, Probability for Statistics and Machine Learning: Fundamentals, 747 and Advanced Topics,](https://reader035.vdocuments.site/reader035/viewer/2022081517/5eda90445f8d0d7f302a5580/html5/thumbnails/12.jpg)
758 Author Index
del Barrio, E., 527, 553Dembo, A., 560, 570Dempster, A., 705den Hollander, F., 560Devroye, L., 485–488, 560, 725Diaconis, P., 67, 339, 505, 614, 615, 652, 657,
671Dimakos, X.K., 615Do, K.-A., 634Dobrushin, R.L., 652, 657Doksum, K., 249, 282, 583, 603, 704, 714Donsker, M., 421, 422, 530Doob, J.L., 463Doss, H., 614, 667, 671Dubhashi, D., 560Dudley, R.M., 1, 505, 508, 527, 530, 531, 541,
578Durrett, R., 402Dvoretzky, A., 242Dym, H., 390
EEaton, M., 208Efron, B., 570, 571, 690Eicker, F., 535Einmahl, J., 535Einmahl, U., 425Embrechts, P., 238Erdos, P., 421, 422Esseen, C., 308Ethier, S., 243Everitt, B., 54
FFalk, M., 238Feller, W., 1, 62, 71, 76, 77, 249, 254, 258,
277, 285, 307, 308, 316, 317, 339, 375,386, 389
Ferguson, T., 188, 249Fernique, X., 576Fill, J., 615, 657Finch, S., 382Fisher, R.A., 25, 212, 586, 602Fishman, G.S., 614, 624Freedman, D., 339, 402, 693Fristedt, B., 463, 472, 493, 496Fuchs, W., 381, 394
GGalambos, J., 221, 238, 323, 328, 331Gamerman, D., 614Garren, S., 674
Gelman, A., 614, 674Geman, D., 614Geman, S., 614Genz, A., 157Geyer, C., 614Ghosh, M., 208Gibbs, A., 505, 508, 509, 516Gilks, W., 614Gine, E., 527, 538, 541, 542, 545, 548, 570,
694Glauber, R., 646Gnedenko, B.V., 328Gotze, F., 570Gray, L., 463, 472, 493, 496Green, P.J., 615Groeneboom, P., 563Gundy, R.F., 481, 482Gyorfi, L., 560, 725
HHaff, L.R., 208, 327, 429Hall, P., 34, 71, 224, 231, 249, 308, 331, 421,
463, 497, 498, 560, 570, 623, 634, 690,691, 694, 701, 704
Hastings, W., 614Heyde, C., 249, 421, 463, 497, 498Higdon, D., 615Hinkley, D., 174Hobert, J., 671Hoeffding, W., 483, 570Horowitz, J., 704Hotelling, H., 210Husler, J., 238
IIbragimov, I.A., 508Isaacson, D., 339
JJaeschke, D., 535Jing, B., 704Johnson, N., 54Jones, G., 671
KKac, M., 421, 422Kagan, A., 69, 157Kamat, A., 156Karatzas, I., 414, 416, 417, 463Karlin, S., 402, 411, 427, 437, 463
![Page 13: Appendix A Symbols, Useful Formulas, and Normal Table978-1-4419-9634-3/1.pdf · A. DasGupta, Probability for Statistics and Machine Learning: Fundamentals, 747 and Advanced Topics,](https://reader035.vdocuments.site/reader035/viewer/2022081517/5eda90445f8d0d7f302a5580/html5/thumbnails/13.jpg)
Author Index 759
Kass, R., 519Kemperman, J., 339Kendall, M.G., 54, 62, 65Kendall, W., 615Kesten, H., 254, 257Khare, K., 614, 671Kiefer, J., 242Kingman, J.F.C., 437, 439, 442, 451,
456, 457Kluppelberg, C., 238Knight, S., 673Komlos, J., 421, 425, 426, 537Korner, T., 417Kosorok, M., 527Kotz, S., 54Krishnan, T., 705, 711, 714, 739Kunsch, H.R., 702
LLahiri, S.N., 690, 702–704Laird, N., 705Landau, H.J., 577Lange, K., 714Lawler, G., 402, 437Le Cam, L., 34, 71, 509, 704Leadbetter, M., 221Ledolter, J., 714Ledoux, M., 560Lehmann, E.L., 238, 249, 583, 600,
699, 704Leise, F., 505Levine, R., 714Light, W., 731Lindgren, G., 221Linnik, Y., 69, 157Liu, J., 614, 657Logan, B.F., 570Lugosi, G., 560, 725
MMadsen, R., 339Mahalanobis, P., 208Major, P., 421, 425, 426, 537Mallows, C.L., 570Martynov, G., 238Mason, D.M., 535, 537, 570Massart, P., 242McDiarmid, C., 485, 487, 488, 560McKean, H., 390McLachlan, G., 705, 711, 714, 739Mee, R., 157Mengersen, K., 667, 669, 673
Metropolis, N., 614Meyn, S., 339Mikosch, T., 238Millar, P., 545Minh, H., 734Morris, C., 612Morters, P., 402, 420Murray, G.D., 712Mykland, P., 674
NNiyogi, P., 734Norris, J., 339, 366
OO’Reilly, N., 534Olkin, I., 208Oosterhoff, J., 563Owen, D., 157
PPaley, R.E., 20Panconesi, A., 560Parzen, E., 437Patel, J., 157Peres, Y., 402, 420Perlman, M., 208Petrov, V., 62, 249, 301, 308–311Pitman, J., 1Plackett, R., 157Politis, D., 702, 704Pollard, D., 527, 538, 548Polya, G., 382Port, S., 239, 264, 265, 302, 303, 306,
315, 437Propp, J., 615Pyke, R., 421
RRachev, S.T., 505Rao, C.R., 20, 62, 69, 157, 210, 505, 519, 520,
522Rao, R.R., 71, 76, 82, 249, 308Read, C., 157Reiss, R., 221, 231, 234, 236, 238, 285, 323,
326, 328, 505, 516Renyi, A., 375, 380, 389Resnick, S., 221, 402Revesz, P., 254, 420, 421, 423, 425,
427, 527
![Page 14: Appendix A Symbols, Useful Formulas, and Normal Table978-1-4419-9634-3/1.pdf · A. DasGupta, Probability for Statistics and Machine Learning: Fundamentals, 747 and Advanced Topics,](https://reader035.vdocuments.site/reader035/viewer/2022081517/5eda90445f8d0d7f302a5580/html5/thumbnails/14.jpg)
760 Author Index
Revuz, D., 402Rice, S.O., 570Richardson, S., 614Ripley, B.D., 614Robert, C., 614, 624, 673Roberts, G., 614, 667Romano, J., 702Rootzen, H., 221Rosenblatt, M., 721Rosenbluth, A., 614Rosenbluth, M., 614Rosenthal, J.S., 615, 667, 671Ross, S., 1, 614, 624Roy, S., 208Rozonoer, L., 732Rubin, D., 614, 674, 705Rubin, H., 337, 628Rudin, W., 726, 730
SSaloff-Coste, L., 505, 614, 652, 671Sauer, N., 539, 540Schmeiser, B., 624Sen, P.K., 328, 329Seneta, E., 339, 361Serfling, R., 249, 311, 323, 326Sethuraman, J., 614, 667, 671Shao, J., 690, 695, 701Shao, Q.M., 560, 570, 571Shawe-Taylor, J., 736Shepp, L., 570, 577Shorack, G., 238, 527Shreve, S., 414, 416, 417, 463Singer, J., 328, 329Singh, K., 693, 699, 701Sinha, B., 208Smith, A.F.M., 614Smith, R.L., 674Sparre-Andersen, E., 389Spencer, J., 20Spiegelhalter, D., 614Spitzer, F., 375, 389, 390Steele, J.M., 34Stein, C., 67Stern, H., 614Stigler, S., 71Stirzaker, D., 1, 339Strassen, V., 425Strawderman, W.E., 429Stroock, D., 560, 614Stuart, A., 54, 62, 65Sturmfels, B., 615Su, F., 505, 508, 509, 516
Sudakov, V.N., 572Sundberg, R., 705
TTalagrand, M., 572, 578Tanner, M., 614, 714Taylor, H.M., 402, 411, 427, 437, 463Teicher, H., 249, 259, 264, 267, 276, 463Teller, A., 614Teller, E., 614Thomas-Agnan, C., 731, 733Thonnes, E., 615Tibshirani, R., 690Tierney, L., 614, 667, 668, 671Titterington, D.M., 623Tiwari, R., 188Tong, Y., 138, 154, 201, 203, 204, 208, 210,
211, 215, 698Tu, D., 690, 695, 701Tusnady, G., 421, 425, 426, 537Tweedie, R., 339, 667, 669
VVajda, I., 505van de Geer, S., 527, 553van der Vaart, A., 249, 310, 527, 546,
547, 557van Zwet, W.R., 537Vapnik, V., 540, 541, 736Varadhan, S.R.S., 456, 560Vos, P., 519
WWang, Q., 560Wasserman, L., 67Waymire, E., 1, 339, 402, 408, 409,
414, 423Wei, G., 714Wellner, J., 238, 310, 527, 538, 546,
547, 557Wermuth, N., 157White, A., 704Whitt, W., 421Widder, D., 53Williams, D., 463Wilson, B., 615Wolf, M., 702Wolfowitz, J., 242Wong, W., 614Wu, C.F.J., 711–713
![Page 15: Appendix A Symbols, Useful Formulas, and Normal Table978-1-4419-9634-3/1.pdf · A. DasGupta, Probability for Statistics and Machine Learning: Fundamentals, 747 and Advanced Topics,](https://reader035.vdocuments.site/reader035/viewer/2022081517/5eda90445f8d0d7f302a5580/html5/thumbnails/15.jpg)
Author Index 761
Y
Yang, G., 704
Yao, Y., 734
Yor, M., 402
Yu, B., 674
ZZabell, S., 67Zeitouni, O., 560Zinn, J., 538, 694Zolotarev, V.M., 505Zygmund, A., 20
![Page 16: Appendix A Symbols, Useful Formulas, and Normal Table978-1-4419-9634-3/1.pdf · A. DasGupta, Probability for Statistics and Machine Learning: Fundamentals, 747 and Advanced Topics,](https://reader035.vdocuments.site/reader035/viewer/2022081517/5eda90445f8d0d7f302a5580/html5/thumbnails/16.jpg)
Subject Index
AABO allele, 709–711Acceptance distribution, 664Accept-reject method
beta generation, 627–628described, 625generation, standard normal values,
626–627scheme efficiency, 628
Almost surely, 252, 253, 256, 259, 261–262,267, 287, 288, 314, 410, 461, 470,491–494, 496, 498, 503, 535, 537,542, 555, 576, 577, 578, 615, 618,632, 641, 694, 725
Ancillarydefinition, 601statistic, 602, 604, 605
Anderson’s inequality, 214, 218Annulus, Dirichlet problem, 417–418Aperiodic state, 351Approximation of moments
first-order, 278, 279scalar function, 282second-order, 279, 280variance, 281
Arc sine law, 386ARE. See Asymptotic relative efficiencyArrival rate, 552, 553
intensity function, 455Poisson process, 439, 441, 442, 446,
460–462Arrival time
definition, 438independent Poisson process, 445interarrival times, 438
Asymptotic independence, 337Asymptotic relative efficiency (ARE)
IQR-based estimation, 327
median, 335variance ratio, 325
Asymptoticsconvergence, distribution
CDF, 262CLT, 266Cramer–Wold theorem, 263–264Helly’s theorem, 264LIL, 267multivariate CLT, 267Polya’s theorem, 265Portmanteau theorem, 265–266
densities and Scheffe’s theorem, 282–286laws, large numbers
Borel-Cantelli Lemma, 254–256Glivenko-Cantelli theorem, 258–259strong, 256–257weak, 256–258
moments, convergence (see Convergenceof moments)
notation and convergence, 250–254preservation, convergence
continuous mapping, 260–261Delta theorem, 269–272multidimension, 260sample correlation, 261–262sample variance, 261Slutsky’s theorem, 268–269transformations, 259variance stabilizing transformations,
272–274Asymptotics, extremes and order statistics
application, 325–326several, 326–327single, 323–325
convergence, types theoremlimit distributions, types, 332Mills ratio, 333
distribution theory, 323
763
![Page 17: Appendix A Symbols, Useful Formulas, and Normal Table978-1-4419-9634-3/1.pdf · A. DasGupta, Probability for Statistics and Machine Learning: Fundamentals, 747 and Advanced Topics,](https://reader035.vdocuments.site/reader035/viewer/2022081517/5eda90445f8d0d7f302a5580/html5/thumbnails/17.jpg)
764 Subject Index
Asymptotics, extremes and orderstatistics (cont.)
easy applicable limit theoremGumbel distribution, 331Hermite polynomial, 329
Fisher–Tippet familydefinition, 333theorem, 334–335
Avogadro’s constant, 401Azuma’s inequality, 483–486
BBallot theorem, 390Barker’s algorithm, 643Basu’s theorem
applications, probabilityconvergence result, 606–607covariance calculation, 605expectation calculation, 606exponential distribution result, 605mean and variance independence,
604–605exponential family, 602and Mahalanobis’s D2, 611and Neyman–Fisher factorization
definition, 602–603factorization, 603–604general, 604
Bayes estimate, 147–152, 467–468, 493–494Bayes theorem
conditional densities, 141, 147, 149use, 147
Berry–Esseen bounddefinition, 76normal approximation, 76–79
Bessel’s inequality, 742Best linear predictor, 110–111, 154Best predictor, 142, 154Beta-Binomial distribution
Gibbs sampler, 965–966simulation, 643–644
Beta densitydefintion, 59–60mean, 150mode, 89percentiles, 618
Bhattacharya affinity, 525Bikelis local bound, 322Binomial confidence interval
normal approximation, 74score confidence, 76Wald confidence, 74–76
Binomial distributionhypergeometric distribution problems, 30negative, 28–29
Bivariate Cauchy, 195Bivariate normal
definition, 136distribution, 212five-parameter distribution, 138–139formula, 138joint distribution, 140mean and variance independence, 139–140missing coordinates, 707–709property, 202simulation, 136, 137, 200–201
Bivariate normal conditional distributionsconditional expectation, 154Galton’s observation, 155
Bivariate normal formulas, 138, 156Bivariate Poisson, 120
Gibbs chain, 683MGF, 121
Bivariate uniform, 125–126, 133Block bootstrap, 701, 702, 703Bochner’s theorem, 306–307Bonferroni bounds, 5Bootstrap
consistencycorrelation coefficient, 697–698Kolmogorov metric, 692–693Mallows-Wasserstein metric, 693sample variance, 696Skewness Coefficient, 697t -statistic, 698
distribution, 690, 691, 694, 696, 738failure, 698–699higher-order accuracy
CBB, 702MBB, 702optimal block length, 704second-order accuracy, 699–700smooth function, 703–704thumb rule, 700–701
Monte Carlo, 694, 697, 737ordinary bootstrap distribution, 690–691resampling, 689–690variance estimate, 737
Borel–Cantelli lemmaalmost sure convergence, binomial
proportion, 255–256pairwise independence, 254–255
Borell inequality, 215Borel’s paradox
Jacobian transformation, 159
![Page 18: Appendix A Symbols, Useful Formulas, and Normal Table978-1-4419-9634-3/1.pdf · A. DasGupta, Probability for Statistics and Machine Learning: Fundamentals, 747 and Advanced Topics,](https://reader035.vdocuments.site/reader035/viewer/2022081517/5eda90445f8d0d7f302a5580/html5/thumbnails/18.jpg)
Subject Index 765
marginal and conditional density, 159–160mean residual life, 160–161
Boundary crossing probabilitiesannulus, 417–418domain and harmonic, 417harmonic function, 416irregular boundary points, 416recurrence and transience, 418–419
Bounded in probability, 251–252Box–Mueller transformation, 194Bracket, 546Bracketing number, 546Branching process, 500Brownian bridge
Brownian motion, 403empirical process, iid random variables,
404Karhunen–Loeve expansion, 554maximum, 410standard, 405
Brownian motioncovariance functions, 406d -dimensional, 405Dirichlet problem and boundary crossing
probabilitiesannulus, 417–418domain and harmonic, 417harmonic function, 416irregular boundary points, 416recurrence and transience, 418–419
distributional propertiesfractal nature, level sets, 415–416path properties and behavior, 412–414reflection principle, 410–411
explicit constructionKarhunen–Loeve expansion, 408
Gaussian process, Markovcontinuous random variables, 407correlation function, 406
invariance principle and statisticsconvergence, partial sum process,
423–424Donsker’s, 424–425partial sums, 421Skorohod embedding theorem, 422,
423uniform metric, 422
local timeLebesgue measure, 419one-dimensional, 420zero, 420–421
negative drift and density, 427–428Ornstein–Uhlenbeck process
convergence, 430–431
covariance function, 429Gaussian process, 430
random walksscaled, simulated plot, 403
state visited, planar, 406stochastic process
definition, 403Gaussian process, 404real-valued, 404standard Wiener process, 404–405
strong invariance principle and KMTtheorem, 425–427
transition density and heat equation,428–429
CCampbell’s theorem
characteristic functional …, 456Poisson random variables, 457shot effects, 456stable laws, 458
Canonical exponential familydescription, 590form and properties
binomial distribution, 590closure, 594–596convexity, 590–591moment and generating function,
591–594one parameter, 589–590
k-parameter, 597Canonical metric
definition, 575Capture-recapture, 32–33Cauchy
density, 44distribution, 44, 48
Cauchy order statistics, 232Cauchy random walk, 395Cauchy–Schwarz inequality, 21, 109, 259,
516, 731CDF. See Cumulative distribution functionCentral limit theorem (CLT)
approximation, 83binomial confidence interval, 74–76binomial probabilities., 72–73continuous distributions, 71de Moivre–Laplace, 72discrete distribution, 72empirical measures, 543–547error, 76–79iid case, 304martingale, 498multivariate, 267
![Page 19: Appendix A Symbols, Useful Formulas, and Normal Table978-1-4419-9634-3/1.pdf · A. DasGupta, Probability for Statistics and Machine Learning: Fundamentals, 747 and Advanced Topics,](https://reader035.vdocuments.site/reader035/viewer/2022081517/5eda90445f8d0d7f302a5580/html5/thumbnails/19.jpg)
766 Subject Index
Central limit theorem (CLT) (cont.)random walks, 73and WLLN, 305–306
Change of variable, 13Change point problem, 650–651Chapman–Kolmogorov equation, 256, 315,
318transition probabilities, 345–346
Characteristic function, 171, 263, 383–385,394–396, 449, 456, 458, 462, 500,533
Bochner’s theorem, 306–307CLT error
Berry–Esseen theorem, 309–310CDF sequence, 308–309theorem, 310–311
continuity theorems, 303–304Cramer–Wold theorem, 263Euler’s formula, 293inequalities
Bikelis nonuniform, 317Hoeffding’s, 318Kolmogorov’s maximal, 318partial sums moment and von
Bahr–Esseen, 318Rosenthal, 318–319
infinite divisibility and stable lawsdistributions, 317triangular arrays, 316
inversion and uniquenessdistribution determining property,
299–300Esseen’s Lemma, 301lattice random variables, 300–301theorems, 298–299
Lindeberg–Feller theorem, 311–315Polya’s criterion, 307–308proof
CLT, 305Cramer–Wold theorem, 306WLLN, 305
random sums, 320standard distributions
binomial, normal and Poisson, 295–296exponential, double exponential, and
Cauchy, 296n-dimensional unit ball, 296–297
Taylor expansions, differentiability andmoments
CDF, 302Riemann–Lebesgue lemma, 302–303
Chebyshev polynomials, 742Chebyshev’s inequality
bound, 52large deviation probabilities, 51
Chernoff’s variance inequalityequality holding, 68–69normal distribution, 68
Chibisov–O’Reilly theorem, 534–535Chi square density
degree of freedom, 45Gamma densities, 58
Chung–Fuchs theorem, 381, 394–396Circular Block Bootstrap (CBB), 702Classification rule, 724, 725, 734, 735, 736,
743, 744Communicating classes
equivalence relation, 350identification, 350–351irreducibility, 349period computation, 351
Completenessapplications, probability
covariance calculation, 605expectation calculation, 606exponential distribution result, 605mean and variance independence,
604weak convergence result, 606–607
definition, 601Neyman–Fisher factorization and Basu’s
theoremdefinition, 602–603
Complete randomnesshomogeneous Poisson process, 551property, 446–448
Compound Poisson Process, 445, 448, 449,459, 461
Concentration inequalitiesAzuma’s, 483Burkholder, Davis and Gundy
general square integrable martingale,481–482
martingale sequence, 480Cirel’son and Borell, 215generalized Hoeffding lemma, 484–485Hoeffding’s lemma, 483–484maximal
martingale, 477–478moment bounds, 479–480sharper bounds near zero, 479submartingale, 478
McDiarmid and DevroyeKolmogorov–Smirnov statistic,
487–488martingale decomposition and
two-point distribution, 486–487optional stopping theorem, 477upcrossing, 488–490
![Page 20: Appendix A Symbols, Useful Formulas, and Normal Table978-1-4419-9634-3/1.pdf · A. DasGupta, Probability for Statistics and Machine Learning: Fundamentals, 747 and Advanced Topics,](https://reader035.vdocuments.site/reader035/viewer/2022081517/5eda90445f8d0d7f302a5580/html5/thumbnails/20.jpg)
Subject Index 767
Conditional densityBayes theorem, 141best predictor, 142definition, 140–141two stage experiment, 143–144
Conditional distribution, 202–205binomial, 119definition, 100–101interarrival times, 460and marginal, 189–190and Markov property, 235–238Poisson, 104recurrence times, 399
Conditional expectation, 141, 150, 154Jensen’s inequality, 494order statistics, 247
Conditional probabilitydefinition, 5prior to posterior belief, 8
Conditional variance, 143, 487, 497definition, 103, 141
Confidence band, continuous CDF, 242–243Confidence interval
binomial, 74–75central limit theorem, 61–62normal approximation theorem, 81Poisson distribution, 80sample size calculation., 66
Confidence interval, quantile, 246, 326Conjugate priors, 151–152Consistency bootstrap, 693–699Consistency of kernel classification, 743Continuity of Gaussian process
logarithm tail, maxima and Landau–Shepptheorem, 577
Wiener process, 576–577Continuous mapping, 260–262, 268, 270, 275,
306, 327, 694Convergence diagnostics
Garren–Smith multiple chain Gibbsmethod, 675
Gelman–Rubin multiple chain method, 674spectral and drift methods, 673–674Yu–Mykland single chain method,
674–675Convergence in distribution
CLT, 266Cramer–Wold theorem, 263Delta theorem, 269–272description, 262Helly’s theorem, 264LIL, 267multivariate CLT, 267Polya’s theorem, 265
Portmanteau theorem, 265–266Slutsky’s theorem, 268–269
Convergence in mean, 253Convergence in probability
continuous mapping theorem, 270sure convergence, 252, 260
Convergence of Gibbs samplerdiscrete and continuous state spaces, 671drift method, 672–673failure, 670–671joint density, 672
Convergence of martingalesL1 and L2
basic convergence theorem, 493Bayes estimates, 493–494Polya’s urn, 493
theoremFatou’s lemma and monotone, 491
Convergence of MCMCDobrushin’s inequality and
Diaconis–Fill–Stroock bound,657–659
drift and minorization methods, 659–662geometric and uniform ergodic, 653separation and chi-square distance, 653SLEM, 651–652spectral bounds, 653–657stationary Markov chain, 651total variation distance, 652
Convergence of medians, 287Convergence of moments
approximation, 278–282distribution, 277–278uniform integrability
conditions, 276–277dominated convergence theorem,
275–276Convergence of types, 328, 332Convex function theorem, 468, 495–496Convolution, 715
definition, 167–168density, 168double exponential, 195n-fold, 168–169random variable symmetrization, 170–172uniform and exponential, 194
Correlation, 137, 195, 201, 204, 271–272coefficient, 212–213, 697–698convergence, 261–262definition, 108exponential order statistics, 234inequality, 120order statistics, 244, 245
![Page 21: Appendix A Symbols, Useful Formulas, and Normal Table978-1-4419-9634-3/1.pdf · A. DasGupta, Probability for Statistics and Machine Learning: Fundamentals, 747 and Advanced Topics,](https://reader035.vdocuments.site/reader035/viewer/2022081517/5eda90445f8d0d7f302a5580/html5/thumbnails/21.jpg)
768 Subject Index
Correlation (cont.)Poisson process, 461properties, 108–109
Correlation coefficient distributionbivariate normal, 212–213bootstrapping, 697–698
Countable additivitydefinition, 2–3
Coupon collection, 288Covariance, 544, 574, 735
calculation, 107, 605definition, 107function, 412, 429inequality, 120, 215matrix, 207, 216multinomial, 610properties, 108–109
Covariance matrix, 136–137, 200, 203, 205,207, 210, 216, 217, 267, 272, 280,326, 530, 597, 611, 709
Cox–Wermuth approximationapproximations test, 157–158
Cramer–Chernoff theoremCauchy case, 564exponential tilting technique, 563inequality, 561large deviation, 562rate function
Bernoulli case, 563–564definition, 560normal, 563
Cramer–Levy theorem, 293Cramer–von Mises statistic, 532–533Cramer–Wold theorem, 263, 266, 305, 306Cubic lattice random walks
distribution theory, 378–379Polya’s formula, 382–383recurrence and transience, 379–381recurrent state, 377three dimensions, 375two simulated, 376, 377
Cumulantsdefinition, 25–26recursion relations, 26
Cumulative distribution function (CDF)application, 87–88, 90, 245continuous random variable, 36–37definition, 9empirical, 241–243exchangeable normals, 206–207and independence, 9–12joint, 177, 193jump function, 10Markov property, 236
moments and tail, 49–50nonnegative integer-valued random
variables, 16PDF and median, 38Polya’s theorem, 265quantile transformation, 44range, 226standard normal density, 41–42, 62
Curse of dimensionality, 131–132, 184Curved exponential family
application, 612definition, 583, 608density, 607–608Poissons, random covariates, 608–609specific bivariate normal, 608
DDelta theorem, continuous mapping theorem
Cramer, 269–270random vectors, 269sample correlation, 271–272sample variance and standard deviation,
271Density
midrange, 245range, 226
Density functioncontinuous random variable, 38description, 36location scale parameter, 39normal densities, 40standard normal density, 42
Detailed balance equation, 639–640Diaconis–Fill–Stroock bound, 657–658Differential metrics
Fisher informationdirect differentiation, 521multivariate densities, 520
Kullback–Leibler distance curvature,519–520
Rao’s Geodesic distances, distributionsgeodesic curves, 522variance-stabilizing transformation, 522
Diffusion coefficientBrownian motion, 427, 428
Dirichlet cross moment, 196Dirichlet distribution
density, 189Jacobian density theorem, 188–189marginal and conditional, 189–190and normal, 190Poincare’s Lemma, 191subvector sum, 190
![Page 22: Appendix A Symbols, Useful Formulas, and Normal Table978-1-4419-9634-3/1.pdf · A. DasGupta, Probability for Statistics and Machine Learning: Fundamentals, 747 and Advanced Topics,](https://reader035.vdocuments.site/reader035/viewer/2022081517/5eda90445f8d0d7f302a5580/html5/thumbnails/22.jpg)
Subject Index 769
Dirichlet problemannulus, 417–418domain and harmonic, 417harmonic function, 416irregular boundary points, 416recurrence and transience, 418–419
Discrete random variableCDF and independence, 9–12definition, 8expectation and moments, 13–19inequalities, 19–22moment-generating functions, 22–26
Discrete uniform distribution, 2, 25, 277Distribution
binomialhypergeometric distribution problems,
30Cauchy, 44, 48CDF
definition, 9Chebyshev’s inequality, 20conditional, 203–205continuous, 71correlation coefficient, 212–213discrete, 72discrete uniform, 2, 25, 277lognormal, 65noncentral, 213–214Poisson, 80quadratic forms, Hotelling’s T2
Fisher Cochran theorem, 211sampling, statistics
correlation coefficient, 212–213Hotelling’s T2 and quadratic forms,
209–212Wishart, 207–208Wishart identities, 208–209
DKW inequality, 241–242, 541, 553Dobrushin coefficient
convergence of Metropolis chains,668–669
defined, 657–658Metropolis–Hastings, truncated geometric,
659two-state chain, 658–659
Dobrushin’s inequalitycoefficient, 657–658Metropolis–Hastings, truncated geometric,
659stationary Markov chain, 658two-state chain, 658–659
Domaindefinition, 417Dirichlet problem, 416, 417
irregular boundary point, 416maximal attraction, 332, 336
Dominated convergence theorem, 275, 285Donsker class, 544Donsker’s theorem, 423, 531Double exponential density, 39–40Doubly stochastic
bistochastic, 340one-step transition matrix, 347transition probability matrix, 342
DriftBayes example, 672–673drift-diffusion equation, 429and minorization methods, 659–662negative, Brownian motion, 427–428time-dependent, 429
Drift-diffusion equation., 429
EEdgeworth expansion, 82–83, 302, 622, 699,
700Ehrenfest model, 342, 373EM algorithm, 689, 704–744
ABO allele, 709–711background plus signal model, 738censoring, 740Fisher scoring method, 705genetics problem, 739mixture problem, 739monotone ascent and convergence, 711Poisson, missing values, 706–707truncated geometric, 738
Empirical CDFconfidence band, continuous, 242, 243definition, 527DKW inequality, 241–242
Empirical measureCLTs
entropy bounds, 544–546notation and formulation, 543–544
definition, 222, 528Empirical process
asymptotic propertiesapproximation, 530–531Brownian bridge, 529–530invariance principle and statistical
applications, 531–533multivariate central limit theorem, 530quantile process, 536strong approximations, 537weighted empirical process, 534–535
CLTs, measures and applicationscompact convex subsets, 547
![Page 23: Appendix A Symbols, Useful Formulas, and Normal Table978-1-4419-9634-3/1.pdf · A. DasGupta, Probability for Statistics and Machine Learning: Fundamentals, 747 and Advanced Topics,](https://reader035.vdocuments.site/reader035/viewer/2022081517/5eda90445f8d0d7f302a5580/html5/thumbnails/23.jpg)
770 Subject Index
Empirical process (cont.)entropy bounds, 544–546invariance principles, 543monotone and Lipschitz functions, 547notation and formulation, 543–544
maximal inequalities and symmetrizationGaussian processes, 547–548
notation and definitionscadlag functions and Glivenko–Cantelli
theorem, 529Poisson process
distributional equality, 552exceedance probability, 553randomness property, 551Stirling’s approximation, 553
Vapnik–Chervonenkis (VC) theorydimensions and classes, 540–541Glivenko–Cantelli class, 538–539measurability conditions, 542shattering coefficient and Sauer’s
lemma, 539–540uniform convergence, relative
frequencies, 542–543Empirical risk, 736Entropy bounds
bracket and bracketing number, 546covering number, 545–546Kolmogorov–Smirnov type statistic, 545P -Donsker, 546VC-subgraph, 545
Equally likelyconcept, 3finite sample space, 3sample points, 3
Ergodic theorem, 366, 640, 641Error function, 90Error of CLT
Berry–Esseen bound, 77Error probability
likelihood ratio test, 565, 580type I and II, 565Wald’s SPRT, 476–477
Esseen’s lemma, 301, 309Estimation
density estimator, 721histogram estimate, 683, 720–721, 737,
740optimal local bandwidth, 723plug-in estimate, 212, 630, 704, 720, 740series estimation, 720–721
Exceedance probability, 553Exchangeable normal
variablesCDF, 206–207
construction, 205–206definition, 205
Expectationcontinuous random variables, 45–46definitions, 46–47gamma function, 47
Experimentcountable additivity, 2–3countably infinite set, 2definition, 2inclusion–exclusion formula, 5
Exponential densitybasic properties, 55definition, 38densities, 55mean and variance, 47standard double, 40–41
Exponential familycanonical form and properties
closure, 594–596convexity, 590–591definition, 589–590moments and moment generating
function, 591–594curved (see Curved exponential family)multiparameter, 596–600one-parameter
binomial distribution, 586definition, 585–586gamma distribution, 587irregular distribution, 588–589normal distribution, mean and variance,
584–586unusual gamma distribution, 587variables errors, 586–587
sufficiency and completenessapplications, probability, 604–607definition, 601–602description, 600Neyman–Fisher factorization and
Basu’s theorem, 602–604Exponential kernel, 736, 740, 742, 744Exponential tilting
technique, 563Extremes. See also Asymptotics, extremes and
order statisticsdefinition, 221discrete-time stochastic sequence, 574–575finite sample theory, 221–247
FFactorial moment, 22Factorization theorem, 602–604
![Page 24: Appendix A Symbols, Useful Formulas, and Normal Table978-1-4419-9634-3/1.pdf · A. DasGupta, Probability for Statistics and Machine Learning: Fundamentals, 747 and Advanced Topics,](https://reader035.vdocuments.site/reader035/viewer/2022081517/5eda90445f8d0d7f302a5580/html5/thumbnails/24.jpg)
Subject Index 771
Failure of EM, kernel, 712–713Fatou’s lemma, 491, 492F-distribution, 174–175, 218f -divergence
finite partition property, 507, 517Feature maps, 732–744Filtered Poisson process, 443–444Finite additivity, 3Finite dimensional distributions, 575Finite sample theory
advanced distribution theorydensity function, 226density, standard normals, 229exponential order statistics, 227–228joint density, 225–226moment formulas, uniform case, 227normal order statistics, 228
basic distribution theoryempirical CDF, 222joint density, 222–223minimum, median and maximum
variables, 225order statistics, 221quantile function, 222uniform order statistics, 223–224
distribution, multinomial maximumequiprobable case, 243Poissonization technique, 243
empirical CDFconfidence band, continuous CDF,
242–243DKW inequality, 241–242
existence of moments, 230–232quantile transformation theorem, 229–230records
density, values and times, 240–241interarrival times, 238
spacingsdescription, 233exponential and Reyni’s representation,
233–234uniform, 234–235
First passage, 354–359, 383–386, 409, 410,427, 475
Fisher Cochran theorem, 211Fisher information
and differential metrics, 520–521matrix, 597–600, 610–611
Fisher information matrixapplication, 597–610definition, 597and differential, 520–521multiparameter exponential family,
597–600
Fisher linear, 735Fisher’s Iris data, 744Fisher–Tippet family
definition, 333theorem, 334–335
Formulachange of variable, 13inclusion-exclusion, 4Tailsum, 16
Full conditionals, 645–646, 648–651, 671, 672Function theorem
convex, 468, 495–496implicit, 569
GGambler’s ruin, 351–353, 370, 392, 464,
468–470, 475, 501Gamma density
distributions, 59exponential density, 56skewness, 82
Gamma functiondefinition, 47
Garren–Smith multiple chain Gibbs method,675
Gartner–Ellis theoremconditions, 580large deviation rate function, 567multivariate normal mean, 568–569non-iid setups, 567–568
Gaussian factorization, 176–177Gaussian process
definition, 404Markov property, 406–407stationary, 430
Gaussian radial kernel, 736Gegenbauer polynomials, 741Gelman–Rubin multiple chain method, 674Generalized chi-square distance, 525, 652, 653Generalized Hoeffding lemma, 484–485Generalized negative binomial distribution,
28–29, 609Generalized Wald identity, 475–476Generalized Wasserstein distance, 525Generating function
central moment, 25–26Chernoff–Bernstein inequality, 51–52definition, 22distribution-determining property, 24–25inversion, 53Jensen’s inequality, 52–53Poisson distribution, 23–24standard exponential, 51
![Page 25: Appendix A Symbols, Useful Formulas, and Normal Table978-1-4419-9634-3/1.pdf · A. DasGupta, Probability for Statistics and Machine Learning: Fundamentals, 747 and Advanced Topics,](https://reader035.vdocuments.site/reader035/viewer/2022081517/5eda90445f8d0d7f302a5580/html5/thumbnails/25.jpg)
772 Subject Index
Geometrically ergodicconvergence of MCMC, 653Metropolis chain, 669–670spectral bounds, 654
Geometric distributiondefinition, 28exponential densities, 55lack of memory, 32
Gibbs samplerbeta-binomial pair, 647–648change point problem, 650–651convergence, 670–673definition, 646Dirichlet distributions, 649–650full conditionals, 645–646Gaussian Bayes
formal and improper priors, 648Markov chain generation problem, 645random scan, 646–647systematic scan, 647
Glivenko–Cantelli class, 539, 541Glivenko–Cantelli theorem, 258–259, 336,
487, 529, 538, 541, 545, 555, 690Gumbel density
definition, 61standard, 61
HHajek–Renyi inequality, 396Hardy–Weinberg law, 609Harmonic
definition, 417function, 416
Harris recurrenceconvergence property, 672drift and minorization methods, 660Metropolis chains, 667–668
Heat equation, 429, 435Hellinger metric, 506–507, 514Helly’s theorem, 264Helmert’s transformation, 186–187Hermite polynomials, 83, 320, 329, 741High dimensional formulas, 191–192Hilbert norm, 729Hilbert space, 725–732, 734, 735, 743Histogram estimate, 683, 720–721, 737, 740Hoeffding’s inequality, 318, 483–484Holder continuity, 413Holder’s inequality, 21Hotelling’s T2
Fisher Cochran theorem, 211quadratic form, 210
Hypergeometric distributiondefinition, 29ingenious use, 32
IIID. See Independent and identically
distributedImportance sampling
Bayesian calculations, 629–630binomial Bayes problem, 631described, 619–620distribution, 629, 633–634Radon–Nikodym derivatives, 629theoretical properties, 632–633
Importance sampling distribution, 633–634Inclusion-exclusion formula, 4, 5Incomplete inner product space, 728Independence
definition, 5, 9Independence of mean and variance, 139–140,
185–187, 211, 604–605Independent and identically distributed (IID)
observations, 140random variables, 20
Independent increments, 404–405, 409, 412,427, 433, 449
Independent metropolis algorithm, 643Indicator variable
Bernoulli variable, 11binomial random variable, 34distribution, 11mathematical calculations, 10
InequalityAnderson’s, 214Borell concentration, 215Cauchy–Schwarz, 21central moment, 25Chen, 215Chernoff–Bernstein inequality, 51–52Cirel’son, 215covariance, 215–216distribution, 19Jensen’s inequality, 52–53monotonicity, 214–215, 593positive dependence, 215probability, 20–21Sidak, 215Slepian’s I and II, 214
Infinite divisibility and stable lawsdescription, 315distributions, 317triangular arrays, 316
![Page 26: Appendix A Symbols, Useful Formulas, and Normal Table978-1-4419-9634-3/1.pdf · A. DasGupta, Probability for Statistics and Machine Learning: Fundamentals, 747 and Advanced Topics,](https://reader035.vdocuments.site/reader035/viewer/2022081517/5eda90445f8d0d7f302a5580/html5/thumbnails/26.jpg)
Subject Index 773
Initial distribution, 369, 641, 651, 652, 655,658, 668
definition, 340Markov chain, 361weak ergodic theorem, 366
Inner product space, 726, 727, 728, 740–742Inspection paradox, 444Integrated Brownian motion, 432Intensity, 557, 682
function, 454, 455, 460piecewise linear, 454–456
Interarrival time, 238, 240, 241conditional distribution, 453Poisson process, 438transformed process, 453
Interquartile range (IQR)definition, 327limit distribution, 335
Invariance principleapplication, 434convergence, partial sum process, 423–424Donsker’s, 424–425, 538partial sums, 421Skorohod embedding theorem, 422, 423and statistics
Cramer–von Mises statistic, 532–533Kolmogorov–Smirnov statistic,
531–532strong
KMT theorem, 425–427partial sum process, 530–531, 537
uniform metric, 422weak, 536
Invariant measure, 663Inverse Gamma density, 59Inverse quadratic kernel, 736, 744Inversion of mgf
moment-generating function, 53Inversion theorem
CDF, 298–299failure, 307–308Plancherel’s identity, 298
IQR. See Interquartile rangeIrreducible, 569, 581, 640–643, 645, 647,
653–655, 660, 661, 663, 667, 671,685
definition, 349loop chains, 371regular chain, 363
Isolated points, 415, 416Isotropic, 717, 742, 743Iterated expectation, 105, 119, 146, 162, 172
applications, 105–106
formula, 105, 146, 167, 468higher-order, 107
Iterated variance, 119, 623applications, 105–106formula, 105
JJacobian formula
CDF, 42technique, 45
Jacobi polynomials, 741, 742Jensen’s inequality, 52–53, 280, 468, 491, 494,
516Joint cumulative distribution function, 97, 112,
124, 177, 193, 261, 263, 447Joint density
bivariate uniform, 125–126continuous random vector, 125defined, 123–124dimensionality curse, 131–132nonuniform, uniform marginals, 128–129
Joint moment-generating function (mgf),112–114, 121, 156, 597
Joint probability mass function (pmf), 121,148, 152, 585, 599, 608
definition, 96, 112function expectation, 100
KKarhunen–Loeve expansion, 408, 533, 554Kernel classification rule, 724–731Kernels and classification
annhilator, 743definition, 715density, 719–723estimation
density estimator, 721histograms, 720optimal local bandwidth, 723plug-in, 720series estimation, 720–721
Fourier, 717kernel plots, 742product kernels, 743smoothing
definition, 715–716density estimation, 718
statistical classificationexponential kernel, 736Fisher linear, 735Gaussian radial kernel, 736Hilbert norm, 729Hilbert space, 725–727
![Page 27: Appendix A Symbols, Useful Formulas, and Normal Table978-1-4419-9634-3/1.pdf · A. DasGupta, Probability for Statistics and Machine Learning: Fundamentals, 747 and Advanced Topics,](https://reader035.vdocuments.site/reader035/viewer/2022081517/5eda90445f8d0d7f302a5580/html5/thumbnails/27.jpg)
774 Subject Index
Kernels and classification (cont.)inner product space, 728linearly separable data, 735linear span, 726–727, 729, 742Mercer’s theorem, 732–744operator norm, 729parallelogram law, 727polynomial kernel, 736positive definiteness, 732Pythagorean identity, 727reproducing kernel, 725Riesz representation theorem, 730rule, 724–725sigmoid kernel, 736support vector machines, 734–736
Weierstrass kernels, 719Kolmogorov metric, 506, 511, 692, 693, 695,
696Kolmogorov–Smirnov statistic
invariance principle, 531–532McDiarmid and Devroye inequalities,
487–488null hypothesis testing, 545weighted, 534–535
Komlos, Major, Tusnady theoremrate, 537strong invariance principle, 425–427
Kullback–Leibler distancedefinition, 507inequality, 711, 712
Kurtosiscoefficient, 82defined, 18skewness, 82
LLack of memory
exponential densities, 55exponential distribution, 55–56geometric distribution, 32
Laguerre polynomials, 741Landau–Shepp theorem, 577Laplacian, 417Large deviation, 51, 463, 483
continuous timeextreme statistics, 574finite-dimensional distribution (FDD),
575Gaussian process, 576–577metric entropy, supremum, 577–579
Cramer–Chernoff theorem, 560–564Cramer’s theorem, general set, 566–567
Gartner–Ellis theorem and Markov chain,567–570
Lipschitz functions and Talagrand’sinequality
correlated normals, 573–574outer parallel body, 572–573Taylor expansion, 572
multiple testing, 559rate function, properties
error probabilities, likelihood ratio test,565–566
shape and smoothness, 564self-normalization, 560t -statistic
definition, 570normal case, 571–572rate function, 570
Law of iterated logarithm (LIL)application, 433Brownian motion, 419use, 267
Legendre polynomials, 741Level sets
fractal nature, 415–416Levy inequality, 397, 400Likelihood function
complete data, 708, 710definition, 148, 565log, 712shape, 153
Likelihood ratiosconvergence, 490–491error probabilities, 565–566Gibbs sampler, 645martingale formation, 467sequential tests, statistics, 469
Lindeberg–Feller theoremfailure condition, 315IID variables, linear combination,
314–315Lyapounov’s theorem,
311–312Linear functional, 729, 730, 743Linear span, 726, 727, 729, 742Local limit theorem, 82Local time
Brownian motionLebesgue measure, 419one-dimensional, 420zero, 420–421
Lognormal densityfinite mgf, 65skewness, 65
Loop chains, 369, 371
![Page 28: Appendix A Symbols, Useful Formulas, and Normal Table978-1-4419-9634-3/1.pdf · A. DasGupta, Probability for Statistics and Machine Learning: Fundamentals, 747 and Advanced Topics,](https://reader035.vdocuments.site/reader035/viewer/2022081517/5eda90445f8d0d7f302a5580/html5/thumbnails/28.jpg)
Subject Index 775
Lower semi-continuousdefinition, 564
Lyapounov inequality, 21–22, 52–53
MMapping theorem, 261, 262, 275, 327, 422,
424, 694continuous, 269–272intensity measure, 452, 453lower-dimensional projections, 452
Marginal densitybivariate uniform, 125–126function, 125
Marginal probability mass function (pmf), 117definition, 98E.X/ calculation, 110
Margin of error, 66, 91Markov chain, 381, 463, 466, 500, 567–570,
581, 613–686Chapman–Kolmogorov equation
transition probabilities, 345–346communicating classes
equivalence relation, 350identification, 350–351irreducibility, 349period computation, 351
gambler’s ruin, 352–353long run evolution and stationary
distributionasymmetric random walk, 365–366Ehrenfest Urn, 364–365finite Markov chain, 361one-step transition probability matrix,
360weak ergodic theorem, 366
recurrence, transience and first passagetimes
definition, 354infinite expectation, 355–358simple random walk, 354–355
Markov chain large deviation, 567–570Markov chain Monte Carlo (MCMC) methods
algorithms, 638convergence and bounds, 651–662general spaces
Gibbs sampler, convergence, 670–673independent Metropolis scheme, 665independent sampling, 664Markov transition kernel, 662–663Metropolis chains, 664Metropolis schemes, convergence,
666–670
nonconventional multivariatedistribution, 666
practical convergence diagnostics,673–675
random walk Metropolis scheme, 664simulation, t-distribution, 665stationary Markov chain, 663–664transition density, 663
Gibbs sampler, 645–651Metropolis algorithm
Barker and independent, 643beta–binomial distribution, 643–644independent sampling, 642–643Metropolis–Hastings algorithm, 643proposal distribution, 645transition matrix, 642truncated geometric distribution,
644–645Monte Carlo
conventional, 614ordinary, 615–624
principle, 614reversible Markov chains
discrete state space, 640–641ergodic theorem, 641–642irreducible and aperiodic stationary
distribution, 640stationary distribution, 639–640
simulationdescribed, 613textbook techniques, 614–615, 624–636
target distribution, 613, 637–638Markov process
definition, 404Ornstein–Uhlenbeck process, 431two-dimensional Brownian motion, 433
Markov’s inequalitydescription, 68
Markov transition kernel, 662–663Martingale
applicationsadapted to sequence, 464–465Bayes estimates, 467–468convex function theorem, 468gambler’s fortune, 463–464Jensen’s inequality, 468likelihood ratios, 467matching problem, 465–466partial sums, 465Polya urn scheme, 466sums of squares, 465supermartingale, 464Wright–Fisher Markov chain, 466–467
![Page 29: Appendix A Symbols, Useful Formulas, and Normal Table978-1-4419-9634-3/1.pdf · A. DasGupta, Probability for Statistics and Machine Learning: Fundamentals, 747 and Advanced Topics,](https://reader035.vdocuments.site/reader035/viewer/2022081517/5eda90445f8d0d7f302a5580/html5/thumbnails/29.jpg)
776 Subject Index
Martingale (cont.)central limit theorem, 497–498concentration inequalities, 477–490convergence, 490–494Kolmogorov’s SLLN proof, 496optional stopping theorem, 468–477reverse, 494–496
Martingale central limit theoremCLT, 498conditions
concentration, 497Martingale Lindeberg, 498
Martingale Lindeberg condition, 498Martingale maximal inequality, 477–478Matching problem, 84, 465–466Maximal inequality
Kolmogorov’s, 318, 396martingale and concentration, 477–480and symmetrization, 547–551
Maximum likelihood estimate (MLE)closed-form formulas, 705definition, 152–153endpoint, uniform, 153–154exponential mean, 153normal mean and variance, 153
McDiarmid’s inequalityKolmogorov–Smirnov statistic, 487–488martingale decomposition and two-point
distribution, 486–487Mean absolute deviation
definition, 17and mode, 30
Mean measure, 450, 557Mean residual life, 160–161Median
CDF to PDF, 38definition, 9distribution, 55exponential, 55
Mercer’s theorem, 732–744Metric entropy, 545, 577–579Metric inequalities
Cauchy–Schwarz, 516variation vs. Hellinger, 517–518
Metropolis–Hastings algorithmbeta–binomial distribution, 644described, 643SLEM, 656–657truncated geometric, 659
Midpoint process, 462Midrange, 245Mills ratio, 160, 333Minkowski’s inequality, 21
Minorization methodsBernoulli experiment, 661–662coupling, 661drift condition, 660Harris recurrence, 660hierarchical Bayes linear models, 660–661nonreversible chains, 659–660
Mixed distribution, 88, 434Mixtures, 39, 214, 300, 307, 611, 661, 721,
739MLE. See Maximum likelihood estimateMode, 30–31, 89, 91, 232, 252, 282, 552, 627,
628, 714Moment generating function (MGF)
central moment, 25–26Chernoff–Bernstein inequality, 51–52definition, 22distribution-determining property, 24–25inversion, 53Jensen’s inequality, 52–53Poisson distribution, 23–24standard exponential, 51
Moment generating function (mgf), 116, 121Moment problem, 277–278Moments
continuous random variables, 45–46definitions, 46–47standard normal, 48–49
Monotone convergence theorem, 491, 492Monte Carlo
bootstrap, 691–692conventional, 614–615EM algorithm, 706Markov chain, 637–645ordinary
Bayesian, 618–619computation, confidence intervals, 616P-values, 622–623Rao–Blackwellization, 623–624
Monte Carlo P-valuescomputation and application, 622–623statistical hypothesis-testing problem, 622
Moving Block Bootstrap (MBB), 702Multidimensional densities
bivariate normalconditional distributions, 154–155five-parameter, density, 136, 138–139formulas, 138, 155–158mean and variance, 139–140normal marginals, 140singular, 137variance–covariance matrix, 136–137
Borel’s paradox, 158–161
![Page 30: Appendix A Symbols, Useful Formulas, and Normal Table978-1-4419-9634-3/1.pdf · A. DasGupta, Probability for Statistics and Machine Learning: Fundamentals, 747 and Advanced Topics,](https://reader035.vdocuments.site/reader035/viewer/2022081517/5eda90445f8d0d7f302a5580/html5/thumbnails/30.jpg)
Subject Index 777
conditional density and expectations,140–147
posterior densities, likelihood functionsand Bayes estimates, 147–152
Multinomial distribution, 243, 542, 599–600MGF, 116pmf, 114and Poisson process, 451–452
Multinomial expansion, 116Multinomial maximum distribution
maximum cell frequency, 243Multiparameter exponential family
assumption, 596–597definition, 596, 597Fisher information matrix definition, 597general multivariate normal distribution,
598–599multinomial distribution, 599–600two-parameter
gamma, 598inverse gaussian distribution, 600normal distribution, 597–598
Multivariate Cauchy, 196Multivariate CLT, 267Multivariate Jacobian formula, 177–178Multivariate normal
conditional distributionsbivariate normal case, 202independent, 203quadrant probability, 204–205
definition and propertiesbivariate normal simulation, 200, 201characterization, 201–202density, 200linear transformations density, 200–201
exchangeable normal variables, 205–207inequalities, 214–216noncentral distributions
chi-square, 214t -distribution, 213
NNatural parameter
definition, 589, 599space, 589, 590, 596, 598and sufficient statistic, 600
Natural sufficient statisticdefinition, 585, 596one-parameter exponential family, 595–596
Nearest neighbor, 462, 554, 556Negative
binomial distribution, 28–29, 609drift, 427–428
Neighborhood recurrent, 394, 418, 419Noncentral chi square distribution, 214, 218
Noncentral F distribution, 210Noncentral t -distribution, 213, 218Non-lattice, 701Nonreversible chain, 659–660Normal approximation
binomialconfidence interval, 74–76
Normal densityCDF, 41–42definition, 40
Normalizingdensity function, 39–40
Normal order statistics, 228Null recurrent
definition, 356Markov chain, 373
OOperator norm, 729Optimal local bandwidth, 723Optional stopping theorem
applicationserror probabilities, Wald’s SPRT,
476–477gambler’s ruin, 475generalized Wald identity, 475–476hitting times, random walk, 475Wald identities, 474–475
stopping timesdefined, 469sequential tests and Wald’s SPRT, 469
Order statisticsbasic distribution theory, 221–222Cauchy, 232conditional distributions, 235–236density, 225description, 221existence of moments, 230–232exponential, 227–228joint density, 222–223moments, uniform, 227normal, 228–229quantile transformation, 229–230and range, 226–227spacings, 233–235uniform, 223–224
Orlicz norms, 556–557Ornstein–Uhlenbeck process
convergence, 430–431covariance function, 429Gaussian process, 430
Orthonormal bases, Parseval’s identity,728–729
Orthonormal system, 741
![Page 31: Appendix A Symbols, Useful Formulas, and Normal Table978-1-4419-9634-3/1.pdf · A. DasGupta, Probability for Statistics and Machine Learning: Fundamentals, 747 and Advanced Topics,](https://reader035.vdocuments.site/reader035/viewer/2022081517/5eda90445f8d0d7f302a5580/html5/thumbnails/31.jpg)
778 Subject Index
PPacking number, 548Parallelogram law, 727Pareto density, 60Partial correlation, 204Partial sum process
convergence, Brownian motion, 423–424interpolated, 425strong invariance principle, 537
Pattern problemsdiscrete probability, 26recursion relation, 27variance formula, 27
Periodburn-in, 675circular block bootstrap (CBB), 702computation, 351intensity function, 454, 455
Perron–Frobenius theorem, 361–363, 569, 654Plancherel’s identity, 298–299Plug-in estimate, 212, 630, 704, 720, 740Poincare’s lemma, 191Poisson approximation
applications, 35binomial random variable, 34confidence intervals, 80–82
Poisson distributioncharacteristic function, 295–296confidence interval, 80–81moment-generating function, 23–24
Poissonization, 116–118, 121, 243, 244Poisson point process
higher-dimensionalintensity/mean measure, 450Mapping theorem, 452–453multinomial distribution, 451–452Nearest Event site, 451
Nearest Neighbor, 462polar coordinates, 460
Poisson process, 551–553, 557, 636, 682Campbell’s theorem and shot noise
characteristic functional …, 456–457shot effects, 456stable laws, 458
1-D nonhomogeneous processesintensity and mean function, plots, 455mapping theorem, 453–454piecewise linear intensity, 454–456
higher-dimensional Poisson pointprocesses
distance, nearest event site, 451mapping theorem, 452–453stationary/homogeneous, 450
Polar coordinatesspherical calculations
definition, 182–183dimensionality curse, 184joint density, 183–184spherically symmetric facts, 185
two dimensionsn uniforms product, 181–182polar transformation usefulness, 181transformation, 180–181
use, 134–135Polar transformation
spherical calculations, 182–185use, 181
Polya’s criterioncharacteristic functions, 308inversion theorem failure, 307–308stable distributions, 307
Polya’s formula, 382–383Polya’s theorem
CDF, 265, 509return probability, 382–383
Polya’s urn, 466, 493Polynomial kernel, 736Portmanteau theorem
definition, 265Positive definiteness, 568, 717, 732Positive recurrent, 356–357, 361, 363, 365,
651Posterior density
definition, 147–148exponential mean, 148–149normal mean, 151–152Poisson mean, 150
Posterior meanapproximation, 651binomial, 149–150
Prior densitydefinition, 148uses, 151
Probability metricsdifferential metrics, 519–522metric inequalities, 515–518properties
coupling identity, 508f -divergences, 513–514Hellinger distance, 510–511joint and marginal distribution
distances, 509–510standard probability, statisticsf -divergences, 507Hellinger metric, 506–507Kolmogorov metric, 506Kullback–Leibler distance, 507
![Page 32: Appendix A Symbols, Useful Formulas, and Normal Table978-1-4419-9634-3/1.pdf · A. DasGupta, Probability for Statistics and Machine Learning: Fundamentals, 747 and Advanced Topics,](https://reader035.vdocuments.site/reader035/viewer/2022081517/5eda90445f8d0d7f302a5580/html5/thumbnails/32.jpg)
Subject Index 779
Levy–Prokhorov metric, 507separation distance, 506total variation metric, 506Wasserstein metric, 506
Product kernels., 743Product martingale, 499Projection formula, 742Prophet inequality
application, 400Bickel, 397Doob–Klass, 397
Proportions, convergence of EM, 282,711–713
Proposal distribution, 645, 664, 665Pythagorean identity, 727
QQuadrant probability, 204Quadratic exponential family, 612Quadratic forms distribution
Fisher Cochran theorem, 211Hotelling’s T2 statistic, 209–210independence, 211–212spectral decomposition theorem, 210
Quantile processnormalized, 528–529restricted Glivenko–Cantelli property, 536uniform, 529weak invariance principle, 536
Quantile transformationaccept–reject method, 625application, 242Cauchy distribution, 44closed-form formulas, 625definition, 44theorem, 229–230
Quartile ratio, 335
RRandom permutation, 636Random scan Gibbs sampler, 646, 647, 672Random walks
arc sine law, 386Brownian motion, 402, 403Chung–Fuchs theorem, 394–396cubic lattice
distribution theory, 378–379Polya’s formula, 382–383recurrence and transience, 379–381recurrent state, 377three dimensions, 375two simulated, 376, 377
Erdos–Kac generalization, 390first passage time
definition, 383distribution, return times, 383–386
inequalitiesBickel, 397Hajek–Renyi, 396Kolmogorov’s maximal, 396Levy and Doob–Klass Prophet, 397
Sparre–Andersen generalization, 389, 390Wald’s identity
stopping time, 391, 392Rao–Blackwellization
Monte Carlo estimate, 623–624variance formula, 623
Rao’s geodesic distance, 522Rate function, 52Ratio estimate, 630Real analytic, 564–565Records
density, values and times, 240–241interarrival times, 238
Recurrence, 354–359, 379–381, 383, 385, 393,396, 399, 418–419, 472, 660, 667,668
Recurrent state, 355, 357–359, 370, 373, 377,394
Recurrent value, 393Reflection principle, 410–411, 432Regular chain
communicating classes, 349irreducible, 363
Regular variation, 332, 333Reproducing kernel, 725–731, 733, 735, 743Resampling, 543, 689, 690, 691, 692, 701Restricted Glivenko–Cantelli property, 536Return probability, 382–383Return times
distribution, 383scaled, asymptotic density, 385
Reverse martingaleconvergence theorem, 496defined, 494sample means, 494–495second convex function theorem, 495–496
Reverse submartingale, 494–496Reversible chain, 373, 640, 645, 652, 685Reyni’s representation, 233–234Riemann–Lebesgue lemma, 302, 303Riesz representation theorem, 730, 743RKHS, 743Rosenthal inequality, 318–319, 322
![Page 33: Appendix A Symbols, Useful Formulas, and Normal Table978-1-4419-9634-3/1.pdf · A. DasGupta, Probability for Statistics and Machine Learning: Fundamentals, 747 and Advanced Topics,](https://reader035.vdocuments.site/reader035/viewer/2022081517/5eda90445f8d0d7f302a5580/html5/thumbnails/33.jpg)
780 Subject Index
SSample maximum, 224, 235, 274, 276, 287,
337, 603–604asymptotic distribution, 330Cauchy, mode, 232domain of attraction, 336
Sample pointsconcept, 3
Sample range, 337Sample space
countable additivity, 2–3countably infinite set, 2definition, 2inclusion–exclusion formula, 5
Sauer’s lemma, 539–540Schauder functions, 408Scheffes theorem, 284–285Score confidence interval, 74, 76Second Aresine law, 410Second largest eigenvalue in modulus (SLEM)
approximation, 686calculation, 685convergence, Gibbs sampler, 671vs. Dobrushin coefficients, 686Metropolis–Hastings algorithm, 656–657transition probability matrix, 651–652
Second order accuracy, 699–701Separation distance, 506, 653Sequential probability ratio test (SPRT), 469,
476–477Sequential tests, 391, 469Series estimate, 720–721Shattering coefficient, 539, 540, 542Shot noise, 456–458Sidak inequality, 215Sigmoid kernel, 736Simple random walk, 354, 375, 393, 395, 397,
399, 474mathematical formulation, 345periodicity and, 369simulated, 376, 377
Simulationalgorithms, common distributions,
634–636beta-binomial distribution, 643–644bivariate normal, 137, 201textbook techniques
accept–reject method, 625–628quantile transformation, 624–625
truncated geometric distribution, 644–645Skewness
defined, 18normal approximation, 82
Skorohod embedding, 422, 423
SLEM. See Second largest eigenvalue inmodulus
Slepian’s inequaliity, 214SLLN. See Strong law of large numbersSlutsky’s theorem
application, 269, 270Spacings
description, 233exponential and Reyni’s representation,
233–234uniform, 234–235
Spatial Poisson process, 459Spectral bounds
decomposition, 655–656Perron–Frobenius theorem, 653–654SLEM, Metropolis–Hastings algorithm,
656–657Spherically symmetric, 135, 180, 182–186,
191SPRT. See Sequential probability ratio testStable laws
index 1/2, 385–386infinite divisibility
description, 315distributions, 317triangular arrays, 316
and Poisson process, 458Stable random walk, 395Standard discrete distributions
binomial, 28geometric, 28hypergeometric, 29negative binomial, 28–29Poisson, 29–34
Stationary, 152, 153, 163, 404, 406, 430, 431,440, 449, 450, 466, 569, 575, 576,614, 639, 640, 641, 645–648,651–656, 658–664, 667, 672, 674,684, 685, 701–703, 711, 713
Stationary distribution, 614, 639, 640,646–648, 651, 653, 658, 659, 661,663, 667, 672, 674, 684, 685
Statistical classification, kernelsexponential kernel, 736Fisher linear, 735Gaussian radial kernel, 736Hilbert norm, 729inner product space, 728linear functional, 729, 730linearly separable data, 735linear span, 726–727, 729, 742Mercer’s theorem, 732–744orthonormal bases, 728–729parallelogram law, 727
![Page 34: Appendix A Symbols, Useful Formulas, and Normal Table978-1-4419-9634-3/1.pdf · A. DasGupta, Probability for Statistics and Machine Learning: Fundamentals, 747 and Advanced Topics,](https://reader035.vdocuments.site/reader035/viewer/2022081517/5eda90445f8d0d7f302a5580/html5/thumbnails/34.jpg)
Subject Index 781
polynomial kernel, 736Pythagorean identity, 727Riesz representation theorem, 730sigmoid kernel, 736support vector machines, 734–736
Statistics and machine learning toolsbootstrap
consistency, 692–699dependent data, 701–704higher-order accuracy, 699–701
EM algorithmABO allele, 709–711modifications, 714monotone ascent and convergence,
711–713Poisson, missing values, 706–707
kernelscompact support, 718Fourier, 717Mercer’s theorem, 722–734nonnegativity, 717rapid decay, 718smoothing, 715–717
Stein’s lemmacharacterization, 70normal distribution, 66principal applications, 67–68
Stochastic matrix, 340, 654Stochastic process
continuous-time, 422d -dimensional, 418definition, 403standard Wiener process, 404
Stopped martingale, 472Stopping time
definition, 391, 478optional stopping theorem, 470–471sequential tests, 469Wald’s SPRT, 469
Strong approximations, 530–531, 537Strong ergodic theorem, 641Strong invariance principle
and KMT theorem, 425–427partial sum process, 530–531, 537
Strong law of large numbers (SLLN)application, 261Cramer-Chernoff theorem, 579Kolmogorov’s, 258, 496, 615, 632Markov chain, 640proof, 496Zygmund–Marcinkiewicz, 693–695
Strong Markov property, 409, 411, 423Submartingale
convergence theorem, 491–492
convex function theorem, 468nonnegative, 477–478optional stopping theorem, 470–471reverse, 494–496upcrossing inequality, 489
SufficiencyNeyman–Fisher factorization and Basu’s
theoremdefinition, 602general, 604
Sum of uniformsdistribution, 77normal density, 78, 79
Supermartingale, 464, 491, 494Support vector machine, 734–736Symmetrization, 170–172, 319, 547–551Systematic scan Gibbs sampler
bivariate distribution, 647–648change point problem, 650–651convergence property, 672defined, 646Dirichlet distribution, 650
TTailsum formula, 16–19, 86, 357Target distribution
convergence, MCMC, 651–652Metropolis chains, 664
Taylor expansion, characteristic function, 303t confidence interval, 187–188, 290–291t -distribution
noncentral, 213, 218simulation, 665, 668student, 175–176, 187
Time reversal, 412Total probability, 5, 6, 117Total variation distance
closed-form formula, 283definition, 282and densities, 282–283normals, 283–284
Total variation metric, 506, 509Transformations
arctanh, 274Box–Mueller, 194Helmert’s, 186–187linear, 200–201multivariate Jacobian formula, 177–178n-dimensional polar, 191nonmonotone, 45polar coordinates, 180–185quantile, 44–45, 229–232, 624–628simple linear, 42–43variance stabilizing, 272–274, 291
![Page 35: Appendix A Symbols, Useful Formulas, and Normal Table978-1-4419-9634-3/1.pdf · A. DasGupta, Probability for Statistics and Machine Learning: Fundamentals, 747 and Advanced Topics,](https://reader035.vdocuments.site/reader035/viewer/2022081517/5eda90445f8d0d7f302a5580/html5/thumbnails/35.jpg)
782 Subject Index
Transience, 354–359, 379–381, 394, 399,418–419
Transition densitydefinition, 428, 662–663drift-diffusion equation, 429Gibbs chain, 672
Transition probabilityMarkov chain, 341matrix, 340nonreversible chain, 684n-step, 346one-step, 344
Triangular arrays, 316Triangular density
definition, 39–40piecewise linear polynomial., 77
UUniform density
basic properties, 54Uniform empirical process, 528, 530, 531,
552, 553Uniform integrability
conditions, 276–277dominated convergence theorem, 275–276
Uniformly ergodic, 653, 669–670Uniform metric, 422, 424, 425Uniform order statistics, 223–225
joint density, 235joint distribution, 230moments, 227
Uniform spacings, 234–235Unimodal
properties, 44Unimodality, order statistics, 39, 246Universal Donsker class, 544Upcrossing, 488–490Upcrossing inequality
discrete time process, 488optional stopping theorem, 490stopping times, 488–489submartingale and decomposition, 489
VVapnik–Chervonenkis (VC)
dimension, 539–541subgraph, 545, 546
Variancedefinition, 17, 46linear function, 68–69mean and, 19
Variance stabilizing transformations (VST)binomial case, 273–274confidence intervals, 272Fisher’s, 274unusual, 274
Void probabilities, 450VST. See Variance stabilizing transformations
WWald confidence interval, 74Wald’s identity
application, 399first and second, 474–477stopping time, 391, 392
Wasserstein metric, 506, 693Weak law of large numbers (WLLN), 20, 256,
257, 269, 270, 305Weakly stationary, 404Weibull density, 56Weierstrass’s theorem, 265–266, 268, 742Weighted empirical process
Chibisov–O’Reilly theorem, 534–535test statistics, 534
Wiener processcontinuous, 576, 577increments, 581standard, 404, 405tied down, 405
Wishart distribution, 207–208Wishart identities, 208–209WLLN. See Weak law of large numbersWright–Fisher Markov chain, 466–467
YYu–Mykland single chain method, 674–675
ZZygmund–Marcinkiewicz SLLN, 693–694