probability theory pdf

Upload: linyujui

Post on 30-May-2018

246 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/14/2019 Probability Theory PDF

    1/125

    MIT OpenCourseWarehttp://ocw.mit.edu

    18.175 Theory of Probability

    Fall 2008

    For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

    http://ocw.mit.edu/http://ocw.mit.edu/termshttp://ocw.mit.edu/termshttp://ocw.mit.edu/
  • 8/14/2019 Probability Theory PDF

    2/125

    1

    2

    3

    4

    5

    6

    7

    8

    9

    10

    11

    12

    13

    14

    15

    16

    17

    18

    19

    20

    21

    ContentsProbability Spaces, Properties of Probability. 1Random variables and their properties. Expectation. 4Kolmogorovs Theorem about consistent distributions. 10Laws of Large Numbers. 12Bernstein Polynomials. Hausdorff and de Finetti theorems. 160 - 1 Laws. Convergence of random series. 21Stopping times, Walds identity. Another proof of SLLN. 26Convergence of Laws. Selection Theorem. 29Characteristic Functions. Central Limit Theorem on R. 34Multivariate normal distributions and CLT. 38Lindebergs CLT. Levys Equivalence Theorem. Three Series Theorem. 42Levys Continuity Theorem. Poisson Approximation. Conditional Expectation. 46Martingales. Doobs Decomposition. Uniform Integrability. 51Optional stopping. Inequalities for martingales. 55Convergence of martingales. Fundamental Walds identity. 59Convergence on metric spaces. Portmanteau Theorem. Lipschitz Functions. 65Metrics for convergence of laws. Empirical measures. 70Convergence and uniform tightness. 74Strassens Theorem. Relationships between metrics. 76Kantorovich-Rubinstein Theorem. 82Prekopa-Leindler inequality, entropy and concentration. 88

    1

  • 8/14/2019 Probability Theory PDF

    3/125

    22Stochastic Processes. Brownian Motion. 9623Donsker Invariance Principle. 10024Empirical process and Kolmogorovs chaining. 10325Markov property of Brownian motion. Reflection principles. 10926Laws of Brownian motion at stopping times. Skorohods imbedding. 114

    2

  • 8/14/2019 Probability Theory PDF

    4/125

    List of Figures2.1 A random variable defined by quantile transformation. . . . . . . . . . . . . . . . . . . . . . . 52.2 (X)generatedbyX. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.3 Pairwise independent but not independent r.v.s. . . . . . . . . . . . . . . . . . . . . . . . . . . 7

    5.1

    Polya urn model.

    . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

    19

    7.1 A sequence of stopping times. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288.1 Approximating indicator. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2914.1 Stopping times of level crossings. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5725.1 Reflecting the Brownian motion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

    3

  • 8/14/2019 Probability Theory PDF

    5/125

    List of Tables

    4

  • 8/14/2019 Probability Theory PDF

    6/125

    Section 1Probability Spaces, Properties ofProbability.Apair(,A) isameasurable space ifA isa-algebraofsubsetsof.AcollectionAofsubsetsof isanalgebra(ring) if:

    1. A.2. C, BA=CB, CBA.3. BA=\BA. 4. Aisa-algebra, ifinaddition,Ci A,i1 = Ci A.

    i1

    (,A,P)isaprobabilityspace ifP isaprobabilitymeasureonA,i.e.1. P()=1.2. P(A)0, A A.

    3. Piscountablyadditive:Ai A,i1, AiAj = i= j=P Ai = P(Ai).

    i=1 i=1AnequivalentformulationofProperty3is:

    3.P isafinitelyadditivemeasureandBn

    Bn+1, Bn =B=

    P(B)=limP(Bn).

    nn1

    Lemma 1 Properties 3and3 areequivalent.Proof.

    1

  • 8/14/2019 Probability Theory PDF

    7/125

    3 =3 : LetCn =Bn\Bn+1,thenBn =B knCk - alldisjoint.

    By3,P(Bn) =P(B) + P(Ck)

    P(B)whenn

    .kn

    3 =3 : Ai =A1A2 An Ai .i1 in

    P Ai =P(A1) + +P(An) +P Bn whereBn = Ai. i1 in

    SinceBn Bn+1 wehaveP(Bn)P n1Bn =P()=0becauseAisaredisjoint.Given algebra A, letA=(A) be a -algebra generated by A, i.e. intersection of all -algebras that

    contain A. It is easy to see that intersection of all such -algebras is itself a -algebra. Indeed, consider asequenceAi fori1suchthateachAi belongstoall-algebrasthatcontainsA.Then Ai belongstoallthese-algebrasandthereforetotheir intersection. i1

    Letusrecallanimportantresultfrommeasuretheory.Theorem 1 (Caratheodoryextension)IfAisanalgebraofsetsand:A R isanon-negativecountablyadditivefunction on A, then can be extended to a measure on -algebra(A). If is -finite, then thisextension isunique.(-finitemeans that = Aifor disjointsequenceAi and (Ai)

  • 8/14/2019 Probability Theory PDF

    8/125

    ConsiderD1, . . . , Dn D.IfasequenceCij Aforj1approximatesDi,P(CijDi)0, j

    then by properties 1 - 3, Cn := Cij approximates Dn := Di, which means that Dn D . LetD=i1Di.Then j in in

    P(D) =P(Dn) +P(D\Dn)andobviouslyP(D\Dn)0asn .Therefore,D D andD isa-algebra.

    3

  • 8/14/2019 Probability Theory PDF

    9/125

    Section 2Random variables and theirproperties. Expectation.Let (,A,P)be aprobabilityspace and(S,B) be a measurable spacewhereB isa-algebraof subsets ofS. ArandomvariableX : S isameasurablefunction,i.e.

    B B=X1(B) A. WhenS =Rwewillusuallyconsidera-algebraB ofBorelmeasurablesetsgeneratedbysets (ai, bi](or,equivalently,generatedbysets(ai, bi)orbyopensets). inLemma 3 X : R is arandom variable ifffor alltR

    {Xt}:={ :X()(, t]} A.Proof.Only directionrequiresproof.Wewillprovethat

    D={DR:X1(D)A}isa-algebra.Sincesets(, t] D thiswill implythatB D.Theresultfollowssimplybecausetakingpre-imagepreservessetoperations.Forexample, ifweconsiderasequenceDi D fori1then

    X1 Di = X1(Di) Ai1 i1

    because X1(Di) A and A is a -algebra. Therefore, i1Di D. Other properties can be checkedsimilarly,soD isa-algebra.

    LetusdefineameasurePX onB byPX =PX1, i.e.forB B,PX(B) =P(XB) =P(X1(B))=PX1(B).

    (S,B,PX)iscalledthesamplespaceofarandomvariableX andPX iscalledthe lawofX.Clearly,onthisspacearandomvariable:S S definedbytheidentity(s) =shasthesamelawasX.

    WhenS=R,afunctionF(t) =P(Xt)iscalledthecumulativedistributionfunction(c.d.f.)ofX.Lemma 4 F isac.d.f.of somer.v.X iff

    1. 0F(t)1,2. F isnon-decreasing,right-continuous,

    4

  • 8/14/2019 Probability Theory PDF

    10/125

    3. limtF(t) = 0, limt +F(t) = 1.Proof. The fact that any c.d.f. satisfies properties 1 - 3 is obvious. Let us show that F which satisfies

    properties 1 - 3 is a c.d.f. of some r.v. X. Consider algebra A consisting of sets (ai, bi] for disjointintervalsandforalln1.LetusdefineafunctionPonAby in P (ai, bi] = F(ai)F(bi) .

    in inOne can show that P is countably additive on A. Then, by Caratheodory extension Theorem 1, P extendsuniquely to a measure P on (A) =B - Borel measurable sets. This means that (R,B,P) is a probabilityspaceand,clearly,randomvariable X :R R definedbyX(x) =xhasc.d.f.P(Xt) =F(t). Belowwewillsometimesabusethenotationsand letF denotebothc.d.f.andprobabilitymeasureP.Alternativeproof.Consideraprobabilityspace([0,1],B, ),whereistheLebesguemeasure.Definer.v.X :[0,1] Rbythequantiletransformation

    X(t)=inf{

    x

    R, F(x)

    t}

    .Thec.d.f.ofX is(t:X(t)a) =F(a)since

    X(t)ainf{x:F(x)t} aan a, F(an)tF(a)t.

    0 1

    Figure2.1:Arandomvariabledefinedbyquantiletransformation.

    Definition.Givenaprobabilityspace(,A,P)andar.v.X : S let(X)bea-algebrageneratedbyacollectionofsets{X1(B) :BB}.Clearly,(X) A.Moreover,theabovecollectionofsetsisitselfa-algebra.Indeed,considerasequenceAi =X1(Bi)forsomeBi B.Then

    Ai = X1(Bi) =X1 Bi =X1(B)i1 i1 i1

    whereB i1Bi B. (X) iscalledthe-algebragenerated byar.v.X.

    0

    X

    1

    11/2

    Figure2.2:(X)generatedbyX.Example.Considerar.v.definedinfigure2.2.WehaveP(X =0)= 12,P(X=1)= 12 and 1 1

    (X) = , 0,2 , 2,1 ,[0,1] .

    5

  • 8/14/2019 Probability Theory PDF

    11/125

    Lemma 5 Consideraprobabilityspace(,A,P),ameasurablespace(S,B)andrandomvariablesX : SandY : R.Then thefollowing areequivalent:

    1. Y =g(X)for some(Borel) measurablefunction g:S

    R.2. Y : R ismeasurable on (, (X)), i.e.withrespect to the -algebragenerated byX.

    Remark.ItshouldbeobviousfromtheproofthatRcanbereplacedbyanyseparablemetricspace.Proof. The fact that 1 implies 2 is obvious since for any Borel set BR the set B :=g1(B) B

    and,therefore,{Y =g(X)B}={Xg1(B) =B}=X1(B)(X).

    Letusshowthat2implies1.Forall integernandk considersets k k+ 1 k k+ 1An,k = :Y()

    2n, 2n =Y1 2n, 2n .By 2, An,k

    (X) =

    {X1(B) : B

    B}

    and, therefore, An,k = X1(Bn,k) for some Bn,k B

    . Let usconsiderafunction k

    gn(X) =2n I(XBn,k).

    kZByconstruction, |Y gn(X)| 21n since k k+ 1 k

    Y()2n, 2n X()Bn,k gn(X()) = 2n.

    It is easy to see that gn(x) gn+1(x) and, therefore, g(x) = limngn(x) is a measurable function on(S,B)and,clearly,Y =g(X).

    Discrete random variables.Ar.v.X : S iscalleddiscrete ifPX({Si}i1)=1forsomesequenceSi S.Absolutely continuous random variables.Onameasurespace(S,B),ameasureP iscalledabsolutely continuousw.r.t.ameasure if

    B B, (B) = 0 =P(B) = 0.Thefollowingisawellknownresultfrommeasuretheory.Theorem 2 (Radon-Nikodym)IfPandaresigma-finiteandPisabsolutelycontinuousw.r.t.thenthereexistsaRadon-Nikodymderivativef0such thatforallB B

    P(B)= f(s)d(s).B

    f isuniquelydefined up toa-nullsets.InatypicalsettingofS=Rk,aprobabilitymeasurePandLebesguesmeasure,f iscalledthedensityofthedistributionP.

    Independence.Consideraprobabilityspace(,C,P)andtwo-algebrasA,B C.AandB arecalledindependentif

    P(AB) =P(A)P(B) forall A A, B B.

    6

  • 8/14/2019 Probability Theory PDF

    12/125

    -algebrasAi C forinareindependent ifP(A1 An) = P(Ai) forall Ai Ai.

    i

    n-algebrasAi C forinarepairwise independent if

    P(AiAj) =P(Ai)P(Aj) forall Ai Ai, Aj Aj, i= j.RandomvariablesXi : S forinare(pairwise) independent if-algebras(Xi), inare(pairwise)independentwhichisjustanotherconvenientwaytostatethefamiliar

    P(X1 B1, . . . , X n Bn) =P(X1 B1). . .P(Xn Bn)foranyeventsB1, . . . , Bn B.

    Example.Consideraregulartetrahedrondie,Figure2.3,withred,greenandbluesidesandared-greenbluebase.Ifweroll this die then indicatorsofdifferentcolors provide an exampleofpairwise independentr.v.sthatarenotindependentsince

    1 1P(r) =P(b) =P(g)= andP(rb) =P(rg) =P(bg) =

    2 4but 31 1

    P(rbg) =4 = P(r)P(b)P(g) = 2 .

    b

    br

    r

    g

    g

    Figure2.3:Pairwiseindependentbutnotindependentr.v.s.

    Independenceof-algebrascanbecheckedongeneratingalgebras:Lemma 6 If algebrasAi, inare independent then-algebras(Ai)are independent.Proof.ObviousbyApproximationLemma2.Lemma 7 Considerr.v.sXi : Rona probabilityspace(,A,P).

    1. Xisare independent iffP(X1 t1, . . . , X n tn) =P(X1 t1). . .P(Xn tn). (2.0.1)

    2. If the lawsofXis havedensitiesfi(x) thenXis are independent iffajointdensityexistsandf(x1,...,xn) = fi(xi).

    7

  • 8/14/2019 Probability Theory PDF

    13/125

    Proof.1isobviousbyLemma6because(2.0.1)impliesthesameequalityforintervalsP(X1 (a1, b1], . . . , X n (an, bn])=P(X1 (a1, b1]). . .P(Xn (an, bn])

    and,therefore,forfiniteunionofdisjointsuch intervals.Tocheckthisfor intervals(forexample,forn=2)wecanwriteP(a1 < X1 b1, a2 < Xn b2)as

    P(X1 b1, X2 b2)P(X1 a1, X2 b2)P(X1 b1, X2 a2) +P(X1 a1, X2 a2)= P(X1 b1)P(X2 b2)P(X1 a1)P(X2 b2)P(X1 b1)P(X2 a2) +P(X1 a1)P(X2 a2)= P(X1 b1)P(X1 a1) P(X2 b2)P(X2 a2) =P(a1 < X1 b1)P(a2 < X2 b2).

    Toprove2westartwith =.P({Xi Ai}) = P(XA1 An) = fi(xi)dx

    A1An

    = fi(xi)dxi{

    byFubinisTheorem}

    = P(Xi

    Ai).Ai in

    Next,weprove= .Firstofall,by independence,P(Xi Ai)FubiniP(XA1 An) = = fi(xi)dx.

    A1An

    Therefore, the same equality holds for sets in algebra A that consists of finite unions of disjoint sets A1 An,i.e.

    P(XB) = fi(xi)dxforBA.B

    BothP(XB), fi(xi)dxarecountablyadditiveonAandfinite,BP(Rn) = fi(xi)dx= 1.Rn

    BytheCaratheodoryextensionTheorem1,theyextenduniquelytoallBorelsetsB=(A),soP(B) = fi(xi)dxforB B.

    B

    Expectation.IfX : R isarandomvariableon(,A,P)thenexpectationofX isdefinedasEX= X()dP().

    Inotherwords,expectationisjustanothertermforthe integralwithrespecttoaprobabilitymeasureand,asaresult,expectationhasalltheusualpropertiesoftheintegrals.Letusemphasizesomeofthem.Lemma 8 1. IfF is the c.d.f.ofX thenforany measurablefunctiong:R R,

    Eg(x) = g(x)dF(x).R

    2. IfX is discrete, i.e.P(X {xi}i1) = 1, thenEX = xiP(X=xi).

    i1

    8

  • 8/14/2019 Probability Theory PDF

    14/125

    3. IfX : Rk has adensityf(x)onRk andg:Rk R then Eg(X) = g(x)f(x)dx.

    Proof.Allthesepropertiesfollowbymakingachangeofvariablesx=X()or=X1(x), i.e.Eg(X) = g(X())dP() = g(x)dP X1(x) = g(x)dPX(x),

    R RwherePX =P X1 isthelawofX.Anotherwaytoseethiswouldbetostartwith indicatorfunctionsofsetsg(x)=I(xB)forwhich

    Eg(X) =P(XB) =PX(B) = I(xB)dPX(x)R

    and,therefore,thesame istrueforsimplestepfunctionsg(x) = wiI(xBi)

    infordisjointBi.Byapproximation,thisistrueforanymeasurablefunctions.

    9

  • 8/14/2019 Probability Theory PDF

    15/125

    Section 3Kolmogorovs Theorem aboutconsistent distributions.The notion of a general probability space (,A,P) and a random variable X : R on this space areratherabstractandoftenoneisreallyinterestedinthelawPX ofXonthesamplespace(R,B,PX).Onecanalwaysdefinearandomvariablewiththis lawbytakingX :R Rtobethe identityX(x) =x.Similarly,onecandefinearandomvectorX= (X1, . . . , X k)onRk bydefiningthedistributionontheBorel-algebraBk first.How canwe define adistribution on an infinitedimensionalspace or, inotherwords, how canwedefineaninfinitefamilyofrandomvariables

    (Xt)tT RT = Rt ={f :T R}tT

    for some infinite setT? Obviously,there are variouswaystodothat, forexample,wecandefineexplicitlyXt =cos(tU)forsomerandomvariableU.Inthissectionwewillconsideratypicalsituationwhenwestartbydefiningthedistributiononanyfinitesubsetofcoordinates, i.e.foranyfinitesubsetNT the lawPNof(Xt)tN ontheBorel-algebraBN onRN isgiven.Clearly,theselawsmustsatisfyanaturalconsistencyassumption:foranyfinitesubsetsNM andanyBorelsetB BN,

    PN(B) =PM(BRMN). (3.0.1)Then the problem is to define asample space simultaneously for the entire family (Xt)tT, i.e. weneed todefine a -algebra A of measurable events in RT and a probability measure P on it that agrees with ourfinite dimensionaldistributionsPN. At thevery least,A should contain events expressed in termsof finitenumberofcoordinates, i.e.thefollowingalgebraofsetsonRT:

    A={BRTN :B BN}.(It is easy to check that A is an algebra.) A set BRTN is called a cylinder and B is the base of thecylinder.

    The

    probability

    P

    on

    such

    sets

    is

    of

    course

    defined

    by

    by

    P(BRTN) =PN(B).

    Noticethat,byconsistencyassumption,Piswelldefined.GiventwofinitesubsetsN1, N2 T andB1 BN1,thesamesetcanberespresentedas

    B1RTN1 = B1R(N1N2)\N1 RT\(N1N2).However, by consistency, Pwill not depend onthe representation. LetA=(A) be a-algebra generatedbyalgebraA,i.e.theminimal-algebrathatcontainsallcylinders.

    Definition.A iscalledthecylindricalalgebraandA isthecylindrical-algebraonRT.Example.IfNT then{supi1Xi 1}isameasurableevent inA.

    10

  • 8/14/2019 Probability Theory PDF

    16/125

    Theorem 3 (Kolmogorov)Forconsistentfamilyofdistributions(3.0.1),Pcanbeuniquelyextended toA.Proof.TousetheCaratheodoryextensionTheorem1,weneedtoshowthatP iscountablyadditiveonAor,equivalently,that itsatisfiescontinuityofmeasureproperty:givenasequenceBn

    A,

    Bn Bn+1, Bn ==P(Bn)0.n1

    Wewillprovethatifthereexists >0suchthatP(Bn)> forallnthen n1Bn = .WehaveBn =CnRTNn, Nn - finitesubsetofT andCn BNn.

    SinceBn Bn+1,wecanassumethatNn Nn+1.Firstofall,byregularityofmeasurePNn thereexistsacompactsetKn Cn suchthat

    PNn(Cn\Kn) 2n+1.

    Wehave,

    CiRTNi \ KiRTNi (Ci\Ki)RTNi

    in in inand,therefore,

    P CiRTNi \ KiRTNi P (Ci\Ki)RTNiin in in

    P(Ci\Ki)RTNi2i

    +1 2

    .

    in inSinceP(Bn) =P inCiRTNi > thisimpliesthat

    P KiRTNi 2 >0.

    in

    Wecan

    write

    KiRTNi = (KiRNnNi)RTNn =KnRTNnin in

    whereKn = in(KiRNnNi) isacompactinRNn,sinceKn isacompactinRNn.WeprovedthatPNn(Kn) =P(KnRTNn) =P KiRTNi >0

    inand,therefore,thereexistsapoint

    n n nx = (x1, . . . , x , . . .)KnRTNn.NnWealsohavethefollowing inclusionproperty.Form>n,

    mx

    Km

    RTNm

    Kn

    RTNnmand,therefore,(x1m,...,x )Kn.Anysequenceonacompacthasaconvergingsubsequence.Let{n1k}k1 beNn

    n n 1k}k1 such11 k)k 2 (x1, . . . , xN1)K1.Thenwecantakeasubsequence{nk}k1 {n

    .Byiteration,wecanfindasubsequence{nk }k1 {nsuchthat(x k, . . . , x1 N1

    21k kthat(xn , . . . , xn )(x1, . . . , xN2)K21 N2 m1km }k1,

    suchthatm mk k(xn ,...,xn )(x1,...,xNm)Km.1 Nm

    Therefore,apoint (x1, x2, . . .) KnRTNn Bn,

    n1 n1sothis lastsetisnotempty.

    11

  • 8/14/2019 Probability Theory PDF

    17/125

    Section 4Laws of Large Numbers.Considerar.v.X andsequenceofr.v.s(Xn)n

    1 onsomeprobabilityspace.WesaythatXn convergestoX

    inprobabilityifforall >0lim P(|XnX| ) = 0.

    nWesaythatXn convergestoX almostsurelyorwithprobability1 if

    P(: lim Xn() =X())=1.n

    Lemma 9 (Chebyshevs inequality)Ifa r.v. X0 thenfor t >0,EX

    P(Xt) .t

    Proof.EX=EXI(X < t) +EXI(X

    t)

    EXI(X

    t)

    tEI(X

    t) =tP(X

    t).

    Theorem 4 (Weak lawof largenumbers)Considerasequenceofr.v.s(Xi)i1 thatarecentered,EXi = 0,havefinite secondmoments,EXi2 K

  • 8/14/2019 Probability Theory PDF

    18/125

    ThenXk 0inprobability,sincefor0<

  • 8/14/2019 Probability Theory PDF

    19/125

    Strong law of large numbers.Thefollowingsimpleobservationwillbeuseful.IfarandomvariableX0thenEX= P(Xx)dx.Indeed,0

    x EX= xdF(x) = 1dsdF(x) = 1dF(x)ds= P(Xs)ds.0 0 0 0 s 0

    ForX0suchthatEX

  • 8/14/2019 Probability Theory PDF

    20/125

    and ifk0 =min{k:k i}then

    1

    4 4 K

    2k 1n(k)2 = 2k0(1

    2) i2.n(k)i ki

    Wecancontinue, 1 m+1 1 m+1

    () =i2 x2dF(x) = i2 x2dF(x)

    i1 mm m 1 m+1 m+1m+ 1 x2dF(x) xdF(x) =EX

  • 8/14/2019 Probability Theory PDF

    21/125

    Section 5Bernstein Polynomials. Hausdorff andde Finetti theorems.Let us look at some applications related to the law of large numbers. Consider an i.i.d. sequence of realvaluedr.v.(Xi)withdistributionP fromafamilyofdistributionsparametrizedbyRsuchthat

    EXi =, 2():=Var(Xi)K 0,

    |Eu(Xn)u()| E|u(Xn)u()|

    = E

    |u(Xn)

    u()

    | I(

    |Xn

    | )+I(

    |Xn

    |> )

    max + 2 max u(x) > )|x||u(x)u()| x | |P(|Xn|() + 2u 1E(Xn)2 () +2uK,

    2 n2where() isthemodulusofcontinuityofu.Letting=n 0sothatn2 finishestheproof.n

    Example.Let(Xi)bei.i.d.withBernoullidistributionB()withprobabilityofsuccess[0,1],i.e.P(Xi =1)=, P(Xi =0)=1,

    and letu:[0,1] Rbecontinuous.Then,bytheaboveTheorem,thefollowingBernsteinpolynomialsn k n k

    nn Bn():=Eu(Xn) = u P Xi =k = u k(1)nk u()n n k k=0 i=1 k=0

    uniformlyon [0,1].Example.Let(Xi)havePoissondistribution()withintensityparameter >0definedby

    kP(Xi =k) =

    k!e forintegerk0.Thenitiswellknown(andeasytocheck)thatEXi =, 2() =andthesumX1+. . .+Xn hasPoissondistribution(n).Ifuisboundedandcontinuouson [0,+)then

    Eu(Xn) =

    ukP n Xi =k= uk(n)ken u()n n k!

    k=0 i=1 k=0

    16

  • 8/14/2019 Probability Theory PDF

    22/125

    uniformlyoncompactsets.Moment problem.ConsiderarandomvariableX[0,1]and letk =EXk be itsmoments.Given

    asequence(c0, c1, c2, . . .) letusdefineasequenceofincrementsbyck =ck+1ck.Thenk =kk+1 =E(XkXk+1) =EXk(1X),

    ()(k) = (1)22k =EXk(1X)EXk+1(1X) =EXk(1X)2andby induction

    (1)rrk =EXk(1X)r.Clearly, (1)rrk 0 since X[0,1].If u isacontinuous functionon [0,1] andBn is its correspondingBernsteinpolynomialthen

    n n

    k n

    k nEBn(X) = u EXk(1X)nk = u (1)nknkk.

    n k n kk=0 k=0

    SinceBn(X)convergesuniformlytou(X),EBn(X)convergestoEu(X).Letusdefine np

    (n)= n (1)nknkk 0, p(n) =1(takeu=1).k kk

    k=0Wecanthinkofpk(n) asthedistributionofar.v.X(n) suchthat

    P X(n) = k =p(kn). (5.0.1)nWeshowedthat

    EBn(X) =Eu X(n) Eu(X)for anycontinuous functionu.We will later see that by definitionthismeansthatX(n) convergesto X indistribution.Giventhemomentsofar.v.X,thisconstructionallowsustoapproximatethedistributionofX andexpectationofu(X).

    Next,givenasequence(k),when is itthesequenceofmomentsofsome [0,1]valuedr.v.X?Bytheabove, itisnecessarythat

    k 0, 0 = 1and(1)rrk 0forallk,r. (5.0.2)Itturnsoutthatthis isalsosufficient.Theorem 7 (Hausdorff)Thereexistsar.v.X[0,1] such that k =EXk iff (5.0.2) holds.Proof. The idea of the proof is as follows. If k are the moments of the distribution of some r.v. X, thenthediscretedistributionsdefined in(5.0.1)shouldapproximate it.Therefore,ourgoalwillbetoshowthatcondition (5.0.2) ensures that (pk(n)) is indeed a distribution and then show that the moments of (5.0.1)convergetok.Asaresult,any limitofthesedistributionswillbeacandidateforthedistributionofX.

    Firstofall,letusexpressk intermsof(pk(n)).Sincek =k+1k wehavethefollowinginversionformula:

    k = k+1k = (k+2k+1) + (k+1 + 2k)r

    = k+22k+1 + 2k = r (1)rjrjk+j,j

    j=0

    17

  • 8/14/2019 Probability Theory PDF

    23/125

    = =

    byinduction.Taker=nk.Thennk

    nk

    nk

    nk

    j n j (n)k = k+j (1)

    n(k+j)n(k+j)k+j = pk+j.n nj=0 k+j j=0 k+jWehave nk k+j

    j (nk)! (k+j)!(nkj)! kn nj!(nkj)! n!

    k+j ksothat nk k+j n m

    k = nk p(kn+)j = nkp(mn).j=0 k m=k k

    By(5.0.2),pm(n) 0and mnpm(n) =0 =1sowecanconsiderar.v.X(n) suchthatP X(n) = m =p(n) for0

    m

    n.m

    nWehave

    n n n m m(m1) (mk+1) m(m 1) (m k+1)k (n) (n) n n n n n (n)k = npm = n(n1) (nk+1) pm = 1(11) (1k+1) pmm=k k m=k m=k n n

    n

    n (k) X(n) k.n mk

    pm =E k n

    m=0Any continuous function u can be approximated by (for example, Bernstein) polynomials so the limitlimnEu X(n) exists. By selection theorem that we will prove later in the course, one can choose asubsequenceX(ni) thatconvergestosomer.v.X indistributionand,asaresult,

    kE X(ni) EXk =k,whichmeansthatk arethemomentsofX.

    de Finettis theorem. Consider an exchangeable sequence X1, X2, . . . , X n, . . . of Bernoulli randomvariableswhichmeansthatforanyn1theprobability

    P(X1 =x1,...,Xn =xn)dependsonlyonx1 +. . .+xn, i.e. itdoesnotdependontheorderof1sor0s.Anotherwaytosaythis isthatforanyn1andanypermutationof1, . . . , nthedistributionof(X(1), . . . , X (n))doesnotdependon.Thenthefollowingholds.Theorem 8 (deFinetti)Thereexists adistributionF on [0,1]such that 1

    pk :=P(X1 +. . .+Xn =k) = n xk(1x)nkdF(x).k0

    Thismeansthattogeneratesuchexchangeablesequencewecanfirstpickx[0,1]fromdistributionF andthengenerateasequenceofi.i.dBernoullirandomvariableswithprobabilityofsuccessx.Proof.Let0 =1andfork1define

    k =P(X1 = 1,...,Xk =1). (5.0.3)

    18

  • 8/14/2019 Probability Theory PDF

    24/125

    WehaveP(X1 = 1,...,Xk = 1, Xk+1 = 0) = P(X1 = 1,...,Xk =1)

    P(X

    1 = 1,...,X

    k = 1, X

    k+1 =1)

    = kk+1 =k.Next,usingexchangeability

    P(X1 = 1,...,Xk = 1, Xk+1 = 0, Xk+2 = 0) = P(X1 = 1,...,Xk = 1, Xk+1 =0) P(X1 = 1,...,Xk = 1, Xk+1 = 0, Xk+2 =1)= k(k+1) = 2k.

    Similarly,by induction,P(X1 = 1,...,Xk = 1, Xk+1 = 0,...,Xn =0)=(1)nknkk 0.

    BytheHausdorfftheorem,k =EXk forsomer.v.X[0,1]and,therefore,P(X1 = 1,...,Xk = 1, Xk+1 = 0,...,Xn = 0) = (1)nknkk 1

    = EXk(1X)nk = xk(1x)nkdF(x).0

    Since,byexchangeability,changingtheorderof1sand0sdoesnotaffecttheprobability,weget 1 P(X1 +. . .+Xn =k) = n xk(1x)nkdF(x).

    k0

    Example. (Polya urn model). Suppose we have b blue and r red balls in the urn. We pick a ball

    + c of the same color

    b r

    Pick

    Figure5.1:Polyaurnmodel.randomlyandreturnitwithcballsofthesamecolor.Considerr.v.s

    1 iftheithballpickedisblueXi = 0 otherwise.

    Xisare

    not

    independent

    but

    exchangeable.

    For

    example,

    b b+c r b r b+r

    P(bbr) = , P(brb) =b+rb+r+cb+r+ 2c b+rb+r+cb+r+ 2c

    areequal.To identifythedistributionF indeFinettistheorem,letus lookatitsmomentsk in(5.0.3),k =Pb. . .b = b b+c b+ (k1)c . b+rb+r+c b+r+ (k1)c

    k timesOnecanrecognizeoreasilycheckthatk arethemomentsofBeta(, )distributionwiththedensity

    (+)()()x1(1x)1

    19

  • 8/14/2019 Probability Theory PDF

    25/125

    on [0,1] with parameters =b/c, =r/c. By de Finettis theorem, we can generate Xis by first pickingx fromdistributionBeta b/c, r/c andthengenerating i.i.d.Bernoulli(Xi)swithprobabilityofsuccessx.By strong law of large numbers, the proportion of blue balls in the first n repetitions will convergeto thisprobability

    of

    success

    x,

    i.e.

    in

    the

    limit

    it

    will

    be

    random

    with

    Beta

    distribution.

    This

    example

    will

    come

    uponcemorewhenwetalkaboutconvergenceofmartingales.

    20

  • 8/14/2019 Probability Theory PDF

    26/125

    Section 60 - 1 Laws. Convergence of randomseries.Considerasequence(Xi)i1 ofrealvaluedindependentrandomvariablesandlet (Xi)i1 bea-algebraofeventsgeneratedbythissequence,i.e.{(Xi)i1 B}forB inthecylindrical-algebraonRN.

    Definition.AneventA (Xi)i1 iscalledataileventifA (Xi)in foralln1.Forexample,ifAi (Xi)then

    Ai i.o.= Ain1in

    isatailevent.Itturnsoutthatsucheventshaveprobability0or1.Theorem 9 (Kolmogorovs0-1 law) IfA isa tailevent then P(A) = 0 or1.Proof.

    For

    a

    finite

    subset

    F

    ={i1, . . . , in}

    N,

    let

    us

    denote

    by

    XF = (Xi1, . . . , X in). A -algebra (Xi)i1 isgeneratedbyalgebra

    {XF B:F- finiteN, B B(R|F|)}.By approximation lemma, we can approximate any event A (Xi)i1 by events in this generatingalgebra. Therefore, for any > 0 there exists a set A in this algebra such that P(AA) and bydefinitionA (X1,...,Xn)forlargeenoughn.Thisimplies

    |P(A)P(A)| , |P(A)P(AA)| .SinceAisatailevent,A((Xi)in+1)whichmeansthatA, A areindependent,i.e.P(AA) =P(A)P(A).Weget

    P(A)

    P(AA) =P(A)P(A)

    P(A)P(A)and letting 0provesthatP(A) =P(A)2.

    Examples.1. i1Xi converges isatailevent,ithasprobability0or1.2.Considerseries i1Xizi onacomplexplane,zC.Itsradiusofconvergence is

    1r=liminf Xi i.

    i | |

    Foranyx0,event{rx}is,obviously,atailevent.Thisimpliesthatr=constwithprobability1.

    21

  • 8/14/2019 Probability Theory PDF

    27/125

    The Savage-Hewitt 0 - 1 law.Next we will prove a stronger result under more restrictive assumption that the r.v.s Xi, i 1 are

    not only independent but also identically distributed with the law . Without loss of generality, we canassume that each Xi is given by the identity Xi(x) = x on its sample space (R,B, ). By Kolmogorovsconsistencytheoremtheentiresequence(Xi)i1 canbedefinedonthesamplespace(RN,B,P)whereBisthecylindrical-algebraandPisthemeasureguaranteedbytheCaratheodoryextensiontheorem.InourcaseXisarei.i.d.andP= iscalledtheinfiniteproductmeasure.Itwillbeconvenienttousethenotation((Xi)i1) for the cylindrical -algebra since similar notation can be used for the cylindrical -algebra onanysubsetofcoordinates.

    Definition.AneventA((Xi)i1)- iscalledexchangeable/symmetric ifforalln1,(x1, x2, . . . , xn, xn+1, . . .)A=(xn, x2,...,xn1, x1, xn+1,...)A.

    Inotherwords,thesetAissymmetricunderpermutationsofafinitenumberofcoordinates.Notethatanytaileventissymmetric.Theorem 10 (Savage-Hewitt0-1 law)IfA is symmetric thenP(A) = 0or1.Proof.Givenasequencex= (x1, x2, . . .)letusdefineanoperator

    x= (xn+1, . . . , x2n, x1, . . . , xn, x2n+1, . . .)thatswitchesthefirstncoordinateswiththesecondncoordinates.SinceAissymmetric,

    A={x:xA}=A.BytheApproximationLemma2forany >0forlargeenoughn,thereexistsAn (X1,...,Xn)suchthatP(AnA).Clearly,

    Bn = An (Xn+1,...,X2n)and by i.i.d.

    P(BnA) =P(AnA) = P(AnA),whichimpliesthatP (AnBn)A 2.Therefore,wecanconcludethat

    P(A)P(An), P(A)P(AnBn) =P(An)P(Bn) =P(An)2whereweusedthefactthattheeventsAn, Bn aredefinedintermsofdifferentsetsofcoordinatesand,thus,are independent.Letting 0 impliesthatP(A) =P(A)2.

    Example.LetSn =X1 +. . .+Xn andletr=limsupSnan.

    n bnEvent{rx} issymmetricsincechanging theorderofany finiteset ofcoordinatesdoesnotaffectSn forlargeenoughn.Asaresult,P(rx) = 0or1,which impliesthatr=constwithprobability1.

    Random series. We already saw above that, by Kolmogorovs 0-1 law, the series i1Xi for independent(Xi)i1 convergeswithprobability0or1.ThismeansthateitherSn =X1+. . .+Xn convergestoits limitS withprobabilityone,orwithprobabilityone itdoesnotconverge.Twosectionback,beforetheproofofthestronglawoflargenumbers,wesawtheexampleofasequencewhichwithprobabilityonedoesnot converge yet converges to 0 in probability. In case when with probability one Sn does not converge, isitstillpossiblethat itconvergestosomerandomvariable inprobability?Theanswer isnobecausewewillnowprovethatforrandomseriesconvergence inprobabilityimpliesa.s.convergence.

    22

  • 8/14/2019 Probability Theory PDF

    28/125

    | |

    Theorem 11 (Kolmogorovs inequality) Suppose that (Xi)i1 are independent andSn =X1 +. . .+Xn. Iffor alljn,

    P(|SnSj| a)p a, 1

    P max |Sj| x P(|Sn|> xa).1jn 1p

    Proof. First of all, let us notice that this inequality is obvious without the maximum because (6.0.1) isequivalentto1pP(|SnSj|< a)andwecanwrite

    (1p)P |Sj| x P |SnSj|< a P |Sj| x= P |SnSj| xa).

    Theequalityistruebecauseevents{|Sj| x}and{|SnSj|< a}areindependentsincethefirstdependsonlyonX1,...,Xj andthesecondonlyonXj+1,...,Xn.The last inequality istruesimplybytriangle inequality.Todealwiththemaximum,insteadoflookingatanarbitrarypartialsumSj wewilllookatthefirstpartialsum that crosses level x. We define that first time by = min{j n : |Sj| x} and let = n+1 if allSj x

    a,

    n)

    P(

    |Sn

    |> x

    a)

    andnoticethatnisequivalenttomaxjn|Sj| x.

    Theorem 12 (Kolmogorov)If theseries i1Xi converges inprobability then itconverges almostsurely.Proof. Suppose that partial sums Sn converge to some r.v. S in probability, i.e. for any > 0, for largeenoughnn0()wehaveP(|SnS| ).Ifkjnn0()then

    P(|SkSj| 2)P(|SkS| ) +P(|Sj S| )2.Next,weuseKolmogorovsinequalityforx= 4anda= 2(weletpartialsumsstartatn):

    1 2P max 123,njk|Sj Sn| 4 12P(|SkSn| 2)forsmall.Theevents{maxnjk|Sj Sn| 4}areincreasingask andbycontinuityofmeasure

    P max 3.nj |Sj Sn| 4

    Finally,sinceP( SnS )wegetP max 4.

    nj |Sj S| 5

    23

  • 8/14/2019 Probability Theory PDF

    29/125

    This kind ofmaximalstatementaboutany sequenceSj is actuallyequivalentto itsa.s.convergence.Toseethistake= m12,taken(m) =n0()andconsideranevent

    5= max .Am n(m)j|Sj S| m2

    Weprovedthat 4P(Am)

  • 8/14/2019 Probability Theory PDF

    30/125

    Example.Considerrandomseries i1 i whereP(i =1)= 21.Wehavei

    i

    2

    1 1E

    i =

    i2

    2,

    i1 i1sotheseriesconvergesa.s.forsuch.

    25

  • 8/14/2019 Probability Theory PDF

    31/125

    Section 7Stopping times, Walds identity.Another proof of SLLN.Consider a sequence (Xi)i1 of independent r.v.s and an integer valued random variable V {1,2, . . .}.WesaythatV is independent of thefuture if{V n} is independentof((Xi)in+1).WesaythatV isastopping time(Markovtime) if{V n} (X1, . . . , X n)foralln.Clearly,astoppingtime is independentofthefuture.AnexampleofstoppingtimeisV =min{k1, Sk 1}.

    SupposethatV is independentofthefuture.WecanwriteESV = ESVI(V =k) = ESkI(V =k)

    k1 k1= EXnI(V =k)(=) EXnI(V =k) = EXnI(V n).

    k1nk n1kn n1In(*)wecaninterchangetheorderofsummationif,forexample,thedoublesequenceisabsolutelysummable,by Fubini-Tonelli theorem. Since V is independent of the future, the event {V n} = {V n1}c isindependentof(Xn)andweget

    ESV = EXnP(V n). (7.0.1)n1

    This impliesthefollowing.Theorem 14 (Walds identity.) If (Xi)i1 are i.i.d., E|X1|

  • 8/14/2019 Probability Theory PDF

    32/125

    where dist meansequality in distribution.=Proof.GivenasubsetNNandsequences(Bi)and(Ci)ofBorelsetsonR,defineevents

    A= V N, X1 B1,...,XV BVandforanyk1,

    D= XV+1 C1,...,XV+k Ck .Wehave,

    P(DA) = P(DA{V =n}) = P(DnA{V =n})n1 n1

    whereDn ={Xn+1 C1, . . . , X n+k Ck}.

    Theintersectionofevents, nNA{V =n}= {V =n, X1 B1, . . . , X n Bn}, otherwise.

    SinceV isastoppingtime,{V =n} (X1, . . . , X n)andA{V =n} (X1, . . . , X n).Ontheotherhand,Dn (Xn+1, . . .)and,asaresult,

    P(DA) = P(Dn)P(A{V =n}) = P(D0)P(A{V =n}) =P(D0)P(A),n1 n1

    andthisfinishestheproof.Remark.Onecouldbea littlebitmorecarefulwhentalkingabouttheeventsgeneratedbyavector

    (V, X1, . . . , X V)thathasrandomlength.Intheproofweimplicitlyassumedthatsucheventsaregeneratedbyevents

    A= V N, X1 B1,...,XV BVwhich is a rather intuitive definition. However, one could be more formal and define a -algebra of eventsgeneratedby(V, X1, . . . , X V)aseventsAsuchthatA{V n} (X1, . . . , X n)foranyn1.Thismeansthat when V n the event A is expressed only in terms of X1, . . . , X n. It is easy to check that with thismoreformaldefinitiontheproofremainsexactlythesame.

    LetusgiveoneinterestingapplicationofMarkovpropertyandWalds identitythatwillyieldanotherproofofstrong lawof largenumbers.Theorem 16 Suppose that (Xi)i1 are i.i.d. such that EX1 >0. If Z = infn1Sn then P(Z >) = 1.(Partial

    sums

    can

    not

    drift

    down

    to

    if

    EX1

    >

    0.

    Of

    course,

    this

    is

    obvious

    by

    SLLN.)

    Proof.Letusdefine(seefigure7.1),

    1 =min{k1, Sk 1}, Z1 =minSk, S(2) =S1+kS1,kk1k(2) 1 Z2 k(2) k(3) (2) 22 =min k1, S , =minS , S =S2+kS(2).k2

    Byinduction, =min k1, S(n) , = minS(n), S(n+1) =S(n+) kS(n).n k 1 Zn kn k k n n

    Z1,...,Zn arei.i.d.byMarkovproperty.27

  • 8/14/2019 Probability Theory PDF

    33/125

    0

    1

    1

    !1

    !2z2

    0z1

    Figure7.1:Asequenceofstoppingtimes.Noticethat,byconstruction,S1+ +n1 n1and

    Z= inf Sk =inf{Z1, S1 +Z2, S1+2 +Z3,...}.k1

    Wehave, {Z N}= {S1+...+k1 +Zk N} {k1 +Zk N}.k1 k1

    Therefore,P(Z N) P(k1 +Zk N) = P(Zk Nk+1)

    k1 k1= P(Z1 Nk+ 1)= P( Z1 j)NP(Z1 j) | | 0

    k1 jN jNifwecanshowthatE|Z1|with probability one. This means that for all n 1, Sn +n M > for some large enough M.Dividingbothsidesbynand lettingn weget

    Snliminf

    n nwith probability one. We can then let 0 over some sequence. Similarly, we prove that limsupSkk 0withprobabilityone.

    28

  • 8/14/2019 Probability Theory PDF

    34/125

    Section 8Convergence of Laws. SelectionTheorem.Inthissectionwewillbeginthediscussionofweakconvergenceofdistributionsonmetricspaces.Let(S, d)beametricspacewithametricd.Considerameasurablespace(S,B)withBorel-algebraBgeneratedbyopensetsand let(Pn)n1 andPbeprobabilitydistributionsonB.Wedefine

    Cb(S) ={f :S R- continuousandbounded}.WesaythatPn Pweakly if

    f dPn f dP forall fCb(S).Theorem 18 IfS=R then Pn P iff

    Fn(t) =Pn , t F(t) =P , tfor anypointofcontinuity tof F(t).Proof.= Letusapproximatean indicatorfunctionbyacontinuousfunctionsas infigure8.1,i.e.

    1(X)I(Xt)2(X), 1, 2 Cb(R).Forconvenienceof notations, instead ofwriting integrals w.r.t. Pn we willwriteexpectations ofa r.v.Xn

    t !!t!

    "2

    "1

    xt+

    Figure8.1:Approximating indicator.withdistributionPn.

    P(Xt)E1(X)E1(Xn)Fn(t) =P(Xn t)E2(Xn)E2(X)P(Xt+)asn .Therefore,forany >0,

    F(t)liminfFn(t)limsupFn(t)F(t+).Sincet isapointofcontinuityofF, letting 0provestheresult.

    29

  • 8/14/2019 Probability Theory PDF

    35/125

    =LetP C(F)bethesetofpointsofcontinuityofF.SinceF ismonotone,thesetP C(F)isdenseinR.TakeM largeenoughsuchthatbothM,MP C(F)andP([M, M]c).Clearly,forlargeenoughk wehavePk([M, M]c)2.Foranyn >1,takeasequenceofpoints

    M =x1n x2n xnn =Mn nsuch that all xi P C(F) and maxi|xi+1 xi | 0 as n . Given a function f Cb(R), consider an

    approximatingfunctionfn(x) = f(xi)I(x(xin1, xin]) + 0 I(x /[M, M]).

    1

  • 8/14/2019 Probability Theory PDF

    36/125

    F(x) is a c.d.f. on Rk (exercise). The fact that Pn are uniformly tight ensures that F(x) 0 or 1 if allxi or+.LetxbeapointofcontinuityofF(x)andleta, bAsuchthatai < xi < bi foralli.Wehave,

    F(a)

    Fn(k)(a)

    Fn(k)(x)

    Fn(k)(b)

    F

    (b)

    ask .SincexisapointofcontinuityandA isdense,

    F(a)a F(b)bx F(x), x F(x),and this proves that Fn(k)(x) F(x) for all such x. Similarly to one-dimensional case one can show thatforanyfCb(Rk),

    f dFn(k) fdF.Proof of Theorem 19. If K is a compact then Cb(K) =C(K). Later in these lectures, when we deal inmoredetailwithconvergenceongeneralmetricspaces,wewillprovethefollowingfactwhichiswell-knownandisaconsequenceoftheStone-Weierstrass theorem.

    Fact.C(K)

    is

    separable

    w.r.t.

    norm||f|| =supxK|f(x|.

    Even though we are proving Selection theorem for a general metric space, right now we are mostlyinterested in thecase S =Rk wherethis fact is asimple consequenceof the Weierstrasstheoremthat anycontinuousfunctioncanbeapproximatedbypolynomials.

    SincePn areuniformlytight, foranyr1wecanfindacompactKr suchthatPn(Kr)>1 1.LetrCr C(Kr)beacountableanddensesubsetofC(Kr).ByCantorsdiagonalizationargumentthereexistsasubsequence(n(k)) suchthatPn(k)(f) converges forallf Cr for all r1.Since Cr isdense inC(Kr)thisimpliesthatPn(k)(f)convergesforallfC(Kr)forallr1.Next,foranyfCb(S),

    Pn(k)(Kc) ||f||r r .f dPn(k) f dPn(k) Kc |f|dPn(k) ||f||Kr r

    This impliesthatthe limitI(f):= lim f dPn(k) (8.0.1)

    kexists.ThequestioniswhythislimitisanintegraloversomeprobabilitymeasureP?OneachofthecompactsKr we could use Rieszs representation theorem for continuous functionalson C(Kr) and thenextendthisrepresentationtotheunionofKr.Instead,wewillprovethisasaconsequenceofamoregeneralresult,theStone-Danielltheoremfrommeasuretheory,whichsaysthefollowing.

    Afamilyoffunction={f :S R} iscalledavector latticeiff, g=cf+g,cR and fg, fg.

    AfunctionalI : R iscalledapre-integralif1. I(cf+g) =cI(f) +I(g),2. f0, I(f)0,3. fn 0,||fn||

  • 8/14/2019 Probability Theory PDF

    37/125

    OnanycompactKr,fn 0uniformly,i.e.n,r n||fn||,Kr 0.

    Since 1fndPn(k) = fndPn(k) n,r +rf1||,KrKrc

    weget 1

    I(fn)= lim fndPn(k) n,r +r||f1||.k

    Lettingn andr wegetthatI(fn)0.BytheStone-DanielltheoremI(f) = f dP

    for some measure on (Cb(S)). The choice of f = 1 gives I(f) = 1 = P(S) which means that P is aprobabilitymeasure.Finally,letusshowthat(Cb(S))=B- Borel-algebrageneratedbyopensets.Sinceany f Cb(S) is measurable onB we get (Cb(S)) B. On the other hand, let F S be any closed setandtakeafunctionf(x)=min(1, d(x, F)).Wehave, |f(x)f(y)| d(x, y)sofCb(S)and

    f1({0})(Cb(S)).However,sinceF isclosed,f1({0}) ={x:d(x, F) = 0}=F andthisprovesthatB (Cb(S)).

    Theorem 21 IfPn convergesweakly toP on Rk then(Pn)n1 isuniformly tight.Proof.Forany >0thereexistslargeenoughM >0,suchthatP(|x|> M)2M (x)dPn (x)dPP |x|> M .Forn largeenough,nn0,wegetPn(|x|>2M)2.Forn < n0 chooseMn sothatPn(|x|> Mn)2.TakeM =max{M1, . . . , M n01,2M}.Asaresult,Pn(|x|> M)2foralln1.

    Lemma 13 Iffor any sequence (n(k))k1 there exists a subsequence (n(k(r)))r1 such that Pn(k(r)) Pweakly thenPn Pweakly.Proof. Suppose not. Then for some f Cb(S) and for some >0 there exists a subsequence (n(k)) suchthat f dPn(k) f dP>.ButthiscontradictsthefactthatforsomesubsequencePn(k(r)) Pweakly.Considerr.v.sX andXn onsomeprobabilityspace(,A,P)withvaluesinametricspace(S, d).LetPandPn be their corresponding laws on Borel sets B in S. Convergence of Xn to X in probability and almostsurely isdefinedexactlythesamewayasforS=Rbyreplacing|XnX|withd(Xn, X).

    32

  • 8/14/2019 Probability Theory PDF

    38/125

    Lemma 14 Xn X in probability ifffor any sequence (n(k)) there exists a subsequence (n(k(r))) suchthatXn(k(r)) X

    a.s.

    Proof.

    =.

    Suppose

    Xn doesnotconvergetoX inprobability,Thenforsmallenough >0thereexistsasubsequence(n(k))suchthat

    P d(X, Xn(k)) .ThiscontradictstheexistenceofsubsequenceXn(k(r)) thatconvergestoX a.s.

    = .Givenasubsequence(n(k)) letuschoose(k(r))sothat 1 1P d(Xn(k(r)), X)

    r r2.ByBorel-Cantellilemma,theseeventscanoccur i.o.withprobability0,whichmeansthatwithprobabilityoneforlargeenoughr

    1d(Xn(k(r)), X) ,r

    i.e.Xn(k(r)) X a.s.Lemma 15 Xn X in probability then Xn X weakly. Proof.ByLemma14,foranysubsequence(n(k))thereexistsasubsequence(n(k(r)))suchthatXn(k(r)) X a.s.GivenfCb(R),bydominatedconvergencetheorem,

    Ef(Xn(k(r))) Ef(X),i.e.Xn(k(r)) X weakly.ByLemma13,Xn X weakly.

    33

  • 8/14/2019 Probability Theory PDF

    39/125

    Section 9Characteristic Functions. CentralLimit Theorem on R.LetX = (X1, . . . , X k)bearandomvectoronRk withdistributionPand lett= (t1, . . . , tk)Rk.CharacteristicfunctionofX isdefinedby

    f(t) =Eei(t,X) = ei(t,x)dP(x).IfX hasstandardnormaldistributionN(0,1)andRthen

    2 2 22 2 2 2EeX =1

    2exx dx=e 1

    2e(x)

    2dx=e .

    Forcomplex=it,consideranalyticfunction21

    (x) =eitx2

    ex2 for xC.ByCauchystheorem, integraloveraclosedpath isequalto0.Letustakeaclosedpathx+i0forx fromto+andx+itforxfrom+to.Then

    2f(t) = 1 eitxx dx= 1 eit(it+x)21(it+x)2dx

    2 2 2 2 2 2 2 2 2

    2 2 2 2 2=1

    et +itx+21t itx21x dx=et 1 ex dx=et . (9.0.1)IfY hasnormaldistributionN(m, 2)then

    EeitY =Eeit(m+X ) =eitmt222 .rLemma 16 IfX is areal-valuedr.v.such thatE|X|

  • 8/14/2019 Probability Theory PDF

    40/125

  • 8/14/2019 Probability Theory PDF

    41/125

    IfPhasdensitypthenPQ(A) = I(x+yA)p(x)dxdQ(y) = I(zA)p(zy)dzdQ(y)

    = p(zy)dzdQ(y) = p(zy)dQ(y) dzA A

    whichmeansthatPQhasdensity f(x) = p(xy)dQ(y). (9.0.2)

    If,inaddition,Qhasdensityq then f(x) = p(xy)q(y)dy.

    DenotebyN(0, 2I)thelawoftherandomvectorX= (X1, . . . , X k)ofi.i.d.N(0, 2)randomvariableswhosedensityonRk is

    k 1 2 1 2 1 1 k2e22xi = 2 e22|x| .

    i=1ForadistributionPdenoteP =P N(0, 2I).Lemma 18 P =P N(0, 2I) hasdensity 1k 2 2

    p(x) = f(t)ei(t,x) 2 |t| dt2

    where f(t) = ei(t,x)dP(x).Proof.By(9.0.2),P N(0, 2I)hasdensity 1 k

    p(x) = 2 e2

    12|xy|2dP(y).

    Using(9.0.1),wecanwritee212(xiyi)2 = 1

    2ei1(xiyi)zie21zi2dzi

    andtakingaproductoverikweget1 2 1 k 1 2

    e22|xy| = 2

    ei1(xy,z)e2|z| dz.

    Thenwe

    can

    continue

    (xy,z)21|z|2p(x) = 1 k ei1 dzdP(y)

    2 1 k= ei1(xy,z)21|z|2dP(y)dz

    2 1 k zei1(x,z) 21= f 2|z| dz.

    2 Letz=t.

    36

  • 8/14/2019 Probability Theory PDF

    42/125

    0

    Theorem 23 (Uniqueness) Ifi(t,x)dP(x) = i(t,x)dQ(x)e e

    thenP=Q.Proof.BytheaboveLemma,P =Q.IfXPandN(0, I)thenX+X almostsurelyasand,therefore,P Pweakly.Similarly,Q Q.

    WeprovedthatthecharacteristicfunctionofSn/nconvergestothec.f.ofN(0, 2).Also,thesequenceL Sn

    n - isuniformlytight,n1sincebyChebyshevsinequality

    P Snn

    2> M <

    M2for largeenoughM.TofinishtheproofoftheCLTonthereal lineweapplythefollowing.Lemma 19 If(Pn) isuniformly tight and

    fn(t) = eitxdPn(x) f(t)thenPn P andf(t) = eitxdP(x).Proof. For any sequence (n(k)), by Selection Theorem, there exists a subsequence (n(k(r))) such thatPn(k(r)) convergesweaklytosomedistributionP.Sinceei(t,x) isboundedandcontinuous,

    i(t,x)

    dPn(k(r)) i(t,x)dP(x)e eas r and, therefore, f is a c.f. of P. By uniqueness theorem, distribution P does not depend on thesequence(n(k)).ByLemma13,Pn Pweakly.

    37

  • 8/14/2019 Probability Theory PDF

    43/125

    Section 10Multivariate normal distributions andCLT.LetPbeaprobabilitydistributiononRk andlet

    g(t) = ei(t,x)dP(x).WeprovedthatP =P N(0, 2I)hasdensity

    p 12(x)=(2)k g(t)ei(t,x) 2|t|2dt.Lemma 20 (Fourier inversionformula)If |g(t)|dt

  • 8/14/2019 Probability Theory PDF

    44/125

    It isnowasimpleexercisetoshowthatforanyboundedopensetU,dP(x) = p(x)dx.

    U UThismeansthatPrestrictedtoboundedsetshasdensityp(x)and,hence,onentireRk.

    ForarandomvectorX= (X1, . . . , X k)Rk wedenoteEX= (EX1, . . . ,EXk).ofi.i.d.randomvectorsonRk suchthatEX1 = 0,E X1 2Theorem 24 Considerasequence(Xi)i1

    ThenL Sn converges weakly to distribution Pwhichhas characteristicfunctionn

  • 8/14/2019 Probability Theory PDF

    45/125

    foranysetRk wecanwriteP(Ag)=P(gA1)=

    A

    1

    1

    2k

    exp

    2

    1|x|2dx.

    Letusnowmakethechangeofvariablesy=Axorx=A1y.Then 1 k 1 1P(Ag)=

    2 exp 2|A1y|2 |det(A)|dy.Butsince

    det(C)=det(AAT)=det(A) det(AT)=det(A)2wehave |det(A)|= det(C).Also

    |A1y|2 = (A1y)T(A1y) =yT(AT)1A1y=yT(AAT)1y=yTC1y.Therefore,weget k 1 1 1P(Ag)=

    2 det(C)exp 2yTC1y dy.ThismeansthatthedistributionN(0, C)hasthedensity 1 k 1 1

    2 det(C)exp 2yTC1y .

    General case. Letustake, forexample,avectorX =QD1/2g for i.i.d.standardnormalvector g sothatX N(0, C).Ifq1, . . . , qk arethecolumnvectorsofQthen

    X=QD1/2g= (11/2g1)q1 +. . .+ (1k/2gn)qk.Therefore,intheorthonormalcoordinatebasisq1, . . . , qk arandomvectorXhascoordinates11/2g1, . . . , k1/2gk.These coordinates are independent with normal distributions with variances 1, . . . , k correspondingly.When det(C) = 0, i.e. C is not invertible, some of its eigenvalues will be zero, say, n+1 = . . . = k = 0.Then the random X vector will be concentrated on the subspace spanned by vectors q1, . . . , qn but it willnothavedensityontheentirespaceRk.Onthesubspacespannedbyvectorsq1, . . . , qn avectorX willhaveadensity

    n 1 x2i f(x1, . . . , xn) = 2i exp 2i .i=1

    Letuslookatacoupleofpropertiesofnormaldistributions.Lemma 21 IfX N(0, C)onRk andA:Rk Rm is linear thenAX N(0,ACAT)onRm.Proof.Thec.f.ofAX is

    Eei(t,AX) =Eei(ATt,X) =e21(CATt,ATt) =e21(ACATt,t).

    Lemma 22 X isnormalonRk iff(t, X) isnormalonRforalltRk.

    40

  • 8/14/2019 Probability Theory PDF

    46/125

    Proof.= .Thec.f.ofreal-valuedrandomvariable(t, X)isf() =Eei(t,X) =Eei(t,X) =e21(Ct,t) =e212(Ct,t)

    whichmeansthat(t, X) N(0,(Ct,t)). =.If(t, X) isnormalthen

    1

    Eei(t,X) =e2(Ct,t)becausethevarianceof(t, X) is(Ct,t).

    Lemma 23 Let Z = (X, Y) where X = (X1, . . . , X i) and Y = (Y1, . . . , Y j) and suppose that Z is normalonRi+j. ThenX and Y are independent iff Cov(Xm, Yn) = 0forallm,n.

    Proof.Onewayisobvious.Theotherwayaround,supposethatD 0

    C=Cov(Z) = .0 F

    Thenthec.f.ofZ isEei(t,Z) =e21(Ct,t) =e21(Dt1,t1)21(F t2,t2) =Eei(t1,X)Eei(t2,Y),

    wheret= (t1, t2).Byuniqueness,X andY are independent.

    Lemma 24 (ContinuousMapping.)SupposethatPn PonX andG:X Y isacontinuousmap.ThenG1 P G1 onY. Inotherwords, ifr.v.Zn

    Z weakly thenG(Zn

    ) G(Z)weakly.Pn

    Proof.This isobvious,becauseforanyfCb(Y),wehavefGCb(X)andtherefore,Ef(G(Zn)) Ef(G(Z)).

    Lemma 25 IfPn PonRk andQn Q onRm thenPnQn PQonRk+m.Proof.ByFubinitheorem,Thec.f.

    ei(t,x)dPnQn(x) = ei(t1,x1)dPn ei(t2,x2)dQn ei(t1,x1)dP ei(t2,x2)dQ= ei(t,x)dPQ.ByLemma19itremainstoshowthat(PnQn) isuniformlytight.ByTheorem21,sincePn P, (Pn) isuniformly tight. Therefore, there exists a compact K on Rk such that Pn(K) > 1. Similarly, for somecompactK onRm,Qn(K)>1.Wehave,

    PnQn(KK)>12andKK isacompactonRk+m.

    Corollary 1 IfPn Pand Qn QbothonRk then PnQn PQ. Proof. Since a function G : Rk+k Rk given by G(x, y) = x+y is continuous, by continuous mappinglemma,

    PnQn = (PnQn)G1 (PQ)G1 =PQ.

    41

  • 8/14/2019 Probability Theory PDF

    47/125

    Section 11Lindebergs CLT. Levys EquivalenceTheorem. Three Series Theorem.Insteadofconsideringi.i.d.sequences,foreachn1wewillconsideravector(X1n, . . . , X n)ofindependentnr.v.,notnecessarilyidenticallydistributed.Thissettingiscalledtriangulararraysbecausetheentirevectormaychangewithn.Theorem 25 Consideravector (Xin)1in of independentr.v.ssuch that

    EXin = 0, Var(Sn) = E(Xin)2 = 1.

    inSuppose that thefollowingLindebergscondition issatisfied:

    n

    E(Xin)2I(|Xin|> )0 asn for all >0. (11.0.1)i=1

    ThenL iinXn N(0,1).Proof.Firstofall, L inXn isuniformlytight,becausebyChebyshevs inequalityi

    P Xni 1> M M2 in

    22forlargeenoughM.ItremainstoshowthatthecharacteristicfunctionofSn covergestoe .Forsimplicity

    ofnotationsletusomittheupper indexnandwriteXi insteadofXi

    n.Since,EeiSn = EeiXi

    init isenoughtoshowthat 2iSn iXilogEe = log 1 + Ee 1 .

    2 (11.0.2)It isaneasyexercisetoprove,byinductiononm,thatforanyaR,

    m+1|a|(m+1)!.

    (ia)kia (11.0.3)e k!

    km

    42

  • 8/14/2019 Probability Theory PDF

    48/125

    (Justintegratethis inequalitytomaketheinductionstep.)Usingthisform= 1,EeiXi

    1 = EeiXi

    1

    iEXi

    2 2 2 1EXi2 2 + EXi2I(

    |Xi

    |> )

    22 (11.0.4)

    2

    22

    2

    for largenby(11.0.1)andforsmallenough.Usingtheexpansionof log(1+z) itiseasytocheckthat

    12log(1+z)z for |z| |z| 2

    and,therefore,wecanwriteEe 4 22EeiXi EeiXi 1 iXi 1 EXi2log 1 + 1 4

    in in in4 4

    EXi2 EXi2 EXi2 0max = max 4 4in inin

    because,asin(11.0.4),EX2 2 +EX2I(|Xi|> )0i i

    for largenby(11.0.1)andfor 0.Finally,toshow(11.0.2)itremainstoshowthat2

    EeiXi 1 . 2

    inUsing(11.0.3)form=1,ontheevent |Xi|> ,

    eiXi 1iXi I 2X2iI (|Xi|> Xi > | | 2and,therefore, 2iXi 1iXi + X2i 2X2iII Xi > Xi > | | | |e .2Using(11.0.3)form= 2,ontheevent |Xi| ,

    eiXi 1iXi +22 X2i

    3 3Xi|3I Xi2.I |Xi| Xi| 6 | | 6

    Combiningthe lasttwoequationsandusingthatEXi = 0,Ee 2EX2i2 3iXi EXi2 EXi2.1 + I |Xi|> +2 6

    Finally,=2 2EeiXi iXi EXi21 1 +Ee + 2 2

    in in in3

    +2 EXi2I( Xi > ) 0 | | 6asn (usingLindebergscondition)and0.

    Lemma 26 IfP,Q are distributions on Rsuch thatPQ=P then Q({0}) = 1.

    43

  • 8/14/2019 Probability Theory PDF

    49/125

  • 8/14/2019 Probability Theory PDF

    50/125

    byBorel-Cantelli lemmaP({Xi =Zi} i.o.)=0whichmeansthat Xi converges iff Zi converges. i1 i1By2,itisenoughtoshowthat i1(ZiEZi)converges,butthisfollowsfromTheorem13by3.

    =.If i1Xi convergesa.s.,P({|Xi|>1}i.o)=0

    andsince(Xi)areindependent,byBorel-Cantelli,P(|Xi|>1) ) for any > 0 for m, n large enough. Suppose that

    | |i

    1Var(Zi) =

    .Then 2 =Var(Smn)= Var(Zk) mn

    mknas n for any fixed m. Intuitively, this should not happen: Smn 0 in probability but their variancegoesto infinity.Inprinciple,onecanconstructsuchsequenceofrandomvariablesbut inourcaseitwillberuledoutbyLindebergsCLT.Becausemn ,Lindebergstheoremwillimplythat,

    SmnESmn ZkEZkTmn = =

    mn mn N(0,1),mkn

    ifm, n and2 .Weonlyneedtocheckthatmn

    2

    E ZkEZk I ZkEZk> 0mn mn mkn

    asm,n,nm .Since|ZkEZk|

  • 8/14/2019 Probability Theory PDF

    51/125

    Section 12Levys Continuity Theorem. PoissonApproximation. ConditionalExpectation.

    Letusstartwiththefollowingbound.Lemma 27 LetX beareal-valuedr.v.with distribution Pand let

    f(t) =EeitX = eitxdP(x).Then,

    1

    7

    u

    P |X|>u u

    0 (1Ref(t))dt.

    Proof.Since Ref(t)= costxdP(x)

    wehaveu u

    1 1(1costx)dP(x)dt = (1costx)dtdP(x)

    u u0 R R 0 sinxu

    = 1xu dP(x)

    R sinxu 1 xu dP(x) siny sin1 |xu|1 1 1

    sincey

    1 (1sin1) 1dP(x) 7P |X| u .

    |xu|1

    Theorem 28 (Levy continuity)Let (Xn) bea sequenceofr.v.onRk.Suppose thatfn(t) =Eei(t,Xn) f(t)

    46

  • 8/14/2019 Probability Theory PDF

    52/125

    andf(t) iscontinuousat0alongeachaxis.Then there exists aprobabilitydistributionPsuch thatf(t) = ei(t,x)dP(x)

    andL(Xn)P.Proof.ByLemma19weonlyneedtoshowthat{L(Xn)}isuniformlytight.Ifwedenote

    Xn = (Xn,1, . . . , X n,k)thenthec.f.salongtheith coordinate:

    fi(ti):=fn(0, . . . , ti,0, . . .0)=EeitiXn,i f(0, . . . , ti, . . .0)=:fi(ti).n Sincefn(0)=1and,therefore,f(0)=1,forany >0wecanfind >0suchthatforallik

    |fi

    (ti)1| if |ti| .

    This impliesthatforlargeenoughn|fni(ti)1| 2 if |ti| .

    UsingpreviousLemma, 1 7 7

    P |Xn,i|> 0 1Refi(ti) dti 0 1fi(ti)dti 72.n n

    Theunionboundimpliesthat

    k

    P |Xn|> 14k

    and{L(Xn)}n1 isuniformlytight.CLTdescribeshowsumsofindependentr.v.sareapproximatedbynormaldistribution.Wewillnowgivea

    nsimpleexampleofadifferentapproximation.ConsiderindependentBernoullirandomvariablesXin B(pi)n n nforin, i.e.P(Xn =1)=p andP(Xn =0)=1pi.Ifp =p >0thenbyCLTi i i i

    Snnpnp(1p) N(0,1).

    nHowever, ifp =pi 0 fast enough then, for example, the Lindeberg conditions will be violated. It iswell-knownthat ifpni

    =pn andnpn thenSn hasapproximatelyPoissondistribution withp.f.

    kf(k) = e

    fork= 0,1,2, . . .k!

    Here isaversionofthisresult.Theorem 29 Consider independentXi B(pi)forinand let

    Sn =X1 +. . .+Xn and=p1 +. . .+pn.Thenforany subsetof integers BZ,

    |P(Sn B)(B)| pi2.in

    47

  • 8/14/2019 Probability Theory PDF

    53/125

    Proof. The proof is based on the construction on one probability space. Let us construct Bernoulli r.v.Xi B(pi)andPoissonr.v.Xi pi onthesameprobabilityspaceasfollows.Letusconsideraprobabilityspace([0,1],B, )withLebesquemeasure.Define

    0, 0x1pi,Xi =Xi(x) = 1, 1pi < x1.Clearly,Xi B(pi).LetusconstructXi asfollows.Iffork0wedefine

    (pi)l

    epick =l!

    0lkthen

    Xi =Xi(x) =

    0, 0xc0,1, c0 < xc1,2, c1 < xc2,. . .

    Clearly,Xi pi.WhenXi = Xi?Since1pj epj =c0,thiscanonlyhappenfor1pi < xc0 and c1 < x1,

    i.e.P(Xj =X) =epj (1pj)+(1epj pjepj) =pj(1epj)p2j j

    Weconstructpairs(Xi, Xi)onseparatecoordinatesofaproductspace,thus,makingthemindependentfoIt is well-known thatin. inXi and,finally,weget

    P(Sn =Sn) P(Xj =Xj) pj2.jn jn

    Conditional expectation.Let(,B,P)beaprobabilityspaceandX : RbearandomvariablesuchthatE|X|

  • 8/14/2019 Probability Theory PDF

    54/125

    BydefinitionY =E(X|A).2.(Uniqueness)SupposethereexistsY =E(X|A)suchthatP(Y = Y)>0, i.e.

    P(Y

    > Y)

    >

    0or

    P(Y

    < Y

    )

    >

    0.

    SincebothY, Y aremeasurableonAthesetA={Y > Y} A.Oneonehand,E(Y Y)IA >0.Ontheotherhand,

    E(Y Y)IA =EXIAEXIA = 0- acontradiction.

    3.E(cX+Y|A) =cE(X|A) +E(Y|A).4.If-algebrasC A B then

    E(E(X|A)|C) =E(X|C).ConsiderasetC C A.Then

    EIC(E(E(X|A)|C))=EICE(X|A) =EICX and EIC(E(X|C))=EXIC.Weconcludebyuniqueness.

    5.E(X|B) =X,E(X|{,}) =EX,E(X|A) =EX ifX isindependentofA.6.IfXZ thenE(X|A)E(Z|A)a.s.;proof issimilartoproofofuniqueness.7.(Monotoneconvergence)IfE|Xn|

  • 8/14/2019 Probability Theory PDF

    55/125

    WecanassumethatX, Y 0bydecomposingX =X+X, Y =Y+Y.Considerasequenceofsimplefunctions

    Yn = wkICk, Ck AmeasurableonAsuchthat0Yn Y.Bymonotoneconvergencetheorem,it isenoughtoprovethat

    E(XICk|A) = ICkE(X|A).TakeB A.SinceBCk A,

    EIBICkE(X|A) =EIBCkE(X|A) =EXIBCk =E(XICk)IB.10.(Jensens inequality)Iff :R R isconvexthen

    f(E(X|A))E(f(X|A)).Byconvexity,

    f(X)

    f(E(X|A))

    f(E(X|A))(X

    E(X|A)).

    Takingconditionexpectationofbothsides,

    E(f(X)|A)f(E(X|A))f(E(X|A))(E(X|A)E(X|A))=0.

    50

  • 8/14/2019 Probability Theory PDF

    56/125

    Section 13Martingales. Doobs Decomposition.Uniform Integrability.Let(,B,P)beaprobabilityspaceandlet(T,)bealinearlyorderedset.Considerafamilyof-algebrasBt, tT suchthatfortu,Bt Bu B.Definition.Afamily(Xt,Bt)tT iscalledamartingaleif

    1. Xt : Rismeasurablew.r.t.Bt;inotherwords,Xt isadaptedtoBt.2. E|Xt|

  • 8/14/2019 Probability Theory PDF

    57/125

    Thus,(Zn,Bn)n1 isaright-closedmartingale.

    Lemma

    28

    Letf:R R

    be

    a

    convex

    function.

    Suppose

    that

    either

    one

    of

    two

    conditions

    holds:

    1. (Xt,Bt) isamartingale,2. (Xt,Bt) isasubmartingaleand f is increasing.

    Then (f(Xt),Bt) isasubmartingale.Proof.1.Fortu,byJensensinequality,

    f(Xt) =f(E(Xu|Bt))E(f(Xu)|Bt).2.Fortu,sinceXt E(Xu|Bt)andf isincreasing,

    f(Xt)f(E(Xu|Bt))E(f(Xu)|Bt),

    wherethe laststepisagainJensensinequality.

    Theorem 30 (Doobsdecomposition)If(Xn,Bn)n0 isasubmartingalethenitcanbeuniquelydecomposedXn =Zn +Yn,

    where (Yn,Bn) isamartingale,Z0 = 0, Zn Zn+1 almost surely andZn isBn1-measurable.Proof.LetDn =XnXn1 and

    Gn =E(Dn|B

    n

    1) =E(Xn

    |Bn

    1)

    Xn

    1

    0

    bythedefinitionofsubmartingale.Let,Hn =DnGn, Yn =H1 +. . .+Hn, Zn =G1 + +Gn.

    SinceGn 0a.s.,Zn Zn+1 and,byconstruction,Zn isBn1-measurable.Wehave,E(Hn|Bn1) =E(Dn|Bn1)Gn = 0

    and, therefore, E(Yn|Bn1) =Yn1. Uniqueness follows by construction. Suppose that Xn =Zn +Yn withallstatedproperties.First,sinceZ0 = 0, Y0 =X0.Byinduction,givenauniquedecompositionupton1,wecanwrite

    Zn =E(Zn|Bn1) =E(XnYn|Bn1) =E(Xn|Bn1)Yn1andYn =XnZn.Definition.Wesaythat(Xn)n1 isuniformly integrableif

    supE and sup I( > M) 0 asn |Xn|

  • 8/14/2019 Probability Theory PDF

    58/125

    Proof.1.IfXn =E(Y|Bn)then|Xn|=|E(Y|Bn)| E(|Y||Bn) and E|Xn| E|Y| M} Bn,XnI(|Xn|> M)=I(|Xn|> M)E(Y|Bn) =E(YI(|Xn|> M)|Bn)

    and,therefore,E|Xn|I(|Xn|> M)E|Y|I(|Xn|> M) KP(|Xn|> M) +E|Y|I(|Y|> K)

    KE|Xn|+E Y I( Y > K)KE|Y|+E Y I( Y > K).M | | | | M | | | |

    LettingM , K provesthatsup E|Xn|I(|Xn|> M)0asM .n2.Since(Xn,Bn)n isasubmartingale, forY =X wehaveXn E(Y|Bn).Belowwewillusethe

    followingobservation.Sinceafunctionmax(a, x) isconvexand increasinginx,byJensensinequalitymax(a, Xn)E(max(a, Y)|Bn). (13.0.1)

    Since,max(Xn, a) |a|+XnI Xn >|a|

    and{|Xn > a|}Bn wecanwrite| |Emax(Xn, a) +EXnI Xn > a a +EYI +E|Y||a| |a| |a| | | | |

    IfwetakeM >|a|thenE|max(Xn, a)|I(|max(Xn, a)|> M) = EXnI(Xn > M)EYI(Xn > M)

    KP(Xn > M) +E|Y I( Y > K)Emax(Xn,0)

    | | | K M +E|Y|I(|Y|> K)

    Emax(Y,0)by(13.0.1) K

    M +E|Y|I(|Y|> K).LettingM andK finishestheproof.Uniformintegrabilityplaysanimportantrolewhenstudyingtheconvergenceofmartingales.Thefollowingstrengtheningofthedominatedconvergencetheoremwillbeuseful.Lemma 30 Considerr.v.s(Xn)andX suchthatE|Xn| ) + 2E|Xn|I(|Xn|> K) + 2E|X|I(|X|> K)+ 2KP( > )+2supE Xn I( Xn > K) + 2E X|I( X > K). |XnX|

    n | | | | | | |Lettingn andthen0, K provestheresult.

    1= 2.ByChebyshevsinequality,1

    P(|XnX|> )E|XnX| 0

    53

  • 8/14/2019 Probability Theory PDF

    59/125

    as n so Xn X in probability. To prove uniform integrability let us first show that for any > 0thereexists >0suchthat

    P(A)< =E|X|IA 0onecanfindasequenceofeventsA(n)suchthat

    1P(A(n))

    2n and E|X|IA(n) >.Since n1P(A(n))0,take asaboveandtakeM >0 largeenoughsothatforalln1P( Xn > M) E|Xn| M)

    E

    |Xn

    X

    |+E

    |X

    |I(

    |Xn

    |> M)

    E

    |Xn

    X

    |+.

    Forlargeenoughnn0,E|XnX| and,therefore,E|Xn|I(|Xn|> M)2.

    WecanalsochooseM largeenoughsothatE|Xn|I(|Xn|> M)2fornn0 andthisfinishestheproof.

    54

  • 8/14/2019 Probability Theory PDF

    60/125

    Section 14Optional stopping. Inequalities formartingales.Considerasequenceof-algebras(Bn)n0 suchthatBn Bn+1.Integervaluedr.v. {1,2, . . .} iscalledastopping timeif{n} Bn.LetusdenotebyB a-algebraoftheeventsB suchthat

    {n} B Bn, n1.If(Xn)isadaptedto(Bn)thenrandomvariablessuchasX ork=1Xk aremeasurableonB.Forexample,

    {X A}= { =n} {Xn A}= {n} \ {n1} {Xn A} B.n1 n1

    Theorem 31 (Optionalstopping)Let(Xn,Bn)beamartingaleand1, 2

  • 8/14/2019 Probability Theory PDF

    61/125

    Thesecondconditionin(14.0.1)isviolatedsinceP(2 =n) = 2n andE|Sn|I(n2) = 2P(2 =n)+(2n+12)P(n+ 12) = 20.

    Proof of Theorem 31.ConsiderasetA B1.Wehave,EX2IAI(1 2) = EX2I A {1 =n} I(n2)

    n1()= EXnI A {1 =n} I(n2) =EX1IAI(1 2).

    n1Toprove(*) itisenoughtoprovethatforAn =A {1 =n} Bn,

    EX2IAnI(n2) =EXnIAnI(n2). (14.0.2)Wecanwrite

    EXnIAnI(n2) = EXnIAnI(2 =n) +EXnIAnI(n+ 12)= EX2IAnI(2 =n) +EXnIAnI(n+ 12)

    since{n+ 12}={2 n}c Bn, by martingaleproperty= EX2IAnI(2 =n) +EXn+1IAnI(n+ 12)

    byinduction = EX2IAnI(2 =k) +EXmIAnI(m2)nk

  • 8/14/2019 Probability Theory PDF

    62/125

    OntheeventA,X1 M and,therefore,EXnIA =EX2IA EX1IA MEIA =MP(A).

    Ontheotherhand,EXnIA EX+ andthisfinishestheproof.nAs a corollary we obtain the second Kolmogorovs inequality. If (Xi) are independent and EXi = 0 thenSn = 1inXi isamartingaleandSn2 isasubmartingale.Therefore, 1 1

    P max =P max S2 ES2 = Var(Xk).1kn|Sk| M 1kn k M2 M2 n M2

    1knExercises. 1.ShowthatforanyrandomvariableY,E|Yp|= ptp1P(|Y| t)dt.0 2.LetX, Y betwonon-negativerandomvariablessuchthatforeveryt >0,P(Y t)t1 XI(Y t)dP.Foranyp >1,fp = ( |f|pdP)1/p and1/p+ 1/q= 1,showthatYp qXp.3.Givenanon-negativesubmartingale(Xn,Bn),letX :=maxjnXj andX :=maxj1Xj.Provethatfornanyp >1and1/p+ 1/q= 1,Xp qsupnXnp.Hint:useexercise2andDoobsmaximalinequality.Doobs upcrossing inequality. Let (Xn,Bn)n1 be a submartingale. Given two real numbers a < b wewilldefineasequenceofstoppingtimes(n)whenXn iscrossingadownwardandbupwardasinfigure14.1.Namely,wedefine

    x

    a

    b

    x x x x

    xx

    Figure14.1:Stoppingtimesof levelcrossings.1 =min{n1, Xn a}, 2 =min{n > 2 :Xn b}

    and,byinduction,fork22k1 =min{n > 2k2, Xn a}, 2k =min{n >2k1, Xn b}.

    Define(a,b,n)=max{k:2k n}

    - thenumberofupwardcrossingsof [a, b]beforetimen.Theorem 33 (Doobsupcrossing inequality)Wehave,

    E(a,b,n) E(Xbn

    a

    a)+. (14.0.4)

    Proof.Sincex(xa)+ is increasingconvexfunction,Zn = (Xna)+ isalsoasubmartingale.Clearly,X(a,b,n) =Z(0, ba, n)

    whichmeansthatitisenoughtoprove(14.0.4)fornonnegativesubmartingales.Fromnowonwecanassumethat0Xn andwewouldliketoshowthat

    E(0, b , n) EXn.b

    57

  • 8/14/2019 Probability Theory PDF

    63/125

    Letusdefineasequenceofr.v.s1, 2k1 < j2k forsomek=j 0, otherwise,

    i.e.j istheindicatoroftheeventthatattimej theprocessiscrossing [0, b]upward.DefineX0 = 0.Thenn n

    b(0, b , n) j(Xj Xj1) = I(j =1)(Xj Xj1).j=1 j=1

    TheeventBj1Bj1 c

    {j = 1}= {2k1 < j2k}= 2k1 j1 2k j1 Bj1k k

    i.e.thefactthatattimejwearecrossingupwardisdeterminedcompletelybythesequenceuptotimej

    1.Then

    n nbE(0, b , n) EE I(j =1)(Xj Xj1)Bj1 = EI(j =1)E(Xj Xj1|Bj1)

    j=1 j=1n n

    = EI(j =1)(E(Xj|Bj1)Xj1) E(Xj Xj1) =EXn,j=1 j=1

    where in the last inequality we used that (Xj,Bj) is a submartingale, E(Xj|Bj1) Xj1, which impliesthat

    I(j =1)(E(Xj|Bj1)Xj1)E(Xj|Bj1)Xj1.Thisfinishestheproof.

    58

  • 8/14/2019 Probability Theory PDF

    64/125

    Section 15Convergence of martingales.Fundamental Walds identity.Wefinallygettoourmainresultabouttheconvergenceofmartingalesandsubmartingales.Theorem 34 Let (Xn,Bn)

  • 8/14/2019 Probability Theory PDF

    65/125

    ByDoobs inequality,EY(a,b,n) E(Yna)+ = E(X1a)+

  • 8/14/2019 Probability Theory PDF

    66/125

    Corollary 2 Martingale(Xn,Bn) isright-closable iff it is uniformly integrable.Toprovethis,applycase3aboveto(Xn)and(Xn)whicharebothsubmartingales.Theorem 35 (Levysconvergence)Let(,B,P)beaprobabilityspaceandX beareal-valuedrandomvariableon it.Givenasequence of-algebras

    B1 . . . Bn . . . B+ BwhereB+ = 1n

  • 8/14/2019 Probability Theory PDF

    67/125

    as M . Therefore, the limit Y = limYn exists and it is an easy exercise to show that Sn/bn 0(Kroneckerslemma).

    3.(Polyaurnscheme)LetusrecallthePolyaurnschemefromSection5.Letusconsiderasequence#(blueballsafterniterations)Yn = .

    #(totalaftern iterations)Yn isamartingalebecausegiventhatatstepnthenumbersofblueandredballsarebandr,theexpectednumberofballsatstepn+1willbe

    b b+c r b bE(Yn+1|Bn) = + = =Yn.

    b+r b+r+c b+r b+r+c b+rSince Yn is bounded, by martingale convergence theorem, the limit Y = limnYn exists. What is thedistributionofY?Letusconsiderasequence

    1 blueatstepiXi = 0 redatstepi

    and letSn = inXi.Clearly,b+Snc Sn

    Yn =b+r+nc n

    as n and, therefore, Sn/n Y. The sequence (Xn) is exchangeable and by de Finettis theorem inSection5weshowedthat 1

    P(Sn =k) = n xk(1x)nkd b , r (x).k 0 c c

    ForanyfunctionuC([0,1]),n

    1Eu = u xk(1x)nkd , (x) = Bn(x)d , (x),

    Sn

    k n

    b r

    1

    b r

    n n k 0 c c 0 c c

    k=0

    whereBn(x) istheBernsteinpolynomialthatapproximatesu(x)uniformlyon [0,1].Therefore, 1 Eu S

    nn 0 u(x)d cb

    ,rc (x)

    whichmeansthat ,L S

    nn cb r

    c =L(Y),i.e.the limitY hasBetadistribution cb,rc .Optional stopping for martingales revisited. Let be a stopping time. We would like to determinewhenEX =EX1.Aswesawaboveinthecaseoftwostoppingtimes,somekindofintegrabilityassumptionsare

    necessary.

    In

    this

    simpler

    case,

    the

    necessary

    conditions

    are

    clear

    from

    the

    proof.

    Lemma 31 Wehave

    EX1 = lim EXI(n) lim EXnI(n) = 0.n n

    Proof.Wecanwrite,EXI(n) = EXkI( =k) = EXkI(k)EXkI(k+1)

    1kn 1kn since{k+ 1}={k}c Bk = EXkI(k)EXk+1I(k+1)

    1kn= EX1EXn+1I(n+1).

    62

  • 8/14/2019 Probability Theory PDF

    68/125

    Example.Given0< p

  • 8/14/2019 Probability Theory PDF

    69/125

    bysymmetry.Therefore,1 1

    Ech() = and Ee =ch(z) ch(ch1(e

    )z)

    byachangeofvariablese = 1/ch.Formoregeneralstoppingtimesthecondition(15.0.1)mightnotbeeasytocheck.WewillnowshowanotherapproachthatishelpfultoverifyafundamentalWaldsidentity.IfPisthedistributionofXis,letP beadistributionwithRadon-Nikodymderivativew.r.t.Pgivenby

    dP ex= .

    dP ()This is,indeed,adensitysince

    ex ()dP= = 1.

    R (x) ()Wewillthinkof(Xn)asdefinedontheproductspace(R,B,P).Forexample,aset

    { =n} (X1, . . . , X n) BnisaBorelsetonRn.Wecanwrite,

    S Sn (x1+ +xn) e e e E

    ()I(

  • 8/14/2019 Probability Theory PDF

    70/125

    Section 16Convergence on metric spaces.Portmanteau Theorem. LipschitzFunctions.

    Let (S, d) be a metric space andB - a Borel -algebra generated by open sets. Let us recall that Pn PweaklyonB if

    f dPn f dPforallfCb(S)- real-valuedboundedcontinuousfunctionsonS.

    ForasetAS,wedenotebyAtheclosureofA, intA- interiorofAandA=A\intA- boundaryofA.Aiscalledacontinuity setofPifP(A) = 0.Theorem 36 (Portmanteau theorem) Thefollowingareequivalent.

    1. Pn Pweakly.2. Forany openset US, lim infnPn(U)P(U).3. Forany closedsetF S, limsup Pn(F)P(F).n4. Forany continuitysetA ofP, limnPn(A) =P(A).

    Proof.1= 2.LetU beanopensetandF =Uc.ConsiderasequenceoffunctionsinCb(S)

    fm(s)=min(1, md(s, F))suchthatfm(s) IU(s).(This isnotnecessarilytrue ifU isnotopen.)SincePn P,

    Pn(U) fmdPn fmdP asn and liminfPn(U) fmdP.n

    Lettingm ,bymonotoneconvergencetheorem.liminfPn(U) IUdP=P(U).

    n3.Bytakingcomplements.

    65

    2

  • 8/14/2019 Probability Theory PDF

    71/125

    2,3=4.SinceintAisopenandAisclosedandintAA,by2and3,P(intA)liminfPn(intA)lim supPn(A)P(A).

    n nIfP(A)=0thenP(A)=P(intA) =P(A)and,therefore, limPn(A) =P(A).

    4=1. Consider f Cb(S) and let Fy ={s S : f(s) =y} be a level set of f. There exist at mostcountablymanyysuchthatP(Fy)>0.Therefore,forany >0wecanfindasequencea1 . . .aN suchthat

    max(ak+1ak), P(Fak)=0 forallkandtherangeoff isinsidetheinterval(a1,aN).Let

    Bk ={sS :ak f(s)< ak+1} and f(s) = akI(sBk).Sincef iscontinuous,Bk Fak Fak+1 andP(Bk) = 0.By4,

    fdPn = akPn(Bk) akP(Bk) = fdP.k k

    Since,byconstruction,|f(s)f(s)| ,letting0provesthat f dPn f dP.Lipschitz functions.Forafunctionf :S R,letusdefineaLipschitzsemi-normby

    ||f||L =sup|f(x)f(y)|.x=y d(x, y)

    Clearly, ||f||L =0ifff isconstantso||f||L isnotanorm.LetusdefineaboundedLipschitznormby||f||BL =||f||L +||f||,

    where ||f|| =supsS|f(s)|.LetBL(S, d) = f :SR : ||f||BL

  • 8/14/2019 Probability Theory PDF

    72/125

    Proof.Proofof 1.It isenoughtoconsiderk= 2.Forspecificity,take=.Givenx, yS,supposethatf1f2(x)f1f2(y) =f1(y).

    Thenf1(x)f1(y), iff1(x)f2(x)|f1f2(y)f1f2(x)|=f1f2(x)f1f2(y) f2(x)f2(y), otherwise

    ||f1||L||f2||Ld(x, y).Thisfinishestheproofof1.

    Proofof 2.Firstofall,obviously,max||f1 fk||

    1ik||fi||.Therefore,using1,

    i i i ||fi||BL.||f1 fk||BL max||fi|| +max||fi||L 2max

    Theorem 37 (Extension theorem) Given a set AS and a bounded Lipschitzfunction f BL(A, d) onA, thereexistsanextensionhBL(S, d) such that

    f =h on A and ||h||BL =||f||BL.Proof. Let us first find an extension such that ||h||L = ||f||L. We will start by extending f to one pointxS\A.Thevaluey=h(x)mustsatisfy

    |yf(s)| fLd(x, s) forall sAor,equivalently,

    inf(f(s) +||

    f

    ||Ld(x, s))

    y

    sup(f(s)

    ||

    f

    ||Ld(x, s)).

    sA sASuchy existsiffforalls1, s2 A,

    f(s1) +||f||Ld(x, s1)f(s2)||f||Ld(x, s2).This inequalityissatisfiedbecausebytriangle inequality

    f(s2)f(s1)||f||Ld(s1, s2)||f||L(d(s1, x) +d(s2, x)).ItremainstoapplyZornslemmatoshowthatf canbeextendedtotheentireS.Defineorderbyinclusion:

    f1 f2 iff1 isdefinedonA1,f2 - onA2, A1 A2, f1 =f2 onA1 andf1L =f2L.

    Foranychain{f}, f = f f.ByZornslemmathereexistsamaximalelementh.ItisdefinedontheentireS because,otherwise,wecouldextendtoonemorepoint.ToextendpreservingBLnormtake

    h = (h||f||)(||f||).Bypart1ofprevious lemma, itiseasytoseethat ||h||BL =||f||BL.

    Stone-Weierstrass Theorem.A set A S is totally bounded if for any > 0 there exists a finite -cover of A, i.e. a set of points

    a1, . . . , aN suchthat A B(ai, ),

    iNwhereB(a, ) ={yS :d(a, y)} isaballofradiuscenteredata.Letusrecallthefollowingtheoremfromanalysis.

    67

  • 8/14/2019 Probability Theory PDF

    73/125

    Theorem 38 (Arzela-Ascoli) Let (S, d) be a compact metric space and let (C(S), d ) be the space of continuousreal-valuedfunctionsonS withuniform convergencemetric

    d (f, g)=sup . xS|

    f(x)

    g(x)|

    AsubsetF C(S) is totallybounded ind metric iffF is equicontinuousand uniformly bounded.Remark. Equicontinuous means that for any > there exists > 0 such that if d(x, y) then for allf F, |f(x)f(y)| .Theorem 39 (Stone-Weierstrass)Let(S, d)beacompactmetricspaceandF C(S) issuch that

    1. F isalgebra, i.e.forallf, g F, cR, we havecf+g F, f g F.2. F separatespoints, i.e. if x= yS then thereexistsf F such thatf(x) = f(y).3.

    F containsconstants.

    ThenF is dense inC(S).Corollary 3 If(S, d) isacompactspace then BL(S, d) isdense inC(S).Proof. ForF =BL(S, d) in the Stone-Weierstrass theorem, 3 is obvious, 1 follows from Lemma 32 and 2followsfromtheextensionTheorem37,sinceafunctiondefinedontwopointsx= y suchthatf(x) = f(y)canbeextendedtotheentireS.Proof of Theorem 39. Consider bounded f F, i.e. |f(x)| M. A function x |x| defined on theinterval [M, M] can be uniformly approximated by polynomials of x by the Weierstrass theorem on thereallineor,forexample,usingBernsteinspolynomials.Therefore,|f(x)|canbeuniformlyapproximatedbypolynomialsoff(x),andbyproperties1and3,byfunctionsin

    F.Therefore,if

    F istheclosureof

    F ind

    normthenforanyf F itsabsolutevalue |f| F.Therefore,foranyf, g F wehavemin(f, g)= 1

    2(f+g)1

    2|fg| F, max(f, g)=1

    2(f+g) +1

    2|fg| F. (16.0.1)Givenanypointsx= yandc, dRonecanalwaysfindf F suchthatf(x) =candf(y) =d.Indeed,byproperty2wecanfindg F suchthatg(x) = g(y)and,asaresult,asystemofequations

    ag(x) +b=c, ag(y) +b=dhasasolutiona,b.Thenthefunctionf =ag+bsatisfiestheaboveandit isinF by1.

    TakehC(S)andfixx.Foranyy letfy F besuchthatfy(x) =h(x), fy(y) =h(y).

    Bycontinuityoffy,foranyyS thereexistsanopenneighborhoodUy ofy suchthatfy(s)h(s)forsUy.

    Since (Uy) is an open cover of the compact S, there exists a finite subcover Uy1, . . . , U yN. Let us define afunction

    fx(s)=max(fy1(s), . . . , f yN(s)) F by(16.0.1).Byconstruction, ithasthefollowingproperties:

    fx(x) =h(x), fx(s)h(s) forall sS.

    68

  • 8/14/2019 Probability Theory PDF

    74/125

    Again,bycontinuityoffx(s)thereexistsanopenneighborhoodUx ofxsuchthatfx(s)h(s) + for sUx.

    TakeafinitesubcoverUx1, . . . , U xM anddefineh(s)=minfx1(s), . . . , f xM(s) F by(16.0.1).

    Byconstruction,h(s)h(s) +andh(s)h(s)forallsS whichmeansthatd (h, h).Sinceh F,thisprovesthatF isdense inC(S).

    Corollary 4 If(S, d) isacompactspace then C(S) isseparable ind.Remark.Recallthatthis fact wasused in the proof oftheSelectionTheorem, which was proved for

    generalmetricspaces.Proof.Bytheabovetheorem,BL(S, d)isdenseinC(S).Foranyintegern

    1,theset

    {f :

    ||f||

    BL

    n}isuniformlyboundedandequicontinuous.BytheArzela-Ascolitheorem,itistotallyboundedand,therefore,

    separablewhichcanbeseenbytakingfinite1/m-coversforallm1.Theunion{||f||BL n}=BL(S, d)

    isthereforeseparableinC(S)which is,asaresult,alsoseparable.

    69

  • 8/14/2019 Probability Theory PDF

    75/125

    Section 17Metrics for convergence of laws.Empirical measures.Levy-Prohorov metric.Considerametricspace(S, d).ForasetAS letusdenoteby

    A ={yS:d(x, y)< forsomexA}its-neighborhood.LetBbeaBorel-algebraonS.

    Definition.IfP,QareprobabilitydistributionsonB then(P,Q)=inf{ >0 :P(A)Q(A) +forallAB}

    iscalledtheLevy-Prohorov distancebetweenPandQ.Lemma 34 is ametricon theset ofprobability lawson

    B.

    Proof. 1. First, let us show that (Q,P) = (P,Q). Suppose that (P,Q) > . Then there exists a set AsuchthatP(A)>Q(A) +.Takingcomplementsgives

    Q(Ac)>P(Ac) +P(Ac) +,wherethe lastinequalityfollowsfromthefactthatAc Ac :

    aAc =d(a, Ac)< = d(a, b)< forsomebAcsinceb /A, d(b, A)

    = d(a, A)>0 =a /A=aAc.Therefore, for a set B = Ac, Q(B) > P(B) +. This means that (Q,P) > and, therefore, (Q,P)(P,Q).Bysymmetry,(Q,P)(P,Q)and(Q,P) =(P,Q).

    2.Next, letusshowthatif(P,Q)=0thenP=Q.ForanysetF andanyn1,1 1

    P(F)Q(Fn) + .n

    IfF isclosedthenF1 F asn andbycontinuityofmeasuren1nP(F)Q F =Q(F).

    Similarly,P(F)Q(F)and,therefore,P(F) =Q(F).70

  • 8/14/2019 Probability Theory PDF

    76/125

    3.Finally,letusprovethetriangleinequality(P,R)(P,Q) +(Q,R).

    If(P,Q)< xand(Q,R)< y thenforanysetA,P(A)Q(Ax) +xR (Ax)y +y+xR Ax+y +x+y,

    whichmeansthat(P,R)x+y.Bounded Lipschitz metric. Given probability distributions P,Q on the metric space (S, d) we define aboundedLipschitzdistancebetweenthemby

    (P,Q)=sup f dP f dQ:||f||BL 1 .Lemma 35 isametric on the setofprobability lawsonB.Proof. (P,Q) = (Q,P) and the triangle inequality are obvious. It remains to prove that (P,Q) = 0implies P = Q. Given a closed set F, the sequence of functions fm(x) = md(x, F)1 converges fm IU, whereU =Fc.Obviously, ||fm||BL m+1and,therefore, fmdP= fmdQ.Lettingm provesthatP(U) =Q(U).ThelawPon(S, d) istightifforany >0thereexistsacompactKS suchthatP(S\K).Theorem 40 (Ulam)If(S, d) isseparablethenforany lawPonB thereexistsaclosedtotallyboundedsetKS such that P(S\K). If (S, d) is complete and separable then K is compact and, therefore, everylaw is tight.

    1Proof.Consider asequence{x1, x2, . . .}that is dense in S. For any m1, S = B xi, ,where Bi=1 m

    denotesaclosedball,andbycontinuityofmeasure,for largeenoughn(m),n(m) 1

    P S\ B xi,m 2m.

    i=1Ifwetake

    n(m) 1K= B xi,

    mm1 i=1

    then P(S\K)

    2m =.m1

    K isclosedandtotallyboundedbyconstruction.IfS iscomplete,K iscompact.Theorem 41 Suppose thateither (S, d) isseparableorP is tight.Then thefollowingareequivalent.

    1. Pn P.2. ForallfBL(S, d), f dPn f dP.3. (Pn,P) 0.4. (Pn,P) 0.

    71

  • 8/14/2019 Probability Theory PDF

    77/125

    Proof.1= 2.Obvious.3= 4.Infact,wewillprovethat

    (Pn

    ,P)

    2 (Pn

    ,P). (17.0.1)GivenaBorelsetAS,considerafunction 1

    f(x) = 0 1

    d(x, A) suchthat IA fIA.Obviously, ||f||BL 1 +1 andwecanwrite

    Pn(A) f dPn = f dP+ fdPn f dP P(A)+(1+1)sup f dPn f dP:||f||BL 1= P(A)+(1+1)(Pn,P)P(A) +,

    where =max(,(1+1)(Pn,P)). This implies that (Pn,P). Since is arbitrary we can minimize=()over.Ifwetake= then=max(, +) =+ and

    1 =2 ; 1 =12 .4= 1.Supposethat(Pn,P) 0whichmeansthatthereexistsasequencen 0suchthat

    Pn(A)P(An) +n forallmeasurableAS.IfAisclosed,then n1An =Aand,bycontinuityofmeasure,

    lim supPn(A)lim sup P(An) +n =P(A).n n

    Bytheportmanteautheorem,Pn P.2=3. If P is tight, let K be a compact such that P(S\K) . If (S, d) is separable, by Ulams

    theorem,letK beaclosedtotallyboundedsetsuchthatP(S\K).Ifweconsiderafunction 1 1f(x) = 0 1 d(x, K) with ||f||BL 1 +

    then

    Pn(K) f dPn fdPP(K)1,

    whichimpliesthatfornlargeenough,Pn(K)12.ThismeansthatallPn areessentiallyconcentratedonK.Let

    B= f :||f||BL(S,d) 1 , BK = f :fB C(K),K

    wherefK denotestherestrictionofftoK.IfKiscompactthen,bytheArzela-Ascolitheorem,BK istotallyboundedwithrespecttod.IfK istotallyboundedthenwecanisometricallyidentifyfunctionsinBK withtheir uniqueextensions to thecompletionK of K and, by theArzela-Ascolitheorem forthe compactK,BK isagaintotallyboundedwithrespecttod.Inanycase,given >0,wecanfindf1, . . . , f k B suchthatforallfB

    sup f(x)fj(x) forsomejk.xK| |

    This uniform approximation can also be extended to K. Namely, for any x K take y K such thatd(x, y).Then

    |f(x)fj(x)| |f(x)f(y)|+|f(y)fj(y)|+|fj(y)fj(x)| ||f||Ld(x, y) ++||fj||Ld(x, y)3.

    72

  • 8/14/2019 Probability Theory PDF

    78/125

    Therefore,foranyfB,+||f|| Pn(Kc) +P(Kc)f dPn f dP f dPn f dP

    K

    K

    + 2+f dPn f dPK

    K

    + 3+ 3+ 2+fjdPn fjdPK

    K

    + 3+ 3+ 3+ 2+fjdPnfjdPn fjdP

    fjdP+12.max

    1jkFinally,

    +12(Pn,P)=sup f dPn f dP fjdPn fjdPmax1jkf

    B

    and,usingassumption2,limsupn(Pn,P)12.Letting0finishestheproof.Convergence of empirical measures. Let (,P) be a probability space and X1, X2, . . . : S be ani.i.d.sequenceofrandomvariableswithvalues inametricspace(S, d).LetbethelawofXi on

    S.Letus

    definetherandomempiricalmeasuresn ontheBorel-algebraB onS byn

    ni=1

    1(A)() = I(Xi()A), A B.n

    Bythestrong lawoflargenumbers,foranyfCb(S),n

    n i=11

    f(Xi) Ef(X1) =f dn = f da.s.

    However,thesetofmeasurezerowherethisconvergenceisviolateddependsonf anditisnotobviousthattheconvergenceholdsforallfCb(S)withprobabilityone.Theorem 42 (Varadarajan) Let (S, d) be a separable metric space. Then n converges to weakly almostsurely,

    P :n( )() weakly = 1. Proof.Since(S, d)isseparable,byTheorem2.8.2inR.A.P.,thereexistsametriceonS suchthat(S, e)istotallyboundedandeandddefinethesametopology, i.e.e(sn, s) 0 ifandonly ifd(sn, s) 0.This,of course, means that Cb(S, d) =Cb(S, e) and weak convergence of measures does not change. If (T, e) is thecompletionof(S, e)then(T, e)iscompact.BytheArzela-Ascolitheorem,BL(T, e)isseparablewithrespecttothed normand,therefore,BL(S, e)isalsoseparable.Let(fm)beadensesubsetofBL(S, e).Then,bythestrong lawof largenumber,

    n

    fmdn = fm1 (Xi) Efm(X1) = fmda.s.n

    i=1Therefore,onthesetofprobabilityone,thesamesetofprobabilityone, fmf dn

    dn fm dforallm1.Since(fm)isdenseinBL(S, e),onf dforallfBL(S, e).Since(S, e)isseparable,theprevious

    theorem impliesthatn weakly.

    73

  • 8/14/2019 Probability Theory PDF

    79/125

    Section 18Convergence and uniform tightness.Inthissection,wewillmakeseveralconnectionsbetweenconvergenceofmeasuresanduniformtightnessongeneralmetricspaces,whicharesimilartotheresults intheEuclideansetting.First,wewillshowthat, insomesense,uniformtightnessisnecessaryforconvergenceoflaws.Theorem 43 IfPn P0 onS andeachPn is tightforn0, then(Pn)n0 isuniformly tight.Proof. Since Pn P0 and P0 is tight, by Theorem 41, the Levy-Prohorov metric (Pn,P0) 0. Given >0, letustakeacompactK suchthatP0(K)>1.Bydefinitionof,

    n1 0 :Pn(K)>1 0.

    By regularity of measure Pn, any measurable set A can be approximated by its closed subset F. Since Pnistight,wecanchooseacompactofmeasureclosetoone,and intersecting itwith theclosedsubsetF,wecanapproximateanysetAbyitscompactsubset.Therefore,thereexistsacompactKn K2a(n) suchthatPn(Kn)>1.Let

    L=K (n1Kn).Then Pn(L) Pn(Kn) > 1. It remains to show that L is compact. Consider a sequence (xn) on L.There are two possibilities. First, if there exists an infinite subsequence (xn(k)) that belongs to one of thecompactsKj thenithasaconvergingsubsubsequenceinKj andasaresultinL.Ifnot,thenthereexistsasubsequence(xn(k))suchthatxn(k) Km(k) andm(k) ask .Since

    Km(k) K2a(m(k))thereexistsyk K suchthat

    d(xn(k), yk)2a(m(k)).Since K iscompact,the sequenceyk K has a convergingsubsequence yk(r) yK which impliesthatd(xn(k(r)), y)0, i.e.xn(k(r)) yL.Therefore,L iscompact.WealreadyknowfromtheSelectionTheoreminSection8thatanyuniformlytightsequenceoflawsonanymetricspacehasaconvergingsubsequence.Underadditionalassumptionson(S, d)wecancomplementtheSelectionTheoremandmakesomeconnectionstothemetricsdefinedintheprevioussection.Theorem 44 Let (S, d) be a complete separable metric space and A be a subset of probability laws on S.Then thefollowingare equivalent.

    74

  • 8/14/2019 Probability Theory PDF

    80/125

    1. A isuniformly tight.2. Forany sequencePn A there exists a converging subsequencePn(k) P whereP isa lawonS.3. Ahasthecompactclosureonthespaceofprobability lawsequippedwiththeLevy-ProhorovorboundedLipschitzmetricsor.4. A is totallyboundedwithrespect to or.

    Remark.Implications1= 2= 3= 4holdwithoutcompletenessassumptionandtheonlyimplication wherecompletenesswillbeusedis4= 1.

    Proof. 1=2. Any sequence Pn A is uniformly tight and, by selection theorem, there exists aconvergingsubsequence.

    2= 3. Since (S, d) is separable, by Theorem 41, Pn P if and only if (Pn,P) or (Pn,P) 0. Everysequence intheclosureAcanbeapproximatedbyasequence inA.Thatsequencehasaconvergingsubsequencethat,obviously,convergestoanelementinAwhichmeansthattheclosureofAiscompact.

    3= 4.Compactsetsaretotallyboundedand,therefore,iftheclosureAiscompact,thesetAistotallybounded.

    4=1. Since 2, we will only deal with . For any > 0, there exists a finite subset B AsuchthatAB.Since(S, d)iscompleteandseparable,byUlamstheorem,foreachPB thereexistsacompactKP suchthatP(KP)>1.Therefore,

    KB = KP isacompactandP(KB)>1forallPB.PB

    Forany >0,letF beafinitesetsuchthatKB F (herewewilldenotebyF theclosed-neighborhoodofF).SinceAB,foranyQAthereexistsPB suchthat(Q,P)< and,therefore,

    1P(KB)P(F)Q(F2) +.Thus,1

    2

    Q(F2)forallQ

    A.Given >0,takem =/2m+1 andfindFm asabove,i.e.

    F/2m .1

    2m Q m /2m /2mThen Q m1Fm 1 m1 = 1. Finally, L= m1Fm is compact because it is closed2mandtotallyboundedbyconstruction,andS iscomplete.

    Corollary 5 (Prohorov) The set of laws on a complete separable metric space is complete with respect tometricsor.Proof.Ifasequenceof laws isCauchyw.r.t.or then it istotallybounded andbyprevioustheorem ithasaconvergingsubsequence.Obviously,Cauchysequencewillconvergetothesamelimit.Finally,letusstateasaresultthe ideawhichappearedinLemma19 inSection9.Lemma 36 Suppose that (Pn) is uniformly tight on a metric space (S, d).Suppose that all converging subsequences (Pn(k)) converge to the same limit, i.e. if Pn(k) P0 then P0 is independent of (n(k)). ThenPn P0.Proof.Anysubsequence(Pn(k)) is uniformlytightand, by theselectiontheorem, it has a convergingsub-subsequence(Pn(k(r)))whichhastoconvergetoP0.Lemma13 inSection8finishestheproof.This willbeveryuseful whenprovingconvergenceof lawsonmetricspaces,suchasC([0,1]), for example.If we can prove that (Pn) is uniformly tight and, assuming that a subsequence converges, can identify theunique limit,thenthesequencePn mustconvergetothesame limit.

    75

  • 8/14/2019 Probability Theory PDF

    81/125

    Section 19Strassens Theorem. Relationshipsbetween metrics.Metric for convergence in probability.Let(,B,P)beaprobabilityspace,(S, d)- ametricspaceandX, Y : S - randomvariableswithvaluesinS.Thequantity

    (X, Y)=inf{0 :P(d(X, Y)> )}is called the Ky Fan metric on the setL0(, S) of classes of equivalences of such random variables, wheretwor.v.sareequivalent iftheyareequala.s.Ifwetakeasequence

    k =(X, Y)thenP(d(X, Y)> k)k andsince

    I(d(X, Y)> k) I(d(X, Y)> ),

    bymonotoneconvergencetheorem,P(d(X, Y)> ).Thus,the infimum inthe definition of(X, Y) isattained.Lemma 37 isametric onL0(, S)whichmetrizesconvergence inprobability.Proof.Firstofall,clearly,(X, Y) = 0 iffX=Y almostsurely.Toprovethetriangleinequality,

    P(d(X, Z)> (X, Y) +(Y, Z)) P(d(X, Y)> (X, Y))+P(d(Y, Z)> (Y, Z)) (Y, Z) +(Y, Z)

    sothat(X, Z)(X, Y) +(Y, Z).Thisprovesthat isametric.Next, ifn =(Xn, X)0thenforany >0andlargeenoughnsuchthatn )P(d(Xn, X)> n)n 0.Conversely,ifXn X inprobabilitythenforanym1andlargeenoughnn(m), 1 1

    P d(Xn, X)>m m

    whichmeansthatn 1/msothatn 0.

    Lemma 38 For X, Y L0(, S), the Levy-Prohorov metric satisfies(L(X),L(Y))(X, Y).

    76

  • 8/14/2019 Probability Theory PDF

    82/125

    Proof.Take > (X, Y)sothatP(d(X, Y)).ForanysetAS,P(XA) =P(XA, d(X, Y)< ) +P(XA, d(X, Y))P(Y A) +

    whichmeansthat(L(X),L(Y)).Letting(X, Y)provestheresult.We will now prove that, in some sense, the opposite is also true. Let (S, d) be a metric space and P,Q beprobability laws on S. Suppose that these laws areclose in theLevy-Prohorov metric . Can we constructrandomvariabless1 ands2,withlawsPandQ,thataredefineonthesameprobabilityspaceandareclosetoeachotherintheKyFanmetric?WewillconstructadistributionontheproductspaceSSsuchthatthe coordinates s1 and s2 have marginal distributions P and Q and the distribution is concentrated in theneighborhoodofthediagonals1 =s2,wheres1 ands2 arecloseinmetricd,andthesizeoftheneighborhoodiscontrolledby(P,Q).

    ConsidertwosetsX andY.GivenasubsetKXY andAX wedefineaK-imageofAbyAK ={yY :xA,(x, y)K}.

    AK-matchingf ofX intoY isaone-to-onefunctionf :X Y suchthat(x, f(x))K.Wewillneedthefollowingwellknownmatchingtheorem. Theorem 45 IfX, Y arefiniteandforallAX,

    card(AK)card(A) (19.0.1)then thereexistsaK-matchingf of X intoY.Proof.Wewillprovetheresultbyinductiononm=card(X).Thecaseofm=1isobvious.ForeachxXthereexistsyY suchthat(x, y)K.Ifthereisamatchingf ofX\{x}intoY\{y}thendefiningf(x) =yextendsf toX.Ifnot,thensincecard(X\{x})

  • 8/14/2019 Probability Theory PDF

    83/125

    Remark.Condition(19.0.2)isarelaxationofthedefinitionoftheLevy-Prohorovmetric,onecantakeany, > (P,Q). Conditions 1 - 3 mean that we can construct a measure on SS such that coordinatesx, y havemarginaldistributionsP,Q,concentratedwithindistance+ofeachother(condition2)exceptfor

    the

    set

    of

    measure

    at

    most

    +

    (condition

    3).

    Proof.Theproofwillproceedinseveralsteps.Case A. We will start with the simplest case which is, however, at the core of everything else. Given

    small >0,taken1suchthatn>1.SupposethatlawsP,QareuniformonfinitesubsetsM, NS ofequalcardinality,

    1card(M)=card(N) =n, P(x) =Q(y) = < , xM, yN.

    nUsingcondition(19.0.2),wewouldliketomatchasmanypointsfromM andN aspossible,butonlypointsthatarewithindistancefromeachother.Tousethematchingtheorem,wewill introducesomeauxiliarysetsU andV thatarenottoobig,withsizecontrolledbyparameter,andtheunionofthesesetswithMandN satisfiesacertainmatchingcondition.

    Takeinteger

    k

    such

    that

    n

    k ) = 0, (SS)

    n n < +. (19.0.3)

    78

  • 8/14/2019 Probability Theory PDF

    84/125

    Finally,both and arefinitesumsofpointmasseswhichareproductmeasuresofpointmasses.CaseB.SupposenowthatPandQareconcentratedonfinitelymanypoin