two-moment inequalities for rényi entropy and mutual...

Two-Moment Inequalities for Renyi Entropyand Mutual Information

Galen Reeves

Department of ECE and Department of Statistical ScienceDuke University

ISIT, June 2017

Table of Contents

Motivation

InequalitiesA ‘half Jensen’ inequalityA ‘two-moment’ inequality

Renyi Entropy Bounds

Mutual Information BoundsMutual information and variance of conditional densityProperties of the bounds

Conclusion

2 / 25

How can we show mutual information is small?

1. Use I(X;Y ) = H(X)−H(X |Y )

2. Jensen’s inequality and Renyi divergence:

I(X;Y ) = D1(PX,Y ‖PXPY ) ≤ Dα(PX,Y ‖PXPY ), α > 1

3. A ‘half Jensen’ inequality and a ‘two-moment’ inequalityToday’s talk

3 / 25

Motivation: Conditional CLT for random projections

Consider Y = AX +√tN where A is IID Gaussian random

matrix and N is Gaussian perturbation. Let GY be Gaussiandistribution with same mean and covariance as Y .

PY ≈ GY CLT

PY |A(· |A) ≈ GY Conditional CLT

To prove entropic bounds (see Friday’s talk), we use

{I(A;Y ) ≈ 0} and CLT ⇐⇒ Conditional CLT

Challenges:

I tight bounds on h(Y ) and h(Y |A) are difficult

I p(Y |A)/p(Y ) can increase without bound as t→ 0

4 / 25

Table of Contents

Motivation




Conclusion

5 / 25

Jensen and Renyi divergence revisited

For random variables (X,Y ) ∼ p(x, y),

I(X;Y ) = E[logZ], Z =p(X,Y )

p(X)p(Y )

Jensen’s inequality

I(X;Y ) = E[logZ] =1

tE[logZt

]≤ 1

tlogE

[Zt]

This is the Renyi divergence of order α = 1 + t.

This approach is problematic if Z has heavy tails...

6 / 25

A ‘half Jensen’ inequality [R. 2017]

Start with ’half Jensen’ inequality

I(X;Y ) = E[logZ] ≤ E[log(E[Z |Y ])]

Conditional expectation of Z can be expressed in terms of thevariance of conditional density:

E[Z |Y = y] =

∫ [p(y |x)

p(y)

]2p(x) dx = 1 +

Var(p(y |X))

p(y)

For every 0 < t < 1, the inequality log(1 + u) ≤ 1tu

t yields,

I(X;Y ) ≤ 1

t

∫[p(y)]1−2t[Var(p(y |X))]t dy

7 / 25

Special cases of the ‘half Jensen’ inequality

I(X;Y ) ≤ 1

t

∫[p(y)]1−2t[Var(p(y |X))]t dy

The case t = 1 gives bound in terms of chi-square divergence

The case t = 1/2 gives bound in terms of standard deviation of theconditional density

I(X;Y ) ≤ 2

∫ √Var(p(y |X) dy

The case 0 < t < 1/2 combined with Holder’s inequality givesbound in terms of variance of the conditional density and Renyientropy of order r = (1− 2t)/(1− t)

I(X;Y ) ≤ 1

t

[exp(hr(Y ))

∫Var(p(y |X) dy

]tThese bounds depend on integrals of fractional powers.

8 / 25

A ‘two-moment’ inequality [R. 2017]

Proposition: For any numbers 0 < r < 1 and p < 1/r − 1 < qand non-negative function f defined on [0,∞), we have(∫

|f(x)|r dx

)1r

︸︷︷︸‖f‖r

≤ C(∫

‖x‖pf(x) dx︸︷︷︸p-th moment

)λ(∫‖x‖qf(x) dx︸︷︷︸q-th moment

)1−λ

where λ = (q + 1− 1/r)/(q − p).

Best possible constant is given by

C =

[1

(q − p)B

(rλ

1− r,r(1− λ)

1− r

)] 1−rr

with B(a, b) = B(a, b)(a+ b)a+ba−ab−b.

9 / 25

A ‘two-moment’ inequality [R. 2017]

Proposition: For any numbers 0 < r < 1 and p < 1/r − 1 < qand non-negative function f defined on S ⊆ Rn, we have(∫|f(x)|r dx

)1r

︸︷︷︸‖f‖r

≤ C(∫

‖x‖npf(x) dx︸︷︷︸np-th moment

)λ(∫‖x‖nqf(x) dx︸︷︷︸nq-th moment

)1−λ

where λ = (q + 1− 1/r)/(q − p).

Best possible constant is given by

C =

[Vol(Bn ∩ cone(S))

(q − p)B

(rλ

1− r,r(1− λ)

1− r

)] 1−rr

with B(a, b) = B(a, b)(a+ b)a+ba−ab−b.

10 / 25

Remarks on the ‘two-moment’ inequality

The proof is a straightforward consequence of Holder’s inequalityand integral representation of the Beta function.

For r = 1/2, Euler’s reflection formula for the Beta function leadsto simplified expression

C =πλ−λ(1− λ)−(1−λ)

(q − p) sin(πλ)

It is possible that variations of these inequalities exist in theliterature on weighted Lp-norm inequalities. So far, I have beenunable to find an explicit reference.

11 / 25

Table of Contents

Motivation




Conclusion

12 / 25

Renyi entropy

Let X have density p(x) on S ⊆ Rn. The Renyi entropy of orderr ∈ (0, 1) ∪ (1,∞) is defined according to

hr(X) =1

1− rlog

(∫S|p(x)|r dx

).

Properties:

I Decreasing in r

I Limit as r → 0 depends on volume of support

I Limit as r → 1 is Shannon entropy

13 / 25

Upper bound for Renyi entropy [R. 2017]

Proposition: For any numbers 0 < r < 1 and p < 1/r − 1 < qand density function p(x) on S ⊆ Rn, we have

hr(X) ≤ log C +rλ

1− rlogE[‖X‖np] +

r(1− λ)

1− rlogE[‖X‖nq]

with λ = (q + 1− 1/r)/(q − p).

Relationship to existing results:

I One-moment bound: Evaluating with p = 0 recoversentropy–moment inequalities of Costa et al. 2002 and Lutwak et

al. 2002. Bound is attained by a maximum entropy distribution.

I Two-moment bound: Evaluating with two carefully chosenmoments (p, q) can lead to significant improvements.

14 / 25

Example: Lognormal distribution

0 0.2 0.4 0.6 0.8 1

r

0

0.1

0.2

0.3

0.4

0.5

Gap betweenupper bound andRenyi entropy

optimaltwo-momentinequality

optimalone-momentinequality

σ2=10σ2=1 σ2=0.1

15 / 25

Example: Multivariate Gaussian distribution

5 10 15 20 25

n

0

0.2

0.4

0.6

0.8

1

Gap betweenupper bound andRenyi entropy(r = 0.2) optimal

two-momentinequality

optimalone-momentinequality

large n limit equals gapfor lognormal distribution

16 / 25

Table of Contents

Motivation




Conclusion

17 / 25

Variance of conditional density

Let (X,Y ) be a random pair such that the conditional distributionof Y given X has density p(y |x) on Rn.

Variance of conditional density defined by

Var(f(y |X)) = E[|p(y |X)− p(y)|2

], X ∼ PX

Define s-th moment of the variance:

Vs(Y |X) =

∫‖y‖s Var(p(y |X)) dy.

Note that Vs(Y |X) is nonnegative and equal to zero if and only ifX and Y are independent.

18 / 25

A mutual information bound [R. 2017]

Combining the ‘half Jensen’ and ’two-moment’ inequalities yields:

I(X;Y ) ≤ Cλ

√ω(S)V λ

np(Y |X)V 1−λnq (Y |X)

(q − p), q < 1 < p

where λ = (q − 1)/(q − p) and ω(S) = Vol(Bn ∩ cone(S))

19 / 25

Useful properties of Vs(Y |X)

The s-th moment of the variance can be expressed as

Vs(Y |X) = E[Ks(X,X)−Ks(X1, X2)],

where X1 and X2 are independent and

Ks(x1, x2) =

∫‖y‖sf(y|x1)f(y|x2) dy

is a p.d. kernel that does not depend on the distribution of X.

If U → X → Y forms a Markov chain then

Vs(Y |U) = E[Ks(X

′1, X

′2)−Ks(X1, X2)

]where X ′1 and X ′2 are conditionally independent given U

20 / 25

Illustrative example

Consider Markov chain U → X → Y given by

X | U ∼ N (0, U), Y | X ∼ N (X, 1).

Then

Ks(x1, x2) = 2−1+s2 E[∣∣∣∣N (0, 1) +

x1 + x2√2

∣∣∣∣s]φ(x1 − x2√2

),

and

Vs(Y |U) =Γ(1+s2 )

2πE

[(1 + U)

s−12 − (1 + U1)

s2 (1 + U2)

s2

(1 + 12(U1+U2))

s+12

],

where (U1, U2) are independent copies of U .

21 / 25

Illustrative example

0 0.1 0.2 0.3 0.4 0.5

ǫ

0

0.2

0.4

0.6

0.8

1

new upper boundchi-squaredivergence

I(U ;Y )

22 / 25

Table of Contents

Motivation




Conclusion

23 / 25

Conclusion

I Bounds on mutual information using a ‘half Jensen’ inequalityand a ‘two-moment’ inequality.

I Two carefully chosen moments can lead to significantimprovements for Renyi entropy.

I New measure of dependence with interesting properties.

I One application is precise convergence rates for entropicconditional central limit theorem [R. 2017]

24 / 25

References I

G. Reeves, “Two-moment inequailties for Renyi entropy and mutualinformation,” 2017, [Online]. Available: https://arxiv.org/abs/1702.07302.

——, “Two-moment inequailties for Renyi entropy and mutual information,” inProc. IEEE Int. Symp. Inform. Theory, Aachen, Germany, Jun. 2017.

——, “Conditional central limit theorems for Gaussian projections,” Dec. 2016,[Online]. Available: https://arxiv.org/abs/1612.09252.

——, “Conditional central limit theorems for Gaussian projections,” in Proc.IEEE Int. Symp. Inform. Theory, Aachen, Germany, Jun. 2017.

25 / 25

https://arxiv.org/abs/1702.07302

https://arxiv.org/abs/1612.09252

two-moment inequalities for rényi entropy and mutual...

Documents