two-moment inequalities for rényi entropy and mutual...
TRANSCRIPT
Two-Moment Inequalities for Renyi Entropyand Mutual Information
Galen Reeves
Department of ECE and Department of Statistical ScienceDuke University
ISIT, June 2017
Table of Contents
Motivation
InequalitiesA ‘half Jensen’ inequalityA ‘two-moment’ inequality
Renyi Entropy Bounds
Mutual Information BoundsMutual information and variance of conditional densityProperties of the bounds
Conclusion
2 / 25
How can we show mutual information is small?
1. Use I(X;Y ) = H(X)−H(X |Y )
2. Jensen’s inequality and Renyi divergence:
I(X;Y ) = D1(PX,Y ‖PXPY ) ≤ Dα(PX,Y ‖PXPY ), α > 1
3. A ‘half Jensen’ inequality and a ‘two-moment’ inequalityToday’s talk
3 / 25
Motivation: Conditional CLT for random projections
Consider Y = AX +√tN where A is IID Gaussian random
matrix and N is Gaussian perturbation. Let GY be Gaussiandistribution with same mean and covariance as Y .
PY ≈ GY CLT
PY |A(· |A) ≈ GY Conditional CLT
To prove entropic bounds (see Friday’s talk), we use
{I(A;Y ) ≈ 0} and CLT ⇐⇒ Conditional CLT
Challenges:
I tight bounds on h(Y ) and h(Y |A) are difficult
I p(Y |A)/p(Y ) can increase without bound as t→ 0
4 / 25
Table of Contents
Motivation
InequalitiesA ‘half Jensen’ inequalityA ‘two-moment’ inequality
Renyi Entropy Bounds
Mutual Information BoundsMutual information and variance of conditional densityProperties of the bounds
Conclusion
5 / 25
Jensen and Renyi divergence revisited
For random variables (X,Y ) ∼ p(x, y),
I(X;Y ) = E[logZ], Z =p(X,Y )
p(X)p(Y )
Jensen’s inequality
I(X;Y ) = E[logZ] =1
tE[logZt
]≤ 1
tlogE
[Zt]
This is the Renyi divergence of order α = 1 + t.
This approach is problematic if Z has heavy tails...
6 / 25
A ‘half Jensen’ inequality [R. 2017]
Start with ’half Jensen’ inequality
I(X;Y ) = E[logZ] ≤ E[log(E[Z |Y ])]
Conditional expectation of Z can be expressed in terms of thevariance of conditional density:
E[Z |Y = y] =
∫ [p(y |x)
p(y)
]2p(x) dx = 1 +
Var(p(y |X))
p(y)
For every 0 < t < 1, the inequality log(1 + u) ≤ 1tu
t yields,
I(X;Y ) ≤ 1
t
∫[p(y)]1−2t[Var(p(y |X))]t dy
7 / 25
Special cases of the ‘half Jensen’ inequality
I(X;Y ) ≤ 1
t
∫[p(y)]1−2t[Var(p(y |X))]t dy
The case t = 1 gives bound in terms of chi-square divergence
The case t = 1/2 gives bound in terms of standard deviation of theconditional density
I(X;Y ) ≤ 2
∫ √Var(p(y |X) dy
The case 0 < t < 1/2 combined with Holder’s inequality givesbound in terms of variance of the conditional density and Renyientropy of order r = (1− 2t)/(1− t)
I(X;Y ) ≤ 1
t
[exp(hr(Y ))
∫Var(p(y |X) dy
]tThese bounds depend on integrals of fractional powers.
8 / 25
A ‘two-moment’ inequality [R. 2017]
Proposition: For any numbers 0 < r < 1 and p < 1/r − 1 < qand non-negative function f defined on [0,∞), we have(∫
|f(x)|r dx
)1r
︸ ︷︷ ︸‖f‖r
≤ C(∫
‖x‖pf(x) dx︸ ︷︷ ︸p-th moment
)λ(∫‖x‖qf(x) dx︸ ︷︷ ︸q-th moment
)1−λ
where λ = (q + 1− 1/r)/(q − p).
Best possible constant is given by
C =
[1
(q − p)B
(rλ
1− r,r(1− λ)
1− r
)] 1−rr
with B(a, b) = B(a, b)(a+ b)a+ba−ab−b.
9 / 25
A ‘two-moment’ inequality [R. 2017]
Proposition: For any numbers 0 < r < 1 and p < 1/r − 1 < qand non-negative function f defined on S ⊆ Rn, we have(∫|f(x)|r dx
)1r
︸ ︷︷ ︸‖f‖r
≤ C(∫
‖x‖npf(x) dx︸ ︷︷ ︸np-th moment
)λ(∫‖x‖nqf(x) dx︸ ︷︷ ︸nq-th moment
)1−λ
where λ = (q + 1− 1/r)/(q − p).
Best possible constant is given by
C =
[Vol(Bn ∩ cone(S))
(q − p)B
(rλ
1− r,r(1− λ)
1− r
)] 1−rr
with B(a, b) = B(a, b)(a+ b)a+ba−ab−b.
10 / 25
Remarks on the ‘two-moment’ inequality
The proof is a straightforward consequence of Holder’s inequalityand integral representation of the Beta function.
For r = 1/2, Euler’s reflection formula for the Beta function leadsto simplified expression
C =πλ−λ(1− λ)−(1−λ)
(q − p) sin(πλ)
It is possible that variations of these inequalities exist in theliterature on weighted Lp-norm inequalities. So far, I have beenunable to find an explicit reference.
11 / 25
Table of Contents
Motivation
InequalitiesA ‘half Jensen’ inequalityA ‘two-moment’ inequality
Renyi Entropy Bounds
Mutual Information BoundsMutual information and variance of conditional densityProperties of the bounds
Conclusion
12 / 25
Renyi entropy
Let X have density p(x) on S ⊆ Rn. The Renyi entropy of orderr ∈ (0, 1) ∪ (1,∞) is defined according to
hr(X) =1
1− rlog
(∫S|p(x)|r dx
).
Properties:
I Decreasing in r
I Limit as r → 0 depends on volume of support
I Limit as r → 1 is Shannon entropy
13 / 25
Upper bound for Renyi entropy [R. 2017]
Proposition: For any numbers 0 < r < 1 and p < 1/r − 1 < qand density function p(x) on S ⊆ Rn, we have
hr(X) ≤ log C +rλ
1− rlogE[‖X‖np] +
r(1− λ)
1− rlogE[‖X‖nq]
with λ = (q + 1− 1/r)/(q − p).
Relationship to existing results:
I One-moment bound: Evaluating with p = 0 recoversentropy–moment inequalities of Costa et al. 2002 and Lutwak et
al. 2002. Bound is attained by a maximum entropy distribution.
I Two-moment bound: Evaluating with two carefully chosenmoments (p, q) can lead to significant improvements.
14 / 25
Example: Lognormal distribution
0 0.2 0.4 0.6 0.8 1
r
0
0.1
0.2
0.3
0.4
0.5
Gap betweenupper bound andRenyi entropy
optimaltwo-momentinequality
optimalone-momentinequality
σ2=10σ2=1 σ2=0.1
15 / 25
Example: Multivariate Gaussian distribution
5 10 15 20 25
n
0
0.2
0.4
0.6
0.8
1
Gap betweenupper bound andRenyi entropy(r = 0.2) optimal
two-momentinequality
optimalone-momentinequality
large n limit equals gapfor lognormal distribution
16 / 25
Table of Contents
Motivation
InequalitiesA ‘half Jensen’ inequalityA ‘two-moment’ inequality
Renyi Entropy Bounds
Mutual Information BoundsMutual information and variance of conditional densityProperties of the bounds
Conclusion
17 / 25
Variance of conditional density
Let (X,Y ) be a random pair such that the conditional distributionof Y given X has density p(y |x) on Rn.
Variance of conditional density defined by
Var(f(y |X)) = E[|p(y |X)− p(y)|2
], X ∼ PX
Define s-th moment of the variance:
Vs(Y |X) =
∫‖y‖s Var(p(y |X)) dy.
Note that Vs(Y |X) is nonnegative and equal to zero if and only ifX and Y are independent.
18 / 25
A mutual information bound [R. 2017]
Combining the ‘half Jensen’ and ’two-moment’ inequalities yields:
I(X;Y ) ≤ Cλ
√ω(S)V λ
np(Y |X)V 1−λnq (Y |X)
(q − p), q < 1 < p
where λ = (q − 1)/(q − p) and ω(S) = Vol(Bn ∩ cone(S))
19 / 25
Useful properties of Vs(Y |X)
The s-th moment of the variance can be expressed as
Vs(Y |X) = E[Ks(X,X)−Ks(X1, X2)],
where X1 and X2 are independent and
Ks(x1, x2) =
∫‖y‖sf(y|x1)f(y|x2) dy
is a p.d. kernel that does not depend on the distribution of X.
If U → X → Y forms a Markov chain then
Vs(Y |U) = E[Ks(X
′1, X
′2)−Ks(X1, X2)
]where X ′1 and X ′2 are conditionally independent given U
20 / 25
Illustrative example
Consider Markov chain U → X → Y given by
X | U ∼ N (0, U), Y | X ∼ N (X, 1).
Then
Ks(x1, x2) = 2−1+s2 E[∣∣∣∣N (0, 1) +
x1 + x2√2
∣∣∣∣s]φ(x1 − x2√2
),
and
Vs(Y |U) =Γ(1+s2 )
2πE
[(1 + U)
s−12 − (1 + U1)
s2 (1 + U2)
s2
(1 + 12(U1+U2))
s+12
],
where (U1, U2) are independent copies of U .
21 / 25
Illustrative example
0 0.1 0.2 0.3 0.4 0.5
ǫ
0
0.2
0.4
0.6
0.8
1
new upper boundchi-squaredivergence
I(U ;Y )
22 / 25
Table of Contents
Motivation
InequalitiesA ‘half Jensen’ inequalityA ‘two-moment’ inequality
Renyi Entropy Bounds
Mutual Information BoundsMutual information and variance of conditional densityProperties of the bounds
Conclusion
23 / 25
Conclusion
I Bounds on mutual information using a ‘half Jensen’ inequalityand a ‘two-moment’ inequality.
I Two carefully chosen moments can lead to significantimprovements for Renyi entropy.
I New measure of dependence with interesting properties.
I One application is precise convergence rates for entropicconditional central limit theorem [R. 2017]
24 / 25
References I
G. Reeves, “Two-moment inequailties for Renyi entropy and mutualinformation,” 2017, [Online]. Available: https://arxiv.org/abs/1702.07302.
——, “Two-moment inequailties for Renyi entropy and mutual information,” inProc. IEEE Int. Symp. Inform. Theory, Aachen, Germany, Jun. 2017.
——, “Conditional central limit theorems for Gaussian projections,” Dec. 2016,[Online]. Available: https://arxiv.org/abs/1612.09252.
——, “Conditional central limit theorems for Gaussian projections,” in Proc.IEEE Int. Symp. Inform. Theory, Aachen, Germany, Jun. 2017.
25 / 25