group symmetry and covariance regularization - mit...

Group Symmetry and CovarianceRegularization

Parikshit ShahUniversity of Wisconsin

ThanksJoint work with Venkat Chandrasekaran

1 / 18

Motivation

I Symmetry is common inscience and engineering.

I Symmetry in statisticalmodels.

I How to exploit known groupstructure?

I Message: Symmetry-awaremethods provide hugestatistical and computationalgains.

2 / 18

Applications: MAR processes

x0 ⇠ N (0, I)

x1 = A0x0 + w1 x2 = A0x0 + w2

x3 = A1x1 + w3 x6 = A1x2 + w6

Important class of stochastic models for multi-scale processes,e.g. oceanography, computer vision.

I What is the covariance among the leaf nodes?I Symmetries: automorphism group of Td .I Formally: Σ invariant under action of: Z2 wr Z2 . . . wr Z2.I Can we exploit symmetries? Haar wavelet transform ...

3 / 18

Applications: Random FieldsI Physical phenomena: oceanography, hydrology,

electromagneticsI Poisson’s equation (stochastic input):

52φ(x) = f (x).

I Green’s function: covariance process R(x1, x2).I Symmetry: Laplacian, boundary conditions, R(x1, x2).

I Symmetry-preserving discretization.

4 / 18

Other Applications

I Partial exchangeability: Clinical Tests1. N patients, T groups of similar characteristics2. X1, . . . ,XN physiological responses3. Patients within same group exchangeable (but not i.i.d.)

I Cyclostationarity: periodic phenomena such as vibrations,sinusoidal components ...

We model symmetry of covariance Σ via G-invariance.

Problem statement: Given G infer information about Σ.

5 / 18

Group Theory: Basics

I Finite group G = (G, ◦)1. G - collection of

permutations on [p]2. ◦ - composition3. Closure under

compositionI Examples

1. Symmetric group: Sp.2. Cyclic group: Z

3. Cartesian products:G1 ×G2.

4. Other products:semi-direct, wreath.

6 / 18

Group Theory: Group Action

Let G be a finite group (of permutation matrices), and Rp×p+ be

PSD matrices. A group action is a map

A :G × Rp×p+ → Rp×p

+(Πg ,Σ

)7→ ΠgΣΠT

I G “acts on” matrices by permuting indices.

DefinitionΣ is G-invariant if

ΠgΣΠTg = Σ ∀Πg ∈ G.

I Formalizes notion of a symmetric model.I Fixed point subspace: WG =

{Σ : ΠgΣΠT

g = Σ ∀Πg ∈ G}

7 / 18

Fixed Point Subspace Projection

I Statistical model X ∼ N (0,Σ), Σ ∈ Rp×p.I Symmetry: Σ ∈WG.I Model Selection: Given i.i.d. samples X1, . . . ,Xn recover Σ.

Σn :=1n

XiX Ti

I High-D regime (n� p), Σn a poor estimate.I G-empirical covariance:

Σ̂ := PG (Σn) .

I Main contribution: statistical analysis of this estimator.

8 / 18

Fixed Point Subspace Projection

I Statistical model X ∼ N (0,Σ), Σ ∈ Rp×p.I Symmetry: Σ ∈WG.I Model Selection: Given i.i.d. samples X1, . . . ,Xn recover Σ.

Σn :=1n

XiX Ti

I High-D regime (n� p), Σn a poor estimate.I G-empirical covariance:

Σ̂ := PG (Σn) .

I Main contribution: statistical analysis of this estimator.

8 / 18

Fixed Point Projection: An Example

I MAR Process invariant w.r.t. Z2 wr Z2 . . . wr Z2.

1 2 3 4

a b c cb a c cc c a bc c b a

3775 T =

12 � 1p

12 � 1

2 0 1p2

12 � 1

2 0 � 1p2

T ⇤⌃T =

(a) (b)

I How to compute fixed-point subspace projection?I Use Haar wavelet transform T :

PG(Σn) = TD (T ∗ΣnT ) T ∗.

9 / 18

Statistical gains: Convergence in spectral norm

I ‖Σ− Σn‖ ≤ δ w.h.p. if n = O( pδ2

I However, ‖Σ− PG (Σn) ‖ ≤ δ w.h.p. if n = O(

log pδ2

G = cyclic, symmetric.I Proof: Fourier transform diagonalizes circulant matrices.

I How do we generalize?

10 / 18

Group Theory: Representation

I G-invariant matrices can be simultaneously blockdiagonalized.

T ∗MT =

M1 0. . .

0 M|I|

Bi 0. . .

I: (active) irreducible representationssi : dimension of Bimi : multiplicity of Bi

I Theorem: ‖Σ− PG (Σn) ‖ ≤ δ w.h.p. provided

n = O(

maxi∈I

miδ2 ,maxi∈I

log pmiδ2

11 / 18

Group Theory: Representation

I G-invariant matrices can be simultaneously blockdiagonalized.

T ∗MT =

M1 0. . .

0 M|I|

Bi 0. . .

I: (active) irreducible representationssi : dimension of Bimi : multiplicity of Bi

I Theorem: ‖Σ− PG (Σn) ‖ ≤ δ w.h.p. provided

n = O(

maxi∈I

miδ2 ,maxi∈I

log pmiδ2

11 / 18

Statistical gains: Convergence in `∞ norm

I ‖Σ− Σn‖`∞ ≤ O(√

log pn

I ‖Σ− PG (Σn) ‖`∞ ≤ O(√

log ppn

)for G = cyclic.

I Proof idea: Reynolds averaging

PG (Σn) =1|G|

g∈GΠgΣnΠT

⇒ Average over edge orbits.I For cyclic group edge orbits are of size p.

I How do we generalize?

12 / 18

Edge Orbit Parameters

I Combinatorial parameters:

The edge orbit of (i , j) is O(i , j) := {(g(i),g(j)) | g ∈ G}.The degree dij is the max. number of times any variableappears in O(i , j).

1. O := mini,j |O(i , j)|2. Od := mini,j

|O(i,j)|dij

.I Theorem: We have w.h.p. that

‖Σ− PG (Σn) ‖`∞ ≤ O(

{√log pnO ,

log pnOd

I Delicate issues: non-i.i.d. averaging, sample reuse.

13 / 18

Edge Orbit Parameters

I Combinatorial parameters:

The edge orbit of (i , j) is O(i , j) := {(g(i),g(j)) | g ∈ G}.The degree dij is the max. number of times any variableappears in O(i , j).

1. O := mini,j |O(i , j)|2. Od := mini,j

|O(i,j)|dij

.I Theorem: We have w.h.p. that

‖Σ− PG (Σn) ‖`∞ ≤ O(

{√log pnO ,

log pnOd

I Delicate issues: non-i.i.d. averaging, sample reuse.

13 / 18

Application: Covariance EstimationI Covariance Estimation: Bickel-Levina thresholding

Σ̂ := thresholdt (Σn) .

If Σ has at most d nonzeros per row/column,

‖Σ− Σ̂‖ <√

d2 log pn

w.h.p.

I Symmetry-aware thresholding: Consider G = cyclic

Σ̂G := thresholdt (PG (Σn)) .

‖Σ− Σ̂G‖ <√

d2 log ppn

w.h.p.

I Rates in previous slides give results for general groups.14 / 18

Application: Covariance EstimationI Covariance Estimation: Bickel-Levina thresholding

Σ̂ := thresholdt (Σn) .

‖Σ− Σ̂‖ <√

d2 log pn

w.h.p.

I Symmetry-aware thresholding: Consider G = cyclic

Σ̂G := thresholdt (PG (Σn)) .

‖Σ− Σ̂G‖ <√

d2 log ppn

w.h.p.

I Rates in previous slides give results for general groups.14 / 18

Application: Gaussian Graphical Model Selection

I Zeros of Σ−1 encode conditional independence relations.I `1-regularized log-l’hood [Yuan and Lin, Ravikumar et al.]:

Θ̂ := arg minΘ∈Sp

tr(ΣnΘ)− log det(Θ) + µn‖Θ‖`1 .

Θ̂, Σ−1 have same zero pattern w.h.p. if n = O(d2 log p

where d is degree of graph.

I If Σ is G-invariant for G = cyclic :

Θ̂G := arg minΘ∈Sp

++∩WG

Θ̂G, Σ−1 have same zero pattern w.h.p. if n = O(

d2 log pp

I Again, rates in previous slides⇒ scaling in general groups.

15 / 18

Application: Gaussian Graphical Model Selection

I Zeros of Σ−1 encode conditional independence relations.I `1-regularized log-l’hood [Yuan and Lin, Ravikumar et al.]:

Θ̂ := arg minΘ∈Sp

Θ̂, Σ−1 have same zero pattern w.h.p. if n = O(d2 log p

where d is degree of graph.

I If Σ is G-invariant for G = cyclic :

Θ̂G := arg minΘ∈Sp

++∩WG

Θ̂G, Σ−1 have same zero pattern w.h.p. if n = O(

d2 log pp

I Again, rates in previous slides⇒ scaling in general groups.

15 / 18

Computational Gains

I When T known PG(·) efficiently computable.I Exploiting symmetries in convex optimization:

If objective and constraint functions G-invariant, thensolution in fixed-point subspace.⇒ reduction in problem size.⇒ improved numerical conditioning.

I For example

arg minΘ∈Sp

++∩WG

tr(ΣnΘ)− log det(Θ) + µn‖Θ‖`1

16 / 18

Computational Gains

I For example

arg minΘ∈Sp

++∩WG

16 / 18

Computational Gains

I For example

arg minΘ∈Sp

++∩WG

16 / 18

ExperimentsGaussian model invariant with respect to cyclic group, p = 50.

0 100 200100

Number of Samples

0 100 20010−1

Number of Samples

tral n

0 100 20010−2

10−1

Number of Samples

PG(⌃n)

⌃n ⌃n

PG(⌃n) PG(⌃n)

Inverse covariance corresponding to a cycle graph, invariantwith respect to cyclic group, p = 50.

0 500 1000 1500 20000

number of samples

PG(⌃n) � log det⌃n � log det

17 / 18

Conclusion

I Statistical models with symmetries.I Fixed-point projection as means of regularization.I Improved rates for several model selection and estimation

tasks.I Computational benefits.I Current efforts: approximately symmetric models.

http://arxiv.org/abs/1111.7061

18 / 18

group symmetry and covariance regularization - mit...

Documents

average intensity of partially coherent lorentz beams in...

x x x x x 1 0 0 x 1 0 0 x 1 0 0 x - uma

c++0x: an overview - inria...6 c++ iso standardization •...

© · © 0 x 0 = 0 1 x 0 = 0 2 x 0 = 0 3 x 0 = 0 4 x 0 =...

powerpoint presentation · x 0x 5x 0x 5x 0x 5x 0x 0 0 0 0 0...

0- . $- /-x - lepra.org.uk · $)/ $)x x '$( / x2#$ #x //-...

warm up graph the following lines: (remember, a y= line is...

norton commonwealth of - virginia department of … page:...

0x ndudf xelmhql vsuhg funyh x Îdsomlql xelfd vh …

secondopiano 0x

yarrabee park displayopen home - alatalo bros · o ver -...

cpp 0x kimryungee

3as-maths-livre-prof-solutions-exercices · 2,9 4,9 3 0x x...

tiffparla/bb/d3/67/+9/0x/xx/x/n0 -...

vision 3d artificielle disparity maps,...

a gentle introduction to gaussian distribution. review...

final game game 2 game 1. 2 45 3 6 789 12 45 3 6 789 1x xx x...

github pages · 2020. 8. 26. · journal of machine...

s e c Ţ i u n e a 0x - x

stage 0x breast