sandeep kumar and prof. daniel p. palomar the hong kong ... · sandeep kumar and prof. daniel p....

79
Optimization Methods for Graph Learning Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong University of Science and Technology ELEC5470/IEDA6100A - Convex Optimization Fall 2019-20

Upload: others

Post on 11-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong ... · Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong University of Science and Technology ELEC5470/IEDA6100A-ConvexOptimization

Optimization Methods for Graph Learning

Sandeep Kumar and Prof. Daniel P. Palomar

The Hong Kong University of Science and Technology

ELEC5470/IEDA6100A - Convex OptimizationFall 2019-20

Page 2: Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong ... · Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong University of Science and Technology ELEC5470/IEDA6100A-ConvexOptimization

1/78

Outline

1 Graphical modeling

2 Types of graphical models

3 Physically inspired model: graphs from smooth signals

4 Probabilistic graphical model: GMRF

5 Structured graph learning via spectral constraints

6 Experiments

Page 3: Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong ... · Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong University of Science and Technology ELEC5470/IEDA6100A-ConvexOptimization

2/78

Outline

1 Graphical modeling

2 Types of graphical models

3 Physically inspired model: graphs from smooth signals

4 Probabilistic graphical model: GMRF

5 Structured graph learning via spectral constraints

6 Experiments

Page 4: Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong ... · Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong University of Science and Technology ELEC5470/IEDA6100A-ConvexOptimization

3/78

Graphical models

Representing knowledge through graphical models

x1

x2 x3

x4

x5x6

x7

x8

x9

I Nodes correspond to the entities (variables).

I Edges encode the relationships between entities (dependencies betweenthe variables).

Page 5: Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong ... · Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong University of Science and Technology ELEC5470/IEDA6100A-ConvexOptimization

4/78

Examples

Learning relational dependencies among entities benefits numerousapplication domains.

Figure 1: Financial Graph

Objective: To infer inter-dependencies offinancial companies.Input xi is the economic indices (stockprice, volume, etc.) of each entity.

Figure 2: Social Graph

Objective: To model behavioral similarity/influence between people.Input xi is the individual online activities(tagging, liking, purchasing).

Page 6: Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong ... · Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong University of Science and Technology ELEC5470/IEDA6100A-ConvexOptimization

5/78

Graphical model importance

I Graphs are intuitive way of representing and visualising therelationships between entities.

I Graphs allow us to abstract out the conditional independencerelationships between the variables from the details of their parametricforms. Thus we can answer questions like: “Is x1 dependent on x6

given that we know the value of x8?” just by looking at the graph.

I Graphs are widely used in a variety of applications in machine learning,graph CNN, graph signal processing, etc.

I Graphs offer a language through which different disciplines canseamlessly interact with each other.

I Graph-based approaches with big data and machine learning are drivingthe current research frontiers.

Graphical Models = Statistics × Graph Theory × Optimization × Engineering

Page 7: Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong ... · Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong University of Science and Technology ELEC5470/IEDA6100A-ConvexOptimization

6/78

How to learn a graph?

Graphical models are about having a graph representation that can encoderelationships between entities.

In many cases, the relationships between entities are straightforward:I Are two people friends in a social network?I Are two researchers co-authors in a published paper?

In many other cases, relationships are not known and must be learned:I Does one gene regulate the expression of others?I Which drug alters the pharmacologic effect of another drug?

The choice of graph representation affects the subsequent analysis andeventually the performance of any graph-based algorithm.

The goal is to learn a graph representation of data with specific properties(e.g., structures).

Page 8: Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong ... · Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong University of Science and Technology ELEC5470/IEDA6100A-ConvexOptimization

7/78

Schematic of graph learning

I Given a data matrix X ∈ Rn×p = [x1,x2, . . . ,xp], each columnxi ∈ Rn is assumed to reside on one of the p nodes and each of the nrows of X is a signal (or feature) on the same graph.

I The goal is to obtain a graph representation of the data.

w15w16

w18

w19

w23

w24

w25

w26

w29

w37

w38

w47

w49

w56

w57

w59

w67

w98

x1

x2 x3

x4

x5x6

x7

x8

x9

Graph is a simple mathematical structure of form G = (V, E ,W), whereI V contains the set of nodes V = 1, 2, 3, . . . , p, andI E = (1, 2), (1, 3), . . . , (i, j), . . . , (p, p− 1) contains the set of edges

between any pair of nodes (i, j).I Weight matrix W encode the relationships strength.

Page 9: Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong ... · Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong University of Science and Technology ELEC5470/IEDA6100A-ConvexOptimization

8/78

Outline

1 Graphical modeling

2 Types of graphical models

3 Physically inspired model: graphs from smooth signals

4 Probabilistic graphical model: GMRF

5 Structured graph learning via spectral constraints

6 Experiments

Page 10: Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong ... · Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong University of Science and Technology ELEC5470/IEDA6100A-ConvexOptimization

9/78

Graph and its matrix representation

Connectivity matrix C, Adjacency matrix W , and Laplacian matrix L.

[C]ij =

1 if (i, j) ∈ E0 if (i, j) /∈ E0 if i = j

[W]ij =

wij if (i, j) ∈ E0 if (i, j) /∈ E0 if i = j

[L]ij =

−wij if (i, j) ∈ E0 if (i, j) /∈ E∑pj=1 wij if i = j

Example: V = 1, 2, 3, 4, E = (1, 2), (1, 3), (2, 3), (2, 4)W = 2, 2, 3, 1.

C =

0 1 1 01 0 1 11 1 0 00 1 0 0

, W =

0 2 2 02 0 3 12 3 0 00 1 0 0

, L =

4 −2 −2 0−2 6 −3 −1−2 −3 5 00 −1 0 1

Page 11: Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong ... · Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong University of Science and Technology ELEC5470/IEDA6100A-ConvexOptimization

10/78

Graph matrices

I The adjacency matrix W = [wij ] and the Laplacian matrix L bothrepresent the same weighted graph and are related:

L = D−W,

where D = Diag(W · 1) is the diagonal matrix, whose (i, i)-element isthe sum of W′s i-th row (Lij = −wij∀i 6= j, Lii =

∑i6=j wij).

I L and W have different mathematical properties, e.g., L is PSD, whileW is not.

I A valid set for the adjacency matrix W:

W =W ∈ Rp×p|Wij = Wji ≥ 0, i 6= j, Wii = 0

I A valid set for the Laplacian matrix L:

L =

L ∈ Rp×p|Lij = Lji ≤ 0, i 6= j, Lii = −∑i 6=j

Lij

Page 12: Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong ... · Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong University of Science and Technology ELEC5470/IEDA6100A-ConvexOptimization

11/78

Types of graphical models

I Models encoding direct dependencies: simple and intuitive.I Sample correlation based graph: two entities i and j are connected if

the pairwise coorelation is greater than certain threshold (ρij > α,) andvice versa.

I Similarity function (e.g., Gaussian RBF) based graph:[W]ij = exp(−‖xi − xj‖22 /2σ

2), if i 6= 0, and Wii = 0. Here, thescaling parameter σ2 controls how rapidly the affinity [W]ij falls offwith distances between xi and xj .

I Models based on some assumption on the data: X ∼ F(G)I Statistical models: F represents a distribution by G (e.g., Markov model

and Bayesian model).

I Physically-inspired models: F represents generative model on G (e.g.,diffusion process on graphs, smooth signals defined over graphs).

Page 13: Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong ... · Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong University of Science and Technology ELEC5470/IEDA6100A-ConvexOptimization

12/78

Outline

1 Graphical modeling

2 Types of graphical models

3 Physically inspired model: graphs from smooth signals

4 Probabilistic graphical model: GMRF

5 Structured graph learning via spectral constraints

6 Experiments

Page 14: Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong ... · Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong University of Science and Technology ELEC5470/IEDA6100A-ConvexOptimization

13/78

Graph learning from smooth signals

Quantifying smoothness:

tr(XLX>

)=∑i,j

wij ‖xi − xj‖22 .

A smaller tr(X>LX

)indicating a smoother signal X over the graph G,

where L is its Laplacian matrix.When a graph G is not already available, we can learn it directly from thedata X by finding the graph weights that minimize the smoothness termcombined with some regularization term:

L := arg minL

tr(XLX>

)+ λh(L),

I Smaller distance ‖xi − xj‖22 between data points xi and xj will forceto learn a graph with larger affinity value wij , and vice versa.

I Higher value of weight wij will imply the features xi and xj are similar,and hence, strongly connected.

I h(L) is a regularization function (e.g., ‖L‖1 , ‖L‖2F , log det(L))

Page 15: Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong ... · Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong University of Science and Technology ELEC5470/IEDA6100A-ConvexOptimization

14/78

Ex 1. Learning graphs under constraints

Variable: W = [w1,w2, . . . ,wp], where wi is the vector containningweights of the edges connected to node i: wi ∈ Rp = [wi1, wi2, . . . , wip].It is often desired to learn a graph with additional properties:I bounded weights: (0 ≤ wij ≤ 1),I edge weights per node sum up to one:

∑pj=1 wij = w>1 = 1,

I bounded energy:∑pj=1 w

2ij .

Problem formulation:

minimizewijpi,j=1

∑i,j wij ‖xi − xj‖22 + λ

2

∑pj=1 w

2ij ,

subject to∑pj=1 wij = 1, wij ≥ 0, wii = 0 ∀, i, j.

This a convex optimization problem, we will obtain a closed-form updaterule.

Page 16: Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong ... · Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong University of Science and Technology ELEC5470/IEDA6100A-ConvexOptimization

15/78

Contd...

I Define eij = ‖xi − xj‖22, and ei as a vector with the j-th element aseij .

I For wi, we have the following problem:

minimizewi

12

∥∥wi + eiλ

∥∥2

2,

subject to w>i 1 = 1, wij ≥ 0, wii = 0.

I The Lagrangian function of this problem is

L(wi, ηi,βi) =1

2

∥∥∥wi +eiλ

∥∥∥2

2− ηi(w>i 1− 1)− β>i wi

where ηi > 0 and βi ∈ Rp are the Lagrangian multipliers, whereβij ≥ 0 ∀ i 6= j.

Page 17: Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong ... · Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong University of Science and Technology ELEC5470/IEDA6100A-ConvexOptimization

16/78

KKT optimality

The optimal solution wi should satisfy that the derivative of the Lagrangianw.r.t wi is equal to zero, so we have

wi +eiλ− ηi1− βi = 0

Then for the j-th element of wi, we have

wij +eijλ− ηi − βij = 0

Noting that wijβij = 0 (complementary slackness), then from the KKTcondition:

wij =(−eijλ

+ ηi

)+

, where (a)+ = max(a, 0).

ηi is found to satisfy the constraint w>i 1 = 1 ∀ i = 1, . . . , p.Additional goal: sparsity in the graph. How do we enforce sparsity?i) Use sparsity enforcing regularizer, e.g., `1-norm.ii) Chose λ such that each node has exactly m << p neighbors, i.e.,‖wi‖0 = m.We will explore the second path for enforcing sparsity.

Page 18: Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong ... · Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong University of Science and Technology ELEC5470/IEDA6100A-ConvexOptimization

17/78

Sparsity by chosing λi: number of neighbors ‖wi‖0 = m

Without loss of generality, suppose ei1, ei2, . . . , ein are ordered in increasingorder. By design, we have wii = 0.Constraint ‖wi‖0 = m implies wim > 0 and wi,m+1 = 0. Therefore, we have

−eimλi

+ 2ηi > 0, and − ei,m+1

λi+ 2ηi ≤ 0

Combining the solution of w with the constraint w>1 = 1, we have

m∑j=1

(−eimλi

+ 2ηi

)= 1 =⇒ ηi =

1

m+

1

2mλi

m∑j=1

eij .

This leads to following inequality for λi:

m

2eim −

1

2

m∑j=1

eij < λi ≤m

2ei,m+1 −

1

2

m∑j=1

eij

Page 19: Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong ... · Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong University of Science and Technology ELEC5470/IEDA6100A-ConvexOptimization

18/78

Contd...

Therefore, to obtain an optimal solution wi with exactly m nonzero values(‖wi‖0 = m), the maximal λi is

λi =m

2ei,m+1 −

1

2

m∑j=1

eij

Combining the previous results, the optimal wiji6=j can be obtained asfollows:

wij =

ei,m+1−eij

mei,m+1−∑mh=1 eih

, j ≤ m0, j > m

Page 20: Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong ... · Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong University of Science and Technology ELEC5470/IEDA6100A-ConvexOptimization

19/78

Ex 2. Graph based clustering

Goal of graph based clustering: Given an initial connected graph W inferthe k-component graph S.

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

(iii) W

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

(iv) S

(v) Initial graph (vi) Desired graph

Page 21: Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong ... · Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong University of Science and Technology ELEC5470/IEDA6100A-ConvexOptimization

20/78

Eigenvalue property of Laplacian matrix L

L = U>Diag(λ1, λ2, · · · , λp)Uwhere U are the eigenvectors, and [λ1, λ2, . . . , λp] are the eigenvalues in the increasingorder and have the following property:

λj = 0kj=1, c1 ≤ λk+1 ≤ · · · ≤ λp ≤ c2,

where the k zero eigenvalues denote the number of components.

1

2 3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25 26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47 48

49

50

51

52

53

54

55

56

57

58

59

60

0 10 20 30 40 50 60Number of nodes(eigenvalues)

0

2

4

6

8

10

12

14

Eig

enva

lues

of L

apla

cian

Mat

rix

3 zero eigenavlues of 3 component graph Laplacian

Page 22: Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong ... · Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong University of Science and Technology ELEC5470/IEDA6100A-ConvexOptimization

21/78

Constrained Laplacian Rank (CLR) for clustering

I Compute from data W as [W]ij = exp(−‖xi − xj‖22 /2σ2). W may

not exhibit the true component structure.I Goal: To infer S from W such that S captures the true component

structure, i.e., k-component structure.I Let Ls = Diag(S1)− S denote the Laplacian for S.

CLR problem formulation:

minimizeS=[s1,..,sp]

‖S−W‖2F ,

subject to s>i 1 = 1, si ≥ 0, sii = 0.rank(Ls) = p− k

I Let λ(Ls) = 0 ≤ λ1 ≤ λ2 ≤ · · · ≤ λp denote the eigenvalues of Ls inthe increasing order.

I rank(Ls) = p− k =⇒ λ(Ls) =λi = 0ki=1, λi > 0pi=k+1

Page 23: Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong ... · Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong University of Science and Technology ELEC5470/IEDA6100A-ConvexOptimization

22/78

CLR problem reformulation

minimizeS=[s1,..,sp]

‖S−W‖2F + β∑ki=1 λi(Ls),

subject to s>i 1 = 1, si ≥ 0, sii = 0.

For sufficiently large β, the optimal solution S to this problem will make thesecond term

∑ki=1 λi(Ls) equal to zero.

Ky Fan’s Theorem [Fan, 1949]:

minimizeλi(Ls)ki=1

k∑i=1

λi(Ls) = minimizeF∈Rp×k

tr(F>LsF,

)subject to F>F = I.

Using the Ky Fan’s Theorem, we can force the first k eigenvalues to zero viathe following formulation:

minimizeS,F∈Rp×k

‖S−W‖2F + βtr(F>LsF

),

subject to s>i 1 = 1, si ≥ 0, sii = 0,F>F = I.

Page 24: Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong ... · Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong University of Science and Technology ELEC5470/IEDA6100A-ConvexOptimization

23/78

Solving for F and S alternately

I Sub-problem for F :

minimizeF∈Rp×k

tr(F>LsF,

)subject to F>F = I.

I The optimal solution of F is formed by the k eigenvectors of Lscorresponding to the k smallest eigenvalues.

I Sub-problem for S:

minimizesijpi,j=1

∑pi,j ‖sij − wij‖

22 + β

∑pi,j ‖fi − fj‖22 sij .

subject to s>i 1 = 1, si ≥ 0, sii = 0.

I Note that the problem is independent for different i, so we can solvethe following problem separately for each i:

minimizes>i 1=1,si≥0

n∑j

‖sij − wij‖22 + β

n∑i,j

‖fi − fj‖22 sij .

minimizes>i 1=1,si≥0

∥∥∥∥si − (wi −β

2vi

)∥∥∥∥2

2

,(vij = ‖fi − fj‖22

).

I We have seen this problem before, which has a closed-form solution.

Page 25: Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong ... · Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong University of Science and Technology ELEC5470/IEDA6100A-ConvexOptimization

24/78

Outline

1 Graphical modeling

2 Types of graphical models

3 Physically inspired model: graphs from smooth signals

4 Probabilistic graphical model: GMRF

5 Structured graph learning via spectral constraints

6 Experiments

Page 26: Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong ... · Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong University of Science and Technology ELEC5470/IEDA6100A-ConvexOptimization

25/78

Gaussian Markov random field (GMRF)

A random vector x = (x1, x2, . . . , xp)> is called a GMRF with parameters

(0, Θ), if its density follows:

p(x) = (2π)(−p/2)(det(Θ))12 exp

(−1

2(x>Θx)

).

The nonzero pattern of Θ determines a conditional graph G = (V, E) :

Θij 6= 0 ⇐⇒ i, j ∈ E ∀ i 6= j

xi ⊥ xj |x/(xi, xj) ⇐⇒ Θij = 0

I For a Gaussian distributed data x ∼ N (0,Θ†) the graph learning issimply an inverse covariance (precision) matrix estimation problem[Lauritzen, 1996].

I If the rank(Θ) < p then x is called an improper GMRF (IGMRF)[Rue and Held, 2005].

I If Θij ≤ 0 ∀ i 6= j then x is called an attractive improper GMRF[Slawski and Hein, 2015].

Page 27: Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong ... · Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong University of Science and Technology ELEC5470/IEDA6100A-ConvexOptimization

26/78

Historical timeline of Markov graphical models

Data X = x(i) ∼ N (0,Σ = Θ†)ni=1, S = 1n

∑ni=1(x(i))(x(i))>

I Covariance selection [Dempster, 1972]: graph from the elements of S−1

inverse sample covariance matrix.I Neighborhood regression [Meinshausen and Bühlmann, 2006]:

arg minβ1

|x(1) − β1X/x(1) |2 + α‖β1‖1.

I `1-regularized MLE [Friedman et al., 2008, Banerjee et al., 2008]:

maximizeΘ0

log det(Θ)− tr(ΘS)− α‖Θ‖1.

I Ising model: `1-regularized logistic regression [Ravikumar et al., 2010].I Attractive IGMRF [Slawski and Hein, 2015].I Laplacian structure in Θ [Lake and Tenenbaum, 2010].I `1-regularized MLE with Laplacian structure

[Egilmez et al., 2017, Zhao et al., 2019]

Page 28: Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong ... · Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong University of Science and Technology ELEC5470/IEDA6100A-ConvexOptimization

27/78

Setting our goal

I Given data X = x(i) ∼ N (0,Σ = Θ†)ni=1,I we have sample covariance matrix sample S = 1

n

∑ni=1(x(i))>(x(i)).

I Learn the graph matrix Θ.

Page 29: Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong ... · Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong University of Science and Technology ELEC5470/IEDA6100A-ConvexOptimization

28/78

Penalized MLE Gaussian Graphical Modeling (GGM)

I `1-regularized MLE [Friedman et al., 2008, Banerjee et al., 2008]:

maximizeΘ0

log det(Θ)− tr(ΘS)− α‖Θ‖1.

I When the data is coming from p-variate Gaussian distribution, it is theMLE of the inverse covariance matrix.

I For arbitrarily distributed data: it is the log-determinant Bregmandivergence regularized optimization problem.

I Computationally efficient method to solve this problem, the famousGraphical Lasso (GLasso) algorithm [Friedman et al., 2008].

I An elegant and statistically consistent way of associating a graph withgiven data and has become an integral part of numerous applications.

Page 30: Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong ... · Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong University of Science and Technology ELEC5470/IEDA6100A-ConvexOptimization

29/78

Additional structure

maximizeΘ0

log det(Θ)− tr(ΘS)− α‖Θ‖1.

I Attractive IGMRF model [Slawski and Hein, 2015]:

SΘ =

Θ 0,Θij = Θji ≤ 0, i, j = 1, . . . , p, i 6= j.

I Laplacian structureSL =

Θ|Θij = Θji ≤ 0 for i 6= j,Θii = −

∑j 6=i Θij

. But Laplacian

is a PSD matrix and the log det is only defined for a PD matrix.I Modified Laplacian structure [Lake and Tenenbaum, 2010]:

maximizeΘ0

log det(Θ)− tr(ΘS)− α‖Θ‖1, subject to Θ = Θ +

1

σ2I

I `1-regularized MLE with Laplacian structure after adding J = 1p11>

[Egilmez et al., 2017, Zhao et al., 2019].

maximizeΘ∈SL

log det(Θ + J)− tr(ΘS)− α‖Θ‖1,

Page 31: Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong ... · Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong University of Science and Technology ELEC5470/IEDA6100A-ConvexOptimization

30/78

GLasso algorithm [Friedman et al., 2008]

maximizeΘ0

log det(Θ)− tr(ΘS)− α‖Θ‖1.

The optimality condition for the above problem is

Θ−1 − S− αΓ = 0,

where Γ is a matrix of component-wise signs of Θ:

[Γ]jk =

γjk = sign(Θjk), if Θjk 6= 0

γjk ∈ [−1, 1], if Θjk 6= 0.

The equation for optimality condition is also known as the normal equation.Further, the constraint requires Θjj to be positive, this implies that

Σii = Sii + α, i = 1, . . . , p,

where Σ = Θ−1.

Page 32: Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong ... · Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong University of Science and Technology ELEC5470/IEDA6100A-ConvexOptimization

31/78

GLasso uses a block-coordinate method for solving the problem. Consider apartitioning of Θ and Σ:

Θ =

(Θ11 θ12

θ>12 Θ22

), Σ =

(Σ11 σ12

σ>12 Σ22

),

where Θ11 ∈ R(p−1)×(p−1),θ12 ∈ Rp−1 and Θ22 is a scalar, and similarly forthe other partitions. Next, ΘΣ = I : Σ = Θ−1 can be expressed as

Σ =

Θ−111 +

Θ−111 θ12θ

>12Θ−1

11

Θ22−θ21Θ−111 θ12

Θ−111 θ12

Θ22−θ>12Θ−111 θ12

· 1Θ22−θ21Θ−1

11 θ12

GLasso solves for a row/column of Θ at a time, holding the rest fixed.Considering the pth column of the normal equation, we get

−σ12 + s12 + αγ12 = 0.

Page 33: Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong ... · Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong University of Science and Technology ELEC5470/IEDA6100A-ConvexOptimization

32/78

GLasso [Mazumder and Hastie, 2012]

Consider reading off σ12 from the partitioned expression:

Θ−111 θ12

Θ22 − θ>12Θ−111 θ12

+ s12 + αγ12 = 0.

The above also simplifies to

Θ−111 θ12Σ22 + s12 + αγ12 = 0.

with ν = θ12Σ22 (with Σ22 fixed), Θ11 0 is equivalent to the stationarycondition for

minimizeν∈Rp−1

1

2ν>Θ−1

11 ν + ν>s12 + α ‖ν‖1

Let ν? be the minimizer, then

θ?12 = ν?Σ22

Θ?22 =

1

Σ22

+ (θ?12)>Θ−111 θ

?12

Page 34: Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong ... · Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong University of Science and Technology ELEC5470/IEDA6100A-ConvexOptimization

33/78

GLasso algorithm summary

1: Initialize: Σ = Diag(S) + αI, and Θ = Σ−1

. in what follows.

2: Repeat untill convergence criteria is met

(a) Rearrange row and column such that the target column is last(implicitly).

(b) Compute Θ−1 = Σ11 − σ12σ>12

Σ22

(c) Obtain ν and update θ?12 and Θ?22.

(d) Update Θ and Σ using the second partition function, ensuringΘΣ = I.

3: Output the precision matrix Θ.

Page 35: Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong ... · Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong University of Science and Technology ELEC5470/IEDA6100A-ConvexOptimization

34/78

Learning a graph with Laplacian constraint

Page 36: Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong ... · Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong University of Science and Technology ELEC5470/IEDA6100A-ConvexOptimization

35/78

Solving for the Laplacian constraint [Zhao et al., 2019]

maximizeΘ∈SL

log det(Θ + J)− tr(ΘS)− α‖Θ‖1.

where SL =

Θ|Θij = Θji ≤ 0 for i 6= j,Θii = −∑j 6=i Θij

.

Now that Θ satisfies the Laplacian constraints, the off-diagonal elements ofΘ are non-positive and the diagonal elements are non-negative, so

‖Θ‖1 = tr (ΘH)

where H = 2I− 11>. Thus, the objective function becomes

log det(Θ + J)− tr(ΘS)− αtr (ΘH)

= log det(Θ + J)− tr(ΘK

)where K = S + αH.

Page 37: Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong ... · Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong University of Science and Technology ELEC5470/IEDA6100A-ConvexOptimization

36/78

Solving with known connectivity information: Approach 1

Goal: To estimate the graph Laplacian weights Θ with known connectivityinformation C.

maximizeΘ∈SL(C)

log det(Θ + J)− tr(ΘK

)The constraint set SL(C) can be written as

Θ 0,Θ1 = 0

Θij = Θji ≤ 0 if Cij = 1 for i 6= j

Θij = Θji = 0 if Cij = 0 for i 6= j

Page 38: Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong ... · Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong University of Science and Technology ELEC5470/IEDA6100A-ConvexOptimization

37/78

We further suppose the graph has no self loops, so the diagonal elements ofC are all zero. Then, the constraint set SL(C) can be compactly rewrittenin the following way:

Θ = Ω,

Θ ∈ SΘ = Θ|Θ 0,Θ1 = 0Ω ∈ SΩ = Ω|IΩ ≥ 0,BΩ = 0,CΩ ≤ 0

where B = 11> − I−C. The constraint IΩ ≥ 0 is implied from theconstraint Θ 0.Next, we will present an equivalent form of the constraints Θ1 = 0 andΘ 0:

Θ = PΞP>

where P ∈ Rp×(p−1) is the orthogonal complement of 1, i.e., P>P = IandP>1 = 0. Note that the choice of P is nonunique. Why?Now with an alternative form of Θ, we can rewrite the objective function asfollows:

tr (ΘK) = tr(ΞK

), where K = P>KP

Page 39: Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong ... · Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong University of Science and Technology ELEC5470/IEDA6100A-ConvexOptimization

38/78

log det(Θ + J)

= log det

(PΞP> +

1

p11>

)= log det(Ξ)

Thus, the problem formulation changes to

minimizeΞ,Ω

tr(ΞK

)− log det Ξ

subject to Ξ 0PΞP> = ΩIΩ ≥ 0BΩ = 0CΩ ≤ 0

Ω ∈ C.

Note: Convex problem with linearly constrained variables Ω,Ξ.An alternating direction method of multipliers (ADMM) based method issuitable for such problem.

Page 40: Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong ... · Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong University of Science and Technology ELEC5470/IEDA6100A-ConvexOptimization

39/78

Sketch of the ADMM update

minimizex,z

f(x) + g(z)

subject to Ax + Bz = c

where x ∈ Rn, z ∈ Rm A ∈ Rp×n and B ∈ Rp×m and c ∈ Rp. Theaugmented Lagrangian is written as

Lρ(x, z,y) = f(x) + g(z) + y> (Ax + Bz− c) +ρ

2‖Ax + Bz− c‖22

where ρ > 0 is the penalty parameter. Now, the ADMM subroutine cyclesthrough following updates until it converges

1: Initialize: z(0),y(0), ρ2: t← 03: while stopping criterion is not met do4: z(t+1) = arg minz∈Z Lρ(x

(t), z,y(t))5: x(t+1) = arg minx∈X Lρ(x, z

(t+1),y(t))6: y(t+1) = y(t) + ρ

(Ax(t+1) + Bz(t+1) − c

)7: t← t+ 18: end while

Page 41: Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong ... · Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong University of Science and Technology ELEC5470/IEDA6100A-ConvexOptimization

40/78

Edge based formulation: Approach 2

Suppose there are M edges present in a graph, and the mth edge connectsvertex im and jm. Then we can always perform the following decompositionon its graph Laplacian matrix Θ:

Θ =

M∑m=1

Wimjm(eime>im + ejme>jm − eime>jm − ejme>im)

=M∑m=1

Wimjm(eim − ejm)>(eim − ejm)

= EDiag(w)E>

where w = Wimjm represents the weights on the edges. The matrix E isknown as the incidence matrix, can be inferred from the connectivity matrixC.This decomposition technique is motivated for highly sparse graph structures(e.g., tree graphs).

Page 42: Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong ... · Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong University of Science and Technology ELEC5470/IEDA6100A-ConvexOptimization

41/78

Problem reformulation

minimizew≥0

− log det(EDiag(w)E> + J) + tr(EDiag(w)E>K

).

Furthermore, with J = 1p11>, we have

EDiag(w)E> +1

p11> = [E,1]Diag([w, 1/p]>)[E,1]> = GDiag([w, 1/p]>)G>

Now the problem can be expressed as

minimizew≥0

− log det(GDiag([w, 1/p]>)G>) + tr(EDiag(w)E>K

)The problem is convex, we will obtain a simple closed form update rule viathe majorijation-minimization (MM) approach[Sun et al., 2016].

We start with following basic inequality:

log det(X) ≤ log det(X0) + tr(X−1

0 (X−X0))

which is due to the concavity of the log-determinant function.

Page 43: Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong ... · Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong University of Science and Technology ELEC5470/IEDA6100A-ConvexOptimization

42/78

Majorijation function

− log det(GXG>) = log det((GXG>)−1

)≤ log det

((GX0G

>)−1)

+ tr([

(GX0G>)−1

]−1 ((GXG>)−1 − (GX0G

>)−1))

=tr(F((GXG>)−1

))+ const.,

where F0 = GX0G> we substitute Diag([w>, 1/p>]>) for X, and the

minimization problem becomes

minimizew≥0

tr(EDiag(w)E>K

)+ tr

(F0

(GDiag

([w>0 , 1/p]

>)G>)−1)

with F0 = GX0G> = GDiag([w>0 , 1/p]

>)G>.

Yet, this minimization problem does not yield a simple closed-form solution.For the sake of algorithmic simplicity, we need to further majorize theobjective.

Page 44: Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong ... · Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong University of Science and Technology ELEC5470/IEDA6100A-ConvexOptimization

43/78

Double majorization and optimal solutionFor any YXY> 0, the following matrix inequality holds:

(YXY>)−1 4 Z−10 YX0X

−1X0Y>Z−1

0 , (1)

where Z0 = YX0Y>. Equality is achieved at X = X0.Now, we are able to do the following (define w = [w, 1/p]> and w0 = [w0, 1/p]>):

tr(F0

(GDiag(w)G>

)−1)

=tr(F

1/20

(GDiag(w)G>F

1/20

)−1)

≤tr(F

1/20 F−1

0 GDiag(w0)Diag(w)−1Diag(w0)G>F−1

0 F1/20

)where Y = G,X = diag(w),X0 = diag(w0),Z0 = F0. The majorized problem can beexpressed as

minimizew≥0

diag(R)>w + diag(QM )>w−1

where R = E>KE, QM = Q1:M,1:M , andQ = Diag(w0)G>(GDiag(w0)G>)−1GDiag(w0). The optimal solution to this problemis

w? =√

diag(QM ) diag(R)−1.

Page 45: Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong ... · Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong University of Science and Technology ELEC5470/IEDA6100A-ConvexOptimization

44/78

Outline

1 Graphical modeling

2 Types of graphical models

3 Physically inspired model: graphs from smooth signals

4 Probabilistic graphical model: GMRF

5 Structured graph learning via spectral constraints

6 Experiments

Page 46: Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong ... · Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong University of Science and Technology ELEC5470/IEDA6100A-ConvexOptimization

45/78

Structured graphs

(vii) Multi-componentgraph

(viii) Regular graph (ix) Modular graph

(x) Bipartite graph (xi) Grid graph (xii) Tree graph

Figure 3: Useful graph structures

Page 47: Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong ... · Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong University of Science and Technology ELEC5470/IEDA6100A-ConvexOptimization

46/78

Structured graphs: importance and challenges

Useful structures:

I Multi-component: graph for clustering, classification.I Bipartite: graph for matching and constructing two-channel filter

banks.I Multi-component bipartite: graph for co-clustering.I Tree: graphs for sampling algorithms.I Modular: graph for social network analysis.I Connected sparse: graph for graph signal processing applications.

Structured graph learning from dataI involves both the estimation of structure (graph connectivity) and

parameters (graph weights),I parameter estimation is well explored (e.g., maximum likelihood),I but structure is a combinatorial property which makes structure

estimation very challenging.Structure learning is NP-hard for a general class of graphical models[Bogdanov et al., 2008].

Page 48: Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong ... · Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong University of Science and Technology ELEC5470/IEDA6100A-ConvexOptimization

47/78

Structured graphs: direction

State-of-the-art directionI The effort has been on characterizing the families of structures for

which learning can be made feasible e.g., maximum weight spanningtree for tree structure [Chow and Liu, 1968] and local-separation andwalk summability for Erdos-Renyi graphs, power-law graphs, andsmall-world graphs [Anandkumar et al., 2012].

I Existing methods are restricted to some particular structures and it isdifficult to extend them to learn other useful structures, e.g.,multi-component, bipartite, etc.

Proposed direction: Graph (structure) ⇐⇒ Graph matrix (spectrum)

I Spectral properties of a graph matrix is one such characterization[Chung, 1997] which is considered in the present work.

I Under this framework, structure learning of a large class of graphstructures can be expressed as an eigenvalue problem of the graphmatrices.

Page 49: Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong ... · Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong University of Science and Technology ELEC5470/IEDA6100A-ConvexOptimization

48/78

Problem statement

To learn structured graphs via spectralconstraints

Page 50: Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong ... · Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong University of Science and Technology ELEC5470/IEDA6100A-ConvexOptimization

49/78

Motivating example 1: structure via Laplacian eigenvalues

Θ = UDiag(λ)U>

For a multi-component graph the first k eigenvalues of its Laplacian matrixare zero:

Sλ = λj = 0kj=1, c1 ≤ λk+1 ≤ · · · ≤ λp ≤ c2

1

2 3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25 26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47 48

49

50

51

52

53

54

55

56

57

58

59

60

0 10 20 30 40 50 60Number of nodes(eigenvalues)

0

2

4

6

8

10

12

14

Eig

enva

lues

of L

apla

cian

Mat

rix

3 zero eigenavlues of 3 component graph Laplacian

Page 51: Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong ... · Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong University of Science and Technology ELEC5470/IEDA6100A-ConvexOptimization

50/78

Motivating example 2: structure via adjacency eigenvalues

Adjacency matrix ΘA: ΘA = Diag(diag(Θ))−Θ.

ΘA = VDiag(ψ)V>

For a bipartite graph the eigenvalues are symmetric about the origin:Sψ = ψi = −ψp−i+1, ∀i = 1, . . . , p.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

0 10 20 30 40 50 60Number of nodes(eigenvalues)

-4

-3

-2

-1

0

1

2

3

4

Eig

enva

lues

of A

djac

ency

Mat

rix

Eigenvalues symmetric along 0

Page 52: Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong ... · Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong University of Science and Technology ELEC5470/IEDA6100A-ConvexOptimization

51/78

Proposed unified framework for SGL

maximizeΘ

log gdet(Θ)− tr(ΘS)− αh(Θ),

subject to Θ ∈ SΘ, λ(T (Θ)) ∈ ST

I gdet is the generalized determinant defined as the non-zero eigenvaluesproduct,

I SΘ encodes the typical constraints of a Laplacian matrix,I λ(T (Θ)) is the vector containing the eigenvalues of matrix T (Θ),I T (·) is the transformation matrix to consider the eigenvalues of

different graph matrices, andI ST allows to include spectral constraints in the eigenvalues.I Precisely ST will facilitate the process of incorporating the spectral

properties required for enforcing structure.

The proposed formulation has converted the combinatorial structuralconstraints into analytical spectral constraints.

Page 53: Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong ... · Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong University of Science and Technology ELEC5470/IEDA6100A-ConvexOptimization

52/78

Optimization for Laplacian spectral constraints

maximizeΘ,λ,U

log gdet(Θ)− tr(ΘS)− α ‖Θ‖1 ,

subject to Θ ∈ SΘ, Θ = UDiag(λ)U>, λ ∈ Sλ, U>U = I,

where λ = [λ1, λ2, . . . , λp] is the vector of eigenvalues and U is the matrixof eigenvectors.

The resulting formulation is still complicated and intractable:I Laplacian structural constraints,I non-convex constraints coupling Θ,U,λ, andI non-convex constraints on U.

In order to derive a feasible formulation:I we first introduce a linear operator L that transforms the Laplacian

structural constraints to simple algebraic constraints andI then relax the eigen-decomposition expression into the objective

function.

Page 54: Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong ... · Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong University of Science and Technology ELEC5470/IEDA6100A-ConvexOptimization

53/78

Linear operator for Θ ∈ SΘ

SΘ =

Θ|Θij = Θji ≤ 0 for i 6= j,Θii = −∑j 6=i

Θij

,

Θij = Θji ≤ 0 and Θ1 = 0 implying the target matrix is symmetric withdegrees of freedom of Θ equal to p(p− 1)/2.We define a linear operator L : w ∈ Rp(p−1)/2

+ → Lw ∈ Rp×p, which maps aweight vector w to the Laplacian matrix:

[Lw]ij = [Lw]ji ≤ 0; i 6= j

[Lw]ii = −∑j 6=i

[Lw]ij

Example of Lw on w = [w1, w2, w3, w4, w5, w6]>:

Lw =

∑i=1,2,3 wi −w1 −w2 −w3

−w1

∑i=1,4,5 wi −w4 −w5

−w2 −w4

∑i=2,4,6 wi −w6

−w3 −w5 −w6

∑i=3,5,6 wi

.

Page 55: Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong ... · Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong University of Science and Technology ELEC5470/IEDA6100A-ConvexOptimization

54/78

Problem reformulation

maximizeΘ,λ,U

log gdet(Θ)− tr(ΘS)− α ‖Θ‖1 ,

subject to Θ ∈ SΘ, Θ = UDiag(λ)U>, λ ∈ Sλ, U>U = I,

Using: i) Θ = Lw and ii) tr(ΘS)

+ αh(Θ) = tr(ΘK

), K = S + H and

H = α(2I− 11>) the proposed problem formulation becomes:

minimizew,λ,U

− log gdet(Diag(λ)) + tr(KLw) +β

2‖Lw −UDiag(λ)U>‖2F ,

subject to w ≥ 0, λ ∈ Sλ, U>U = I.

Page 56: Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong ... · Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong University of Science and Technology ELEC5470/IEDA6100A-ConvexOptimization

55/78

SGL algorithm for k-component graph learning

I Variables: X = (w, λ, U)

I Spectral constraint: Sλ = λj = 0kj=1, c1 ≤ λk+1 ≤ · · · ≤ λp ≤ c2.I Positivity constraint: w ≥ 0

I Orthogonality constraint: U>U = Ip−k

We develop a block majorization-minimization (block-MM) type methodwhich updates each block sequentially while keeping the other blocksfixed [Sun et al., 2016, Razaviyayn et al., 2013].

Page 57: Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong ... · Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong University of Science and Technology ELEC5470/IEDA6100A-ConvexOptimization

56/78

Update for w

Sub-problem for w:

minimizew≥0

tr (KLw) +β

2‖Lw −UDiag(λ)U>‖2F .

minimizew≥0

f(w) =1

2‖Lw‖2F − c>w.

This problem is a convex quadratic program, but does not have aclosed-form solution due to the non-negativity constraint w ≥ 0.The function f(w) is majorized at wt by the function

g(w|wt) = f(w>) + (w −wt)>∇f(wt) +L

2

∥∥w −wt∥∥2

where w> is the update from previous iteration [Sun et al., 2016].A closed-form update by using the MM technique

wt+1 =

(wt − 1

2p∇f(wt)

)+

,

where (a)+ = max(a, 0).

Page 58: Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong ... · Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong University of Science and Technology ELEC5470/IEDA6100A-ConvexOptimization

57/78

Update for U

Sub-problem for U:

maximizeU

tr(U>LwUDiag(λ)

)subject to U>U = Ip−k.

This sub-problem is an optimization on the orthogonal Stiefel manifold[Absil et al., 2009, Benidis et al., 2016]. From the KKT optimalityconditions the solution is given by

Ut+1 = eigenvectors(Lwt+1)[k + 1 : p],

that is, the p− k principal eigenvectors of the matrix Lwt+1 in increasingorder of eigenvalue magnitude.

Page 59: Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong ... · Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong University of Science and Technology ELEC5470/IEDA6100A-ConvexOptimization

58/78

Update for λ

Sub-problem for λ:

minimizeλ∈Sλ

− log det(λ) +β

2‖U>(Lw)U− Diag(λ)‖2F .

minimizec1≤λk+1≤···≤λp≤c2

−p−k∑i=1

log λk+i +β

2‖λ− d‖2,

The sub-problem is popularly known as a regularized isotonic regressionproblem. This is a convex optimization problem and the solution can beobtained from the KKT optimality conditions. We develop an efficientalgorithm with a fast convergence to the global optimum in a maximum ofp− k iterations [Kumar et al., 2019].

Sandeep Kumar, Jiaxi Ying, José Vinícius de M. Cardoso, and Daniel P. Palomar,“A Unified Framework for Structured Graph Learning via Spectral Constraints.” arXivpreprint arXiv:1904.09792 (2019).

Page 60: Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong ... · Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong University of Science and Technology ELEC5470/IEDA6100A-ConvexOptimization

59/78

SGL algorithm summary

minimizew,λ,U

− log gdet(Diag(λ)) + tr(KLw) +β

2‖Lw −UDiag(λ)U>‖2F ,

subject to w ≥ 0, λ ∈ Sλ, U>U = Ip−k.

Algorithm:

1: Input: SCM S, k, c1, c2, β2: Output: Lw3: t← 04: while stopping criterion is not met do

5: wt+1 =(wt − 1

2p∇f(wt))+

6: Ut+1 ← eigenvectors(Lwt+1), suitably ordered.

7: Update λt+1 (via isotonic regression method with maxm iter p− k).

8: t← t+ 19: end while

10: return w(t+1)

Page 61: Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong ... · Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong University of Science and Technology ELEC5470/IEDA6100A-ConvexOptimization

60/78

Convergence and the computational complexity

The worst-case computational complexity of the proposed algorithm isO(p3).

Theorem: The limit point (w?,U?,λ?) generated by this algorithmconverges to the set of KKT points of the optimization problem.

Sandeep Kumar, Jiaxi Ying, José Vinícius de M. Cardoso, and Daniel P. Palomar,“Structured graph learning via Laplacian spectral constraints,” in Advances in NeuralInformation Processing Systems (NeurIPS), 2019.

Page 62: Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong ... · Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong University of Science and Technology ELEC5470/IEDA6100A-ConvexOptimization

61/78

Outline

1 Graphical modeling

2 Types of graphical models

3 Physically inspired model: graphs from smooth signals

4 Probabilistic graphical model: GMRF

5 Structured graph learning via spectral constraints

6 Experiments

Page 63: Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong ... · Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong University of Science and Technology ELEC5470/IEDA6100A-ConvexOptimization

62/78

Synthetic experiment setup

I Generate a graph with desired structure.I Sample weights for the graph edges.I Obtain true Laplacian Θtrue.I Sample data X = x(i) ∈ Rp ∼ N (0,Σ = Θ†true)ni=1.I S = 1

n

∑ni=1(x(i))(x(i))>.

I Use S and some prior spectral information, if available.I Performance metric

Relative Error =

∥∥∥Θ?−Θtrue

∥∥∥F

‖Θtrue‖F, F-Score =

2tp2tp + fp + fn

I Where Θ?is the final estimation result the algorithm and Θtrue is the

true reference graph Laplacian matrix, and tp, fp, fn correspond to truepositives, false positives, and false negatives, respectively.

Page 64: Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong ... · Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong University of Science and Technology ELEC5470/IEDA6100A-ConvexOptimization

63/78

Grid graph

(i) True (ii) [Egilmez et al., 2017]

(iii) SGL with `1-norm (iv) SGL with reweighted`1-norm

Page 65: Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong ... · Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong University of Science and Technology ELEC5470/IEDA6100A-ConvexOptimization

64/78

Noisy multi-component graph

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

(v) True

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

(vi) Noisy

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

(vii) Learned

(viii) True graph (ix) Noisy graph (x) Learned

Page 66: Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong ... · Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong University of Science and Technology ELEC5470/IEDA6100A-ConvexOptimization

65/78

Model mismatch

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

(xiv) True k = 7 (xv) Noisy (xvi) Learned with k =2

Page 67: Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong ... · Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong University of Science and Technology ELEC5470/IEDA6100A-ConvexOptimization

66/78

Popular multi-component structures

Page 68: Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong ... · Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong University of Science and Technology ELEC5470/IEDA6100A-ConvexOptimization

67/78

Real data: cancer dataset [Weinstein et al., 2013]

(xxiii) CLR (Nie et al., 2016) (xxiv) SGL with k = 5

Clustering accuracy (ACC): CLR = 0.9862 and SGL = 0.99875.

Page 69: Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong ... · Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong University of Science and Technology ELEC5470/IEDA6100A-ConvexOptimization

68/78

Animal dataset [Osherson et al., 1991]

ElephantRhino

Horse

CowCamel

Giraffe

Chimp

Gorilla

Mouse

Squirrel

TigerLion

CatDog

Wolf

Seal

Dolphin

Robin

Eagle

Chicken

Salmon

Trout

Bee

Iguana

Alligator

Butterfly

Ant

Finch

Penguin

Cockroach

Whale

Ostrich

Deer

(xxv) GGL [Egilmez et al., 2017]

Elephant

Rhino

HorseCow

Camel

Giraffe

Chimp

Gorilla

Mouse

Squirrel

Tiger

Lion

Cat

Dog Wolf

Seal

Dolphin

Robin

Eagle

Chicken

Salmon

TroutBee

Iguana

Alligator

Butterfly

Ant

Finch

Penguin

Cockroach

Whale

Ostrich

Deer

(xxvi) GLasso [Friedman et al., 2008]

Page 70: Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong ... · Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong University of Science and Technology ELEC5470/IEDA6100A-ConvexOptimization

69/78

Animal dataset contd...

Elephant

Rhino

Horse

Cow

Camel

Giraffe

ChimpGorilla

Mouse

Squirrel

Tiger

LionCat

DogWolf

Seal Dolphin

Robin

EagleChicken

Salmon

Trout

Bee

IguanaAlligator

Butterfly

Ant

Finch

Penguin

Cockroach

Whale

Ostrich

Deer

(xxvii) SGL: proposed (k = 1)

Elephant

Rhino

Horse

Cow

Camel

Giraffe

Chimp

Gorilla

Mouse

Squirrel

Tiger Lion

Cat

Dog

Wolf

Seal

Dolphin

Robin

Eagle

Chicken

Salmon

Trout

Bee

Iguana

Alligator

ButterflyAnt

Finch

PenguinCockroach

Whale

Ostrich

Deer

(xxviii) SGL: proposed (k = 4)

Page 71: Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong ... · Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong University of Science and Technology ELEC5470/IEDA6100A-ConvexOptimization

70/78

Bipartite structure via adjacency spectral constraints

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

(xxix) True0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

(xxx) Noisy0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

(xxxi) Learned

Page 72: Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong ... · Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong University of Science and Technology ELEC5470/IEDA6100A-ConvexOptimization

71/78

Multi-component bipartite structure via joint spectralconstraints

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

(xxxii) True0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

(xxxiii) Noisy0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

(xxxiv) Learned

Page 73: Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong ... · Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong University of Science and Technology ELEC5470/IEDA6100A-ConvexOptimization

72/78

ResourcesAn R package “spectralGraphTopology” containing code for all the experimental resultsis available athttps://cran.r-project.org/package=spectralGraphTopology

NeurIPS paper: Sandeep Kumar, Jiaxi Ying, José Vinícius de M. Cardoso, and Daniel P.Palomar, “Structured graph learning via Laplacian spectral constraints,” in Advances inNeural Information Processing Systems (NeurIPS), 2019.https://arxiv.org/pdf/1909.11594.pdf

Extended version paper: Sandeep Kumar, Jiaxi Ying, José Vinícius de M. Cardoso, andDaniel P. Palomar, “A Unified Framework for Structured Graph Learning via SpectralConstraints, (2019).” https://arxiv.org/pdf/1904.09792.pdf

Authors:

Page 74: Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong ... · Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong University of Science and Technology ELEC5470/IEDA6100A-ConvexOptimization

Thanks

For more information visit:

https://www.danielppalomar.com

73/78

Page 75: Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong ... · Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong University of Science and Technology ELEC5470/IEDA6100A-ConvexOptimization

74/78

References

Absil, P.-A., Mahony, R., and Sepulchre, R. (2009).Optimization algorithms on matrix manifolds.Princeton University Press.

Anandkumar, A., Tan, V. Y., Huang, F., and Willsky, A. S. (2012).High-dimensional gaussian graphical model selection: Walk summability and localseparation criterion.Journal of Machine Learning Research, 13(Aug):2293–2337.

Banerjee, O., Ghaoui, L. E., and d’Aspremont, A. (2008).Model selection through sparse maximum likelihood estimation for multivariateGaussian or binary data.Journal of Machine Learning Research, 9(Mar):485–516.

Benidis, K., Sun, Y., Babu, P., and Palomar, D. P. (2016).Orthogonal sparse pca and covariance estimation via procrustes reformulation.IEEE Transactions on Signal Processing, 64(23):6211–6226.

Bogdanov, A., Mossel, E., and Vadhan, S. (2008).The complexity of distinguishing markov random fields.In Approximation, Randomization and Combinatorial Optimization. Algorithms andTechniques, pages 331–342. Springer.

Page 76: Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong ... · Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong University of Science and Technology ELEC5470/IEDA6100A-ConvexOptimization

75/78

References

Chow, C. and Liu, C. (1968).Approximating discrete probability distributions with dependence trees.IEEE transactions on Information Theory, 14(3):462–467.

Chung, F. R. (1997).Spectral graph theory.Number 92. American Mathematical Soc.

Dempster, A. P. (1972).Covariance selection.Biometrics, pages 157–175.

Egilmez, H. E., Pavez, E., and Ortega, A. (2017).Graph learning from data under laplacian and structural constraints.IEEE Journal of Selected Topics in Signal Processing, 11(6):825–841.

Fan, K. (1949).On a theorem of weyl concerning eigenvalues of linear transformations i.Proceedings of the National Academy of Sciences of the United States of America,35(11):652.

Page 77: Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong ... · Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong University of Science and Technology ELEC5470/IEDA6100A-ConvexOptimization

76/78

References

Friedman, J., Hastie, T., and Tibshirani, R. (2008).Sparse inverse covariance estimation with the graphical lasso.Biostatistics, 9(3):432–441.

Kumar, S., Ying, J., Cardoso, J. V. d. M., and Palomar, D. (2019).A unified framework for structured graph learning via spectral constraints.arXiv preprint arXiv:1904.09792.

Lake, B. and Tenenbaum, J. (2010).Discovering structure by learning sparse graphs.In Proceedings of the 33rd Annual Cognitive Science Conference.

Lauritzen, S. L. (1996).Graphical models, volume 17.Clarendon Press.

Mazumder, R. and Hastie, T. (2012).The graphical lasso: New insights and alternatives.Electronic journal of statistics, 6:2125.

Page 78: Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong ... · Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong University of Science and Technology ELEC5470/IEDA6100A-ConvexOptimization

77/78

References

Meinshausen, N. and Bühlmann, P. (2006).High-dimensional graphs and variable selection with the lasso.The annals of statistics, 34(3):1436–1462.

Osherson, D. N., Stern, J., Wilkie, O., Stob, M., and Smith, E. E. (1991).Default probability.Cognitive Science, 15(2):251–269.

Ravikumar, P., Wainwright, M. J., Lafferty, J. D., et al. (2010).High-dimensional ising model selection using `1-regularized logistic regression.The Annals of Statistics, 38(3):1287–1319.

Razaviyayn, M., Hong, M., and Luo, Z.-Q. (2013).A unified convergence analysis of block successive minimization methods fornonsmooth optimization.SIAM Journal on Optimization, 23(2):1126–1153.

Rue, H. and Held, L. (2005).Gaussian Markov random fields: theory and applications.CRC press.

Page 79: Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong ... · Sandeep Kumar and Prof. Daniel P. Palomar The Hong Kong University of Science and Technology ELEC5470/IEDA6100A-ConvexOptimization

78/78

References

Slawski, M. and Hein, M. (2015).Estimation of positive definite m-matrices and structure learning for attractivegaussian markov random fields.Linear Algebra and its Applications, 473:145–179.

Sun, Y., Babu, P., and Palomar, D. P. (2016).Majorization-minimization algorithms in signal processing, communications, andmachine learning.IEEE Transactions on Signal Processing, 65(3):794–816.

Weinstein, J. N., Collisson, E. A., Mills, G. B., Shaw, K. R. M., Ozenberger, B. A.,Ellrott, K., Shmulevich, I., Sander, C., Stuart, J. M., et al. (2013).The cancer genome atlas pan-cancer analysis project.Nature Genetics, 45(10):1113.

Zhao, L., Wang, Y., Kumar, S., and Palomar, D. P. (2019).Optimization algorithms for graph laplacian estimation via ADMM and MM.IEEE Transactions on Signal Processing, 67(16):4231–4244.

Zhao, L., Wang, Y., Kumar, S., and Palomar, D. P. (2019).Optimization algorithms for graph laplacian estimation via admm and mm.IEEE Transactions on Signal Processing, 67(16):4231–4244.