approximating a single component of the solution to a linear system · 2015-05-21 · approximating...

32
Approximating a single component of the solution to a linear system 1 Christina E. Lee, Asuman Ozdaglar, Devavrat Shah [email protected] [email protected] [email protected] MIT LIDS

Upload: others

Post on 17-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Approximating a single component of the solution to a linear system · 2015-05-21 · Approximating a single component of the solution to a linear system 1 ... Stationary distribution

Approximating a single component of the solution to a linear system

1

Christina E. Lee, Asuman Ozdaglar, Devavrat Shah [email protected] [email protected] [email protected]

MIT LIDS

Page 2: Approximating a single component of the solution to a linear system · 2015-05-21 · Approximating a single component of the solution to a linear system 1 ... Stationary distribution

2

How do I compare to my competitors?

Bonacich Centrality Rank Centrality Pagerank

Page 3: Approximating a single component of the solution to a linear system · 2015-05-21 · Approximating a single component of the solution to a linear system 1 ... Stationary distribution

3 Bonacich Centrality Rank Centrality Pagerank

Stationary distribution of Markov chain

Q: Do I need to compute the entire solution vector in order to get an estimate of my centrality and the centrality of my competitors?

Eek, too expensive!!

Solution to Linear System

Page 4: Approximating a single component of the solution to a linear system · 2015-05-21 · Approximating a single component of the solution to a linear system 1 ... Stationary distribution

Q: How do we efficiently compute the solution for a single coordinate within a large network?

4

(1) Stationary distribution of Markov chain

(2) Solution to Linear System of Equations

• G = P (i.e. a stochastic matrix), z = 0 • Sample short random walks,

relies on sharp characterization of πi

• If A positive definite, then we can choose G, z s.t. spectral radius G < 1, and x = Gx + z equivalent

• Expand computation in local neighborhood, use sparsity of G to bound size of neighborhood

Would like a “local” method, i.e. cheaper than computing full vector.

Page 5: Approximating a single component of the solution to a linear system · 2015-05-21 · Approximating a single component of the solution to a linear system 1 ... Stationary distribution

Overview of Literature

5

Stationary Distribution of Markov Chain

x = Gx + z, spectral radius(G) < 1

Probabilistic

Monte Carlo

MCMC methods Ulam von Neumann

Samples long random walks Good if G has good spectral properties Challenge is to control the variance of estimator Naturally parallelizable

Page 6: Approximating a single component of the solution to a linear system · 2015-05-21 · Approximating a single component of the solution to a linear system 1 ... Stationary distribution

Overview of Literature

6

Stationary Distribution of Markov Chain

x = Gx + z, spectral radius(G) < 1

Probabilistic

Monte Carlo

MCMC methods Ulam von Neumann

Iterative Algebraic

Power method Linear iterative update, Jacobi, Gauss-Seidel

Iterative matrix-vector multiplication Good if G is sparse, and has good spectral properties Distributed, but synchronous Each multiplication O(n2)

Page 7: Approximating a single component of the solution to a linear system · 2015-05-21 · Approximating a single component of the solution to a linear system 1 ... Stationary distribution

Overview of Literature

7

Stationary Distribution of Markov Chain

x = Gx + z, spectral radius(G) < 1

Probabilistic

Monte Carlo

MCMC methods Ulam von Neumann

Other SVD Gaussian elimination

Not designed for single component computation Requires global computation

What if we specifically thought about solving for a single component?

Iterative Algebraic

Power method Linear iterative update, Jacobi, Gauss-Seidel

Page 8: Approximating a single component of the solution to a linear system · 2015-05-21 · Approximating a single component of the solution to a linear system 1 ... Stationary distribution

Stationary Distribution of Markov Chain

x = Gx + z, spectral radius(G) < 1

Probabilistic

Monte Carlo

MCMC methods Ulam von Neumann

Iterative Algebraic

Power method Linear iterative update, Jacobi, Gauss-Seidel

Other SVD Gaussian elimination

Contributions

8

• New Monte Carlo method which uses • Samples “local” truncated returning random walks • Convergence is a function of mixing time • Estimate always guaranteed to be an upper bound • Suggest and analyze specific termination criteria

which gives multiplicative error bounds for high πi, and thresholds low πi

[L., Ozdaglar, Shah NIPS 2013]

Page 9: Approximating a single component of the solution to a linear system · 2015-05-21 · Approximating a single component of the solution to a linear system 1 ... Stationary distribution

Stationary Distribution of Markov Chain

x = Gx + z, spectral radius(G) < 1

Probabilistic

Monte Carlo

MCMC methods Ulam von Neumann

Iterative Algebraic

Power method Linear iterative update, Jacobi, Gauss-Seidel

Other SVD Gaussian elimination

Contributions

9

• Can be implemented asynchronously • Maintains sparse intermediate vectors • Provides invariant which track progress of method

[L., Ozdaglar, Shah ArXiv 1411.2647 2014]

Page 10: Approximating a single component of the solution to a linear system · 2015-05-21 · Approximating a single component of the solution to a linear system 1 ... Stationary distribution

Stationary Distribution of Markov Chain

x = Gx + z, spectral radius(G) < 1

Probabilistic

Monte Carlo

MCMC methods Ulam von Neumann

Iterative Algebraic

Power method Linear iterative update, Jacobi, Gauss-Seidel

Other SVD Gaussian elimination

Contributions

10

Both methods are inspired by previous algorithms designed for computing PageRank

[JW03, H03, FRCS05, ALNO07, BCG10, BBCT12]

[ABCHMT07]

Page 11: Approximating a single component of the solution to a linear system · 2015-05-21 · Approximating a single component of the solution to a linear system 1 ... Stationary distribution

Computing Stationary Probability

• Consider a Markov chain

– Finite state space

– Transition matrix

– Irreducible, positive recurrent

• Goal: compute stationary probability focusing on nodes with higher values

11

Page 12: Approximating a single component of the solution to a linear system · 2015-05-21 · Approximating a single component of the solution to a linear system 1 ... Stationary distribution

Node-Centric Monte Carlo

• Standard Monte Carlo method samples long random walks until it converges to stationarity ergodic theorem

• Our method is based on the property

12

Page 13: Approximating a single component of the solution to a linear system · 2015-05-21 · Approximating a single component of the solution to a linear system 1 ... Stationary distribution

Intuition

Estimate:

13

Page 14: Approximating a single component of the solution to a linear system · 2015-05-21 · Approximating a single component of the solution to a linear system 1 ... Stationary distribution

i

Algorithm

Gather Samples (i, N, θ)

Sample N truncated return paths to i

= fraction of samples truncated

i i i i

Returned to i !

i i i

Keep walking …

i i Path length exceeded

i

Q: How do we choose appropriate N, θ? 14

Page 15: Approximating a single component of the solution to a linear system · 2015-05-21 · Approximating a single component of the solution to a linear system 1 ... Stationary distribution

Gather Samples (i, N(k), θ(k))

Terminate if Satisfied

Compute θ(k+1) and N(k+1)

Double θ: θ(k+1) = 2*θ(k)

Increase N such that by Chernoff’s bound:

Algorithm In

crem

ent

k

15

Page 16: Approximating a single component of the solution to a linear system · 2015-05-21 · Approximating a single component of the solution to a linear system 1 ... Stationary distribution

Convergence

Q: Can we design a “smart” termination criteria?

16

Page 17: Approximating a single component of the solution to a linear system · 2015-05-21 · Approximating a single component of the solution to a linear system 1 ... Stationary distribution

Gather Samples (i, N(k), θ(k))

Terminate if Satisfied(Δ)

Output current estimate if

(a) Stationary prob. is small

(b) Fraction of truncated samples is small

Compute θ(k+1) and N(k+1)

Algorithm In

crem

ent

k

17

Page 18: Approximating a single component of the solution to a linear system · 2015-05-21 · Approximating a single component of the solution to a linear system 1 ... Stationary distribution

Results for Termination Criteria

18 Results extend to countable state space Markov chains.

Termination condition 1 guarantees πi small

Termination condition 2 guarantees multiplicative error bound

Dependent on algorithm parameters, not input Markov chain

Page 19: Approximating a single component of the solution to a linear system · 2015-05-21 · Approximating a single component of the solution to a linear system 1 ... Stationary distribution

19

Simulation - Pagerank

Nodes sorted by stationary probability

Stationary

probability Random graph using configuration model and power law degree distribution

Page 20: Approximating a single component of the solution to a linear system · 2015-05-21 · Approximating a single component of the solution to a linear system 1 ... Stationary distribution

Random graph using configuration model and power law degree distribution

Obtain close estimates for important nodes

20

Simulation - Pagerank

Nodes sorted by stationary probability

Stationary

probability

Page 21: Approximating a single component of the solution to a linear system · 2015-05-21 · Approximating a single component of the solution to a linear system 1 ... Stationary distribution

Random graph using configuration model and power law degree distribution

Obtain close estimates for important nodes

21

Simulation - Pagerank

Nodes sorted by stationary probability

Stationary

probability corrects for the bias! Fraction samples not

truncated

corrects for the bias!

Page 22: Approximating a single component of the solution to a linear system · 2015-05-21 · Approximating a single component of the solution to a linear system 1 ... Stationary distribution

22 Rank Centrality Pagerank

Compute Stationary Probability by sampling local random walks

Page 23: Approximating a single component of the solution to a linear system · 2015-05-21 · Approximating a single component of the solution to a linear system 1 ... Stationary distribution

23 Pagerank Bonacich Centrality

When spectral radius of G < 1,

• Importance sampling, sample walks and reweight → unbiased estimator

• If , may not exist sampling matrix such that variance is finite

• Does not exploit sparsity

Q: Extend to solving linear system? Ulam von Neumann algorithm

Page 24: Approximating a single component of the solution to a linear system · 2015-05-21 · Approximating a single component of the solution to a linear system 1 ... Stationary distribution

Stationary Distribution of Markov Chain

x = Gx + z, spectral radius(G) < 1

Probabilistic

Monte Carlo

MCMC methods Ulam von Neumann

Iterative Algebraic

Power method Linear iterative update, Jacobi, Gauss-Seidel

Other SVD Gaussian elimination

Contributions

24

[JW03, H03, FRCS05, ALNO07, BCG10, BBCT12]

[ABCHMT07]

Page 25: Approximating a single component of the solution to a linear system · 2015-05-21 · Approximating a single component of the solution to a linear system 1 ... Stationary distribution

Synchronous Algorithm

• Standard stationary linear iterative method

• Relies upon Neumann series representation

• Solving for a single coordinate

25

Sparsity pattern is k-hop neighborhood of i

x(t) as dense as z

Page 26: Approximating a single component of the solution to a linear system · 2015-05-21 · Approximating a single component of the solution to a linear system 1 ... Stationary distribution

Synchronous Algorithm

26

Sparsity of p(t) grows as breadth-first traversal

Initialize:

Update step:

Terminate if:

Output:

Page 27: Approximating a single component of the solution to a linear system · 2015-05-21 · Approximating a single component of the solution to a linear system 1 ... Stationary distribution

Synchronous - Guarantee

27 27

If , then the algorithm terminates with

(a) Estimate satisfying

(b) Total number of multiplications bounded by

where

Independent of size of matrix

Page 28: Approximating a single component of the solution to a linear system · 2015-05-21 · Approximating a single component of the solution to a linear system 1 ... Stationary distribution

Synchronous to Asynchronous

Initialize:

Update step:

Terminate if:

Output:

Update for node u of r(t),

28 What if we implemented the updates asynchronously instead?

Page 29: Approximating a single component of the solution to a linear system · 2015-05-21 · Approximating a single component of the solution to a linear system 1 ... Stationary distribution

Q: Does this converge?

For all t, for any chosen sequence of u,

Asynchronous Algorithm

29

Initialize:

Update step: Choose coordinate u

Terminate if:

Output:

Yes, if ρ(G) < 1 and u chosen infinitely often At what rate?

Page 30: Approximating a single component of the solution to a linear system · 2015-05-21 · Approximating a single component of the solution to a linear system 1 ... Stationary distribution

Asynchronous - Guarantee

30 30

If , then the algorithm terminates with

(a) Estimate satisfying

(b) With probability > 1 - δ, total number of multiplications is bounded by

where

Independent of size of matrix

Page 31: Approximating a single component of the solution to a linear system · 2015-05-21 · Approximating a single component of the solution to a linear system 1 ... Stationary distribution

Simulations Compare different choices for

Bonacich centrality in Erdos-Renyi graph: n=1000 and p=0.0276

31

Page 32: Approximating a single component of the solution to a linear system · 2015-05-21 · Approximating a single component of the solution to a linear system 1 ... Stationary distribution

32

Compute Stationary Probability by sampling local random walks

• Can be implemented asynchronously • Maintains sparse intermediate vectors • Provides invariant which track progress of method

[L., Ozdaglar, Shah arXiv:1411.2647 2014]

Compute Ax = b by expanding computation within neighborhood and utilizing sparsity

• New Monte Carlo method which uses • Samples “local” truncated returning random walks • Convergence is a function of mixing time • Estimate always guaranteed to be an upper bound • Suggest and analyze specific termination criteria

which gives multiplicative error bounds for high πi, and thresholds low πi

[L., Ozdaglar, Shah NIPS 2013]