gibbs sampling from π‘˜-determinantal point processes13-09-00)-13...Β Β· 2019-06-13Β Β· point...

12
Gibbs Sampling from -Determinantal Point Processes Based on joint work with Shayan Oveis Gharan Alireza Rezaei University of Washington

Upload: others

Post on 30-Jun-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Gibbs Sampling from π‘˜-Determinantal Point Processes13-09-00)-13...Β Β· 2019-06-13Β Β· Point Process: A distribution on subsets of ={ s, t,…, }. Determinantal Point Process: There

Gibbs Sampling from π‘˜-Determinantal Point Processes

Based on joint work with

Shayan Oveis Gharan

Alireza RezaeiUniversity of Washington

Page 2: Gibbs Sampling from π‘˜-Determinantal Point Processes13-09-00)-13...Β Β· 2019-06-13Β Β· Point Process: A distribution on subsets of ={ s, t,…, }. Determinantal Point Process: There

Point Process: A distribution on subsets of 𝑁 = {1,2,… ,𝑁}.

Determinantal Point Process: There is a PSD kernel 𝐿 ∈ ℝ𝑁×𝑁 such that

βˆ€π‘† βŠ† 𝑁 : β„™ 𝑆 ∝ det 𝐿𝑆

𝑳𝑺

Page 3: Gibbs Sampling from π‘˜-Determinantal Point Processes13-09-00)-13...Β Β· 2019-06-13Β Β· Point Process: A distribution on subsets of ={ s, t,…, }. Determinantal Point Process: There

Point Process: A distribution on subsets of 𝑁 = {1,2,… ,𝑁}.

Determinantal Point Process: There is a PSD kernel 𝐿 ∈ ℝ𝑁×𝑁 such that

βˆ€π‘† βŠ† 𝑁 : β„™ 𝑆 ∝ det 𝐿𝑆

π’Œ-DPP: Conditioning of a DPP on picking subsets of size π‘˜

if 𝑆 = π‘˜: β„™ 𝑆 ∝ det 𝐿𝑆

otherwise : β„™ 𝑆 = 0

Focus of the talk:Sampling from π‘˜-

DPPs

𝑳𝑺

Page 4: Gibbs Sampling from π‘˜-Determinantal Point Processes13-09-00)-13...Β Β· 2019-06-13Β Β· Point Process: A distribution on subsets of ={ s, t,…, }. Determinantal Point Process: There

Point Process: A distribution on subsets of 𝑁 = {1,2,… ,𝑁}.

Determinantal Point Process: There is a PSD kernel 𝐿 ∈ ℝ𝑁×𝑁 such that

DPPs are Very popular probabilistic models in machine learning to capture diversity.

βˆ€π‘† βŠ† 𝑁 : β„™ 𝑆 ∝ det 𝐿𝑆

π’Œ-DPP: Conditioning of a DPP on picking subsets of size π‘˜

if 𝑆 = π‘˜: β„™ 𝑆 ∝ det 𝐿𝑆

otherwise : β„™ 𝑆 = 0

Focus of the talk:Sampling from π‘˜-

DPPs

Applications [Kulesza-Taskar’11, Dang’05, Nenkova-Vanderwende-McKeown’06, Mirzasoleiman-Jegelka-Krause’17]

β€” Image search, document and video summarization, tweet timeline generation, pose

estimation, feature selection

𝑳𝑺

Page 5: Gibbs Sampling from π‘˜-Determinantal Point Processes13-09-00)-13...Β Β· 2019-06-13Β Β· Point Process: A distribution on subsets of ={ s, t,…, }. Determinantal Point Process: There

Continuous Domain

Input: PSD operator 𝐿: π’ž Γ— π’ž β†’ ℝand π‘˜

select a subset 𝑆 βŠ‚ π’ž with π‘˜ points from a distribution with PDF function

𝑝(𝑆) ∝ det 𝐿(π‘₯, 𝑦) π‘₯,π‘¦βˆˆπ‘†

Page 6: Gibbs Sampling from π‘˜-Determinantal Point Processes13-09-00)-13...Β Β· 2019-06-13Β Β· Point Process: A distribution on subsets of ={ s, t,…, }. Determinantal Point Process: There

Continuous Domain

Input: PSD operator 𝐿: π’ž Γ— π’ž β†’ ℝand π‘˜

select a subset 𝑆 βŠ‚ π’ž with π‘˜ points from a distribution with PDF function

𝑝(𝑆) ∝ det 𝐿(π‘₯, 𝑦) π‘₯,π‘¦βˆˆπ‘†

Applications.

β€” Hyper-parameter tuning [Dodge-Jamieson-Smith’17]

β€” Learning mixture of Gaussians[Affandi-Fox-Taskar’13]

Ex. Gaussian : 𝐿 π‘₯, 𝑦 = exp βˆ’π‘₯βˆ’π‘¦ Ξ£βˆ’1 π‘₯βˆ’π‘¦

2

Page 7: Gibbs Sampling from π‘˜-Determinantal Point Processes13-09-00)-13...Β Β· 2019-06-13Β Β· Point Process: A distribution on subsets of ={ s, t,…, }. Determinantal Point Process: There

Random Scan Gibbs Sampler for 𝐾-DPP

1. Stay at the current state 𝑆 = {π‘₯1, … π‘₯π‘˜} with prob1

2.

2. Choose π‘₯𝑖 ∈ 𝑆 u.a.r

3. Choose 𝑦 βˆ‰ 𝑆 from the conditional dist πœ‹ . 𝑆 βˆ’ π‘₯𝑖 is chosen)

Continuous: PDF 𝑦 ∝ πœ‹ π‘₯1, … π‘₯π‘–βˆ’1, 𝑦, π‘₯𝑖+1, … , π‘₯π‘˜ )S ∈

[𝑁]

π‘˜

y

π‘₯𝑖

Page 8: Gibbs Sampling from π‘˜-Determinantal Point Processes13-09-00)-13...Β Β· 2019-06-13Β Β· Point Process: A distribution on subsets of ={ s, t,…, }. Determinantal Point Process: There

Main Result

Given a π‘˜-DPP πœ‹, an β€œapproximate” sample from πœ‹ can be generated by running the

Gibbs sampler for 𝝉 = ෩𝑢 π’ŒπŸ’ β‹… π₯𝐨𝐠 (π―πšπ«π…π’‘π

𝒑𝝅) steps where πœ‡ is the starting dist.

Page 9: Gibbs Sampling from π‘˜-Determinantal Point Processes13-09-00)-13...Β Β· 2019-06-13Β Β· Point Process: A distribution on subsets of ={ s, t,…, }. Determinantal Point Process: There

Main Result

Given a π‘˜-DPP πœ‹, an β€œapproximate” sample from πœ‹ can be generated by running the

Gibbs sampler for 𝝉 = ෩𝑢 π’ŒπŸ’ β‹… π₯𝐨𝐠 (π―πšπ«π…π’‘π

𝒑𝝅) steps where πœ‡ is the starting dist.

Discrete: A simple greedy initialization gives 𝜏 = 𝑂 π‘˜5log π‘˜ . Total running time is 𝑂 𝑁 . poly π‘˜ .

Does not improve upon the previous MCMC methods. [Anari-Oveis Gharan-R’16]

Mixing time is independent of 𝑁, so the running time in distributed settings is sublinear.

Page 10: Gibbs Sampling from π‘˜-Determinantal Point Processes13-09-00)-13...Β Β· 2019-06-13Β Β· Point Process: A distribution on subsets of ={ s, t,…, }. Determinantal Point Process: There

Main Result

Given a π‘˜-DPP πœ‹, an β€œapproximate” sample from πœ‹ can be generated by running the

Gibbs sampler for 𝝉 = ෩𝑢 π’ŒπŸ’ β‹… π₯𝐨𝐠 (π―πšπ«π…π’‘π

𝒑𝝅) steps where πœ‡ is the starting dist.

Discrete: A simple greedy initialization gives 𝜏 = 𝑂 π‘˜5log π‘˜ . Total running time is 𝑂 𝑁 . poly π‘˜ .

Does not improve upon the previous MCMC methods. [Anari-Oveis Gharan-R’16]

Mixing time is independent of 𝑁, so the running time in distributed settings is sublinear.

Continuous: Given access to conditional oracles, πœ‡ can be found so 𝜏 = 𝑂(π‘˜5log π‘˜).

First algorithm with a theoretical guarantee for sampling from continuous π‘˜-DPP.

Being able to run the chain.

Page 11: Gibbs Sampling from π‘˜-Determinantal Point Processes13-09-00)-13...Β Β· 2019-06-13Β Β· Point Process: A distribution on subsets of ={ s, t,…, }. Determinantal Point Process: There

Main Result

Given a π‘˜-DPP πœ‹, an β€œapproximate” sample from πœ‹ can be generated by running the

Gibbs sampler for 𝝉 = ෩𝑢 π’ŒπŸ’ β‹… π₯𝐨𝐠 (π―πšπ«π…π’‘π

𝒑𝝅) steps where πœ‡ is the starting dist.

Discrete: A simple greedy initialization gives 𝜏 = 𝑂 π‘˜5log π‘˜ . Total running time is 𝑂 𝑁 . poly π‘˜ .

Does not improve upon the previous MCMC methods. [Anari-Oveis Gharan-R’16]

Mixing time is independent of 𝑁, so the running time in distributed settings is sublinear.

Continuous: Given access to conditional oracles, πœ‡ can be found so 𝜏 = 𝑂(π‘˜5log π‘˜).

First algorithm with a theoretical guarantee for sampling from continuous π‘˜-DPP.

Using a rejection sampler as the conditional oracles for Gaussian kernels 𝐿 π‘₯, 𝑦 = exp(βˆ’π‘₯βˆ’π‘¦ 2

𝜎2)

defined a unit sphere in ℝ𝑑, the total running time is

β€’ If π‘˜ =poly(d): poly(𝑑, 𝜎)

β€’ If π‘˜ ≀ 𝑒𝑑1βˆ’π›Ώ

and 𝜎 = 𝑂 1 : poly 𝑑 β‹… π‘˜π‘‚(1

𝛿)

Being able to run the chain.

Page 12: Gibbs Sampling from π‘˜-Determinantal Point Processes13-09-00)-13...Β Β· 2019-06-13Β Β· Point Process: A distribution on subsets of ={ s, t,…, }. Determinantal Point Process: There