gibbs sampling from π-determinantal point processes13-09-00)-13...Β Β· 2019-06-13Β Β· point...
TRANSCRIPT
Gibbs Sampling from π-Determinantal Point Processes
Based on joint work with
Shayan Oveis Gharan
Alireza RezaeiUniversity of Washington
Point Process: A distribution on subsets of π = {1,2,β¦ ,π}.
Determinantal Point Process: There is a PSD kernel πΏ β βπΓπ such that
βπ β π : β π β det πΏπ
π³πΊ
Point Process: A distribution on subsets of π = {1,2,β¦ ,π}.
Determinantal Point Process: There is a PSD kernel πΏ β βπΓπ such that
βπ β π : β π β det πΏπ
π-DPP: Conditioning of a DPP on picking subsets of size π
if π = π: β π β det πΏπ
otherwise : β π = 0
Focus of the talk:Sampling from π-
DPPs
π³πΊ
Point Process: A distribution on subsets of π = {1,2,β¦ ,π}.
Determinantal Point Process: There is a PSD kernel πΏ β βπΓπ such that
DPPs are Very popular probabilistic models in machine learning to capture diversity.
βπ β π : β π β det πΏπ
π-DPP: Conditioning of a DPP on picking subsets of size π
if π = π: β π β det πΏπ
otherwise : β π = 0
Focus of the talk:Sampling from π-
DPPs
Applications [Kulesza-Taskarβ11, Dangβ05, Nenkova-Vanderwende-McKeownβ06, Mirzasoleiman-Jegelka-Krauseβ17]
β Image search, document and video summarization, tweet timeline generation, pose
estimation, feature selection
π³πΊ
Continuous Domain
Input: PSD operator πΏ: π Γ π β βand π
select a subset π β π with π points from a distribution with PDF function
π(π) β det πΏ(π₯, π¦) π₯,π¦βπ
Continuous Domain
Input: PSD operator πΏ: π Γ π β βand π
select a subset π β π with π points from a distribution with PDF function
π(π) β det πΏ(π₯, π¦) π₯,π¦βπ
Applications.
β Hyper-parameter tuning [Dodge-Jamieson-Smithβ17]
β Learning mixture of Gaussians[Affandi-Fox-Taskarβ13]
Ex. Gaussian : πΏ π₯, π¦ = exp βπ₯βπ¦ Ξ£β1 π₯βπ¦
2
Random Scan Gibbs Sampler for πΎ-DPP
1. Stay at the current state π = {π₯1, β¦ π₯π} with prob1
2.
2. Choose π₯π β π u.a.r
3. Choose π¦ β π from the conditional dist π . π β π₯π is chosen)
Continuous: PDF π¦ β π π₯1, β¦ π₯πβ1, π¦, π₯π+1, β¦ , π₯π )S β
[π]
π
y
π₯π
Main Result
Given a π-DPP π, an βapproximateβ sample from π can be generated by running the
Gibbs sampler for π = ΰ·©πΆ ππ β π₯π¨π (π―ππ«π ππ
ππ ) steps where π is the starting dist.
Main Result
Given a π-DPP π, an βapproximateβ sample from π can be generated by running the
Gibbs sampler for π = ΰ·©πΆ ππ β π₯π¨π (π―ππ«π ππ
ππ ) steps where π is the starting dist.
Discrete: A simple greedy initialization gives π = π π5log π . Total running time is π π . poly π .
Does not improve upon the previous MCMC methods. [Anari-Oveis Gharan-Rβ16]
Mixing time is independent of π, so the running time in distributed settings is sublinear.
Main Result
Given a π-DPP π, an βapproximateβ sample from π can be generated by running the
Gibbs sampler for π = ΰ·©πΆ ππ β π₯π¨π (π―ππ«π ππ
ππ ) steps where π is the starting dist.
Discrete: A simple greedy initialization gives π = π π5log π . Total running time is π π . poly π .
Does not improve upon the previous MCMC methods. [Anari-Oveis Gharan-Rβ16]
Mixing time is independent of π, so the running time in distributed settings is sublinear.
Continuous: Given access to conditional oracles, π can be found so π = π(π5log π).
First algorithm with a theoretical guarantee for sampling from continuous π-DPP.
Being able to run the chain.
Main Result
Given a π-DPP π, an βapproximateβ sample from π can be generated by running the
Gibbs sampler for π = ΰ·©πΆ ππ β π₯π¨π (π―ππ«π ππ
ππ ) steps where π is the starting dist.
Discrete: A simple greedy initialization gives π = π π5log π . Total running time is π π . poly π .
Does not improve upon the previous MCMC methods. [Anari-Oveis Gharan-Rβ16]
Mixing time is independent of π, so the running time in distributed settings is sublinear.
Continuous: Given access to conditional oracles, π can be found so π = π(π5log π).
First algorithm with a theoretical guarantee for sampling from continuous π-DPP.
Using a rejection sampler as the conditional oracles for Gaussian kernels πΏ π₯, π¦ = exp(βπ₯βπ¦ 2
π2)
defined a unit sphere in βπ, the total running time is
β’ If π =poly(d): poly(π, π)
β’ If π β€ ππ1βπΏ
and π = π 1 : poly π β ππ(1
πΏ)
Being able to run the chain.