Vempala - IAS Video Lectures
Download
Report
Transcript Vempala - IAS Video Lectures
Cool with a Gaussian:
An πβ (π3 ) volume algorithm
Ben Cousins and Santosh Vempala
The Volume Problem
Given a measurable, compact set K in n-dimensional space, find a
number A such that:
1 β π vol πΎ β€ π΄ β€ 1 + π vol πΎ
K is given by
ο½ a point π₯0 β πΎ, s.t. π₯0 + π΅π β πΎ
ο½ a membership oracle: answers YES/NO to βπ₯ β πΎ? "
Volume: first attempt
ο½
Divide and conquer:
ο½
Difficulty: number of parts grows exponentially in n.
More generally: Integration
Input: integrable function f: π
π β π
+ specified by an
oracle, point x, error parameter Ξ΅ .
Output: number A such that:
1 β π β« π β€ π΄ β€ (1 + π)β« π
Volume is the special case when f is a 0-1 function.
High-dimensional problems
ο½
ο½
ο½
ο½
ο½
Integration (volume)
Optimization
Learning
Rounding
Sampling
All hopelessly intractable in general, even to approximate.
High-dimensional problems
Input:
ο½ A set of points S in n-dimensional space π
π
or a distribution in π
π
ο½ A function f that maps points to real values (could be the
indicator of a set)
ο½
What is the complexity of computational problems as the
dimension grows?
ο½
Dimension = number of variables
ο½
Typically, size of input is a function of the dimension.
Structure
Q. What structure makes high-dimensional problems
computationally tractable? (i.e., solvable with polynomial
complexity)
ο½
Convexity and its extensions appear to be the
frontier of polynomial-time solvability.
Volume: second attempt: Sandwiching
Thm (John). Any convex body K has an ellipsoid E s.t.
πΈ β πΎ β ππΈ.
E = maximum volume ellipsoid contained in K.
Thm (KLS95). For a convex body K in isotropic position,
π+1
π΅π
π
ο½
βπΎβ
π π + 1 π΅π
Also a factor n sandwiching, but with a different ellipsoid.
Isotropic position and sandwiching
ο½
For any convex body K (in fact any set/distribution with
bounded second moments), we can apply an affine
transformation so that for a random point x from K :
πΈ π₯ = 0,
πΈ π₯π₯ π = πΌπ .
ο½
Thus K βlooks like a ballβ up to second moments.
ο½
How close is it really to a ball?
K lies between two balls with radii within a factor of n.
ο½
Volume via Sandwiching
ο½
The John ellipsoid can be approximated using the Ellipsoid
algorithm, s.t.
πΈ β πΎ β π1.5 πΈ
ο½
The Inertial ellipsoid can be approximated to within any
constant factor (weβll see how)
ο½
Using either one,
πΈ β πΎ β ππ(1) πΈ
ο½
Polytime algorithm, ππ
ο½
Can we do better?
π£ππ πΈ β€ π£ππ πΎ β€ ππ(π) π£ππ πΈ .
π
approximation
Complexity of Volume Estimation
Thm [E86, BF87]. For any deterministic algorithm that uses at
most ππ membership calls to the oracle for a convex body K
and computes two numbers A and B such that A β€ vol K β€ B,
there is some convex body for which the ratio B/A is at least
ππ
π log π
n
2
where c is an absolute constant.
Thm [DF88]. Computing the volume of an explicit polytope
π΄π₯ β€ π is #P-hard, even for a totally unimodular matrix A and
rational b.
Complexity of Volume Estimation
Thm [BF]. For deterministic algorithms:
# oracle calls
approximation factor
Thm [Dadush-V.13].
Matching upper bound of 1 + π
π
in time
1 π π
π
poly(π).
Randomized Volume/Integration
[DFK89]. Polynomial-time randomized algorithm that
estimates volume to within relative error 1 + π with
1
1
probability at least 1 β πΏ in time poly(n, , log
).
π
πΏ
[Applegate-K91]. Polytime randomized algorithm to
estimate integral of any (Lipshitz) logconcave function.
Volume Computation: an ongoing adventure
Dyer-Frieze-Kannan 89 23
Lovász-Simonovits 90
Applegate-K 90
L 90
DF 91
LS 93
KLS 97
5
LV 03,04
LV 06
Cousins-V. 13, 14
Power
New aspects
everything
16
localization
10
logconcave integration
10
ball walk
8
error analysis
7
multiple improvements
speedy walk, isotropy
4
annealing, wt. isoper.
4
integration, local analysis
3
Gaussian cooling
Does it work?
ο½
[Lovász-Deák 2012] implemented [LV] π4 algorithm
ο½
ο½
ο½
worked for cubes up to dimension 9
but too slow after that.
[CV13] Matlab implementation of a new algorithm
Rotated cubes
Volume: third attempt: Sampling
ο½
Pick random samples from ball/cube containing K.
Compute fraction c of sample in K.
Output c.vol(outer ball).
ο½
Need too many samples!
ο½
ο½
Volume via Sampling [DFK89]
π΅ β πΎ β π
π΅.
Let πΎπ = πΎ β© 2π/π π΅, π = 0, 1, β¦ , π = π log π
.
vol(K1 ) vol(K 2 )
vol(K m )
vol K = vol B .
β¦
.
vol(K 0 ) vol(K1 ) vol(K mβ1 )
Estimate each ratio with random samples.
Volume via Sampling
πΎπ = πΎ β© 2π/π π΅,
π = 0, 1, β¦ , π = π log π
.
vol(K1 ) vol(K 2 )
vol(K m )
vol K = vol B .
β¦
.
vol(K 0 ) vol(K1 ) vol(K mβ1 )
Claim. vol K i+1 β€ 2. vol K i .
Total #samples =
π
π. 2
π
But, how to sample?
= π β π2 .
Sampling
ο½
Generate
ο½
ο½
ο½
a uniform random point from a compact set S
or with density proportional to a function f.
Numerous applications in diverse areas: statistics,
networking, biology, computer vision, privacy, operations
research etc.
Sampling
Input: function f: π
π β π
+ specified by an oracle,
point x, error parameter Ξ΅.
Output: A point y from a distribution within distance Ξ΅
of distribution with density proportional to f.
Logconcave functions
ο½
π: π
π β π
+ is logconcave if for any π₯, π¦ β π
π ,
π ππ₯ + 1 β π π¦ β₯ π π₯ π π π¦
ο½
Examples:
ο½
ο½
ο½
ο½
ο½
1βπ
Indicator functions of convex sets are logconcave
Gaussian density function
exponential function
Level sets of f, πΏπ‘ = π₯ βΆ π π₯ β₯ π‘ , are convex.
Product, minimum, convolution preserve logconcavity
Algorithmic Applications
Given a blackbox for sampling logconcave densities, we get
efficient algorithms for:
ο½
ο½
ο½
ο½
Rounding
Convex Optimization
Volume Computation/Integration
some Learning problems
Rounding via Sampling
Sample m random points from K;
Compute sample mean and sample covariance matrix
1.
2.
ο½
3.
π΄ = πΈ( π₯ β π§ π₯ β π§ π ).
π§=πΈ π₯
1
2
β
Output π΅ = π΄ .
B(K-z) is nearly isotropic.
Thm. C(π).n random points suffice to get πΈ
π΄βπΌ
2
[Adamczak et al; improving on Bourgain, Rudelson]
I.e., for any unit vector v,
1 + π β€ πΈ π£ππ₯
2
β€ 1 + π.
β€ π.
How to Sample?
Ball walk:
At x,
-pick random y from π₯ + πΏπ΅π
-if y is in K, go to y
Hit-and-Run:
At x,
-pick a random chord L through x
-go to a random point y on L
Complexity of Sampling
Thm. [KLS97] For a convex body, the ball walk with an M-warm
start reaches an (independent) nearly random point in poly(n, R,
M) steps.
π0 π
π = sup
π π
ππ π = πΈπ0
π0 π₯
π π₯
Thm. [LV03]. Same holds for arbitary logconcave density
functions. From a warm start, complexity is πβ (π2 π
2 ).
ο½
Isotropic transformation makes R=O( π).
KLS volume algorithm: π × π × π3 = π5
Markov chains
ο½
ο½
ο½
ο½
State space K
set of measurable subsets that form a π-algebra, i.e.,
closed under finite unions and intersections
A next step distribution ππ’ . associated with each point
u in the state space.
A starting point.
π€0 , π€1 , β¦ , π€π , β¦ s.t.
π π€π β π΄ π€0 , π€1 , β¦ , π€πβ1 ) = π(π€π β π΄ | π€πβ1 )
ο½
Convergence
Stationary distribution Q, ergodic βflowβ is:
Ξ¦ π΄ =
π΄
ππ’ πΎ\A ππ(π’)
For any subset π΄, we have Ξ¦ π΄ = Ξ¦(πΎ\A)
Conductance:
β«π΄ ππ’ πΎ\A ππ(π’)
π π΄ =
min π π΄ , π πΎ\A
π = inf π(π΄)
π΄
1
Rate of convergence is bounded by 2 [LS93, JS86].
π
Conductance
Arbitrary measurable subset S.
How large is the conditional escape probability from S?
Local conductance can be arbitrarily small for the ball walk.
vol π₯ + πΏπ΅π β© πΎ
β π₯ =
vol(πΏπ΅π )
Conductance
Need:
ο½ Nearby points have overlapping one-step distributions
ο½ Large subsets have large boundaries [isoperimetry]
π
π π3 β₯ π π1 , π2 min π π1 , π π2
π
where π
2 = πΈπΎ | π₯| 2
Isoperimetry and the KLS conjecture
π
π
π π3 β₯ π(π1 , π2 ) min π π1 , π π2
A = πΈ((π₯ β π₯)(π₯ β π₯)π ) : covariance matrix of π
π
2 = πΈπ
π₯βπ₯
Thm. [KLS95].
π π3 β₯
Conj. [KLS95].
π π3 β₯
2
π
ππ π΄
π
π1 π΄
= ππ π΄ =
ππ (π΄)
π
π(π1 , π2 ) min π π1 , π(π2 )
π(π1 , π2 ) min π π1 , π(π2 )
KLS hyperplane conjecture
π΄ = πΈ(π₯π₯ π )
Conj. [KLS95].
π π3 β₯
π
π1 π΄
π(π1 , π2 ) min π π1 , π(π2 )
β’ Could improve sampling complexity by a factor of n
β’ Implies well-known conjectures in convex geometry: slicing
conjecture and thin-shell conjecture
β’ But wide open!
KLS, Slicing, Thin-shell
thin shell
slicing
KLS
current bound
π1/3
[Guedon-Milman]
π1/4
[Bourgain, Klartag]
~ π1/3
[Bobkov;
Eldan-Klartag]
All are conjectured to be O(1).
Conjectures are equivalent! [Ball, Eldan-Klartag].
Is rapid mixing possible?
Ball walk can have bad starts, but
Hit-and-run escapes from corners
Min distance isoperimetry is
too coarse
Average distance isoperimetry
ο½
ο½
How to average distance?
β π₯ β€ πππ π π’, π£ βΆ π’ β π1 , π£ β π2 , π₯ β β(π₯, π¦)
Thm.[LV04]
π π3 β₯ πΈ β π₯ π(π1 )π(π2 )
Hit-and-run mixes rapidly
ο½
Thm [LV04]. Hit-and-run mixes in polynomial time from any
starting point inside a convex body.
1
ππ·
ο½
Conductance = Ξ©
ο½
Along with isotropic transformation, gives πβ π3 sampling
algorithm.
Simulated Annealing [LV03, Kalai-V.04]
To estimate β« π consider a sequence π0 , π1 , π2 , β¦ , π = ππ
with β« π0 being easy, e.g., constant function over ball.
Then, β« π =
β« π1 β« π2
β« ππ
β« π0 .
.
β¦
.
β« π0 β« π1
β« ππβ1
Each ratio can be estimated by sampling:
1.
Sample X with density proportional to ππ
2.
Compute π =
Then, πΈ π = β«
ππ+1 π
ππ π
ππ+1 π ππ π
.
ππ π
β« ππ π
ππ =
β« ππ+1
.
β« ππ
Annealing [LV06]
ο½
Define:
ππ π = π βππ
π
ο½
π0 = 2π
, ππ+1 = ππ / 1 +
ο½
π ~ π log(2π
/π) phases
1
π
, ππ =
π
2π
β« π1 β« π2
β« ππ
ο½ π0 .
.
β¦
.
β« π0 β« π1
β« ππβ1
ο½
The final estimate could be πΞ©(π) , so each ratio could be
π π or higher. How can we estimate it with a few
samples?!
Annealing [LV03, 06]
ο½
ο½
ππ π = π βππ
π
π0 = 2π
, ππ+1 = ππ / 1 +
Lemma. ππ΄π
π =
ο½
ππ+1 π
ππ π
1
π
, ππ =
π
2π
< 4 πΈ π 2.
Although expectation of Y can be large (exponential
even), we need only a few samples to estimate it!
LoVe algorithm: π × π × π3 = π4
Variance of ratio estimator
Let π ππ , π =
Then, πΈ π = β«
πΈ π
πΈ π
2
2
π ππ+1 ,π
, π=
π ππ ,π
π ππ+1 , π π ππ ,π
.
ππ
π ππ , π
β« π ππ , π
π βππ π
=
β« π ππ+1 ,π
β« π ππ ,π
2
=
π ππ+1 , π
π ππ , π
β« π ππ , π
ππ β
2
π ππ , π
β« π ππ , π
β« π ππ+1 , π
=
β« π 2ππ+1 β ππ , π β« π ππ , π
β« π ππ+1 , π
2
πΉ π 1βπΌ πΉ π 1+πΌ
=
πΉ π 2
(would be at most 1, if F was logconcaveβ¦)
2
Variance of ratio estimator
Lemma. For any logconcave f and a >0, the function
π π = ππ β« π π, π ππ is also logconcave.
So ππ πΉ(π) is logconcave and
(ππ πΉ
π
)2
β₯ π 1βπΌ
π
πΉ(π 1 β πΌ ) π 1 + πΌ
Therefore:
πΉ π 1βπΌ πΉ π 1+πΌ
πΉ π 2
for πΌ β€
1
.
π
β€
1
1βπΌ 1+πΌ
π
β€
π
1
1βπΌ2
β€ 4.
π
πΉ(π 1 + πΌ )
Volume Computation: an ongoing adventure
Dyer-Frieze-Kannan 89 23
Lovász-Simonovits 90
Applegate-K 90
L 90
DF 91
LS 93
KLS 97
5
LV 03,04
LV 06
Cousins-V. 13, 14
Power
New aspects
everything
16
localization
10
logconcave integration
10
ball walk
8
error analysis
7
multiple improvements
speedy walk, isotropy
4
annealing, wt. isoper.
4
integration, local analysis
3
Gaussian cooling
Gaussian sampling/volume
ο½
Sample from Gaussian restricted to K
Compute Gaussian measure of K
ο½
Anneal with a Gaussian
ο½
ο½
Define
ππ π = π
β
π
2
2π2
π
1
, increase
π
ο½
Start with π0 small ~
ο½
Compute ratios of integrals of consecutive phases:
β« ππ+1
β« ππ
in phases.
Gaussian sampling
ο½
KLS conjecture holds for Gaussian restricted to any convex
body (via Brascamp-Lieb inequality).
Thm. π π3 β₯
ο½
π
π
π(π1 , π2 ) min π π1 , π(π2 )
Not enough on its own, but can be used to show:
Thm. [Cousins-V. 13]. For π 2 = π 1 , Ball walk applied to
Gaussian π(0, π 2 πΌπ ) restricted to any convex body containing
the unit ball mixes in πβ π2 time from a warm start.
Speedy walk: a thought experiment
ο½
Take sequence of points visited by ball walk:
π€0 , π€1 , π€2 , π€3 , β¦ , π€π , π€π+1 , π€π+3 β¦
ο½
Subsequence of βproperβ attempts that stay inside K
ο½
This subsequence is a Markov chain and is rapidly mixing
from any point
ο½
For a warm start, the total number of steps is only a
constant factor higher
Gaussian volume
ο½
Theorem [Cousins-V.13] The Gaussian volume of a
convex body K containing the unit ball can be estimated
in time πβ π3 .
ο½
ο½
No need to adjust for isotropy!
Each step samples a 1-d Gaussian from an interval
ο½
Can we use this to compute the volume?
Gaussian Cooling
ο½
ππ π = π
2
ο½ π0
ο½
=
β
1
2
, ππ
π
Estimate
ππ2
π
2
2π2
π
=π π .
β« ππ+1
β« ππ
using samples drawn according to ππ
β€ 1, set
ππ2
=
2
ππβ1
1+
ο½
For
ο½
2
For ππ2 > 1, set ππ2 = ππβ1
1+
1
π
ππβ1
π
Gaussian Cooling
ο½
ππ π = π
β
π
2
2π2
π
2
For ππ2 β€ 1, we set ππ2 = ππβ1
1+
1
π
ο½
Sampling time: π2 ,
#phases, #samples per phase: π
ο½
So, total time = π2 × π × π = π3
ο½
Gaussian Cooling
ο½
ππ π = π
For
ο½
ο½
ο½
ο½
ππ2
β
π
2
2π2
π
> 1, we set
ππ2
=
Sampling time: π 2 π2
#phases to double π is
2
ππβ1
1+
ππβ1
π
(too much??)
π
π
#samples per phase is also
π
π
So, total time to double π is
π
π
×
π
π
× π 2 π2 = π3 !!!
Variance of ratio estimator
ο½
2
Why can we set ππ2 as high as ππβ1
1+
2
π₯
β 2
2π
ππβ1
π
?
π ππ2 , π₯ = π
for π₯ β πΎ
πΉ ππ2 = β« π ππ2 , π₯ ππ₯
Lemma. π =
π π2 , π
π2
π 1+πΌ, π
πΈ π2
πΈ π 2
for πΌ = π
π
π
.
πΈ π =
πΉ π2
πΉ
π2
1+πΌ
π2
π2
πΉ
πΉ
1+πΌ
1βπΌ
π
=
=
π
πΉ π2 2
πΌ2 π
π2
=π 1
Variance of ratio estimator
πΈ π2
πΈ π 2
π2
π2
πΉ 1+πΌ πΉ 1βπΌ
π
=
=π
2
2
πΉ π
First use localization to reduce to 1-d inequality,
for a restricted family of logconcave functions:
For πΎ β π
β
π΅π and βπ
β€ π β€ π’ β€ π
π’
πΊ π2 =
π‘+π
t2
πβ1 π β2π 2 dt
π
π2
π2
πΊ 1+πΌ πΊ 1βπΌ
π
=π
2
2
πΊ π
πΌ2 π
2
π2
πΌ2 π
π2
Variance of ratio estimator
π2
π2
πΊ 1+πΌ πΊ 1βπΌ
π
=π
2
2
πΊ π
πΌ2 π
2
π2
where
πΈ
π‘2
=
π’ 2
β«π π‘
π’
β«π
π‘+π
π‘+π
β
ππΈ π‘ 2
β€π
2
dπ
π‘2
πβ1 β2π 2
π
ππ‘
π‘2
πβ1 π β2π 2 ππ‘
Warm start
ο½
With
ππ2
=
2
ππβ1
1
ππβ1
+
π
, random point from one
distribution gives a warm start for the next.
π ππ2 , π₯ = π
π = πΈπ
for πΌ = π
2
π₯
β 2
2π
ππ π₯
ππ+1 π₯
π
π
for π₯ β πΎ
ππ π₯
πΉ ππ2 = β« π ππ2 , π₯ ππ₯
β« ππ+1 π₯ ππ π₯ ππ₯
=
β
β
ππ+1 π₯
β« ππ π₯
β« ππ π₯
1+πΌ
πΉ π 2 (1 + πΌ) πΉ π 2 β
1 + 2πΌ = π 1
=
πΉ π2 2
. Same lemma!
Gaussian Cooling [CV14]
ο½
Accelerated annealing
π
π
ο½
1+
ο½
Thm. The volume of any well-rounded convex body K can
be estimated using πβ (π3 ) membership queries.
is best possible rate
CV algorithm:
π
π
×
π
π
× π 2 π2 × log π = πβ (π3 )
Practical volume/integration
Start with a concentrated Gaussian
Run the algorithm till the Gaussian is nearly flat
In each phase, flatten Gaussian as much as possible while keeping
variance of ratio of integrals bounded
Variance can be estimated with a small constant number of samples
If covariance is skewed (as seen by SVD of O(n) points), scale down
high variance subspace
βAdaptiveβ annealing (also used in [Stefankovic-Vigoda-V.] for discrete
problems)
Open questions
ο½
How true is the KLS conjecture?
Open questions
When to stop a random walk?
(how to decide if current point is βrandomβ?)
ο½
ο½
Faster isotropy/rounding?
ο½
How to get information before reaching stationary?
To make isotropic:
run for N steps; transform using covariance; repeat.
Open questions
ο½
How efficiently can we learn a polytope given only random
points?
ο½
With O(mn) points, cannot βseeβ structure, but enough
information to estimate the polytope! Algorithms?
For convex bodies:
ο½
ο½
ο½
[KOS][GR] need 2Ξ© π points
π
[Eldan] need 2π even to estimate volume!
Open questions
ο½
Can we estimate the volume of an explicit polytope in
deterministic polynomial time?
π΄π₯ β€ π