talagrand-ias - IAS Video Lectures

Download Report

Transcript talagrand-ias - IAS Video Lectures

Talagrand’s convolution conjecture and
anti-concentration of temperature
James R. Lee
University of Washington
Joint work with Ronen Eldan [MSR, University of Washington,…]
noise and smoothness
𝑓 ∢ βˆ’1,1
𝑛
→ℝ
The β€œheat flow” operator π‘‡πœ– ∢ 𝐿2 βˆ’1,1
is defined, for πœ– ∈ [0,1], by:
𝑛
β†’ 𝐿2 βˆ’1,1
𝑛
π‘‡πœ– 𝑓 π‘₯ = 𝔼 𝑓 π‘₯1πœ– , π‘₯2πœ– , … , π‘₯π‘›πœ–
Where π‘₯π‘–πœ– = π‘₯𝑖 with probability 1 βˆ’ πœ– and π‘₯π‘–πœ– = ±1
with probability πœ–/2 each, independently for each 𝑖.
General principle:
𝑓 smooth β‡’ π‘‡πœ– 𝑓 smoother
Many applications:
PCPs, hardness of approximation, social choice,
circuit complexity, information theory,
learning, data structures, ...
noise and smoothness
Hypercontractive inequality [Bonami, Gross, Nelson]:
For πœ– > 0:
π‘‡πœ– ∢ 𝐿𝑝 βˆ’1,1 𝑛 β†’ πΏπ‘ž βˆ’1,1 𝑛
is a contraction for some π‘ž > 𝑝 > 1.
𝑓
𝑝
small for 𝑝 > 1 β‡’ π‘‡πœ– 𝑓
π‘ž
small for π‘ž > 𝑝
If 𝑓 = indicator of a set, then this is encodes
β€œsmall set expansion.”
noise and smoothness
Relative entropy: For 𝑓 ∢ βˆ’1,1
𝑛
β†’ ℝ+ with 𝔼 𝑓 = 1,
Ent(𝑓) = 𝔼 𝑓 log 𝑓 = π”Όπ‘βˆΌπ‘“ [log 𝑓 𝑍 ]
Gradient: 𝔼 𝛻𝑓
2
=
𝑛
𝑖=1 𝔼[
𝑓 π‘₯ βŠ• 𝑒𝑖 βˆ’ 𝑓 π‘₯
Log-Sobolev inequality: For every such 𝑓 ∢ βˆ’1,1
Ent(𝑓) ≀ 𝔼 𝛻 𝑓
𝑛
2
]
β†’ ℝ+
2
𝑑
β‰ˆ βˆ’ Ent(π‘‡πœ– 𝑓)
π‘‘πœ–
πœ–=0
noise and hypercontractivity
Hypercontractive inequality:
For πœ– > 0 and 𝑝 > 1, there is a π‘ž > 𝑝 such that
π‘‡πœ– 𝑓
for all 𝑓 ∢ βˆ’1,1
Log-Sobolev inequality:
𝑛
π‘ž
≀ 𝑓
𝑝
β†’ ℝ.
Ent(𝑓) ≀ 𝔼 𝛻 𝑓
2
Talagrand (1989): What about smoothing for arbitrary 𝑓?
the convolution conjecture
𝑓 ∢ βˆ’1,1
𝑛
β†’ ℝ+ and 𝔼 𝑓 = 1
Markov’s inequality: β„™ 𝑓 β‰₯ 𝛼 ≀
1
𝛼
(tight for 𝑓 = scaled indicator on a set of measure 1/𝛼)
Convolution conjecture [Talagrand 1989, $1,000]:
For every πœ– > 0, there exists πœ™: ℝ+ β†’ ℝ+ so that for every 𝑓,
β„™ π‘‡πœ– 𝑓 β‰₯ 𝛼 ≀
πœ™ 𝛼
𝛼
and πœ™ 𝛼 β†’ 0 as 𝛼 β†’ ∞
- Best function is probably πœ™ 𝛼 ∼
1
log 𝛼
(achieved for halfspaces)
- Conjecture unresolved for any fixed πœ– > 0, 𝑓 = πŸπ‘† , 𝑆 βŠ† {βˆ’1,1}𝑛
anti-concentration of temperature
Convolution conjecture [Talagrand 1989]:
For every πœ– > 0, there exists πœ™: ℝ+ β†’ ℝ+ so that for every 𝑓,
β„™ π‘‡πœ– 𝑓 β‰₯ 𝛼 ≀
πœ™ 𝛼
𝛼
and πœ™ 𝛼 β†’ 0 as 𝛼 β†’ ∞
Equivalent to the conjecture that
𝔼 π‘‡πœ– 𝑓 1 π‘‡πœ– π‘“βˆˆ 𝛼,2𝛼 ≀ πœ™ 𝛼
β€œTemperature” cannot concentrate at a single (high) level
the Gaussian case
𝑓 ∢ ℝ𝑛 β†’ ℝ+ and 𝔼 𝑓 = ∫ 𝑓𝑑𝛾𝑛 = 1 (gaussian measure)
Let 𝐡𝑑 be an 𝑛-dimensional Brownian motion with 𝐡𝑑 = 0.
the Gaussian case
𝑓 ∢ ℝ𝑛 β†’ ℝ+ and 𝔼 𝑓 = ∫ 𝑓𝑑𝛾𝑛 = 1 (gaussian measure)
Let 𝐡𝑑 be an 𝑛-dimensional Brownian motion with 𝐡0 = 0.
Brownian semi-group: 𝑃𝑑 𝑓 π‘₯ = 𝔼[𝑓 π‘₯ + 𝐡𝑑 ]
𝑃0 = identity map
𝑃1 = standard 𝑛-dim. Gaussian avg
the Gaussian case
𝑓 ∢ ℝ𝑛 β†’ ℝ+ and 𝔼 𝑓 = ∫ 𝑓 𝑑𝛾𝑛 = 1
Brownian semi-group: 𝑃𝑑 𝑓 π‘₯ = 𝔼[𝑓 π‘₯ + 𝐡𝑑 ]
Gaussian convolution conjecture: For every 𝑑 < 1, there is a
πœ™: ℝ+ β†’ ℝ+ so that for every 𝑓,
β„™ 𝑃1βˆ’π‘‘ 𝑓 𝐡𝑑 β‰₯ 𝛼 ≀
-
πœ™ 𝛼
𝛼
and πœ™ 𝛼 β†’ 0 as 𝛼 β†’ ∞
Special case of discrete cube conjecture
Previously unknown for any 𝑑
𝑛 = 1 is an exercise
True in any fixed dimension
[Ball, Barthe, Bednorz, Oleszkiewicz, and Wolff, 2010]
isoperimetric (dual) version
Fix 𝑑 > 0 and consider a subset 𝑆 βŠ† ℝ𝑛 .
Consider 𝑔 ∢ ℝ𝑛 β†’ ℝ+ supported on 𝑆
such that π‘ˆπ‘‘ 𝑔
∞
≀ 1.
Goal: Maximize ∫ 𝑔 𝑑𝛾𝑛 subject to these
constraints.
𝑔 = πŸπ‘† gets ∫ 𝑔 𝑑𝛾𝑛 = 𝛾𝑛 (𝑆)
Conjecture [restated]:
One can achieve ∫ 𝑔 𝑑𝛾𝑛 ≫ 𝛾𝑛 (𝑆)
as 𝛾𝑛 𝑆 β†’ 0
𝑆 βŠ† ℝ𝑛
the Gaussian case
Theorem [Eldan-L 2014]:
If 𝑓 ∢ ℝ𝑛 β†’ ℝ+ satisfies 𝔼 𝑓 = 1 and
𝛻 2 log 𝑓 π‘₯ ≽ βˆ’π›½ 𝐼𝑑
then for all 𝛼 β‰₯ 2:
for all π‘₯ ∈ ℝ𝑛
1 𝐢𝛽 log log 𝛼
ℙ𝑓β‰₯𝛼 ≀ β‹…
𝛼
log 𝛼
4
the Gaussian case
Theorem [Eldan-L 2014]:
If 𝑓 ∢ ℝ𝑛 β†’ ℝ+ satisfies 𝔼 𝑓 = 1 and
𝛻 2 log 𝑓 π‘₯ ≽ βˆ’π›½ 𝐼𝑑
for all π‘₯ ∈ ℝ𝑛
then for all 𝛼 β‰₯ 8:
1 𝐢𝛽 log log 𝛼
ℙ𝑓β‰₯𝛼 ≀ β‹…
𝛼
log 𝛼
4
Corollary: If 𝑓 ∢ ℝ𝑛 β†’ ℝ+ satisfies 𝔼 𝑓 = 1 then for any
𝑑 < 1 and all 𝛼 β‰₯ 2,
1 𝐢 log log 𝛼 4
β„™ 𝑃1βˆ’π‘‘ 𝑓 𝐡𝑑 β‰₯ 𝛼 ≀
β‹…
𝛼 (1 βˆ’ 𝑑) log 𝛼
the Gaussian case
Theorem [Eldan-L 2014]:
If 𝑓 ∢ ℝ𝑛 β†’ ℝ+ satisfies 𝔼 𝑓 = 1 and
𝛻 2 log 𝑓 π‘₯ ≽ βˆ’π›½ 𝐼𝑑
for all π‘₯ ∈ ℝ𝑛
then
all any
𝛼 β‰₯𝑓 2:
Fact:forFor
with 𝔼 𝑓 = 1,
1 𝐢𝛽 log
log
𝛼
1
β„™
𝑓
β‰₯
𝛼
≀
β‹…
2
𝛻 log 𝑃1βˆ’π‘‘ 𝑓(π‘₯)
𝐼𝑑
𝛼 ≽ βˆ’ 1 βˆ’log
𝑑 𝛼
4
Corollary: If 𝑓 ∢ ℝ𝑛 β†’ ℝ+ satisfies 𝔼 𝑓 = 1 then for any
𝑑 < 1 and all 𝛼 β‰₯ 2,
1 𝐢 log log 𝛼 4
β„™ 𝑃1βˆ’π‘‘ 𝑓 𝐡𝑑 β‰₯ 𝛼 ≀
β‹…
𝛼 (1 βˆ’ 𝑑) log 𝛼
some difficulties
Corollary: If 𝑓 ∢ ℝ𝑛 β†’ ℝ+ satisfies 𝔼 𝑓 = 1 then for any
𝑑 < 1 and all 𝛼 β‰₯ 2,
β„™ 𝑃1βˆ’π‘‘ 𝑓 𝐡𝑑 β‰₯ 𝛼 ≀
1
𝛼
𝐢(𝑑) log log 𝛼 4
β‹…
log 𝛼
What are the difficult functions 𝑓?
half space
Good:
noise insensitive
Bad:
boundary far from
origin
dust
proof ideas
𝔼 𝑓 = 1,
𝛻 2 log 𝑓 π‘₯ ≽ βˆ’π›½ 𝐼𝑑
𝑀𝑑 = 𝑃1βˆ’π‘‘ 𝑓 𝐡𝑑 is a (Doob) martingale
𝑀0 = 𝔼 𝑓 = 1
𝑀1 = 𝑓 𝐡1
Goal: β„™ 𝑀1 > 𝛼 β‰ͺ
1
𝛼
arguing about small-probability events = annoying
random measure
conditioning
π‘Šπ‘‘ = 𝐡𝑑 conditioned on 𝐡1 ∼ 𝑓𝑑𝛾𝑛
Goal:
β„™ 𝑓 𝐡1 ∈ 𝛼, 2𝛼
β‰ͺ 1/𝛼
Suffices to prove that:
β„™ 𝑓 π‘Š1 ∈ 𝛼, 2𝛼
= π‘œ(1)
because
β„™ 𝑓 π‘Š1 ∈ 𝛼, 2𝛼
∼ 𝛼 β„™ 𝑓 𝐡1 ∈ 𝛼, 2𝛼
π‘Šπ‘‘ as an Itô process: Föllmer’s drift
Consider a process π‘Šπ‘‘ with π‘Š0 = 0, and
π‘‘π‘Šπ‘‘ = 𝑑𝐡𝑑 + 𝑣𝑑 𝑑𝑑
where 𝑣𝑑 is predictable (deterministic function of {𝐡𝑠 : 𝑠 ∈ [0, 𝑑]})
Integrating: π‘Šπ‘‘ =
𝑑
𝐡𝑑 + ∫0 𝑣𝑠
𝑑𝑠 = Brownian motion + drift
Among all such drifts satisfying π‘Š1 = 𝐡1 +
let 𝑣𝑑 be the one which minimizes
1
𝔼
0
𝑣𝑑
2
𝑑𝑑
1
∫0 𝑣𝑑
𝑑𝑑 ∼ 𝑓 𝑑𝛾𝑛 ,
π‘Šπ‘‘ as an Itô process: Föllmer’s drift
Among all such drifts satisfying π‘Š1 = 𝐡1 +
let 𝑣𝑑 be the one which minimizes
1
𝔼
0
𝑣𝑑
2
𝑑𝑑
1
∫0 𝑣𝑑
𝑑𝑑 ∼ 𝑓 𝑑𝛾𝑛 ,
π‘Šπ‘‘ as an Itô process: Föllmer’s drift
Among all such drifts satisfying π‘Š1 = 𝐡1 +
let 𝑣𝑑 be the one which minimizes
1
𝔼
0
𝑣𝑑
2
1
∫0 𝑣𝑑
𝑑𝑑 ∼ 𝑓 𝑑𝛾𝑛 ,
𝑑𝑑
π‘Šπ‘‘
Lemma: 𝑣𝑑 is martingale.
Explicit form: 𝑣𝑑 = 𝛻 log 𝑃1βˆ’π‘‘ 𝑓 π‘Šπ‘‘ =
Theorem [Lehec 2010]:
1
Ent(𝑓) = 𝔼
2
𝛻𝑃1βˆ’π‘‘ 𝑓 π‘Šπ‘‘
𝑃1βˆ’π‘‘ 𝑓 π‘Šπ‘‘
1
0
𝑣𝑑
2
𝑑𝑑
an optimal coupling
{𝐡𝑑 } 𝑛-dim Brownian motion, 𝑓 ∢ ℝ𝑛 β†’ ℝ+ with 𝔼𝑓 = 1
Construct π‘Šπ‘‘ so that π‘Š1 ∼ 𝑓𝑑𝛾𝑛
π‘Š0 = 0
π‘‘π‘Šπ‘‘ = 𝑑𝐡𝑑 + 𝑣𝑑 𝑑𝑑
𝛻𝑃1βˆ’π‘‘ 𝑓 π‘Šπ‘‘
𝑣𝑑 =
𝑃1βˆ’π‘‘ π‘Šπ‘‘
is a martingale
an optimal coupling
{𝐡𝑑 } 𝑛-dim Brownian motion, 𝑓 ∢ ℝ𝑛 β†’ ℝ+ with 𝔼𝑓 = 1
Construct π‘Šπ‘‘ so that π‘Š1 ∼ 𝑓𝑑𝛾𝑛
π‘Š0 = 0
π‘‘π‘Šπ‘‘ = 𝑑𝐡𝑑 + 𝑣𝑑 𝑑𝑑
𝛻𝑃1βˆ’π‘‘ 𝑓 π‘Šπ‘‘
𝑣𝑑 =
𝑃1βˆ’π‘‘ π‘Šπ‘‘
is a martingale
Multi-granular geometry of 𝑓
reflected in {𝑣𝑑 }:
For all 𝑑 ∈ 0,1 ,
𝔼 𝑣𝑑
2
2
= ∫ 𝛻 𝑃1βˆ’π‘‘ 𝑓 𝑑𝛾𝑛
proof sketch
Suffices to prove that:
β„™ 𝑓 π‘Š1 ∈ 𝛼, 2𝛼
= π‘œ(1)
Idea:
Suppose that β„™ 𝑓 π‘Š1 ∈ 𝛼, 2𝛼 β‰₯ 𝑝, then
β„™ 𝑓 π‘Š1 ∈ 2𝛼, 4𝛼 β‰₯ 𝑝
β„™ 𝑓 π‘Š1 ∈ 4𝛼, 8𝛼 β‰₯ 𝑝
β‹―
log 𝛼
levels
Making 𝑓 bigger: We’ll use 𝛻 2 log 𝑓 π‘₯ ≽ βˆ’π›½ 𝐼𝑑
log 𝑓 π‘Š1 + 𝑒 β‰₯ log 𝑓 π‘Š1 + βŒ©π‘’, 𝛻 log 𝑓 π‘Š1 βŒͺ βˆ’π›½ 𝑒
= log 𝑓 π‘Š1 + βŒ©π‘’, 𝑣1 βŒͺ βˆ’ 𝛽 𝑒
2
2
proof sketch
Pushing π‘Š1 in the direction of the drift at time 𝑑 = 1:
𝑓 π‘Š1 + 𝑒 β‰₯ 𝑓 π‘Š1 exp 𝑒, 𝑣1 βˆ’ 𝛽 𝑒 2
Setting 𝑒 = 𝛿𝑣1 (𝛿 small) multiplies the value of 𝑓.
want to say that π‘Š1 could do it
Girsanov’s theorem
Consider any Itô process 𝑑𝑋𝑑 = 𝑑𝐡𝑑 + 𝑣𝑑 𝑑𝑑
*under suitable conditions
Let 𝑃 be the BM measure of 𝐡𝑑 and 𝑄 be the BM measure of 𝑋𝑑 .
Then under the change of measure:
𝑑𝑄
= exp βˆ’
𝑑𝑃
1
0
1
𝑣𝑑 , 𝑑𝐡𝑑 βˆ’
0
𝑣𝑑
{𝑋𝑑 ∢ 𝑑 ∈ [0,1]} has the law of Brownian motion.
2
𝑑𝑑
the masochistic part
π‘‘π‘Šπ‘‘ = 𝑑𝐡𝑑 + 𝑣𝑑 𝑑𝑑
𝑣𝑑 = 𝛻 log 𝑃1βˆ’π‘‘ 𝑓 π‘Šπ‘‘
𝑑𝑋𝑑𝛿 = 𝑑𝐡𝑑 + 1 + 𝛿 𝑣𝑑 𝑑𝑑
Now we can argue that π‘Šπ‘‘ β‰ˆ 𝑋𝑑𝛿 (Girsanov’s theorem).
What about 𝑓 𝑋1𝛿 ≫ 𝑓 π‘Š1 ?
Note that
𝑋1𝛿
= π‘Š1 +
1
𝛿 ∫0 𝑣𝑑
𝑑𝑑, and recall
𝑓 π‘Š1 + 𝑒 β‰₯ 𝑓 π‘Š1 exp 𝑒, 𝑣1 βˆ’ π‘œ(1)
Big question:
1
Does ∫0 𝑣𝑑 𝑑𝑑 point in the direction of the gradient 𝑣1 ?
the calculations and concentration
π‘‘π‘Šπ‘‘ = 𝑑𝐡𝑑 + 𝑣𝑑 𝑑𝑑
𝑣𝑑 = 𝛻 log 𝑃1βˆ’π‘‘ 𝑓 π‘Šπ‘‘
𝑑𝑋𝑑𝛿 = 𝑑𝐡𝑑 + 1 + 𝛿 𝑣𝑑 𝑑𝑑
Let 𝑄 be the BM law of π‘Š1 and 𝑄𝛿 the BM law of 𝑋1𝛿 .
Girsanov:
𝑑𝑄𝛿
= exp 𝛿
𝑑𝑄
1
0
𝛿2
𝑣𝑑 , 𝑑𝐡𝑑 βˆ’ 𝛿 +
2
Gradient estimate:
𝑓 𝑋1𝛿 β‰₯ 𝑓 π‘Š1 exp
1
0
1
𝛿
0
𝑣𝑑 𝑑𝑑, 𝑣1
𝑣𝑑
2 𝑑𝑑
the calculations and concentration
Since 𝑣𝑑 is a martingale 𝔼 𝑣1 𝐡𝑑 = 𝔼 𝑣𝑑 𝐡𝑑 , so
1
𝔼 𝛿
0
Girsanov:
𝑑𝑄𝛿
= exp 𝛿
𝑑𝑄
1
𝑣𝑑 𝑑𝑑, 𝑣1
1
0
=𝛿𝔼
0
𝑣𝑑
𝛿2
𝑣𝑑 , 𝑑𝐡𝑑 βˆ’ 𝛿 +
2
Gradient estimate:
𝑓 𝑋1𝛿 β‰₯ 𝑓 π‘Š1 exp
2
𝑑𝑑
1
0
1
𝛿
0
𝑣𝑑 𝑑𝑑, 𝑣1
𝑣𝑑
2 𝑑𝑑
the calculations and concentration
For 𝛿 = π‘˜/ log 𝛼, π‘˜ = 1,2, … , log 𝛼:
If β„™ 𝑓 π‘Š1 ∈ 𝛼, 2𝛼 β‰₯ 𝑝, then
β„™ 𝑓 π‘Š1 ∈ 2𝛼, 4𝛼 β‰₯ 𝑝/10
β„™ 𝑓 π‘Š1 ∈ 4𝛼, 8𝛼 β‰₯ 𝑝/10 …
Girsanov:
𝑑𝑄𝛿
= exp 𝛿
𝑑𝑄
1
0
𝛿2
𝑣𝑑 , 𝑑𝐡𝑑 βˆ’ 𝛿 +
2
Gradient estimate:
𝑓 𝑋1𝛿 β‰₯ 𝑓 π‘Š1 exp
1
0
1
𝛿
0
𝑣𝑑 𝑑𝑑, 𝑣1
𝑣𝑑
2 𝑑𝑑
conclusion
Process for βˆ’1,1 𝑛 analogous to π‘Šπ‘‘ : Sample coordiantes
one by one to have the right marginals conditioned on the past.
Can prove log-Sobolev and Talagrand’s Entropy-Transport inequality
in a few lines based on this. There is an information theory
interpretation: Chain rule ↔ Martingale property.
These proofs use first derivatives of 𝑓, while our proof of the
convolution conjecture in Gaussian spaces use second derivatives.
Prove for the discrete case? There is a difficulty because the drift
requires additional randomness.
Riemannian setting: Bakry-Emery curvature, gradient flow vs. optimal
transport
Kumar-Courtade conjecture
𝑋 ∈ 0,1
𝑛
uniformly at random
π‘Œ βˆΌπœ– 𝑋
Conjecture: 𝐼(𝑓 𝑋 ∢ π‘Œ) maximized among functions
𝑓: 0,1 𝑛 β†’ 0,1
by 𝑓 𝑋 = 𝑋1