talagrand-ias - IAS Video Lectures

Transcript talagrand-ias - IAS Video Lectures

Talagrand’s convolution conjecture and
anti-concentration of temperature
James R. Lee
University of Washington
Joint work with Ronen Eldan [MSR, University of Washington,…]
noise and smoothness
𝑓 ∶ −1,1
𝑛
→ℝ
The “heat flow” operator 𝑇𝜖 ∶ 𝐿2 −1,1
is defined, for 𝜖 ∈ [0,1], by:
𝑛
→ 𝐿2 −1,1
𝑛
𝑇𝜖 𝑓 𝑥 = 𝔼 𝑓 𝑥1𝜖 , 𝑥2𝜖 , … , 𝑥𝑛𝜖
Where 𝑥𝑖𝜖 = 𝑥𝑖 with probability 1 − 𝜖 and 𝑥𝑖𝜖 = ±1
with probability 𝜖/2 each, independently for each 𝑖.
General principle:
𝑓 smooth ⇒ 𝑇𝜖 𝑓 smoother
Many applications:
PCPs, hardness of approximation, social choice,
circuit complexity, information theory,
learning, data structures, ...
noise and smoothness
Hypercontractive inequality [Bonami, Gross, Nelson]:
For 𝜖 > 0:
𝑇𝜖 ∶ 𝐿𝑝 −1,1 𝑛 → 𝐿𝑞 −1,1 𝑛
is a contraction for some 𝑞 > 𝑝 > 1.
𝑓
𝑝
small for 𝑝 > 1 ⇒ 𝑇𝜖 𝑓
𝑞
small for 𝑞 > 𝑝
If 𝑓 = indicator of a set, then this is encodes
“small set expansion.”
noise and smoothness
Relative entropy: For 𝑓 ∶ −1,1
𝑛
→ ℝ+ with 𝔼 𝑓 = 1,
Ent(𝑓) = 𝔼 𝑓 log 𝑓 = 𝔼𝑍∼𝑓 [log 𝑓 𝑍 ]
Gradient: 𝔼 𝛻𝑓
2
=
𝑛
𝑖=1 𝔼[
𝑓 𝑥 ⊕ 𝑒𝑖 − 𝑓 𝑥
Log-Sobolev inequality: For every such 𝑓 ∶ −1,1
Ent(𝑓) ≤ 𝔼 𝛻 𝑓
𝑛
2
]
→ ℝ+
2
𝑑
≈ − Ent(𝑇𝜖 𝑓)
𝑑𝜖
𝜖=0
noise and hypercontractivity
Hypercontractive inequality:
For 𝜖 > 0 and 𝑝 > 1, there is a 𝑞 > 𝑝 such that
𝑇𝜖 𝑓
for all 𝑓 ∶ −1,1
Log-Sobolev inequality:
𝑛
𝑞
≤ 𝑓
𝑝
→ ℝ.
Ent(𝑓) ≤ 𝔼 𝛻 𝑓
2
Talagrand (1989): What about smoothing for arbitrary 𝑓?
the convolution conjecture
𝑓 ∶ −1,1
𝑛
→ ℝ+ and 𝔼 𝑓 = 1
Markov’s inequality: ℙ 𝑓 ≥ 𝛼 ≤
1
𝛼
(tight for 𝑓 = scaled indicator on a set of measure 1/𝛼)
Convolution conjecture [Talagrand 1989, $1,000]:
For every 𝜖 > 0, there exists 𝜙: ℝ+ → ℝ+ so that for every 𝑓,
ℙ 𝑇𝜖 𝑓 ≥ 𝛼 ≤
𝜙 𝛼
𝛼
and 𝜙 𝛼 → 0 as 𝛼 → ∞
- Best function is probably 𝜙 𝛼 ∼
1
log 𝛼
(achieved for halfspaces)
- Conjecture unresolved for any fixed 𝜖 > 0, 𝑓 = 𝟏𝑆 , 𝑆 ⊆ {−1,1}𝑛
anti-concentration of temperature
Convolution conjecture [Talagrand 1989]:
For every 𝜖 > 0, there exists 𝜙: ℝ+ → ℝ+ so that for every 𝑓,
ℙ 𝑇𝜖 𝑓 ≥ 𝛼 ≤
𝜙 𝛼
𝛼
and 𝜙 𝛼 → 0 as 𝛼 → ∞
Equivalent to the conjecture that
𝔼 𝑇𝜖 𝑓 1 𝑇𝜖 𝑓∈ 𝛼,2𝛼 ≤ 𝜙 𝛼
“Temperature” cannot concentrate at a single (high) level
the Gaussian case
𝑓 ∶ ℝ𝑛 → ℝ+ and 𝔼 𝑓 = ∫ 𝑓𝑑𝛾𝑛 = 1 (gaussian measure)
Let 𝐵𝑡 be an 𝑛-dimensional Brownian motion with 𝐵𝑡 = 0.
the Gaussian case
𝑓 ∶ ℝ𝑛 → ℝ+ and 𝔼 𝑓 = ∫ 𝑓𝑑𝛾𝑛 = 1 (gaussian measure)
Let 𝐵𝑡 be an 𝑛-dimensional Brownian motion with 𝐵0 = 0.
Brownian semi-group: 𝑃𝑡 𝑓 𝑥 = 𝔼[𝑓 𝑥 + 𝐵𝑡 ]
𝑃0 = identity map
𝑃1 = standard 𝑛-dim. Gaussian avg
the Gaussian case
𝑓 ∶ ℝ𝑛 → ℝ+ and 𝔼 𝑓 = ∫ 𝑓 𝑑𝛾𝑛 = 1
Brownian semi-group: 𝑃𝑡 𝑓 𝑥 = 𝔼[𝑓 𝑥 + 𝐵𝑡 ]
Gaussian convolution conjecture: For every 𝑡 < 1, there is a
𝜙: ℝ+ → ℝ+ so that for every 𝑓,
ℙ 𝑃1−𝑡 𝑓 𝐵𝑡 ≥ 𝛼 ≤
-
𝜙 𝛼
𝛼
and 𝜙 𝛼 → 0 as 𝛼 → ∞
Special case of discrete cube conjecture
Previously unknown for any 𝑡
𝑛 = 1 is an exercise
True in any fixed dimension
[Ball, Barthe, Bednorz, Oleszkiewicz, and Wolff, 2010]
isoperimetric (dual) version
Fix 𝑡 > 0 and consider a subset 𝑆 ⊆ ℝ𝑛 .
Consider 𝑔 ∶ ℝ𝑛 → ℝ+ supported on 𝑆
such that 𝑈𝑡 𝑔
∞
≤ 1.
Goal: Maximize ∫ 𝑔 𝑑𝛾𝑛 subject to these
constraints.
𝑔 = 𝟏𝑆 gets ∫ 𝑔 𝑑𝛾𝑛 = 𝛾𝑛 (𝑆)
Conjecture [restated]:
One can achieve ∫ 𝑔 𝑑𝛾𝑛 ≫ 𝛾𝑛 (𝑆)
as 𝛾𝑛 𝑆 → 0
𝑆 ⊆ ℝ𝑛
the Gaussian case
Theorem [Eldan-L 2014]:
If 𝑓 ∶ ℝ𝑛 → ℝ+ satisfies 𝔼 𝑓 = 1 and
𝛻 2 log 𝑓 𝑥 ≽ −𝛽 𝐼𝑑
then for all 𝛼 ≥ 2:
for all 𝑥 ∈ ℝ𝑛
1 𝐶𝛽 log log 𝛼
ℙ𝑓≥𝛼 ≤ ⋅
𝛼
log 𝛼
4
the Gaussian case
Theorem [Eldan-L 2014]:
If 𝑓 ∶ ℝ𝑛 → ℝ+ satisfies 𝔼 𝑓 = 1 and
𝛻 2 log 𝑓 𝑥 ≽ −𝛽 𝐼𝑑
for all 𝑥 ∈ ℝ𝑛
then for all 𝛼 ≥ 8:
1 𝐶𝛽 log log 𝛼
ℙ𝑓≥𝛼 ≤ ⋅
𝛼
log 𝛼
4
Corollary: If 𝑓 ∶ ℝ𝑛 → ℝ+ satisfies 𝔼 𝑓 = 1 then for any
𝑡 < 1 and all 𝛼 ≥ 2,
1 𝐶 log log 𝛼 4
ℙ 𝑃1−𝑡 𝑓 𝐵𝑡 ≥ 𝛼 ≤
⋅
𝛼 (1 − 𝑡) log 𝛼
the Gaussian case
Theorem [Eldan-L 2014]:
If 𝑓 ∶ ℝ𝑛 → ℝ+ satisfies 𝔼 𝑓 = 1 and
𝛻 2 log 𝑓 𝑥 ≽ −𝛽 𝐼𝑑
for all 𝑥 ∈ ℝ𝑛
then
all any
𝛼 ≥𝑓 2:
Fact:forFor
with 𝔼 𝑓 = 1,
1 𝐶𝛽 log
log
𝛼
1
ℙ
𝑓
≥
𝛼
≤
⋅
2
𝛻 log 𝑃1−𝑡 𝑓(𝑥)
𝐼𝑑
𝛼 ≽ − 1 −log
𝑡 𝛼
4
Corollary: If 𝑓 ∶ ℝ𝑛 → ℝ+ satisfies 𝔼 𝑓 = 1 then for any
𝑡 < 1 and all 𝛼 ≥ 2,
1 𝐶 log log 𝛼 4
ℙ 𝑃1−𝑡 𝑓 𝐵𝑡 ≥ 𝛼 ≤
⋅
𝛼 (1 − 𝑡) log 𝛼
some difficulties
Corollary: If 𝑓 ∶ ℝ𝑛 → ℝ+ satisfies 𝔼 𝑓 = 1 then for any
𝑡 < 1 and all 𝛼 ≥ 2,
ℙ 𝑃1−𝑡 𝑓 𝐵𝑡 ≥ 𝛼 ≤
1
𝛼
𝐶(𝑡) log log 𝛼 4
⋅
log 𝛼
What are the difficult functions 𝑓?
half space
Good:
noise insensitive
Bad:
boundary far from
origin
dust
proof ideas
𝔼 𝑓 = 1,
𝛻 2 log 𝑓 𝑥 ≽ −𝛽 𝐼𝑑
𝑀𝑡 = 𝑃1−𝑡 𝑓 𝐵𝑡 is a (Doob) martingale
𝑀0 = 𝔼 𝑓 = 1
𝑀1 = 𝑓 𝐵1
Goal: ℙ 𝑀1 > 𝛼 ≪
1
𝛼
arguing about small-probability events = annoying
random measure
conditioning
𝑊𝑡 = 𝐵𝑡 conditioned on 𝐵1 ∼ 𝑓𝑑𝛾𝑛
Goal:
ℙ 𝑓 𝐵1 ∈ 𝛼, 2𝛼
≪ 1/𝛼
Suffices to prove that:
ℙ 𝑓 𝑊1 ∈ 𝛼, 2𝛼
= 𝑜(1)
because
ℙ 𝑓 𝑊1 ∈ 𝛼, 2𝛼
∼ 𝛼 ℙ 𝑓 𝐵1 ∈ 𝛼, 2𝛼
𝑊𝑡 as an Itô process: Föllmer’s drift
Consider a process 𝑊𝑡 with 𝑊0 = 0, and
𝑑𝑊𝑡 = 𝑑𝐵𝑡 + 𝑣𝑡 𝑑𝑡
where 𝑣𝑡 is predictable (deterministic function of {𝐵𝑠 : 𝑠 ∈ [0, 𝑡]})
Integrating: 𝑊𝑡 =
𝑡
𝐵𝑡 + ∫0 𝑣𝑠
𝑑𝑠 = Brownian motion + drift
Among all such drifts satisfying 𝑊1 = 𝐵1 +
let 𝑣𝑡 be the one which minimizes
1
𝔼
0
𝑣𝑡
2
𝑑𝑡
1
∫0 𝑣𝑡
𝑑𝑡 ∼ 𝑓 𝑑𝛾𝑛 ,
𝑊𝑡 as an Itô process: Föllmer’s drift
Among all such drifts satisfying 𝑊1 = 𝐵1 +
let 𝑣𝑡 be the one which minimizes
1
𝔼
0
𝑣𝑡
2
𝑑𝑡
1
∫0 𝑣𝑡
𝑑𝑡 ∼ 𝑓 𝑑𝛾𝑛 ,
𝑊𝑡 as an Itô process: Föllmer’s drift
Among all such drifts satisfying 𝑊1 = 𝐵1 +
let 𝑣𝑡 be the one which minimizes
1
𝔼
0
𝑣𝑡
2
1
∫0 𝑣𝑡
𝑑𝑡 ∼ 𝑓 𝑑𝛾𝑛 ,
𝑑𝑡
𝑊𝑡
Lemma: 𝑣𝑡 is martingale.
Explicit form: 𝑣𝑡 = 𝛻 log 𝑃1−𝑡 𝑓 𝑊𝑡 =
Theorem [Lehec 2010]:
1
Ent(𝑓) = 𝔼
2
𝛻𝑃1−𝑡 𝑓 𝑊𝑡
𝑃1−𝑡 𝑓 𝑊𝑡
1
0
𝑣𝑡
2
𝑑𝑡
an optimal coupling
{𝐵𝑡 } 𝑛-dim Brownian motion, 𝑓 ∶ ℝ𝑛 → ℝ+ with 𝔼𝑓 = 1
Construct 𝑊𝑡 so that 𝑊1 ∼ 𝑓𝑑𝛾𝑛
𝑊0 = 0
𝑑𝑊𝑡 = 𝑑𝐵𝑡 + 𝑣𝑡 𝑑𝑡
𝛻𝑃1−𝑡 𝑓 𝑊𝑡
𝑣𝑡 =
𝑃1−𝑡 𝑊𝑡
is a martingale
an optimal coupling
{𝐵𝑡 } 𝑛-dim Brownian motion, 𝑓 ∶ ℝ𝑛 → ℝ+ with 𝔼𝑓 = 1
Construct 𝑊𝑡 so that 𝑊1 ∼ 𝑓𝑑𝛾𝑛
𝑊0 = 0
𝑑𝑊𝑡 = 𝑑𝐵𝑡 + 𝑣𝑡 𝑑𝑡
𝛻𝑃1−𝑡 𝑓 𝑊𝑡
𝑣𝑡 =
𝑃1−𝑡 𝑊𝑡
is a martingale
Multi-granular geometry of 𝑓
reflected in {𝑣𝑡 }:
For all 𝑡 ∈ 0,1 ,
𝔼 𝑣𝑡
2
2
= ∫ 𝛻 𝑃1−𝑡 𝑓 𝑑𝛾𝑛
proof sketch
Suffices to prove that:
ℙ 𝑓 𝑊1 ∈ 𝛼, 2𝛼
= 𝑜(1)
Idea:
Suppose that ℙ 𝑓 𝑊1 ∈ 𝛼, 2𝛼 ≥ 𝑝, then
ℙ 𝑓 𝑊1 ∈ 2𝛼, 4𝛼 ≥ 𝑝
ℙ 𝑓 𝑊1 ∈ 4𝛼, 8𝛼 ≥ 𝑝
⋯
log 𝛼
levels
Making 𝑓 bigger: We’ll use 𝛻 2 log 𝑓 𝑥 ≽ −𝛽 𝐼𝑑
log 𝑓 𝑊1 + 𝑢 ≥ log 𝑓 𝑊1 + 〈𝑢, 𝛻 log 𝑓 𝑊1 〉 −𝛽 𝑢
= log 𝑓 𝑊1 + 〈𝑢, 𝑣1 〉 − 𝛽 𝑢
2
2
proof sketch
Pushing 𝑊1 in the direction of the drift at time 𝑡 = 1:
𝑓 𝑊1 + 𝑢 ≥ 𝑓 𝑊1 exp 𝑢, 𝑣1 − 𝛽 𝑢 2
Setting 𝑢 = 𝛿𝑣1 (𝛿 small) multiplies the value of 𝑓.
want to say that 𝑊1 could do it
Girsanov’s theorem
Consider any Itô process 𝑑𝑋𝑡 = 𝑑𝐵𝑡 + 𝑣𝑡 𝑑𝑡
*under suitable conditions
Let 𝑃 be the BM measure of 𝐵𝑡 and 𝑄 be the BM measure of 𝑋𝑡 .
Then under the change of measure:
𝑑𝑄
= exp −
𝑑𝑃
1
0
1
𝑣𝑡 , 𝑑𝐵𝑡 −
0
𝑣𝑡
{𝑋𝑡 ∶ 𝑡 ∈ [0,1]} has the law of Brownian motion.
2
𝑑𝑡
the masochistic part
𝑑𝑊𝑡 = 𝑑𝐵𝑡 + 𝑣𝑡 𝑑𝑡
𝑣𝑡 = 𝛻 log 𝑃1−𝑡 𝑓 𝑊𝑡
𝑑𝑋𝑡𝛿 = 𝑑𝐵𝑡 + 1 + 𝛿 𝑣𝑡 𝑑𝑡
Now we can argue that 𝑊𝑡 ≈ 𝑋𝑡𝛿 (Girsanov’s theorem).
What about 𝑓 𝑋1𝛿 ≫ 𝑓 𝑊1 ?
Note that
𝑋1𝛿
= 𝑊1 +
1
𝛿 ∫0 𝑣𝑡
𝑑𝑡, and recall
𝑓 𝑊1 + 𝑢 ≥ 𝑓 𝑊1 exp 𝑢, 𝑣1 − 𝑜(1)
Big question:
1
Does ∫0 𝑣𝑡 𝑑𝑡 point in the direction of the gradient 𝑣1 ?
the calculations and concentration
𝑑𝑊𝑡 = 𝑑𝐵𝑡 + 𝑣𝑡 𝑑𝑡
𝑣𝑡 = 𝛻 log 𝑃1−𝑡 𝑓 𝑊𝑡
𝑑𝑋𝑡𝛿 = 𝑑𝐵𝑡 + 1 + 𝛿 𝑣𝑡 𝑑𝑡
Let 𝑄 be the BM law of 𝑊1 and 𝑄𝛿 the BM law of 𝑋1𝛿 .
Girsanov:
𝑑𝑄𝛿
= exp 𝛿
𝑑𝑄
1
0
𝛿2
𝑣𝑡 , 𝑑𝐵𝑡 − 𝛿 +
2
Gradient estimate:
𝑓 𝑋1𝛿 ≥ 𝑓 𝑊1 exp
1
0
1
𝛿
0
𝑣𝑡 𝑑𝑡, 𝑣1
𝑣𝑡
2 𝑑𝑡
the calculations and concentration
Since 𝑣𝑡 is a martingale 𝔼 𝑣1 𝐵𝑡 = 𝔼 𝑣𝑡 𝐵𝑡 , so
1
𝔼 𝛿
0
Girsanov:
𝑑𝑄𝛿
= exp 𝛿
𝑑𝑄
1
𝑣𝑡 𝑑𝑡, 𝑣1
1
0
=𝛿𝔼
0
𝑣𝑡
𝛿2
𝑣𝑡 , 𝑑𝐵𝑡 − 𝛿 +
2
Gradient estimate:
𝑓 𝑋1𝛿 ≥ 𝑓 𝑊1 exp
2
𝑑𝑡
1
0
1
𝛿
0
𝑣𝑡 𝑑𝑡, 𝑣1
𝑣𝑡
2 𝑑𝑡
the calculations and concentration
For 𝛿 = 𝑘/ log 𝛼, 𝑘 = 1,2, … , log 𝛼:
If ℙ 𝑓 𝑊1 ∈ 𝛼, 2𝛼 ≥ 𝑝, then
ℙ 𝑓 𝑊1 ∈ 2𝛼, 4𝛼 ≥ 𝑝/10
ℙ 𝑓 𝑊1 ∈ 4𝛼, 8𝛼 ≥ 𝑝/10 …
Girsanov:
𝑑𝑄𝛿
= exp 𝛿
𝑑𝑄
1
0
𝛿2
𝑣𝑡 , 𝑑𝐵𝑡 − 𝛿 +
2
Gradient estimate:
𝑓 𝑋1𝛿 ≥ 𝑓 𝑊1 exp
1
0
1
𝛿
0
𝑣𝑡 𝑑𝑡, 𝑣1
𝑣𝑡
2 𝑑𝑡
conclusion
Process for −1,1 𝑛 analogous to 𝑊𝑡 : Sample coordiantes
one by one to have the right marginals conditioned on the past.
Can prove log-Sobolev and Talagrand’s Entropy-Transport inequality
in a few lines based on this. There is an information theory
interpretation: Chain rule ↔ Martingale property.
These proofs use first derivatives of 𝑓, while our proof of the
convolution conjecture in Gaussian spaces use second derivatives.
Prove for the discrete case? There is a difficulty because the drift
requires additional randomness.
Riemannian setting: Bakry-Emery curvature, gradient flow vs. optimal
transport
Kumar-Courtade conjecture
𝑋 ∈ 0,1
𝑛
uniformly at random
𝑌 ∼𝜖 𝑋
Conjecture: 𝐼(𝑓 𝑋 ∶ 𝑌) maximized among functions
𝑓: 0,1 𝑛 → 0,1
by 𝑓 𝑋 = 𝑋1

talagrand-ias - IAS Video Lectures

Transcript talagrand-ias - IAS Video Lectures

Directory