Hypercontractive inequality

Transcript Hypercontractive inequality

Regularization under diffusion
and Talagrand’s convolution conjecture
James R. Lee
University of Washington
Joint work with Ronen Eldan (Weizmann)
noise, heat, and smoothness
𝑓 ∶ −1,1
𝑛
→ℝ
The discrete “heat flow” operator 𝑇𝜖 : 𝐿2 −1,1
is defined for 𝜖 ∈ [0,1] by
𝑛
→ 𝐿2 ( −1,1 𝑛 )
(𝑇𝜖 𝑓) 𝑥1 , 𝑥2 , … , 𝑥𝑛 = 𝔼 𝑓 𝑥1𝜖 , 𝑥2𝜖 , … , 𝑥𝑛𝜖
where 𝑥𝑖𝜖 = 𝑥𝑖 with prob 1 − 𝜖 and 𝑥𝑖𝜖 = ±1 with prob 𝜖/2 each
(independently for each 𝑖)
𝑇𝜖 is diagonalized in the Fourier basis and dampens high-degree
Fourier coefficients:
𝑇𝜖 𝜒𝑆 = 1 − 𝜖 𝑆 𝜒𝑆
noise, heat, and smoothness
𝑓 ∶ −1,1
𝑛
→ℝ
The discrete “heat flow” operator 𝑇𝜖 : 𝐿2 −1,1
is defined for 𝜖 ∈ [0,1] by
𝑛
→ 𝐿2 ( −1,1 𝑛 )
(𝑇𝜖 𝑓) 𝑥1 , 𝑥2 , … , 𝑥𝑛 = 𝔼 𝑓 𝑥1𝜖 , 𝑥2𝜖 , … , 𝑥𝑛𝜖
where 𝑥𝑖𝜖 = 𝑥𝑖 with prob 1 − 𝜖 and 𝑥𝑖𝜖 = ±1 with prob 𝜖/2 each
(independently for each 𝑖)
General principle:
𝑓 smooth ⇒ 𝑇𝜖 𝑓 smoother
Many applications:
PCPs & hardness of approximation, statistical
physics, threshold phenomena, social choice,
circuit complexity, information theory
noise, heat, and smoothness
Hypercontractive inequality [Bonami, Gross, Nelson]:
For every 𝜖 > 0:
𝑇𝜖 ∶ 𝐿𝑝 −1,1 𝑛 → 𝐿𝑞 −1,1 𝑛
is a contraction for some 𝑞 > 𝑝 > 1.
Here: 𝑓
𝑓
𝑝
𝑝
= 𝔼 𝑓
𝑝 1/𝑝
small for 𝑝 > 1 ⇒ 𝑇𝜖 𝑓
𝑞
small for 𝑞 > 𝑝
If 𝑓 ≈ indicator of a subset 𝑆 ⊆ −1,1 𝑛 , then this is encodes
isoperimetric profile / “small set expansion.”
noise, heat, and smoothness
Relative entropy: For 𝑓 ∶ −1,1
𝑛
→ ℝ+ with 𝔼 𝑓 = 1,
Ent(𝑓) = 𝔼 𝑓 log 𝑓 = 𝔼𝑍∼𝑓 [log 𝑓 𝑍 ]
Gradient: 𝔼 𝛻𝑓
2
=
𝑛
𝑖=1 𝔼[
𝑓 𝑥 ⊕ 𝑒𝑖 − 𝑓 𝑥
2]
Log-Sobolev inequality [Gross]: For every such 𝑓 ∶ −1,1
Ent(𝑓) ≤ 𝔼 𝛻 𝑓
2
𝑑
≈ − Ent(𝑇𝜖 𝑓)
𝑑𝜖
𝑛
𝜖=0
→ ℝ+
noise, heat, and smoothness
Hypercontractive inequality:
For 𝜖 > 0 and 𝑝 > 1, there is a 𝑞 > 𝑝 such that
𝑇𝜖 𝑓
for all 𝑓 ∶ −1,1
𝑛
𝑞
≤ 𝑓
𝑝
→ ℝ.
Log-Sobolev inequality: Ent(𝑓) ≤ 𝔼 𝛻 𝑓
2
Talagrand (1989):
What about regularization if we only know 𝔼 𝑓 = 1 ?
the convolution conjecture
𝑓 ∶ −1,1
𝑛
→ ℝ+ and 𝔼 𝑓 = 1
Markov’s inequality: ℙ 𝑓 ≥ 𝛼 ≤
1
𝛼
(tight for 𝑓 = scaled indicator on a set of measure 1/𝛼)
Convolution conjecture [Talagrand 1989, $1,000]:
For every 𝜖 > 0, there exists 𝜙: ℝ+ → ℝ+ so that for every 𝑓,
ℙ 𝑇𝜖 𝑓 ≥ 𝛼 ≤
𝜙 𝛼
𝛼
and 𝜙 𝛼 → 0 as 𝛼 → ∞
- Best function is probably 𝜙 𝛼 ∼
1
log 𝛼
(achieved for halfspaces)
- Conjecture open for any fixed 𝜖 > 0, even for indicators of sets:
𝑓 = 𝟏𝑆 , 𝑆 ⊆ {−1,1}𝑛
anti-concentration of temperature
Convolution conjecture [Talagrand 1989]:
For every 𝜖 > 0, there exists 𝜙: ℝ+ → ℝ+ so that for every 𝑓 with
𝔼𝑓 = 1, ℙ 𝑇𝜖 𝑓 ≥ 𝛼 ≤
𝜙 𝛼
𝛼
and 𝜙 𝛼 → 0 as 𝛼 → ∞
Equivalent to the conjecture that
𝔼 𝑇𝜖 𝑓 1 𝑇𝜖 𝑓∈ 𝛼,2𝛼
≤𝜙 𝛼
“Temperature” cannot concentrate at a single (high) level.
anti-concentration of temperature
the Gaussian limiting case
𝑓 ∶ ℝ𝑛 → ℝ+ and 𝔼 𝑓 = ∫ 𝑓𝑑𝛾𝑛 = 1
(𝛾𝑛 is standard 𝑛-dim gaussian)
Let 𝐵𝑡 be an 𝑛-dimensional Brownian motion with 𝐵𝑡 = 0.
Brownian semi-group: 𝑃𝑡 𝑓 𝑥 = 𝔼[𝑓 𝑥 + 𝐵𝑡 ]
𝑃𝑡 𝟏𝑆 𝑥 = 𝛾𝑥,𝑡 (𝑆)
𝑓 = 𝟏𝑆
the Gaussian limiting case
𝑓 ∶ ℝ𝑛 → ℝ+ and 𝔼 𝑓 = ∫ 𝑓 𝑑𝛾𝑛 = 1
Brownian semi-group: 𝑃𝑡 𝑓 𝑥 = 𝔼[𝑓 𝑥 + 𝐵𝑡 ]
Gaussian convolution conjecture: For every 𝑡 < 1, there is a
𝜙: ℝ+ → ℝ+ so that for every such 𝑓,
ℙ 𝑃1−𝑡 𝑓 𝐵𝑡 ≥ 𝛼 ≤
𝜙 𝛼
𝛼
and 𝜙 𝛼 → 0 as 𝛼 → ∞
- Special case of discrete cube conjecture
- Previously unknown for any 𝑡 > 0
- 𝑛 = 1 is an [*, O’Donnell 2014] exercise
- True in any fixed dimension
[Ball, Barthe, Bednorz, Oleszkiewicz, and Wolff, 2010]
the Gaussian limiting case
𝑓 ∶ ℝ𝑛 → ℝ+ and 𝔼 𝑓 = ∫ 𝑓 𝑑𝛾𝑛 = 1
Brownian semi-group: 𝑃𝑡 𝑓 𝑥 = 𝔼[𝑓 𝑥 + 𝐵𝑡 ]
Gaussian convolution conjecture: For every 𝑡 > 0, there is a
𝜙: ℝ+ → ℝ+ so that for every such 𝑓,
ℙ 𝑈𝑡 𝑓 ≥ 𝛼 ≤
𝜙 𝛼
𝛼
and 𝜙 𝛼 → 0 as 𝛼 → ∞
Ornstein-Uhlenbeck semi-group:
𝑈𝑡 𝑓 𝑥 = 𝔼𝑍∼𝛾𝑛 𝑓 𝑒 −𝑡 𝑥 + 1 − 𝑒 −2𝑡 𝑍
the Gaussian limiting case
Theorem [Eldan-L 2014]:
If 𝑓 ∶ ℝ𝑛 → ℝ+ satisfies 𝔼 𝑓 = 1 and
𝛻 2 log 𝑓 𝑥 ≽ −𝛽 𝐼𝑑 for all 𝑥 ∈ ℝ𝑛
for some 𝛽 ≥ 1 then for all 𝛼 ≥ 𝑒 3 :
1 𝐶𝛽 log log 𝛼
ℙ𝑓≥𝛼 ≤ ⋅
𝛼
log 𝛼
4
semi-log convexity
Theorem [Eldan-L 2014]:
If 𝑓 ∶ ℝ𝑛 → ℝ+ satisfies 𝔼 𝑓 = 1 and
𝛻 2 log 𝑓 𝑥 ≽ −𝛽 𝐼𝑑 for all 𝑥 ∈ ℝ𝑛
for some 𝛽 ≥ 1 then for all 𝛼 ≥ 𝑒 3 :
Fact: For any 𝑓 with 𝔼 𝑓 = 1, 4
1 𝐶𝛽 log log 𝛼
1
ℙ𝑓≥𝛼 ≤ ⋅
2
𝛼 ≽ − log 𝛼 𝐼𝑑
𝛻 log 𝑃1−𝑡 𝑓(𝑥)
1−𝑡
Corollary: If 𝑓 ∶ ℝ𝑛 → ℝ+ satisfies 𝔼 𝑓 = 1 then for any
𝑡 < 1 and all 𝛼 ≥ 𝑒 3 ,
ℙ 𝑃1−𝑡 𝑓 𝐵𝑡
1 𝐶 log log 𝛼 4
≥𝛼 ≤ ⋅
𝛼 (1 − 𝑡) log 𝛼
some difficulties
Corollary: If 𝑓 ∶ ℝ𝑛 → ℝ+ satisfies 𝔼 𝑓 = 1 then for any 𝑡 < 1
and all 𝛼 ≥ 2,
1
ℙ 𝑃1−𝑡 𝑓 𝐵𝑡 ≥ 𝛼 ≪
𝛼
What are the difficult functions 𝑓?
Good:
Noise insensitive
Bad:
Boundary far from origin
𝐵𝑡 has to past the crest!
half space
dust
proof sketch
𝔼 𝑓 = 1,
𝛻 2 log 𝑓 𝑥 ≽ −𝛽 𝐼𝑑
𝑀𝑡 = 𝑃1−𝑡 𝑓 𝐵𝑡 is a (Doob) martingale
𝑀0 = 𝔼 𝑓 = 1
𝛾𝑛
𝑀1 = 𝑓 𝐵1
𝛿 𝐵1
Goal: ℙ 𝑀1 > 𝛼 ≪
1
𝛼
arguing about small-probability events
isn’t so easy…
random measure
conditioning
𝑊𝑡 = 𝐵𝑡 conditioned on 𝐵1 ∼ 𝑓𝑑𝛾𝑛
[Doob transform]
Goal:
ℙ 𝑓 𝐵1 ∈ 𝛼, 2𝛼
≪ 1/𝛼
Suffices to prove that:
ℙ 𝑓 𝑊1 ∈ 𝛼, 2𝛼
= 𝑜(1)
because
ℙ 𝑓 𝑊1 ∈ 𝛼, 2𝛼
∼ 𝛼 ℙ 𝑓 𝐵1 ∈ 𝛼, 2𝛼
𝑊𝑡 is an Itô process
Consider a process 𝑊𝑡 with 𝑊𝑡 = 0, and
𝑑𝑊𝑡 = 𝑑𝐵𝑡 + 𝑣𝑡 𝑑𝑡
where {𝑣𝑡 } is predictable
(deterministic function of 𝑡, {𝐵𝑠 : 𝑠 ∈ [0, 𝑡]})
𝑡
Integrating: 𝑊𝑡 = 𝐵𝑡 + ∫0 𝑣𝑠 𝑑𝑠 = Brownian motion + drift
Among all such drifts satisfying
𝑊1 =
1
𝐵1 + ∫0 𝑣𝑡
𝑑𝑡 ∼ 𝑓 𝑑𝛾𝑛 ,
let 𝑣𝑡 be the one which minimizes
1
𝔼
0
𝑣𝑡
2
𝑑𝑡
Föllmer’s drift
Among all such drifts satisfying 𝑊1 = 𝐵1 +
1
∫0 𝑣𝑡
𝑑𝑡 ∼ 𝑓 𝑑𝛾𝑛 ,
let 𝑣𝑡 be the one which minimizes
1
𝔼
0
𝑣𝑡
2
𝑑𝑡
Lemma: 𝑣𝑡 is martingale.
𝑊𝑡
Explicit form: 𝑣𝑡 = 𝛻 log 𝑃1−𝑡 𝑓 𝑊𝑡 =
𝛻𝑃1−𝑡 𝑓 𝑊𝑡
𝑃1−𝑡 𝑓 𝑊𝑡
1
Theorem [Lehec 2010]: Ent 𝛾𝑛 (𝑓) = 𝔼
2
1
0
𝑣𝑡
2
𝑑𝑡
an energy/entropy optimal coupling
{𝐵𝑡 } 𝑛-dim Brownian motion, 𝑓 ∶ ℝ𝑛 → ℝ+ with 𝔼𝑓 = 1
Construct 𝑊𝑡 so that 𝑊1 ∼ 𝑓𝑑𝛾𝑛
𝑊0 = 0
𝑑𝑊𝑡 = 𝑑𝐵𝑡 + 𝑣𝑡 𝑑𝑡
𝛻𝑃1−𝑡 𝑓 𝑊𝑡
𝑣𝑡 =
𝑃1−𝑡 𝑊𝑡
is a martingale
proof sketch
Suffices to prove that:
ℙ 𝑓 𝑊1 ∈ 𝛼, 2𝛼
= 𝑜(1)
Idea:
Suppose that ℙ 𝑓 𝑊1 ∈ 𝛼, 2𝛼 ≥ 𝑝, then
ℙ 𝑓 𝑊1 ∈ 2𝛼, 4𝛼 ≥ 𝑝
ℙ 𝑓 𝑊1 ∈ 4𝛼, 8𝛼 ≥ 𝑝
⋯
log 𝛼
levels
Making 𝑓 bigger: We’ll use 𝛻 2 log 𝑓 𝑥 ≽ −𝛽 𝐼𝑑
log 𝑓 𝑊1 + 𝑢 ≥ log 𝑓 𝑊1 + 〈𝑢, 𝛻 log 𝑓 𝑊1 〉 −𝛽 𝑢
= log 𝑓 𝑊1 + 〈𝑢, 𝑣1 〉 − 𝛽 𝑢
2
2
proof sketch
Pushing 𝑊1 in the direction of the drift at time 𝑡 = 1:
𝑓 𝑊1 + 𝑢 ≥ 𝑓 𝑊1 exp 𝑢, 𝑣1 − 𝛽 𝑢 2
Setting 𝑢 = 𝛿𝑣1 (𝛿 small) multiplies the value of 𝑓.
want to say that 𝑊1 could do it
(without our help)
Girsanov’s theorem
Consider an Itô process 𝑑𝑋𝑡 = 𝑑𝐵𝑡 + 𝑣𝑡 𝑑𝑡
* under suitable conditions
Let 𝑃 be the BM measure of 𝐵𝑡 and 𝑄 be the BM measure of 𝑋𝑡 .
Then under the change of measure:
𝑑𝑄
= exp −
𝑑𝑃
1
0
1
𝑣𝑡 , 𝑑𝐵𝑡 −
0
𝑣𝑡
{𝑋𝑡 ∶ 𝑡 ∈ [0,1]} has the law of Brownian motion.
2
𝑑𝑡
the greedy perturbation
𝑣𝑡 = 𝛻 log 𝑃1−𝑡 𝑓 𝑊𝑡
𝑑𝑊𝑡 = 𝑑𝐵𝑡 + 𝑣𝑡 𝑑𝑡
𝑑𝑋𝑡𝛿 = 𝑑𝐵𝑡 + 1 + 𝛿 𝑣𝑡 𝑑𝑡
Now we can argue that 𝑊𝑡 ≈ 𝑋𝑡𝛿 (Girsanov’s theorem).
What about 𝑓 𝑋1𝛿 ≫ 𝑓 𝑊1 ?
Note that 𝑋1𝛿
= 𝑊1 +
1
𝛿 ∫0 𝑣𝑡
𝑑𝑡, and recall
𝑓 𝑊1 + 𝑢 ≥ 𝑓 𝑊1 exp 𝑢, 𝑣1 − 𝑜(1)
Big question:
1
Does ∫0 𝑣𝑡 𝑑𝑡 point in the direction of the gradient 𝑣1 ?
a balancing act
𝑣𝑡 = 𝛻 log 𝑃1−𝑡 𝑓 𝑊𝑡
𝑑𝑊𝑡 = 𝑑𝐵𝑡 + 𝑣𝑡 𝑑𝑡
𝑑𝑋𝑡𝛿 = 𝑑𝐵𝑡 + 1 + 𝛿 𝑣𝑡 𝑑𝑡
Let 𝑄 be the BM law of 𝑊1 and 𝑄𝛿 the BM law of 𝑋1𝛿 .
Girsanov:
𝑑𝑄𝛿
= exp 𝛿
𝑑𝑄
1
0
𝛿2
𝑣𝑡 , 𝑑𝐵𝑡 − 𝛿 +
2
Gradient estimate:
𝑓 𝑋1𝛿 ≥ 𝑓 𝑊1 exp
1
𝛿 ∫0 𝑣𝑡 𝑑𝑡, 𝑣1
1
0
𝑣𝑡
2 𝑑𝑡
a balancing act
Since 𝑣𝑡 is a martingale 𝔼 𝑣1 𝐵𝑡 = 𝔼 𝑣𝑡 𝐵𝑡 , so
1
𝔼 𝛿
0
Girsanov:
𝑑𝑄𝛿
= exp 𝛿
𝑑𝑄
1
𝑣𝑡 𝑑𝑡, 𝑣1
=𝛿𝔼
0
𝑣𝑡
2
𝑑𝑡
[ The technically difficult part here is concentration:
Getting these to happen at the same time. ]
1
0
𝛿2
𝑣𝑡 , 𝑑𝐵𝑡 − 𝛿 +
2
Gradient estimate:
𝑓 𝑋1𝛿 ≥ 𝑓 𝑊1 exp
1
𝛿 ∫0 𝑣𝑡 𝑑𝑡, 𝑣1
1
0
𝑣𝑡
2 𝑑𝑡
a balancing act
For 𝛿 = 𝑘/ log 𝛼, 𝑘 = 1,2, … , log 𝛼:
If ℙ 𝑓 𝑊1 ∈ 𝛼, 2𝛼
≥ 𝑝, then
ℙ 𝑓 𝑊1 ∈ 2𝛼, 4𝛼
ℙ 𝑓 𝑊1 ∈ 4𝛼, 8𝛼
…
Girsanov:
𝑑𝑄𝛿
= exp 𝛿
𝑑𝑄
1
0
≥ 𝑝/10
≥ 𝑝/10
𝛿2
𝑣𝑡 , 𝑑𝐵𝑡 − 𝛿 +
2
Gradient estimate:
𝑓 𝑋1𝛿 ≥ 𝑓 𝑊1 exp
1
𝛿 ∫0 𝑣𝑡 𝑑𝑡, 𝑣1
1
0
𝑣𝑡
2 𝑑𝑡
conclusion
Process for −1,1 𝑛 analogous to 𝑊𝑡 : Sample coordiantes
one by one to have the right marginals conditioned on the past.
(Actually, an analogous process for any Markov chain.)
Extension to discrete spaces? [$1,000]
Additional randomness causes significant concentration issues
Can prove log-Sobolev and Talagrand’s Entropy-Transport
inequality in a few lines based on this. There is an information
theory interpretation: Chain rule ↔ Martingale property
These proofs use first order derivatives of 𝑓, while our proof of the
convolution conjecture in Gaussian space uses second order properties
(perturbation)
Challenge: Stein can prove that 𝐿2 mixing ⇒ log-Sobolev inequality.
Can you?

Hypercontractive inequality

Transcript Hypercontractive inequality

Directory