Hypercontractive inequality

Download Report

Transcript Hypercontractive inequality

Regularization under diffusion
and Talagrand’s convolution conjecture
James R. Lee
University of Washington
Joint work with Ronen Eldan (Weizmann)
noise, heat, and smoothness
𝑓 ∢ βˆ’1,1
𝑛
→ℝ
The discrete β€œheat flow” operator π‘‡πœ– : 𝐿2 βˆ’1,1
is defined for πœ– ∈ [0,1] by
𝑛
β†’ 𝐿2 ( βˆ’1,1 𝑛 )
(π‘‡πœ– 𝑓) π‘₯1 , π‘₯2 , … , π‘₯𝑛 = 𝔼 𝑓 π‘₯1πœ– , π‘₯2πœ– , … , π‘₯π‘›πœ–
where π‘₯π‘–πœ– = π‘₯𝑖 with prob 1 βˆ’ πœ– and π‘₯π‘–πœ– = ±1 with prob πœ–/2 each
(independently for each 𝑖)
π‘‡πœ– is diagonalized in the Fourier basis and dampens high-degree
Fourier coefficients:
π‘‡πœ– πœ’π‘† = 1 βˆ’ πœ– 𝑆 πœ’π‘†
noise, heat, and smoothness
𝑓 ∢ βˆ’1,1
𝑛
→ℝ
The discrete β€œheat flow” operator π‘‡πœ– : 𝐿2 βˆ’1,1
is defined for πœ– ∈ [0,1] by
𝑛
β†’ 𝐿2 ( βˆ’1,1 𝑛 )
(π‘‡πœ– 𝑓) π‘₯1 , π‘₯2 , … , π‘₯𝑛 = 𝔼 𝑓 π‘₯1πœ– , π‘₯2πœ– , … , π‘₯π‘›πœ–
where π‘₯π‘–πœ– = π‘₯𝑖 with prob 1 βˆ’ πœ– and π‘₯π‘–πœ– = ±1 with prob πœ–/2 each
(independently for each 𝑖)
General principle:
𝑓 smooth β‡’ π‘‡πœ– 𝑓 smoother
Many applications:
PCPs & hardness of approximation, statistical
physics, threshold phenomena, social choice,
circuit complexity, information theory
noise, heat, and smoothness
Hypercontractive inequality [Bonami, Gross, Nelson]:
For every πœ– > 0:
π‘‡πœ– ∢ 𝐿𝑝 βˆ’1,1 𝑛 β†’ πΏπ‘ž βˆ’1,1 𝑛
is a contraction for some π‘ž > 𝑝 > 1.
Here: 𝑓
𝑓
𝑝
𝑝
= 𝔼 𝑓
𝑝 1/𝑝
small for 𝑝 > 1 β‡’ π‘‡πœ– 𝑓
π‘ž
small for π‘ž > 𝑝
If 𝑓 β‰ˆ indicator of a subset 𝑆 βŠ† βˆ’1,1 𝑛 , then this is encodes
isoperimetric profile / β€œsmall set expansion.”
noise, heat, and smoothness
Relative entropy: For 𝑓 ∢ βˆ’1,1
𝑛
β†’ ℝ+ with 𝔼 𝑓 = 1,
Ent(𝑓) = 𝔼 𝑓 log 𝑓 = π”Όπ‘βˆΌπ‘“ [log 𝑓 𝑍 ]
Gradient: 𝔼 𝛻𝑓
2
=
𝑛
𝑖=1 𝔼[
𝑓 π‘₯ βŠ• 𝑒𝑖 βˆ’ 𝑓 π‘₯
2]
Log-Sobolev inequality [Gross]: For every such 𝑓 ∢ βˆ’1,1
Ent(𝑓) ≀ 𝔼 𝛻 𝑓
2
𝑑
β‰ˆ βˆ’ Ent(π‘‡πœ– 𝑓)
π‘‘πœ–
𝑛
πœ–=0
β†’ ℝ+
noise, heat, and smoothness
Hypercontractive inequality:
For πœ– > 0 and 𝑝 > 1, there is a π‘ž > 𝑝 such that
π‘‡πœ– 𝑓
for all 𝑓 ∢ βˆ’1,1
𝑛
π‘ž
≀ 𝑓
𝑝
β†’ ℝ.
Log-Sobolev inequality: Ent(𝑓) ≀ 𝔼 𝛻 𝑓
2
Talagrand (1989):
What about regularization if we only know 𝔼 𝑓 = 1 ?
the convolution conjecture
𝑓 ∢ βˆ’1,1
𝑛
β†’ ℝ+ and 𝔼 𝑓 = 1
Markov’s inequality: β„™ 𝑓 β‰₯ 𝛼 ≀
1
𝛼
(tight for 𝑓 = scaled indicator on a set of measure 1/𝛼)
Convolution conjecture [Talagrand 1989, $1,000]:
For every πœ– > 0, there exists πœ™: ℝ+ β†’ ℝ+ so that for every 𝑓,
β„™ π‘‡πœ– 𝑓 β‰₯ 𝛼 ≀
πœ™ 𝛼
𝛼
and πœ™ 𝛼 β†’ 0 as 𝛼 β†’ ∞
- Best function is probably πœ™ 𝛼 ∼
1
log 𝛼
(achieved for halfspaces)
- Conjecture open for any fixed πœ– > 0, even for indicators of sets:
𝑓 = πŸπ‘† , 𝑆 βŠ† {βˆ’1,1}𝑛
anti-concentration of temperature
Convolution conjecture [Talagrand 1989]:
For every πœ– > 0, there exists πœ™: ℝ+ β†’ ℝ+ so that for every 𝑓 with
𝔼𝑓 = 1, β„™ π‘‡πœ– 𝑓 β‰₯ 𝛼 ≀
πœ™ 𝛼
𝛼
and πœ™ 𝛼 β†’ 0 as 𝛼 β†’ ∞
Equivalent to the conjecture that
𝔼 π‘‡πœ– 𝑓 1 π‘‡πœ– π‘“βˆˆ 𝛼,2𝛼
β‰€πœ™ 𝛼
β€œTemperature” cannot concentrate at a single (high) level.
anti-concentration of temperature
the Gaussian limiting case
𝑓 ∢ ℝ𝑛 β†’ ℝ+ and 𝔼 𝑓 = ∫ 𝑓𝑑𝛾𝑛 = 1
(𝛾𝑛 is standard 𝑛-dim gaussian)
Let 𝐡𝑑 be an 𝑛-dimensional Brownian motion with 𝐡𝑑 = 0.
Brownian semi-group: 𝑃𝑑 𝑓 π‘₯ = 𝔼[𝑓 π‘₯ + 𝐡𝑑 ]
𝑃𝑑 πŸπ‘† π‘₯ = 𝛾π‘₯,𝑑 (𝑆)
𝑓 = πŸπ‘†
the Gaussian limiting case
𝑓 ∢ ℝ𝑛 β†’ ℝ+ and 𝔼 𝑓 = ∫ 𝑓 𝑑𝛾𝑛 = 1
Brownian semi-group: 𝑃𝑑 𝑓 π‘₯ = 𝔼[𝑓 π‘₯ + 𝐡𝑑 ]
Gaussian convolution conjecture: For every 𝑑 < 1, there is a
πœ™: ℝ+ β†’ ℝ+ so that for every such 𝑓,
β„™ 𝑃1βˆ’π‘‘ 𝑓 𝐡𝑑 β‰₯ 𝛼 ≀
πœ™ 𝛼
𝛼
and πœ™ 𝛼 β†’ 0 as 𝛼 β†’ ∞
- Special case of discrete cube conjecture
- Previously unknown for any 𝑑 > 0
- 𝑛 = 1 is an [*, O’Donnell 2014] exercise
- True in any fixed dimension
[Ball, Barthe, Bednorz, Oleszkiewicz, and Wolff, 2010]
the Gaussian limiting case
𝑓 ∢ ℝ𝑛 β†’ ℝ+ and 𝔼 𝑓 = ∫ 𝑓 𝑑𝛾𝑛 = 1
Brownian semi-group: 𝑃𝑑 𝑓 π‘₯ = 𝔼[𝑓 π‘₯ + 𝐡𝑑 ]
Gaussian convolution conjecture: For every 𝑑 > 0, there is a
πœ™: ℝ+ β†’ ℝ+ so that for every such 𝑓,
β„™ π‘ˆπ‘‘ 𝑓 β‰₯ 𝛼 ≀
πœ™ 𝛼
𝛼
and πœ™ 𝛼 β†’ 0 as 𝛼 β†’ ∞
Ornstein-Uhlenbeck semi-group:
π‘ˆπ‘‘ 𝑓 π‘₯ = π”Όπ‘βˆΌπ›Ύπ‘› 𝑓 𝑒 βˆ’π‘‘ π‘₯ + 1 βˆ’ 𝑒 βˆ’2𝑑 𝑍
the Gaussian limiting case
Theorem [Eldan-L 2014]:
If 𝑓 ∢ ℝ𝑛 β†’ ℝ+ satisfies 𝔼 𝑓 = 1 and
𝛻 2 log 𝑓 π‘₯ ≽ βˆ’π›½ 𝐼𝑑 for all π‘₯ ∈ ℝ𝑛
for some 𝛽 β‰₯ 1 then for all 𝛼 β‰₯ 𝑒 3 :
1 𝐢𝛽 log log 𝛼
ℙ𝑓β‰₯𝛼 ≀ β‹…
𝛼
log 𝛼
4
semi-log convexity
Theorem [Eldan-L 2014]:
If 𝑓 ∢ ℝ𝑛 β†’ ℝ+ satisfies 𝔼 𝑓 = 1 and
𝛻 2 log 𝑓 π‘₯ ≽ βˆ’π›½ 𝐼𝑑 for all π‘₯ ∈ ℝ𝑛
for some 𝛽 β‰₯ 1 then for all 𝛼 β‰₯ 𝑒 3 :
Fact: For any 𝑓 with 𝔼 𝑓 = 1, 4
1 𝐢𝛽 log log 𝛼
1
ℙ𝑓β‰₯𝛼 ≀ β‹…
2
𝛼 ≽ βˆ’ log 𝛼 𝐼𝑑
𝛻 log 𝑃1βˆ’π‘‘ 𝑓(π‘₯)
1βˆ’π‘‘
Corollary: If 𝑓 ∢ ℝ𝑛 β†’ ℝ+ satisfies 𝔼 𝑓 = 1 then for any
𝑑 < 1 and all 𝛼 β‰₯ 𝑒 3 ,
β„™ 𝑃1βˆ’π‘‘ 𝑓 𝐡𝑑
1 𝐢 log log 𝛼 4
β‰₯𝛼 ≀ β‹…
𝛼 (1 βˆ’ 𝑑) log 𝛼
some difficulties
Corollary: If 𝑓 ∢ ℝ𝑛 β†’ ℝ+ satisfies 𝔼 𝑓 = 1 then for any 𝑑 < 1
and all 𝛼 β‰₯ 2,
1
β„™ 𝑃1βˆ’π‘‘ 𝑓 𝐡𝑑 β‰₯ 𝛼 β‰ͺ
𝛼
What are the difficult functions 𝑓?
Good:
Noise insensitive
Bad:
Boundary far from origin
𝐡𝑑 has to past the crest!
half space
dust
proof sketch
𝔼 𝑓 = 1,
𝛻 2 log 𝑓 π‘₯ ≽ βˆ’π›½ 𝐼𝑑
𝑀𝑑 = 𝑃1βˆ’π‘‘ 𝑓 𝐡𝑑 is a (Doob) martingale
𝑀0 = 𝔼 𝑓 = 1
𝛾𝑛
𝑀1 = 𝑓 𝐡1
𝛿 𝐡1
Goal: β„™ 𝑀1 > 𝛼 β‰ͺ
1
𝛼
arguing about small-probability events
isn’t so easy…
random measure
conditioning
π‘Šπ‘‘ = 𝐡𝑑 conditioned on 𝐡1 ∼ 𝑓𝑑𝛾𝑛
[Doob transform]
Goal:
β„™ 𝑓 𝐡1 ∈ 𝛼, 2𝛼
β‰ͺ 1/𝛼
Suffices to prove that:
β„™ 𝑓 π‘Š1 ∈ 𝛼, 2𝛼
= π‘œ(1)
because
β„™ 𝑓 π‘Š1 ∈ 𝛼, 2𝛼
∼ 𝛼 β„™ 𝑓 𝐡1 ∈ 𝛼, 2𝛼
π‘Šπ‘‘ is an Itô process
Consider a process π‘Šπ‘‘ with π‘Šπ‘‘ = 0, and
π‘‘π‘Šπ‘‘ = 𝑑𝐡𝑑 + 𝑣𝑑 𝑑𝑑
where {𝑣𝑑 } is predictable
(deterministic function of 𝑑, {𝐡𝑠 : 𝑠 ∈ [0, 𝑑]})
𝑑
Integrating: π‘Šπ‘‘ = 𝐡𝑑 + ∫0 𝑣𝑠 𝑑𝑠 = Brownian motion + drift
Among all such drifts satisfying
π‘Š1 =
1
𝐡1 + ∫0 𝑣𝑑
𝑑𝑑 ∼ 𝑓 𝑑𝛾𝑛 ,
let 𝑣𝑑 be the one which minimizes
1
𝔼
0
𝑣𝑑
2
𝑑𝑑
Föllmer’s drift
Among all such drifts satisfying π‘Š1 = 𝐡1 +
1
∫0 𝑣𝑑
𝑑𝑑 ∼ 𝑓 𝑑𝛾𝑛 ,
let 𝑣𝑑 be the one which minimizes
1
𝔼
0
𝑣𝑑
2
𝑑𝑑
Lemma: 𝑣𝑑 is martingale.
π‘Šπ‘‘
Explicit form: 𝑣𝑑 = 𝛻 log 𝑃1βˆ’π‘‘ 𝑓 π‘Šπ‘‘ =
𝛻𝑃1βˆ’π‘‘ 𝑓 π‘Šπ‘‘
𝑃1βˆ’π‘‘ 𝑓 π‘Šπ‘‘
1
Theorem [Lehec 2010]: Ent 𝛾𝑛 (𝑓) = 𝔼
2
1
0
𝑣𝑑
2
𝑑𝑑
an energy/entropy optimal coupling
{𝐡𝑑 } 𝑛-dim Brownian motion, 𝑓 ∢ ℝ𝑛 β†’ ℝ+ with 𝔼𝑓 = 1
Construct π‘Šπ‘‘ so that π‘Š1 ∼ 𝑓𝑑𝛾𝑛
π‘Š0 = 0
π‘‘π‘Šπ‘‘ = 𝑑𝐡𝑑 + 𝑣𝑑 𝑑𝑑
𝛻𝑃1βˆ’π‘‘ 𝑓 π‘Šπ‘‘
𝑣𝑑 =
𝑃1βˆ’π‘‘ π‘Šπ‘‘
is a martingale
proof sketch
Suffices to prove that:
β„™ 𝑓 π‘Š1 ∈ 𝛼, 2𝛼
= π‘œ(1)
Idea:
Suppose that β„™ 𝑓 π‘Š1 ∈ 𝛼, 2𝛼 β‰₯ 𝑝, then
β„™ 𝑓 π‘Š1 ∈ 2𝛼, 4𝛼 β‰₯ 𝑝
β„™ 𝑓 π‘Š1 ∈ 4𝛼, 8𝛼 β‰₯ 𝑝
β‹―
log 𝛼
levels
Making 𝑓 bigger: We’ll use 𝛻 2 log 𝑓 π‘₯ ≽ βˆ’π›½ 𝐼𝑑
log 𝑓 π‘Š1 + 𝑒 β‰₯ log 𝑓 π‘Š1 + βŒ©π‘’, 𝛻 log 𝑓 π‘Š1 βŒͺ βˆ’π›½ 𝑒
= log 𝑓 π‘Š1 + βŒ©π‘’, 𝑣1 βŒͺ βˆ’ 𝛽 𝑒
2
2
proof sketch
Pushing π‘Š1 in the direction of the drift at time 𝑑 = 1:
𝑓 π‘Š1 + 𝑒 β‰₯ 𝑓 π‘Š1 exp 𝑒, 𝑣1 βˆ’ 𝛽 𝑒 2
Setting 𝑒 = 𝛿𝑣1 (𝛿 small) multiplies the value of 𝑓.
want to say that π‘Š1 could do it
(without our help)
Girsanov’s theorem
Consider an Itô process 𝑑𝑋𝑑 = 𝑑𝐡𝑑 + 𝑣𝑑 𝑑𝑑
* under suitable conditions
Let 𝑃 be the BM measure of 𝐡𝑑 and 𝑄 be the BM measure of 𝑋𝑑 .
Then under the change of measure:
𝑑𝑄
= exp βˆ’
𝑑𝑃
1
0
1
𝑣𝑑 , 𝑑𝐡𝑑 βˆ’
0
𝑣𝑑
{𝑋𝑑 ∢ 𝑑 ∈ [0,1]} has the law of Brownian motion.
2
𝑑𝑑
the greedy perturbation
𝑣𝑑 = 𝛻 log 𝑃1βˆ’π‘‘ 𝑓 π‘Šπ‘‘
π‘‘π‘Šπ‘‘ = 𝑑𝐡𝑑 + 𝑣𝑑 𝑑𝑑
𝑑𝑋𝑑𝛿 = 𝑑𝐡𝑑 + 1 + 𝛿 𝑣𝑑 𝑑𝑑
Now we can argue that π‘Šπ‘‘ β‰ˆ 𝑋𝑑𝛿 (Girsanov’s theorem).
What about 𝑓 𝑋1𝛿 ≫ 𝑓 π‘Š1 ?
Note that 𝑋1𝛿
= π‘Š1 +
1
𝛿 ∫0 𝑣𝑑
𝑑𝑑, and recall
𝑓 π‘Š1 + 𝑒 β‰₯ 𝑓 π‘Š1 exp 𝑒, 𝑣1 βˆ’ π‘œ(1)
Big question:
1
Does ∫0 𝑣𝑑 𝑑𝑑 point in the direction of the gradient 𝑣1 ?
a balancing act
𝑣𝑑 = 𝛻 log 𝑃1βˆ’π‘‘ 𝑓 π‘Šπ‘‘
π‘‘π‘Šπ‘‘ = 𝑑𝐡𝑑 + 𝑣𝑑 𝑑𝑑
𝑑𝑋𝑑𝛿 = 𝑑𝐡𝑑 + 1 + 𝛿 𝑣𝑑 𝑑𝑑
Let 𝑄 be the BM law of π‘Š1 and 𝑄𝛿 the BM law of 𝑋1𝛿 .
Girsanov:
𝑑𝑄𝛿
= exp 𝛿
𝑑𝑄
1
0
𝛿2
𝑣𝑑 , 𝑑𝐡𝑑 βˆ’ 𝛿 +
2
Gradient estimate:
𝑓 𝑋1𝛿 β‰₯ 𝑓 π‘Š1 exp
1
𝛿 ∫0 𝑣𝑑 𝑑𝑑, 𝑣1
1
0
𝑣𝑑
2 𝑑𝑑
a balancing act
Since 𝑣𝑑 is a martingale 𝔼 𝑣1 𝐡𝑑 = 𝔼 𝑣𝑑 𝐡𝑑 , so
1
𝔼 𝛿
0
Girsanov:
𝑑𝑄𝛿
= exp 𝛿
𝑑𝑄
1
𝑣𝑑 𝑑𝑑, 𝑣1
=𝛿𝔼
0
𝑣𝑑
2
𝑑𝑑
[ The technically difficult part here is concentration:
Getting these to happen at the same time. ]
1
0
𝛿2
𝑣𝑑 , 𝑑𝐡𝑑 βˆ’ 𝛿 +
2
Gradient estimate:
𝑓 𝑋1𝛿 β‰₯ 𝑓 π‘Š1 exp
1
𝛿 ∫0 𝑣𝑑 𝑑𝑑, 𝑣1
1
0
𝑣𝑑
2 𝑑𝑑
a balancing act
For 𝛿 = π‘˜/ log 𝛼, π‘˜ = 1,2, … , log 𝛼:
If β„™ 𝑓 π‘Š1 ∈ 𝛼, 2𝛼
β‰₯ 𝑝, then
β„™ 𝑓 π‘Š1 ∈ 2𝛼, 4𝛼
β„™ 𝑓 π‘Š1 ∈ 4𝛼, 8𝛼
…
Girsanov:
𝑑𝑄𝛿
= exp 𝛿
𝑑𝑄
1
0
β‰₯ 𝑝/10
β‰₯ 𝑝/10
𝛿2
𝑣𝑑 , 𝑑𝐡𝑑 βˆ’ 𝛿 +
2
Gradient estimate:
𝑓 𝑋1𝛿 β‰₯ 𝑓 π‘Š1 exp
1
𝛿 ∫0 𝑣𝑑 𝑑𝑑, 𝑣1
1
0
𝑣𝑑
2 𝑑𝑑
conclusion
Process for βˆ’1,1 𝑛 analogous to π‘Šπ‘‘ : Sample coordiantes
one by one to have the right marginals conditioned on the past.
(Actually, an analogous process for any Markov chain.)
Extension to discrete spaces? [$1,000]
Additional randomness causes significant concentration issues
Can prove log-Sobolev and Talagrand’s Entropy-Transport
inequality in a few lines based on this. There is an information
theory interpretation: Chain rule ↔ Martingale property
These proofs use first order derivatives of 𝑓, while our proof of the
convolution conjecture in Gaussian space uses second order properties
(perturbation)
Challenge: Stein can prove that 𝐿2 mixing β‡’ log-Sobolev inequality.
Can you?