Hypercontractive inequality
Download
Report
Transcript Hypercontractive inequality
Regularization under diffusion
and Talagrandβs convolution conjecture
James R. Lee
University of Washington
Joint work with Ronen Eldan (Weizmann)
noise, heat, and smoothness
π βΆ β1,1
π
ββ
The discrete βheat flowβ operator ππ : πΏ2 β1,1
is defined for π β [0,1] by
π
β πΏ2 ( β1,1 π )
(ππ π) π₯1 , π₯2 , β¦ , π₯π = πΌ π π₯1π , π₯2π , β¦ , π₯ππ
where π₯ππ = π₯π with prob 1 β π and π₯ππ = Β±1 with prob π/2 each
(independently for each π)
ππ is diagonalized in the Fourier basis and dampens high-degree
Fourier coefficients:
ππ ππ = 1 β π π ππ
noise, heat, and smoothness
π βΆ β1,1
π
ββ
The discrete βheat flowβ operator ππ : πΏ2 β1,1
is defined for π β [0,1] by
π
β πΏ2 ( β1,1 π )
(ππ π) π₯1 , π₯2 , β¦ , π₯π = πΌ π π₯1π , π₯2π , β¦ , π₯ππ
where π₯ππ = π₯π with prob 1 β π and π₯ππ = Β±1 with prob π/2 each
(independently for each π)
General principle:
π smooth β ππ π smoother
Many applications:
PCPs & hardness of approximation, statistical
physics, threshold phenomena, social choice,
circuit complexity, information theory
noise, heat, and smoothness
Hypercontractive inequality [Bonami, Gross, Nelson]:
For every π > 0:
ππ βΆ πΏπ β1,1 π β πΏπ β1,1 π
is a contraction for some π > π > 1.
Here: π
π
π
π
= πΌ π
π 1/π
small for π > 1 β ππ π
π
small for π > π
If π β indicator of a subset π β β1,1 π , then this is encodes
isoperimetric profile / βsmall set expansion.β
noise, heat, and smoothness
Relative entropy: For π βΆ β1,1
π
β β+ with πΌ π = 1,
Ent(π) = πΌ π log π = πΌπβΌπ [log π π ]
Gradient: πΌ π»π
2
=
π
π=1 πΌ[
π π₯ β ππ β π π₯
2]
Log-Sobolev inequality [Gross]: For every such π βΆ β1,1
Ent(π) β€ πΌ π» π
2
π
β β Ent(ππ π)
ππ
π
π=0
β β+
noise, heat, and smoothness
Hypercontractive inequality:
For π > 0 and π > 1, there is a π > π such that
ππ π
for all π βΆ β1,1
π
π
β€ π
π
β β.
Log-Sobolev inequality: Ent(π) β€ πΌ π» π
2
Talagrand (1989):
What about regularization if we only know πΌ π = 1 ?
the convolution conjecture
π βΆ β1,1
π
β β+ and πΌ π = 1
Markovβs inequality: β π β₯ πΌ β€
1
πΌ
(tight for π = scaled indicator on a set of measure 1/πΌ)
Convolution conjecture [Talagrand 1989, $1,000]:
For every π > 0, there exists π: β+ β β+ so that for every π,
β ππ π β₯ πΌ β€
π πΌ
πΌ
and π πΌ β 0 as πΌ β β
- Best function is probably π πΌ βΌ
1
log πΌ
(achieved for halfspaces)
- Conjecture open for any fixed π > 0, even for indicators of sets:
π = ππ , π β {β1,1}π
anti-concentration of temperature
Convolution conjecture [Talagrand 1989]:
For every π > 0, there exists π: β+ β β+ so that for every π with
πΌπ = 1, β ππ π β₯ πΌ β€
π πΌ
πΌ
and π πΌ β 0 as πΌ β β
Equivalent to the conjecture that
πΌ ππ π 1 ππ πβ πΌ,2πΌ
β€π πΌ
βTemperatureβ cannot concentrate at a single (high) level.
anti-concentration of temperature
the Gaussian limiting case
π βΆ βπ β β+ and πΌ π = β« πππΎπ = 1
(πΎπ is standard π-dim gaussian)
Let π΅π‘ be an π-dimensional Brownian motion with π΅π‘ = 0.
Brownian semi-group: ππ‘ π π₯ = πΌ[π π₯ + π΅π‘ ]
ππ‘ ππ π₯ = πΎπ₯,π‘ (π)
π = ππ
the Gaussian limiting case
π βΆ βπ β β+ and πΌ π = β« π ππΎπ = 1
Brownian semi-group: ππ‘ π π₯ = πΌ[π π₯ + π΅π‘ ]
Gaussian convolution conjecture: For every π‘ < 1, there is a
π: β+ β β+ so that for every such π,
β π1βπ‘ π π΅π‘ β₯ πΌ β€
π πΌ
πΌ
and π πΌ β 0 as πΌ β β
- Special case of discrete cube conjecture
- Previously unknown for any π‘ > 0
- π = 1 is an [*, OβDonnell 2014] exercise
- True in any fixed dimension
[Ball, Barthe, Bednorz, Oleszkiewicz, and Wolff, 2010]
the Gaussian limiting case
π βΆ βπ β β+ and πΌ π = β« π ππΎπ = 1
Brownian semi-group: ππ‘ π π₯ = πΌ[π π₯ + π΅π‘ ]
Gaussian convolution conjecture: For every π‘ > 0, there is a
π: β+ β β+ so that for every such π,
β ππ‘ π β₯ πΌ β€
π πΌ
πΌ
and π πΌ β 0 as πΌ β β
Ornstein-Uhlenbeck semi-group:
ππ‘ π π₯ = πΌπβΌπΎπ π π βπ‘ π₯ + 1 β π β2π‘ π
the Gaussian limiting case
Theorem [Eldan-L 2014]:
If π βΆ βπ β β+ satisfies πΌ π = 1 and
π» 2 log π π₯ β½ βπ½ πΌπ for all π₯ β βπ
for some π½ β₯ 1 then for all πΌ β₯ π 3 :
1 πΆπ½ log log πΌ
βπβ₯πΌ β€ β
πΌ
log πΌ
4
semi-log convexity
Theorem [Eldan-L 2014]:
If π βΆ βπ β β+ satisfies πΌ π = 1 and
π» 2 log π π₯ β½ βπ½ πΌπ for all π₯ β βπ
for some π½ β₯ 1 then for all πΌ β₯ π 3 :
Fact: For any π with πΌ π = 1, 4
1 πΆπ½ log log πΌ
1
βπβ₯πΌ β€ β
2
πΌ β½ β log πΌ πΌπ
π» log π1βπ‘ π(π₯)
1βπ‘
Corollary: If π βΆ βπ β β+ satisfies πΌ π = 1 then for any
π‘ < 1 and all πΌ β₯ π 3 ,
β π1βπ‘ π π΅π‘
1 πΆ log log πΌ 4
β₯πΌ β€ β
πΌ (1 β π‘) log πΌ
some difficulties
Corollary: If π βΆ βπ β β+ satisfies πΌ π = 1 then for any π‘ < 1
and all πΌ β₯ 2,
1
β π1βπ‘ π π΅π‘ β₯ πΌ βͺ
πΌ
What are the difficult functions π?
Good:
Noise insensitive
Bad:
Boundary far from origin
π΅π‘ has to past the crest!
half space
dust
proof sketch
πΌ π = 1,
π» 2 log π π₯ β½ βπ½ πΌπ
ππ‘ = π1βπ‘ π π΅π‘ is a (Doob) martingale
π0 = πΌ π = 1
πΎπ
π1 = π π΅1
πΏ π΅1
Goal: β π1 > πΌ βͺ
1
πΌ
arguing about small-probability events
isnβt so easyβ¦
random measure
conditioning
ππ‘ = π΅π‘ conditioned on π΅1 βΌ πππΎπ
[Doob transform]
Goal:
β π π΅1 β πΌ, 2πΌ
βͺ 1/πΌ
Suffices to prove that:
β π π1 β πΌ, 2πΌ
= π(1)
because
β π π1 β πΌ, 2πΌ
βΌ πΌ β π π΅1 β πΌ, 2πΌ
ππ‘ is an ItΓ΄ process
Consider a process ππ‘ with ππ‘ = 0, and
πππ‘ = ππ΅π‘ + π£π‘ ππ‘
where {π£π‘ } is predictable
(deterministic function of π‘, {π΅π : π β [0, π‘]})
π‘
Integrating: ππ‘ = π΅π‘ + β«0 π£π ππ = Brownian motion + drift
Among all such drifts satisfying
π1 =
1
π΅1 + β«0 π£π‘
ππ‘ βΌ π ππΎπ ,
let π£π‘ be the one which minimizes
1
πΌ
0
π£π‘
2
ππ‘
FΓΆllmerβs drift
Among all such drifts satisfying π1 = π΅1 +
1
β«0 π£π‘
ππ‘ βΌ π ππΎπ ,
let π£π‘ be the one which minimizes
1
πΌ
0
π£π‘
2
ππ‘
Lemma: π£π‘ is martingale.
ππ‘
Explicit form: π£π‘ = π» log π1βπ‘ π ππ‘ =
π»π1βπ‘ π ππ‘
π1βπ‘ π ππ‘
1
Theorem [Lehec 2010]: Ent πΎπ (π) = πΌ
2
1
0
π£π‘
2
ππ‘
an energy/entropy optimal coupling
{π΅π‘ } π-dim Brownian motion, π βΆ βπ β β+ with πΌπ = 1
Construct ππ‘ so that π1 βΌ πππΎπ
π0 = 0
πππ‘ = ππ΅π‘ + π£π‘ ππ‘
π»π1βπ‘ π ππ‘
π£π‘ =
π1βπ‘ ππ‘
is a martingale
proof sketch
Suffices to prove that:
β π π1 β πΌ, 2πΌ
= π(1)
Idea:
Suppose that β π π1 β πΌ, 2πΌ β₯ π, then
β π π1 β 2πΌ, 4πΌ β₯ π
β π π1 β 4πΌ, 8πΌ β₯ π
β―
log πΌ
levels
Making π bigger: Weβll use π» 2 log π π₯ β½ βπ½ πΌπ
log π π1 + π’ β₯ log π π1 + β©π’, π» log π π1 βͺ βπ½ π’
= log π π1 + β©π’, π£1 βͺ β π½ π’
2
2
proof sketch
Pushing π1 in the direction of the drift at time π‘ = 1:
π π1 + π’ β₯ π π1 exp π’, π£1 β π½ π’ 2
Setting π’ = πΏπ£1 (πΏ small) multiplies the value of π.
want to say that π1 could do it
(without our help)
Girsanovβs theorem
Consider an ItΓ΄ process πππ‘ = ππ΅π‘ + π£π‘ ππ‘
* under suitable conditions
Let π be the BM measure of π΅π‘ and π be the BM measure of ππ‘ .
Then under the change of measure:
ππ
= exp β
ππ
1
0
1
π£π‘ , ππ΅π‘ β
0
π£π‘
{ππ‘ βΆ π‘ β [0,1]} has the law of Brownian motion.
2
ππ‘
the greedy perturbation
π£π‘ = π» log π1βπ‘ π ππ‘
πππ‘ = ππ΅π‘ + π£π‘ ππ‘
πππ‘πΏ = ππ΅π‘ + 1 + πΏ π£π‘ ππ‘
Now we can argue that ππ‘ β ππ‘πΏ (Girsanovβs theorem).
What about π π1πΏ β« π π1 ?
Note that π1πΏ
= π1 +
1
πΏ β«0 π£π‘
ππ‘, and recall
π π1 + π’ β₯ π π1 exp π’, π£1 β π(1)
Big question:
1
Does β«0 π£π‘ ππ‘ point in the direction of the gradient π£1 ?
a balancing act
π£π‘ = π» log π1βπ‘ π ππ‘
πππ‘ = ππ΅π‘ + π£π‘ ππ‘
πππ‘πΏ = ππ΅π‘ + 1 + πΏ π£π‘ ππ‘
Let π be the BM law of π1 and ππΏ the BM law of π1πΏ .
Girsanov:
πππΏ
= exp πΏ
ππ
1
0
πΏ2
π£π‘ , ππ΅π‘ β πΏ +
2
Gradient estimate:
π π1πΏ β₯ π π1 exp
1
πΏ β«0 π£π‘ ππ‘, π£1
1
0
π£π‘
2 ππ‘
a balancing act
Since π£π‘ is a martingale πΌ π£1 π΅π‘ = πΌ π£π‘ π΅π‘ , so
1
πΌ πΏ
0
Girsanov:
πππΏ
= exp πΏ
ππ
1
π£π‘ ππ‘, π£1
=πΏπΌ
0
π£π‘
2
ππ‘
[ The technically difficult part here is concentration:
Getting these to happen at the same time. ]
1
0
πΏ2
π£π‘ , ππ΅π‘ β πΏ +
2
Gradient estimate:
π π1πΏ β₯ π π1 exp
1
πΏ β«0 π£π‘ ππ‘, π£1
1
0
π£π‘
2 ππ‘
a balancing act
For πΏ = π/ log πΌ, π = 1,2, β¦ , log πΌ:
If β π π1 β πΌ, 2πΌ
β₯ π, then
β π π1 β 2πΌ, 4πΌ
β π π1 β 4πΌ, 8πΌ
β¦
Girsanov:
πππΏ
= exp πΏ
ππ
1
0
β₯ π/10
β₯ π/10
πΏ2
π£π‘ , ππ΅π‘ β πΏ +
2
Gradient estimate:
π π1πΏ β₯ π π1 exp
1
πΏ β«0 π£π‘ ππ‘, π£1
1
0
π£π‘
2 ππ‘
conclusion
Process for β1,1 π analogous to ππ‘ : Sample coordiantes
one by one to have the right marginals conditioned on the past.
(Actually, an analogous process for any Markov chain.)
Extension to discrete spaces? [$1,000]
Additional randomness causes significant concentration issues
Can prove log-Sobolev and Talagrandβs Entropy-Transport
inequality in a few lines based on this. There is an information
theory interpretation: Chain rule β Martingale property
These proofs use first order derivatives of π, while our proof of the
convolution conjecture in Gaussian space uses second order properties
(perturbation)
Challenge: Stein can prove that πΏ2 mixing β log-Sobolev inequality.
Can you?