Transcript Document

Pixel Recovery via l1 Minimization
in the Wavelet Domain
* Onur
*
Ivan W. Selesnick, Richard
Van Slyke, and
#
G. Guleryuz
presenting author
*: Polytechnic University, Brooklyn, NY
#:
DoCoMo Communications Laboratories USA, Inc., San Jose, CA
Overview
•Problem statement: Estimation/Recovery of missing data.
•Formulation as a linear expansion over overcomplete basis.
•Expansions that minimize the
l1 norm.
•Why do this?
•Connections to adaptive linear estimators and sparsity.
•Connections to recent results and statistics
•Simulation results and comparisons to our earlier work.
•Why not to do this: Analysis of what is going on.
•Conclusion and ways of modifying the solutions for better results.
( Presentation is much more detailed than the paper.)
( Some software available, please check the paper.)
Problem Statement
1. Original
 x0 
x 
 x1 
available pixels
lost pixels
(assume zero mean)
Image
2.
Lost
Block
3. Derive predicted
 x0  }n0
 0  }n (n0  n1  N )
  1
 x0 
y 
 x̂1 
Formulation
 x0 
y 
 x̂1 
( P0 y  x0 )
available data projection
1. Take NxM matrix of overcomplete basis,
M N
H  h1 h2 ... hM 
2. Write y in terms of the basis
M
y  Hc   ci hi
i 1
3. Find the expansion coefficients (two ways)
Find the expansion coefficients
to minimize the l1 norm
M
M
min
c
i 1
i
subject to
P0  ci hi  x0
i 1
l1 norm of expansion coefficients
Regularization
Available data constraint
Why minimize the l1 norm?
M
M
min
c
i 1
i
subject to
P0  ci hi  x0
i 1
“Under i.i.d. Laplacian model for coefficient probabilities,
max p(c1 , c2 ,..., cM )
min
l1
p (ci ) 
norm
Bogus reason
Real reason: sparse decompositions.

2
e   |c i |
What does sparsity have to do with
estimation/recovery?
 x0 
y 
 x̂1 
M
y   ci hi
M
P0  ci hi  x0
i 1
i 1
M
y   ci ( x0 )hi
i 1
1. Any such decomposition builds an adaptive linear estimate.
xˆ1  Ax0
A  A( x0 )
2. In fact “any” estimate can be written in this form.
Onur G. Guleryuz, ``Nonlinear Approximation Based Image Recovery Using Adaptive Sparse Reconstructions and Iterated
Denoising: Part I - Theory,‘’ IEEE Tr. on IP, in review.
http://eeweb.poly.edu/~onur (google: onur guleryuz).
The recovered signal must be
sparse
3. The recovered becomes
1
y    x0
 A
n0  N
N
1
 A
 
null space of dimension
y has to be sparse
n1  N  n0
 x0  n0
y      di gi
 xˆ1  i 1
Onur G. Guleryuz, ``Nonlinear Approximation Based Image Recovery Using Adaptive Sparse Reconstructions and Iterated
Denoising: Part I - Theory,‘’ IEEE Tr. on IP, in review.
http://eeweb.poly.edu/~onur (google: onur guleryuz).
Who cares about y, what about the
original x?
If successful prediction is possible x also has to be ~sparse
i.e., if
|| x  y ||2 small, then x ~ sparse
1. Predictable
sparse
2. Sparsity of x is not a bad leap of faith to make in estimation
If not sparse, cannot estimate well anyway.
(caveat: the data may be sparse, but not in the given basis)
Onur G. Guleryuz, ``Nonlinear Approximation Based Image Recovery Using Adaptive Sparse Reconstructions and Iterated
Denoising: Part I - Theory,‘’ IEEE Tr. on IP, in review.
http://eeweb.poly.edu/~onur (google: onur guleryuz).
Why minimize the l1 norm?
M
M
min
c
i 1
i
subject to
P0  ci hi  x0
i 1
Under certain conditions the l1 problem gives the solution to
the l0 problem:
min card (S ( x0 ))
S ( x0 )
subject to
P0
c h
iS ( x0 )
i i
 x0
Find the “most predictable”/sparsest expansion that agrees with the data.
(solving l1 convex, not combinatorial)
D. Donoho, M. Elad, and V. Temlyakov, ``Stable Recovery of Sparse Overcomplete Representations in the Presence of
Noise‘’.
http://www-stat.stanford.edu/~donoho/reports.html
Why minimize the l1 norm?
Experience from statistics literature. The “lasso” is known to
generate sparse expansions.
min || P0
c h  x
iS ( x0 )
i i
0
M
||2 subject to
c
i 1
i
t
R. Tibshirani, ``Regression shrinkage and selection via the lasso’’. J. Royal. Statist. Soc B., Vol. 58, No. 1, pp. 267-288.
Simulation Results
l1 :
M
M
min
c
i 1
i
subject to
P0  ci hi  x0
i 1
vs.
Iterated Denosing (ID) with no layering and no selective thresholding:
Onur G. Guleryuz, ``Nonlinear Approximation Based Image Recovery Using Adaptive Sparse Reconstructions and Iterated
Denoising: Part II –Adaptive Algorithms,‘’ IEEE Tr. on IP, in review.
http://eeweb.poly.edu/~onur (google: onur guleryuz).
H: Two times expansive M=2N, real, dual-tree, DWT. Real part of:
N. G. Kingsbury, ``Complex wavelets for shift invariant analysis and filtering of signals,‘’ Appl. Comput. Harmon. Anal.,
10(3):234-253, May 2002.
Simulation Results
Original
l1: 23.49 dB
Missing
ID: 25.39 dB
5
5
5
5
10
10
10
10
15
15
15
15
20
20
20
20
10
20
10
Original
20
10
20
10
l1: 21.40 dB
Missing
ID: 30.38 dB
5
5
5
5
10
10
10
10
15
15
15
15
20
20
20
20
10
20
10
20
10
20
20
10
20
Sparse Modeling Generates NonConvex Problems
missing pixel
x
x
c1
c2
available pixel
Pixel coordinates for a “two pixel” image
available pixel constraint
x
Transform coordinates
“Sparse=non-convex”, who cares.
What about reality, natural images?

+ (1   ) 
=
Geometry
x
l1
ball
x
Case 1
Case 2
x
Case 3
Not sparse
Bogus reason
Why not to minimize the l1 norm
What about all the optimality/sparsest results?
Results such as: D. Donoho et. al. ``Stable Recovery of Sparse Overcomplete
Representations in the Presence of Noise‘’.
are very impressive, but they are tied closely to H providing the sparsest
decomposition for x.
overwhelming noise:
 x0   x0 
w x       y
 x1   0 
1
modeling error
2
error due to
missing data
( 2  n1 2  aN 2 )
msel1 ( x, y )  f ( 1 ,  2 )
Why not to minimize the l1 norm
M
M
min
c
i 1
(problem due to
2
subject to
i
P0  ci hi  x0
i 1
M
)
H  h1 h2 ... hM 
~ ~
~
h1 h2 ... hM 
P0 H  

0


c P h
i 1
i 0 i
 x0
“nice” basis, “decoherent”
“not nice” basis (due to
cropping), may become very
“coherent”
Examples
1 / 3 1 / 2

H  1 / 3
0
1 / 3  1 / 2

1/ 6 

 2 / 6
1 / 6 
1 / 3 1 / 2 1 / 6 


P0 H   0
0
0 
 0

0
0


orthonormal, coherency=0
unnormalized coherency=
1/ 6
normalized coherency= 1
(worst possible)
1. Optimal solution sometimes tries to make coefficients of scaling functions zero.
2. l1 solution never sees the actual problem.
What does ID do?
 x0   x0 
w x       y
 x1   0 
2
1
x  
w  x   0   y
 0 


 1   1  2   2
Progression 1:
 x0
w  x     y
* 
Progression 2:
•Decomposes the big problem
into many progressions.
•Arrives at the final complex
problem by solving much
simpler problems.
• l is conceptually a single
1
step, greedy version of ID.
Uses correctly modeled
components to reduce the
overwhelming errors/”noise”
...
ID is all about robustly selecting
sparsity
•Tries to be sparse, not the sparsest.
•Robust to model failures.
•Other constraints easy to incorporate
Conclusion
1. Have to be more agnostic than smoothest, sharpest,
smallest, sparsest, *est.
minimum mse not necessarily = sparsest
2. Have to be more robust to modeling errors.
When a convex approximation is possible to the underlying non-convex problem,
great. But have to make sure assumptions are not violated.
For lasso/ l1 fans:
3. Is it still possible to use l1, but with ID principles? Yes
M
M
min
c
i 1
P0  ci hi  x0
subject to
i
i 1
 x0 
min  ci subject to ||  cihi    ||2  T 
i 1
i 1
0
M
M
M
 ch
i 1
i i
y
available
data
Do you think you reduced mse? No: you shouldn’t have done this. Yes: Do it again.
M
M
min
 c
i 1
i
subject to ||
 ch  y ||  T 
i 1
i i
2
y
...
But must ensure no case 3 problems (ID stays away from those).
1. It’s not about the lasso or how you tighten the lasso, it’s about what (plural) you
tighten the lasso to.
2. This is not “LASSO”, “LARS”, .... This is Iterated Denoising (use hard
thresholding!).