Transcript Document
Pixel Recovery via l1 Minimization in the Wavelet Domain * Onur * Ivan W. Selesnick, Richard Van Slyke, and # G. Guleryuz presenting author *: Polytechnic University, Brooklyn, NY #: DoCoMo Communications Laboratories USA, Inc., San Jose, CA Overview •Problem statement: Estimation/Recovery of missing data. •Formulation as a linear expansion over overcomplete basis. •Expansions that minimize the l1 norm. •Why do this? •Connections to adaptive linear estimators and sparsity. •Connections to recent results and statistics •Simulation results and comparisons to our earlier work. •Why not to do this: Analysis of what is going on. •Conclusion and ways of modifying the solutions for better results. ( Presentation is much more detailed than the paper.) ( Some software available, please check the paper.) Problem Statement 1. Original x0 x x1 available pixels lost pixels (assume zero mean) Image 2. Lost Block 3. Derive predicted x0 }n0 0 }n (n0 n1 N ) 1 x0 y x̂1 Formulation x0 y x̂1 ( P0 y x0 ) available data projection 1. Take NxM matrix of overcomplete basis, M N H h1 h2 ... hM 2. Write y in terms of the basis M y Hc ci hi i 1 3. Find the expansion coefficients (two ways) Find the expansion coefficients to minimize the l1 norm M M min c i 1 i subject to P0 ci hi x0 i 1 l1 norm of expansion coefficients Regularization Available data constraint Why minimize the l1 norm? M M min c i 1 i subject to P0 ci hi x0 i 1 “Under i.i.d. Laplacian model for coefficient probabilities, max p(c1 , c2 ,..., cM ) min l1 p (ci ) norm Bogus reason Real reason: sparse decompositions. 2 e |c i | What does sparsity have to do with estimation/recovery? x0 y x̂1 M y ci hi M P0 ci hi x0 i 1 i 1 M y ci ( x0 )hi i 1 1. Any such decomposition builds an adaptive linear estimate. xˆ1 Ax0 A A( x0 ) 2. In fact “any” estimate can be written in this form. Onur G. Guleryuz, ``Nonlinear Approximation Based Image Recovery Using Adaptive Sparse Reconstructions and Iterated Denoising: Part I - Theory,‘’ IEEE Tr. on IP, in review. http://eeweb.poly.edu/~onur (google: onur guleryuz). The recovered signal must be sparse 3. The recovered becomes 1 y x0 A n0 N N 1 A null space of dimension y has to be sparse n1 N n0 x0 n0 y di gi xˆ1 i 1 Onur G. Guleryuz, ``Nonlinear Approximation Based Image Recovery Using Adaptive Sparse Reconstructions and Iterated Denoising: Part I - Theory,‘’ IEEE Tr. on IP, in review. http://eeweb.poly.edu/~onur (google: onur guleryuz). Who cares about y, what about the original x? If successful prediction is possible x also has to be ~sparse i.e., if || x y ||2 small, then x ~ sparse 1. Predictable sparse 2. Sparsity of x is not a bad leap of faith to make in estimation If not sparse, cannot estimate well anyway. (caveat: the data may be sparse, but not in the given basis) Onur G. Guleryuz, ``Nonlinear Approximation Based Image Recovery Using Adaptive Sparse Reconstructions and Iterated Denoising: Part I - Theory,‘’ IEEE Tr. on IP, in review. http://eeweb.poly.edu/~onur (google: onur guleryuz). Why minimize the l1 norm? M M min c i 1 i subject to P0 ci hi x0 i 1 Under certain conditions the l1 problem gives the solution to the l0 problem: min card (S ( x0 )) S ( x0 ) subject to P0 c h iS ( x0 ) i i x0 Find the “most predictable”/sparsest expansion that agrees with the data. (solving l1 convex, not combinatorial) D. Donoho, M. Elad, and V. Temlyakov, ``Stable Recovery of Sparse Overcomplete Representations in the Presence of Noise‘’. http://www-stat.stanford.edu/~donoho/reports.html Why minimize the l1 norm? Experience from statistics literature. The “lasso” is known to generate sparse expansions. min || P0 c h x iS ( x0 ) i i 0 M ||2 subject to c i 1 i t R. Tibshirani, ``Regression shrinkage and selection via the lasso’’. J. Royal. Statist. Soc B., Vol. 58, No. 1, pp. 267-288. Simulation Results l1 : M M min c i 1 i subject to P0 ci hi x0 i 1 vs. Iterated Denosing (ID) with no layering and no selective thresholding: Onur G. Guleryuz, ``Nonlinear Approximation Based Image Recovery Using Adaptive Sparse Reconstructions and Iterated Denoising: Part II –Adaptive Algorithms,‘’ IEEE Tr. on IP, in review. http://eeweb.poly.edu/~onur (google: onur guleryuz). H: Two times expansive M=2N, real, dual-tree, DWT. Real part of: N. G. Kingsbury, ``Complex wavelets for shift invariant analysis and filtering of signals,‘’ Appl. Comput. Harmon. Anal., 10(3):234-253, May 2002. Simulation Results Original l1: 23.49 dB Missing ID: 25.39 dB 5 5 5 5 10 10 10 10 15 15 15 15 20 20 20 20 10 20 10 Original 20 10 20 10 l1: 21.40 dB Missing ID: 30.38 dB 5 5 5 5 10 10 10 10 15 15 15 15 20 20 20 20 10 20 10 20 10 20 20 10 20 Sparse Modeling Generates NonConvex Problems missing pixel x x c1 c2 available pixel Pixel coordinates for a “two pixel” image available pixel constraint x Transform coordinates “Sparse=non-convex”, who cares. What about reality, natural images? + (1 ) = Geometry x l1 ball x Case 1 Case 2 x Case 3 Not sparse Bogus reason Why not to minimize the l1 norm What about all the optimality/sparsest results? Results such as: D. Donoho et. al. ``Stable Recovery of Sparse Overcomplete Representations in the Presence of Noise‘’. are very impressive, but they are tied closely to H providing the sparsest decomposition for x. overwhelming noise: x0 x0 w x y x1 0 1 modeling error 2 error due to missing data ( 2 n1 2 aN 2 ) msel1 ( x, y ) f ( 1 , 2 ) Why not to minimize the l1 norm M M min c i 1 (problem due to 2 subject to i P0 ci hi x0 i 1 M ) H h1 h2 ... hM ~ ~ ~ h1 h2 ... hM P0 H 0 c P h i 1 i 0 i x0 “nice” basis, “decoherent” “not nice” basis (due to cropping), may become very “coherent” Examples 1 / 3 1 / 2 H 1 / 3 0 1 / 3 1 / 2 1/ 6 2 / 6 1 / 6 1 / 3 1 / 2 1 / 6 P0 H 0 0 0 0 0 0 orthonormal, coherency=0 unnormalized coherency= 1/ 6 normalized coherency= 1 (worst possible) 1. Optimal solution sometimes tries to make coefficients of scaling functions zero. 2. l1 solution never sees the actual problem. What does ID do? x0 x0 w x y x1 0 2 1 x w x 0 y 0 1 1 2 2 Progression 1: x0 w x y * Progression 2: •Decomposes the big problem into many progressions. •Arrives at the final complex problem by solving much simpler problems. • l is conceptually a single 1 step, greedy version of ID. Uses correctly modeled components to reduce the overwhelming errors/”noise” ... ID is all about robustly selecting sparsity •Tries to be sparse, not the sparsest. •Robust to model failures. •Other constraints easy to incorporate Conclusion 1. Have to be more agnostic than smoothest, sharpest, smallest, sparsest, *est. minimum mse not necessarily = sparsest 2. Have to be more robust to modeling errors. When a convex approximation is possible to the underlying non-convex problem, great. But have to make sure assumptions are not violated. For lasso/ l1 fans: 3. Is it still possible to use l1, but with ID principles? Yes M M min c i 1 P0 ci hi x0 subject to i i 1 x0 min ci subject to || cihi ||2 T i 1 i 1 0 M M M ch i 1 i i y available data Do you think you reduced mse? No: you shouldn’t have done this. Yes: Do it again. M M min c i 1 i subject to || ch y || T i 1 i i 2 y ... But must ensure no case 3 problems (ID stays away from those). 1. It’s not about the lasso or how you tighten the lasso, it’s about what (plural) you tighten the lasso to. 2. This is not “LASSO”, “LARS”, .... This is Iterated Denoising (use hard thresholding!).