Non-local means: a look at non-local self-similarity

Download Report

Transcript Non-local means: a look at non-local self-similarity

Non-local means: a look at nonlocal self-similarity of images
IT 530, LECTURE NOTES
Partial Differential Equations (PDEs):
Heat Equation
I
 div(I )  I xx  I yy
t
(t )
I (t 1)  I (t )  t ( I xx(t )  I yy
)
I ( 0)  noisy image
• Inspired from thermodynamics
• Blurs out edges
Executing several iterations of
this PDE on a noisy image is
equivalent to convolving the
same image with a Gaussian!
The “sigma” of the Gaussian is
directly proportional to the
number of time-steps of the
PDE.
2
PDEs: Anisotropic Diffusion
I (t 1)  I (t )  t ((I xx  I yy ) g (| I |)  g.I )
• Diffusivity function “g”.
• Decreasing function of gradient
magnitude.
• Preserve edges: Diffuse along
edges not across.
Several papers: Perona and Malik [IEEE PAMI 1990],
Total variation method [Rudin et al, 1992], Beltrami flow
[Sochen et al, IEEE TIP 1998], etc.
3
PDEs: Total Variation
• Total variation denoising seeks to minimize
the following energy functional:
E ( I ( x, y))   | I ( x, y) | dxdy

I

t
I xx I y2  2 I xy I x I y  I yy I x2
( I x2  I y2 )1.5
Euler-Lagrange equation
(Partial differential equation): exhibits anisotropic behaviour due to gradient
magnitude term in the denominator. Diffusion is low across strong edges.
Heat
equation
Perona-Malik
PDE
Total variation
Neighborhood Filters for Denoising
I n ( x, y)  I ( x, y) 
1
I ( x, y) 
I (x j , y j )

| N ( x, y) | jN ( x, y )
Simple averaging filter – will
cause blurring of edges and
textures in the image
Denoising with a
neighborhood filter
Neighborhood Filters for Denoising:
Lee Filter
• Weigh the pixels in the neighborhood by
factors inversely proportional to the distance
between the central pixel and the particular
pixel used for weighting.
• This is expressed as:
I ( x, y ) 
More weight to
nearby pixels
 I ( x , y )e
jN ( x , y )
j
e
 (( x  x j ) 2  ( y  y j ) 2 ) /( 2 2 )
j
(( x  x j ) 2  ( y  y j ) 2 ) /( 2 2 )
jN ( x , y )
Anisotropic Neighborhood Filter
(Yaroslavsky Filter)
• Weigh the pixels in the neighborhood by
factors inversely proportional to the difference
between the intensity values at those pixels
and the intensity value of the pixel to be
denoised.
• This is expressed as:
More weight to
pixels with similar
intensity values:
better preservation
of edges/boundaries
I ( x, y ) 
 I ( x , y )e
jN ( x , y )
j
e
 ( I ( x , y )  I ( x j , y j )) 2 /( 2 2 )
j
( I ( x , y )  I ( x j , y j )) 2 /( 2 2 )
jN ( x , y )
Bilateral Filter (Lee+Yaroslavsky Filter)
• Weigh the pixels in the neighborhood by
factors inversely proportional to the difference
between the intensity values at those pixels
and the intensity value of the pixel to be
denoised, and the difference in pixel locations.
• This is expressed as:
More weight to
pixels with similar I ( x,
intensity values:
better preservation
of edges/boundaries
y) 
 I ( x , y )e
jN ( x , y )
j
e

( xi  x j ) 2  ( yi  y j ) 2  ( I ( x , y )  I ( x j , y j )) 2
2 2
j

jN ( x , y )
( xi  x j ) 2  ( yi  y j ) 2  ( I ( x , y )  I ( x j , y j )) 2
2 2
Comparative Results
Comparative Results
• The anisotropic diffusion algorithm performs
better than the others.
• In the Yaroslavsky/Bilateral filter, the comparison
between the intensity values is not very robust.
This creates artifacts around the edges.
• Performance difference between Yaroslavsky and
bilateral filter is minor.
• All aforementioned filter are based on the
principle of piece-wise constant intensity
images.
Non-local self-similarity
Non-local self-similarity
is very useful in
denoising (and almost
everything else in image
processing).
For denoising, you could
simply take an average
of all those patches that
were “similar” (modulo
noise).
Non-local Means
Natural images have a great
deal of redundancy: patches
from different regions can be
very similar
NL-Means: a non-local pixelbased method
(Buades et al, 2005)
•Awate and Whitaker (PAMI 2007)
•Popat and Picard (TIP 1998)
•De-Bonet (MIT Tech report 1998)
•Wang et al (IEEE SPL 2003)
Difference
between
patches
14
Non-local means: Basic Principle
• Non-local means compares entire patches (not
individual pixel intensity values) to compute
weights for denoising pixel intensities.
• Comparison of entire patches is more robust,
i.e. if two patches are similar in a noisy image,
they will be similar in the underlying clean
image with very high probability.
• We will see this informally and prove it
mathematically in due course.
Non-local means: Variant
 w I (x , y )
j
I ( xi , yi ) 
(xj ,y j )
w
(xj ,y j )
j
j
j
Euclidean distance between two
patches is being weighted by a
Gaussian with maximum weight
at the center of the two patches
and decaying outwards
 t ,u  p / 2
2 
   G (u , v)(I ( xi  u , yi  t )  I ( x j  u , y j  t )) 
 t ,u   p / 2

w j  exp
t ,u   p / 2

G (u , v)



t ,u   p / 2


u 2  v2
G (u , v)  exp(
)
2
2
Three principles to evaluate denoising
algorithms
• (1): The residual image (also called “method noise”) –
defined as the difference between the noisy image and
the denoised image – should look like (and have all the
properties of) a pure noise image.
• (2): A denoising algorithm should transform a pure
noise image into another noise image (of lower
variance).
• (3): A competent denoising algorithm should find for
any pixel ‘i’, all and only those pixels ‘j’ that have the
same model as ‘i’ (i.e. those pixels whose intensity
would have most likely been the same as that of ‘i’, if
there were no noise).
Principle 1: Residual Image
Principle 1: Residual Image
Principle 2: Noise to noise
Principle 3: Correct models?
The pixels with high weight in anisotropic diffusion or
bilateral filters do NOT line up with our expectation (in
all images!). This is because noise affects the gradient
computation or single intensity driven weights.
In NL-means, the comparison between patches is
MUCH more robust to noise!
Non-local means: Implementation
details
• A drawback of the algorithm is its very high
time complexity – O(N x N) for an image with
N pixels.
• Heuristic work-around: given a reference
patch, restrict the research for similar patches
to a window of size S x S (called as “search
zone”) around the center of the reference
patch.
Non-local means implementation
details
• The parameter sigma to compute the weights will
depend on the noise variance. Heuristic relation
is:
  k n , k [0.75,1)
• Patch-size is a free parameter – usually some size
between 7 x 7 and 21 x 21 is chosen. Larger
patch-size – better discrimination of the truly
similar patches, but more expensive and more
(over)smoothing.
• Smaller patch-size – less smoothing.
Patch-size selection
Patch-size too small: mottling effect (fake
edges/patterns in constant intensity
regions)
Patch-size too large: oversmoothing of
subtle textures and edges
Ref: Duval and Gousseau, “A bias-variance approach for the non-local means”
Gray region
(containing patch P)
Black region
(containing patch Q)
Noisy gray region
(containing patch U(x))
Ref: Duval and Gousseau, “A biasvariance approach for the non-local
means”
s2
Assume patch-size is s x s.
Assume noise from N(0,1).
U ( x)  P    n
2
2
i 1
2
i
s2
s2
i 1
i 1
U ( x)  Q   (   ni ) 2   ( 2  2 ni   2 ni2 )
2
s2
P ( U ( x)  Q  U ( x)  P )  P ( s 2 2  2  ni  0)
2
2
i 1
s  1
 P(

ni )

2
s i 1
s2



s
2
This is a zero-mean Gaussian
random variable with variance 1
1  x2 / 2
s
e
dx  erfc( )
2
2
Discriminability improves as patch-
By definition of erfc, this probability
decreases as ‘s’ increases.
size increases! It explains why NLmeans outperforms single-pixel
neighborhood filters!
Extension to Video denoising
• For video-denoising, simply denoising each
individual frame independently ignores temporal
similarity or redundancy.
• Most video denoising algorithms first perform a
motion compensation step: (1) estimate the
motion between consecutive frames, and (2)
align each successive frame to its previous frame.
• Motion estimation is performed typically by
exploiting the “brightness constancy
assumption”, i.e. that the intensity of any physical
point is unchanged throughout the video.
Extension to Video denoising
• The most popular motion compensation
algorithms also assume that the motion of
nearby pixels is similar (motion smoothness
assumption).
• You will study this in more detail in computer
vision: optical flow.
• Denoising is done after motion compensation
(assuming that pixels at the same coordinate in
successive frames will have same/similar
intensities).
Extension to Video denoising
• There are some problems in motion
estimation, even more so, if the video is noisy.
• One such issue is called the aperture problem
– for any block in one frame, there are many
matching blocks in the next frame.
Extension to video denoising
• The motion smoothness assumption is one
way to alleviate the aperture problem (again,
you will study this in more detail in computer
vision).
• On the next slide, we will see the performance
of the Lee filter and the Yaroslavsky filter, with
and without motion compensation.
NL-means
performs much
better!
NL-Means for video denoising
• Video data has tremendous redundancy (more than
individual frames).
• Any reference patch in one frame will have many
similar patches in other frames – the aperture problem
is NO problem for video denoising!
• So forget about motion compensation!
• Run NL-means on each frame, using similar patches
from that frame as well as from nearby frames.
• Advantages: avoids all the inevitable errors in motion
estimation, AND saves computational cost!
An information-theoretic (and
iterated) variant of NL-Means - UINTA
• UINTA = Unsupervised information-theoretic
adaptive filter.
• UINTA is again based on the principle of nonlocal similarity.
• It uses tools from information theory
(conditional entropy) and kernel density
estimation.
• Uses a simple observation about the entropy
of natural images.
Ref: Awate and Whitaker, Higher-order image
statistics for unsupervised, information-theoretic,
adaptive image filtering”
Principle of UINTA
• The conditional entropy of the intensity of a
central pixel given its neighbors is low in a
“clean” natural image.
• As noise is added, this entropy increases.
y1
y2
y5
X
y20
To denoise, you can
minimize the following
quantity at each pixel:
 
h( X  x | Y  y)
y24
Overview of UINTA algorithm
• For each pixel location i, we seek to minimize
the following quantity:
h( X i  x | Yi )
• For this do a gradient descent (at each
location) until convergence:
x
h( X i  x | Yi )
 x 
x
 value of pixel' i' in thenoisy image
( t 1)
x (0)
(t )
Mathematical details
• For image neighborhoods with n pixels, we
first need to estimate probability density
functions of random variables having n
(or n-1) dimensions.
• Consider the neighborhoods are denoted as
follows: Z  ( X , Y )
• The expression for the PDF of Z is as follows:
| Ai |
1

 
p ( Z  zi ) 
G ( zi  z j ;  )

| Ai | j 1
  diagonal cov.matrix with  2 on thediagonal
Mathematical Details
• The expression for the entropy is:
| Ai |


1
1
 
h( Z )   E (log p( Z ))    log
G( zi  z j ;  ) 

T i 1  | Ai | j 1

T
• The gradient descent is given on the following
slide.


zi  ( yi , xi )
Neighborhood
Chain rule
A projection vector
that extracts only
the dimension
corresponding to
the central pixel
xi
( t 1)
Central pixel to be
denoised

h( X | Y  yi )
(t )
 xi 
xi
Independent of
value of x
But h( X | Y )  h( X , Y )  h(Y )


h( X | Y  yi )  1  log p ( Z  zi )

xi
|T |
xi
 
 T
G ( zi  z j ,  )
 zi    1 

1 
 
 
 
 ( zi  z j )
 
 xi   | T |  z j Ai  G ( zi  z k ,  )
z k  Ai
1


| T | z j Ai
 
G ( zi  z j ,  ) ( xi  x j )
 
2
G
(
z

z
,

)

 i k
z k  Ai
 T
 zi   

 ( zi  z j )  ( xi  x j )
 xi 
Note!
Note! If you set the derivative of the conditional entropy to
zero (you do this since you want to minimize the conditional
entropy) and rearrange the terms, you get the NL-means
update for denoising. So UINTA can be considered an
iterated form of NL-means!
1

| T | z j Ai
 xi 
 
G ( zi  z j ,  ) ( xi  x j )
x
0
 
2
 G ( zi  z k ,  ) 
z k  Ai

z j  Ai
 
G ( zi  z j ,  ) x j
 
 G ( zi  z k ,  )
z k Ai
Earlier work on non-local similarity
• A technique similar (in principle) to UINTA was
developed by Popat and Picard in 1997.
• A training set of clean and degraded images was
used to learn the joint probability density of
degraded neighborhoods and clean central pixels.
• Given a noisy image, a pixel value is restored
using an MAP estimate.
• Unlike UINTA, this method requires prior training.
Texture synthesis or completion:
another use of non-local similarity
Ref: Efros and Leung,
“Texture Synthesis by
Non-parametric sampling”
Remember: a texture image contains very high repetition of “similar” patches all
over!
Method:
• For every pixel (x,y) that needs to be filled, collect
valid neighboring intensity values.
• Search throughout the image to find “similar”
neighborhoods.
• Assign the intensity at (x,y) as some weighted
combination of such central pixel values.
• Free parameters: size of the neighborhood and
the definition of “similar neighborhoods”.
• For pseudo-code, see
http://graphics.cs.cmu.edu/people/efros/research/EfrosLeung.html
Some more results
Something similar in Natural Language
Processing
• Collect sequences of n consecutive words (or
alphabets) from a large corpus of English text
(eg: newspaper, book etc.)
• Compute the probability of occurrence of the
(n+1)-th word given a preceding sequence of n
words.
• Sampling from such a conditional probability
table allows for construct of plausible Englishlike text.
Ref: Shannon, A mathematical theory of communication, 1948