lect12_review

Transcript lect12_review

From Pixels to Features:
Review of Part 1
COMP 4900C
Winter 2008
Topics in part 1 – from pixels to features
• Introduction
• what is computer vision? It’s applications.
• Linear Algebra
• vector, matrix, points, linear transformation, eigenvalue,
eigenvector, least square methods, singular value decomposition.
• Image Formation
• camera lens, pinhole camera, perspective projection.
• Camera Model
• coordinate transformation, homogeneous coordinate,
intrinsic and extrinsic parameters, projection matrix.
• Image Processing
• noise, convolution, filters (average, Gaussian, median).
• Image Features
• image derivatives, edge, corner, line (Hough transform), ellipse.
General Methods
• Mathematical formulation
• Camera model, noise model
• Treat images as functions
I  f ( x, y)
• Model intensity changes as derivatives f  [I x , I y ]T
• Approximate derivative with finite difference.
• First-order approximation
I (i  u, j  v)  I (i, j)  I xu  I y v  I (i, j)  u vf
• Parameter fitting – solving an optimization
problem
Vectors and Points
We use vectors to represent points in 2 or 3 dimensions
y
v
Q(x2,y2)
P(x1,y1)
x
 x2  x1 
vQP  

y

y
1
 2
The distance between the two points:
D  Q  P  ( x2  x1 ) 2  ( y2  y1 ) 2
Homogeneous Coordinates
Go one dimensional higher:
 wx 
 x  
 y    wy 
 
 w 
 wx 
 x  
 y    wy 
   wz 
 z   
w
w is an arbitrary non-zero scalar, usually we choose 1.
From homogeneous coordinates to Cartesian coordinates:
 x1 
 x    x1 / x3 
 2   x2 / x3 
 x3 
 x1 
 x   x1 / x4 
 2    x2 / x4 

 x3  
 x   x3 / x4 
 4
2D Transformation with Homogeneous Coordinates
2D coordinate transformation:
 cos 
p' '  
 sin 
sin    p x  Tx 
 



cos    p y  Ty 
2D coordinate transformation using homogeneous coordinates:
 p x ' '  cos 
 p ' '   sin 
 y  
 1   0
sin  Tx   p x 
cos  Ty   p y 
 
0
1   1 
Eigenvalue and Eigenvector
We say that x is an eigenvector of a square matrix A if
Ax  x
 is called eigenvalue and x is called eigenvector.
The transformation defined by A changes only the
magnitude of the vector x
Example:
3 2  2   4 
2
3 2 1 5
1
1 4 1  5  51 and 1 4  1   2  2 1

   
 

   

1
2
5 and 2 are eigenvalues, and 1 and 1 are eigenvectors.

 
Symmetric Matrix
We say matrix A is symmetric if
AT  A
Example: B T B
is symmetric for any B, because
( BT B)T  BT ( BT )T  BT B
A symmetric matrix has to be a square matrix
Properties of symmetric matrix:
•has real eignvalues;
•eigenvectors can be chosen to be orthonormal.
• BT B has positive eigenvalues.
Orthogonal Matrix
A matrix A is orthogonal if
A A I
T
or
A A
T
1
The columns of A are orthogonal to each other.
Example:
 cos
A
 sin 
sin  

cos 
cos
A 
 sin 
1
 sin  
cos 
Least Square
When m>n for an m-by-n matrix A,
Ax  b
has no solution.
In this case, we look for an approximate solution.
We look for vector x such that
Ax  b
2
is as small as possible.
This is the least square solution.
Least Square
Least square solution of linear system of equations
Ax  b
Normal equation:
T
A A
A Ax  A b
T
is square and symmetric
The Least square solution
makes
T
Ax  b
2
1
x  ( A A) A b
minimal.
T
T
SVD: Singular Value Decomposition
An mn matrix A can be decomposed into:
A  UDV
T
U is mm, V is nn, both of them have orthogonal columns:
U U I
T
V V I
T
D is an mn diagonal matrix.
Example:
 2 0  1 0 0   2 0 
0  3  0  1 0 0 3 1 0

 

 0 1 

0 0  0 0 1 0 0 
Pinhole Camera
Why Lenses?
Gather more light from each scene
point
Four Coordinate Frames
xim
Yc
camera
frame
optical
center
Camera model:
y
pim
pixel
frame
image
plane
frame
Yw
Zw
yim
Xc
Zc
Pw
world
frame
x
principal
point
transformation
pim  
Pw

matrix


Xw
Perspective Projection
P
X
Y
x f
y f
Z
Z
p
y
optical
center
x
principal
point
These are nonlinear.
principal
axis
image
plane
Using homogenous coordinate, we have a linear relation:
u   f
v   0
  
 w  0
xu/w
0
f
0
X 
0 0  
Y
0 0  
Z 
1 0  
1
y  v/w
World to Camera Coordinate
Transformation between the camera and world coordinates:
Xc  RXw  T
R,T
Xc 
Y 
 c   R
 Z c  0
 
1
X w
T  Yw 
1   Z w 
 
 1 
Image Coordinates to Pixel Coordinates
x  (ox  xim ) sx y  (o y  yim ) s y
s x , s y : pixel sizes
xim
yim
y
x (ox,oy)
 xim   1 / s x
y    0
 im  
 1   0
0
 1/ s y
0
ox   x 
oy   y
 
1   1 
Put All Together – World to Pixel
 x1   1 / s x
x    0
 2 
 x3   0
 1 / s x
  0
 0
 1 / s x
  0
 0
0
1/ sy
0
0
1/ sy
0
0
1/ sy
0
 f / s x
  0
 0
xim  x1 / x3
0
 f / sy
0
ox   u 
o y   v 
1   w
ox   f
o y   0
1   0
0
f
0
X 
0 0  c 
Y
0 0  c 
 Zc 

1 0  
1
X w
0 0
 
 R T   Yw 

f 0 0 
0 1   Z w 

0 1 0
 
 1 
X w
ox  1 0 0 0
 
 R T   Yw 



o y  0 1 0 0  
  Z   K R
0
1
 w
1  0 0 1 0 
 
 1 
ox   f
o y   0
1   0
0
yim  x2 / x3
Xw
Y 
T  w 
 Zw 
 1 
 
Camera Intrinsic Parameters
 f / s x
K  0

 0
0
 f / sy
0
ox 
oy 

1 
K is a 3x3 upper triangular matrix, called the
Camera Calibration Matrix.
There are five intrinsic parameters:
(a) The pixel sizes in x and y directions s x , s y
(b) The focal length f
(c) The principal point (ox,oy), which is the point
where the optic axis intersects the image plane.
Extrinsic Parameters
Xw
X w
 x1 
Y 
Y 
pim   x2   K R T  w   M  w 
 Zw 
 Zw 
 x3 
 
 
1
 
 1 
[R|T] defines the extrinsic parameters.
The 3x4 matrix M = K[R|T] is called the projection matrix.
Image Noise
Additive and random noise:
Iˆx, y   I x, y   nx, y 
I(x,y) : the true pixel values
n(x,y) : the (random) noise at pixel (x,y)

Gaussian Distribution
Single variable
1
 ( x   ) 2 / 2 2
p( x) 
e
 2
Gaussian Distribution
2

Bivariate with zero-means and variance

 x2  y2
G x, y  
exp 
2
2
2
2


1
 


Gaussian Noise
Is used to model additive random noise
 n2
2 2
•The probability of n(x,y) is e
•Each has zero mean
•The noise at each pixel is independent
Impulsive Noise
• Alters random pixels
• Makes their values very different from the true ones
Salt-and-Pepper Noise:
• Is used to model impulsive noise
I h, k 
xl

I sp h, k   
imin  yimax  imin  x  l
x, y are uniformly distributed random
variables
l , imin,imaxare constants
Image Filtering
Modifying the pixels in an image based on some
function of a local neighbourhood of the pixels
N(p)
p
10
30
10
20
11
20
11
9
1
f  p
5.7
Linear Filtering – convolution
The output is the linear combination of the neighbourhood pixels
I A (i, j )  I * A 
m/2
m/2
  A(h, k ) I (i  h, j  k )
h m / 2 k  m / 2
The coefficients come from a constant matrix A, called kernel.
This process, denoted by ‘*’, is called (discrete) convolution.
1
3
0
2
10
2
4
1
1
Image

1
0
1
0.1 -1
1
0
Kernel
-1
=
5
-1
Filter Output
Smoothing by Averaging
1
*
9
1
1
1
1
1
1
1
1
1

Convolution can be understood as weighted averaging.
Gaussian Filter

 x2  y2
G x, y  
exp 
2
2
2
2


1
 


Discrete Gaussian kernel:
G (h, k ) 

1
2
2
e
h2 k 2
2 2
where Gh, k  is an elementof an m  m array
Gaussian Filter

*
 1
Gaussian Kernel is Separable
IG  I  G 
m/2

m/2
  G (h, k ) I (i  h, j  k ) 
h m / 2 k  m / 2


m/2
m/2
 e
e

e
e

k2
2 2
I (i  h, j  k )
k  m / 2
h2 k 2
2
I (i  h, j  k ) 
2 2
h m / 2 k  m / 2
2
m/2  h
m/2
2
2
h m / 2
since

h2 k 2
2
e

h2
2
2

e
k2
2 2
Gaussian Kernel is Separable
Convolving rows and then columns with a 1-D Gaussian kernel.
I

1
38
9 18 9
1
1
=
Ir
1
Ir

1
38
9
18
9
=
result
1
2
m
m
The complexity increases linearly with instead of with .
Gaussian vs. Average
Gaussian Smoothing
Smoothing by Averaging
Nonlinear Filtering – median filter
Replace each pixel value I(i, j) with the median of the values
found in a local neighbourhood of (i, j).
Median Filter
Salt-and-pepper noise
After median filtering
Edges in Images
Definition of edges
•
•
Edges are significant local changes of intensity in an image.
Edges typically occur on the boundary between two different regions in an image.
Images as Functions
2-D
Red channel intensity
I  f ( x, y)
Finite Difference – 2D
Continuous function:
f x, y 
f x  h, y   f x, y 
 lim
h 0
x
h
f x, y 
f x, y  h   f x, y 
 lim
h 0
y
h
Discrete approximation:
Ix 
f x, y 
 f i 1, j  f i , j
x
f x, y 
Iy 
 f i , j 1  f i , j
y
Convolution kernels:
 1 1
 1
1
 
Image Derivatives
I x  I *1 1
Image I
 1
Iy  I * 
1
Edge Detection using Derivatives
1-D image f (x)
1st derivative f ' ( x)
f ' ( x) threshold
Pixels that pass the
threshold are
edge pixels
Image Gradient
gradient
 fx 
f   f 
 y 
magnitude
f 
   
f 2
x
direction
arctan(fy / fx )
f 2
y
Finite Difference for Gradient
Discrete approximation:
I x (i, j ) 
Convolution kernels:
f
 f i 1, j  f i , j
x
 1 1
 1
1
 
f
I y (i, j ) 
 f i , j 1  f i , j
y
magnitude
G(i, j )  I (i, j )  I (i, j )
2
x
aprox. magnitude
direction
2
y
G (i, j )  I x  I y
arctan(I y / I x )
Edge Detection Using the Gradient
Properties of the gradient:
• The magnitude of gradient
provides information about the
strength of the edge
• The direction of gradient is
always perpendicular to the
direction of the edge
Main idea:
• Compute derivatives in x and y directions
• Find gradient magnitude
• Threshold gradient magnitude
Edge Detection Algorithm
 1 1
*
Ix
edges
I x2  I y2
Threshold
Image I
*
 1
1
 
Iy
Edge Detection Example
Ix
I
Iy
Edge Detection Example
G(i, j )  I x2 (i, j )  I y2 (i, j )
I
G(i, j )  Threshold  
Finite differences responding to noise
Increasing noise ->
(this is zero mean additive gaussian noise)
Solution: smooth first
Where is the edge? Look for peaks in
Sobel Edge Detector
Approximate derivatives with
central difference
f
I x (i, j ) 
 f i 1, j  f i 1, j
x
Smoothing by adding 3 column
neighbouring differences and give
more weight to the middle one
Convolution kernel for I y
Convolution kernel
1
0  1
1 0  1 
2 0  2


1 0  1 
2
1
1
0

0
0


 1  2  1
Sobel Operator Example
a1
a4
a7
a2
a5
a8
a3
a6
a9
a1
a4
a7
a2
a5
a8
a3
a6
a9
*
1 0  1 
2 0  2


1 0  1 
*
2
1
1
0

0
0


 1  2  1
The approximate gradient at a5
I x  (a1  a3 )  2(a4  a6 )  (a7  a9 )
I y  (a1  a7 )  2(a2  a8 )  (a3  a9 )
Sobel Edge Detector
*
1 0  1 
2 0  2


1 0  1 
Ix
I x2  I y2
edges
Threshold
Image I
*
2
1
1
0

0
0


 1  2  1
Iy
Edge Detection Summary
Input: an image I and a threshold .
1. Noise smoothing: I s  I  h
(e.g. h is a Gaussian kernel)
2. Compute two gradient images I x and I y by convolving I s
with gradient kernels (e.g. Sobel operator).
3. Estimate the gradient magnitude at each pixel
G(i, j )  I x2 (i, j )  I y2 (i, j )
4. Mark as edges all pixels (i, j ) such that G(i, j )  
Corner Feature
Corners are image locations that have large intensity changes
in more than one directions.
Shifting a window in any direction should give a large
change in intensity
Harris Detector: Basic Idea
“flat” region:
no change in
all directions
“edge”:
no change along
the edge direction
“corner”:
significant change
in all directions
C.Harris, M.Stephens. “A Combined Corner and Edge Detector”. 1988
Change of Intensity
The intensity change along some direction can be quantified
by sum-of-squared-difference (SSD).
D(u, v)   I (i  u, j  v)  I (i, j )
2
i, j
u 
v 
 
I (i, j )
I (i  u, j  v)
Change Approximation
If u and v are small, by Taylor theorem:
I (i  u, j  v)  I (i, j)  I xu  I y v
where
I
Ix 
x
I
and I y 
y
therefore
I (i  u, j  v)  I (i, j ) 
2
 I (i, j )  I x u  I y v  I (i, j ) 
2
 I x u  I y v 
2
 I x2u 2  2 I x I y uv  I y2 v 2
 I x2
 u v 
 I x I y
I x I y  u 
2  
I y   v 
Gradient Variation Matrix
  I x2
D(u, v)  u v 
 I x I y
I I
I
x y
2
y
 u 
 
  v 
This is a function of ellipse.
  I x2
C
 I x I y
I I
I
x y
2
y



Matrix C characterizes how intensity changes
in a certain direction.
Eigenvalue Analysis
  I x2
C
 I x I y
I I
I
x y
2
y

 0
T 1
Q
Q 


 0 2 
If either λ is close to 0, then this is not a corner, so look for
locations where both are large.
I 

C    Ix
I y 
x
I x 
C    I x I y  AT A
I y 
•C is symmetric
•C has two positive eigenvalues

Iy


(max)-1/2
(min)-1/2
Corner Detection Algorithm
Line Detection
The problem:
•How many lines?
•Find the lines.
Equations for Lines
The slope-intercept equation of line
y  ax  b
What happens when the line is vertical? The slope a goes
to infinity.
A better representation – the polar representation

  x cos   y sin 
Hough Transform: line-parameter mapping
A line in the plane maps to a point in the - space.


(,)
All lines passing through a point map to a sinusoidal curve in the
- (parameter) space.

  x cos   y sin 
Mapping of points on a line

Points on the same line define curves in the parameter space
that pass through a single point.
Main idea: transform edge points in x-y plane to curves in the
parameter space. Then find the points in the parameter space
that has many curves passing through.
Quantize Parameter Space

m
Detecting Lines by finding maxima / clustering in parameter space.
m

Examples
input image
Hough space
lines detected
Image credit: NASA Dryden Research Aircraft Photo Archive
Algorithm
1. Quantize the parameter space
int P[0, max][0, max]; // accumulators
2. For each edge point (x, y) {
For ( = 0;  <= max;  = +) {
  x cos   y sin  // round off to integer
(P[][])++;
}
}
3. Find the peaks in P[][].
Equations of Ellipse
x2 y 2
 2 1
2
r1 r2
ax  bxy  cy  dx  ey  f  0
2
2
Let
x  [ x2 , xy, y 2 , x, y,1]T
a  [a, b, c, d , e, f ]
T
T
x
a0
Then
Ellipse Fitting: Problem Statement
pi  [ xi , yi ]T
Given a set of N image points
find the parameter vector a 0such that the ellipse
f (p, a)  xT a  0
fits pi best in the least square sense:
N
min  D(p i , a)
a
2
i 1
Where D(pi , a) is the distance from pi to the ellipse.
Euclidean Distance Fit
pi
pˆ i
D(pi , a)  pˆ i  pi
pˆ i is the point on the ellipse that
is nearest to pi
f (pˆ i , a)  0
pˆ i  pi is normal to the ellipse at pˆ i
Compute Distance Function
Computing the distance function is a constrained optimization
problem:
min pˆ i  pi
pˆ i
2
subject to
f (pˆ i , a)  0
Using Lagrange multiplier, define:
L( x, y,  )  pˆ i  p i
2
 2f (pˆ i , a)
where pˆ i  [ x, y]T
L ( x, y ,  )
Then the problem becomes: min
ˆ
pi
Set
L L

0
x y
we have
pˆ i  pi  f (pˆ i , a)
Ellipse Fitting with Euclidean Distance
Given a set of N image points pi  [ xi , yi ]
find the parameter vector a 0such that
T
N
min 
a
i 1
f (p i , a)
2
f (p i , a)
2
This problem can be solved by using a numerical nonlinear
optimization system.

lect12_review

Transcript lect12_review

Directory