슬라이드 1

Transcript 슬라이드 1

Computer Vision
Fitting & SIFT Descriptor
April / 27 / 2015
Hyunki Hong
School of Integrative Engineering,
Chung-Ang Univ.
Contents
•
Fitting
•
SIFT Descriptor
Fitting
• Features: edges and corners
• How to form a higher-level, more
compact representation of the
features in the image by grouping
multiple features according to a
simple model
Fitting
• Choose a parametric model to represent a set of features
simple model: lines
simple model: circles
complicated model: car
Fitting: Issues
Case study: Line detection
• Noise in the measured feature locations
• Extraneous data: clutter (outliers), multiple lines
• Missing data: occlusions
Fitting: Issues
• If we know which points belong to the line, how do we find
the “optimal” line parameters?
– Least squares
• What if there are outliers?
– Robust fitting, RANSAC(RANdom SAmple Consensus)
• What if there are many lines?
– Voting methods: RANSAC, Hough transform
• What if we’re not even sure it’s a line?
– Model selection
Least squares line fitting
• Data: (x1, y1), …, (xn, yn)
• Line equation: yi = m xi + b
• Find (m, b) to minimize
E 

n
i 1
( yi  m xi  b )

E    y i  xi
i 1 

n
y=mx+b
(xi, yi)
2
 y 1   x1
m  
  
1   
  


 
b 
 y n   x n
2
2
1
 m 
  
 b
 
1
 Y  XB
2
||x||2 = x.x = xTx
 (Y  XB ) (Y  XB )  Y Y  2 ( XB ) Y  ( XB ) ( XB )
T
dE
 2 X XB  2 X Y  0
T
T
dB
X XB  X Y
T
T
T
T
→ 9페이지
T
 (c ' x )
x
c
 (c ' x )
x 
 c
Normal equations: least squares solution to XB = Y
예제(선형대수학)
• Data: (x1, y1), …, (xn, yn)
• Line equation: yi = m xi + b, y1 = m x1 + b, …, yn = m xn + b
 y 1   x1
  
  
  
 y n   x n
1
 m 
  
 b
 
1
M y  M Mv
T
T

y  M v,
측정오차 있으면, 측정된 점은 직선 위에 있지 않음.

1
v  (M M ) M y
T
T
: 최소제곱직선(least squares line of best fit)
또는 회귀직선(regression line)
위 식으로 얻어진 y = m x + b
• (0, 1), (1, 3), (2, 4), (3, 4)의 최소제곱직선?
0

1
M  
2

3
1

1
,
1

1
1 
 
3
y   
4
 
4
0
T
M M  
1
1
2
1
1
0

3 1


1 2

3
1

1
14


1  6

1
 2
m

 
T
-1
T
v     ( M M ) M y   10
3
b 

 10
6
,
4
 3
10   0
7   1

10 
 2

T
-1
( M M )   10
3

 10
1
2
1
1
 3
10 
7 

10 
1 
 
3 3
 1 
 

 
1   4  1 . 5 
 
4
참조: 행렬 기초 이론
• 두 개의 n×1 벡터 c, x가 곱의 형식으로
주어졌을 때,
• c′ x의 x에 대한 partial derivative:
• 같은 방법으로
 (c ' x )
x 
 c
1
2
U
x
x
 a 11
y 
 a 12
 a 11 x  a 12 y ,
,
n
cx
i
i
i 1
  (c ' x ) 


 c1 

x
1


 
 (c ' x )


  c
x
  (c ' x )   
c
 x
  n
n


• 이차형식 y′ A y를 y에 대한 partial
derivative는 A가 대칭이면,
만약 A가 대칭행렬이 아니면,
U 
c ' x  ( c1 ,
 x1 
 
c n )   
x 
 n
 ( y ' Ay )
y
 ( y ' Ay )
y
 2 Ay
  A  A y
a 12   x  1
1 T
2
2
    ( a 11 x  2 a 12 xy  a 22 y )  U  X AX
a 22   y  2
2
 U 
  x   a 11
U
 a 12 x  a 22 y  

U  a
y

  12
  y 
A가 대칭행렬이어야 성립
a 12   x 
U
 AX
  
a 22   y 
xi
(여기서
xi  x , y )
Problem with “vertical” least squares
• Not rotation-invariant
• Fails completely for vertical lines
Total least squares
• Distance between point (xi, yi)
and line ax+by=d (a2 + b2 =
1): |axi + byi – d|
ax+by=d
• Find (a, b, d) to minimize the
sum of squared perpendicular
distances E  n ( a x  b y  d ) 2
 i 1 i
i
E
d


n
i 1
 2 ( a xi  b yi  d )  0
d 
 x1  x
n

2
E   ( a ( x i  x )  b ( y i  y )) 

i 1

 x n  x
dE
dN
 2 (U U ) N  0
T
a

n
Unit normal:
N=(a, b)
( a x (x
 b y, y
i di))
E 

n
xi 
i 1
n
i 1
2
i
i
b

n
y1  y 
 a 

 b 
 
y n  y 
n
i 1
yi  a x  b y
2
 (UN ) (UN )
T
Total least squares
 x1  x
n

2
E   ( a ( x i  x )  b ( y i  y )) 

i 1

 x n  x
dE
2
y1  y 
 a 

 b 
 
y n  y 
 (UN ) (UN )
T
 2 (U U ) N  0
T
dN
Solution to (UTU)N = 0, subject to ||N||2 = 1: eigenvector of UTU
associated with the smallest eigenvalue (least squares solution
to homogeneous linear system UN = 0)
 x1  x

U 


 x n  x
y1  y 



y n  y 
n

2
(
x

x
)
 i

T
U U   n i 1

( x i  x )( y i  y )
 
i 1
n

i 1

( x i  x )( y i  y ) 

n
2

(
y

y
)
 i

i 1
second moment matrix
참조: Singular Value Decomposition
(SVD)
• A rectangular matrix A can be broken down into the product of
three matrices: an orthogonal matrix U, a diagonal matrix S,
and the transpose of an orthogonal matrix V.
Amn = UmmSmnVnnT
, where UTU = I, VTV = I ; the coloumns of U are orthonormal
eigenvectors of AAT, the columns of V are orthonormal
eigenvectors of ATA. S is a diagonal matrix containing the
square roots of eigenvalues from U or V in descending order.
• Example A   3 1 1

 1
3

1
- To find U, we have to start with AAT.
3

T
A  1

 1
 1

3 ,

1 
AA
T
 3
 
 1
1
3
3
1 
 1
1 
 1
 1
 11
3  

1
1 
1

11 
참조: Singular Value Decomposition
(SVD)
- To find the eigenvalues & corresponding eigenvectors of AAT.
11    x1  x 2
11 x1  x 2   x1
 x1 
11 1   x1 

     
x1  11 x 2   x 2
x1  11    x 2
 1 11   x 2 
 x2 
11   
1
1
11   
0
11   11     1  1  0
   10   12   0
0
- For λ= 10, 11  10  x1  x 2  0
For λ= 12, 11  12  x1  x 2  0
0

x1   x 2

x1  x 2


[1,  1]
[1, 1]
- These eigenvectors become column vectors in a matrix
ordered by the size of the corresponding eigenvalue.
1

1
1 

 1
- to convert this matrix into an orthogonal matrix which we do by
applying the Gram-Schmidt orthonormalization process to the
column vectors
참조: Singular Value Decomposition (SVD)
1) Begin by normalizing v1.
 1
w 2  v 2  u 1  v 2  u 1  1,  1  
,
 2
2) normalize
u2
- The calculation of V:
w2
w2

v1
 1

,
2
2
 2
1 1
[1, 1]
1 
 1



1
,

1

,


2
 2
 1
 
,
 2
3

T
A A 1

 1
v1
u1 
1 

2
 1
 3
3 
 1

1 
1
3
10
1 
 0
1 
 2
0
1 

2
1 
  1,  1  [ 0 , 0 ]  1,  1
2
1 
 1


2
2
U  

1
1


 2
2 
2

4

2 
10
4
1) find the eigenvalues of ATA by
10

0

 2
0
10
4
2   x1 
 x1 
 
 
4 x2   x2
 
 
 x 3 
2   x 3 
10   
0
2
0
10   
4
2
4
2   
0
λ = 0, 10, 12
참조: Singular Value Decomposition
(SVD)
2) λ = 12일 때, v1 = [1, 2, 1]
1

2

 1
λ = 10일 때, v2 = [2, -1, 0]
λ = 0일 때, v3 = [1, 2, -5]
3) an orthonormal matrix
w 2  v 2  u 1  v 2  u 1  2 ,
u1 
 1,
0 ,
v1
v1
u2 
 2
w 3  v 3  u1  v 3  u1  u 2  v 3  u 2  
,
3

4) Amn = UmmSmnVnn
 1

  2
1

 2
1 

2  12

1  0

2 
0
10



0 

0 



 1
 
,
 6
w2
4
3
T
1
2
6
2
6
1
5
1
5
2
30
30
1 

6

 3
0  
  1
5 

30 
1
3
,
1

1



V  




1
0
2
1 

6
,
6
1
 2
 
,
 5
w2
1 

2

 5 
2
,
5

0

w3
10 
 1
,
u


,
3

3 
w3
 30
1
2
6
2
5
1
6
1
5
6
0
1 

30

2 
,
30 
5 

30 
V
T



 




2
,
30
1
2
6
2
6
1
5
1
5
2
30
30
1 

6

0 

5 

30 
5 

30 
Total least squares
 x1  x

U 


 x n  x
y1  y 



y n  y 
n

2
(
x

x
)
 i

T
U U   n i 1

( x i  x )( y i  y )
 
i 1
n

i 1

( x i  x )( y i  y ) 

n
2

(
y

y
)
 i

i 1
second moment matrix
N = (a, b)
( xi  x , y i  y )
(x, y)
Least squares for general curves
• We would like to minimize the sum of squared geometric
distances between the data points and the curve
(xi, yi)
d((xi, yi), C)
curve C
Least squares for conics
• Equation of a general conic:
C(a, x) = a · x = ax2 + bxy + cy2 + dx + ey + f = 0,
a = [a, b, c, d, e, f], x = [x2, xy, y2, x, y, 1]
• Minimizing the geometric distance is non-linear even for a
conic
• Algebraic distance: C(a, x)
• Algebraic distance minimization by linear least squares:
 x12
 2
 x2
 
 2
 x n
x1 y 1
y1
2
x1
y1
x2 y2
y2
2
x2
y2

xn yn
2
yn
xn
yn
a
 
1 b
 

1  c 
 0

 d 

1  e 
 
 f 
Least squares for conics
• Least squares system: Da = 0
• Need constraint on a to prevent trivial solution
• Discriminant: b2 – 4ac
– Negative: ellipse
– Zero: parabola
– Positive: hyperbola
• Minimizing squared algebraic distance subject to constraints
leads to a generalized eigenvalue problem
– Many variations possible
Matching for Object Recognition
• Motivation
: want to recognize a known objects from unknown viewpoints
Find them in an image
Database of models
Local Feature based Approaches
• Represent appearance of object by little intensity/feature
patches.
• Try to match patches from object to image
• Geometrically consistent matches tell you the location and
pose of the object
• Example: Represent object by set of 11×11 intensity
templates extracted around Harris corners.
Harris corners
Object “model”
Match patches to
new image using NCC.
Template Matching
• a technique for finding small parts of an image which match a
template image: feature-based and template-based approach
검사 대상인 영상
미리 구축되어진
모델(template)
-존재 유무 ?
-위치 ?
Template Matching
• Measurement
: 대응되는 픽셀의 밝기값을 서로 빼서 차이값을 더함
- MAD(Mean Absolute Difference)
- MSE(Mean Square Error)
- 두 값 모두 0일 때 최적의 정합.
cf. SAD = MAD × MN
: Sum of Absolute Difference
SAD
MSE = (T - I)2 = T2 + I2 – 2TI
MSE의 최소화 → TI 곱의 최대화
TI 곱을 교차 상관(cross correlation)이라고 함
• 연산 횟수의 계산
- 템플레이트 크기: M × N
- 검사할 영상의 크기: R × C
- 겹침이 발생하는 횟수: (R-M) ×(C-N)
예) 템플레이트 크기 : 100×100 픽셀2
검사할 영상 크기: 640×480 픽셀2
=> 겹침횟수: 540×380 번
• 계산복잡도의 고려
- MAD에서, 각각의 겹침 상태에서 뺄셈을 M × N번 계산
- 총 뺄셈횟수: (R-M) × (C-N) × (M×N)번
- 계산량 과다 문제 해결해야 함.
POINT CWinTestDoc::m_MatchMAD(int height, int width, unsigned char *m_TempImg,int tHeight, int tWidth)
{
register int i,j,m,n,index;
// MAD
float SumD, MinCorr=255.0f*tHeight*tWidth;
POINT OptPos;
for(m=0; m<height-tHeight; m++) // (1) 제 1 루프
{
for(n=0; n<width-tWidth; n++) // (2) 제 2 루프
{
SumD = 0.0f;
for(i=0;i<tHeight;i++) // (3) 제 3 루프
{
index = i*tWidth;
for(j=0; j<tWidth ;j++) // (4) 제 4 루프
SumD +=(float)fabs(m_InImg[m+i][n+j]-m_TempImg[index+j]);
}
if (SumD<MinCorr)
{
MinCorr=SumD;
OptPos.y=m;
OptPos.x=n;
}
}
}
MinCorr /= (float)(tHeight*tWidth); // 템플레이트 영역크기로 나누어 줌
return OptPos;
}
구현
POINT CWinTestDoc::m_MatchMSE(int height, int width, unsigned char *m_TempImg, int tHeight, int tWidth)
{
register int i,j,m,n,index;
// MSE
int diff;
float SumD, MinCorr=255.0f*255.0f*tHeight*tWidth;
POINT OptPos;
for(m=0; m<height-tHeight; m++)
{
for(n=0; n<width-tWidth; n++)
{
SumD = 0.0f;
for(i=0;i<tHeight;i++)
{
index = i*tWidth;
for(j=0; j<tWidth ;j++)
{
diff = m_InImg[m+i][n+j]- m_TempImg[index+j];
SumD += (float)(diff*diff);
}
}
if(SumD<MinCorr)
{
MinCorr=SumD;
OptPos.y=m;
OptPos.x=n;
}
}
}
MinCorr /= (float)(tHeight*tWidth);
return OptPos;
}
구현
Problem with Simple Example
• Using NCC to match intensity patches puts restrictions on the
amount of overall rotation and scaling allowed between the
model and the image appearance.
More General: SIFT
• Image content is transformed into local feature coordiantes
that are invariant to translation, rotation, scale, and other
imaging parameters.
David G. Lowe, "Distinctive image features from scale-invariant keypoints," International Journal of Computer Vision, 60, 2
(2004), pp. 91-110.
SIFT Keys: General Idea
• Reliably extract same image points regardless of new
magnification and rotation of the image.
• Normalize image patches, extract feature vector
• Match feature vectors using correlation
• Want to detect/match same features regardless of
- Translation : easy, almost every feature extraction and correlation
matching algorithm is translation invariant
- Rotation : harder. Guess a canonical orientation for each patch
from local gradients
- Scaling : hardest of all. Create a multi-scale representation of the
image and appeal to scale space theory to determine correct scale
at each point.
SIFT
an algorithm to detect and describe local features in images
stored in a database
• For any object in an image, interesting points on the object can
be extracted to provide a "feature description" of the object.
• This description, extracted from a training image, can then be
used to identify the object when attempting to locate the object
in a test image containing many other objects.
• To perform reliable recognition, it is important that the features
extracted from the training image be detectable even under
changes in image scale, noise and illumination.
Algorithm Overview
• Scale-space extrema detection
: use difference-of-Gaussian function
• Keypoint localization
: Sub-pixel location and scale fit to a model
• Orientation assignment for each keypoint
• Keypoint descriptor
: Created from local image gradients
Scale Space
L ( x , y ,  )  G ( x , y ,  )  I ( x , y ),
G ( x, y, ) 
1
 ( x  y ) / 2
2
2
• Definition:
2 
• The image is convolved with Gaussian filters at different
scales, and then the difference of successive Gaussian-blurred
images are taken.
• Keypoints are then taken as maxima/minima of the Difference
of Gaussians (DoG) that occur at multiple scales
: D(x, y, σ) = L(x, y, kiσ) - L(x, y, kjσ)
L(x, y, kσ): the convolution of
the original image I(x, y) with
the G(x, y, kσ) at scale kσ.
2
e
2
Scale Space Images &
Difference-of-Gaussian Images
• Sample point is selected only if it
is a minimum or a maximum of
these points.
Localization
• 3D quadratic function is fit to the local sample points.
• Start with Taylor expansion with sample point as the origin
– where
D ( )  D 
D
T


1
 D
2

2
T

2

• Take the derivative with respect to X, and set it to 0, giving
  ( x, y , )
•
D
T
 D ˆ
0

X
2
X
X
2
is the location of the keypoint.
• This is a 3×3 linear system:
 2D
ˆ  

  2





1
D

Localization
 D
2


 2
 D
 y
 2
 D

   x
2
 2D
ˆ  

  2





1
 2D

  2

D


D
 ˆ  



 D
2
y
 D
2
y
2
 D
2
 yx
• Derivatives approximated by finite differences,
– example:
D
D k  1  D k 1
i, j


2
 D
2
 D

i, j
i, j
1
i 1, j
2
y
D k 1  2 D k  D k  1
i, j
2

i, j

i 1, j
i 1 , j
i 1 , j
( D k  1  D k 1 )  ( D k  1  D k 1 )
• Edge response elimination
4
 D
 D 



x 




2
 D 
 D  

y  



 yx
 y 

x


2
 D 
 D  



2
 x 
 x 
2
Orientation assignment
• Descriptor computed relative to keypoint’s orientation
achieves rotation invariance.
• Precomputed along with mag. for all levels (useful in
descriptor computation)
m ( x, y ) 
( L ( x  1, y )  L ( x  1, y ))  ( L ( x , y  1)  L ( x , y  1))
 ( x , y )  tan
2
1
2
(( L ( x , y  1)  L ( x , y  1)) /( L ( x  1, y )  L ( x  1, y )))
• Multiple orientations assigned to keypoints from an orientation
histogram
– significantly improve stability of matching
Descriptor
• Descriptor has 3 dimensions (x, y, θ)
• Orientation histogram of gradient magnitudes
• Position and orientation of each gradient sample rotated relative
to keypoint orientation
Descriptor
• Weight magnitude of each sample point by Gaussian weighting
function
• Distribute each sample to adjacent bins by trilinear interpolation
(avoids boundary effects)
Descriptor
• Best results achieved with 4×4×8 = 128 descriptor size
• Normalize to unit length
– Reduces effect of illumination change
Object Recognition
• Create a database of keypoints from
training images
• Match keypoints to a database
– Nearest neighbor search

슬라이드 1

Transcript 슬라이드 1

Directory