Transcript PPT

CROSS-INDEXING OF BINARY
SCALE INVARIANT FEATURE
TRANSFORM CODES FOR
LARGE-SCALE IMAGE SEARCH
Presented by Xinyu Chang
Introduction
Image matching is a fundamental aspect of many problems in
computer vision, including object or scene recognition, solving for
3D structure from multiple images, stereo correspondence, and
motion tracking.
In recent years, there has been growing interest in mapping visual
features into compact binary codes for applications on largescale image collections. Encoding high-dimensional data as
compact binary codes reduces the memory cost for storage.
Introduction
Goal
 Extracting distinctive invariant features
Correctly matched against a large database of features from many
images


Invariance to image scale and rotation
Robustness to
• Affine distortion
• Change in 3D viewpoint
• Addition of noise
• Change in illumination
Introduction
Content

Interest Point Detection







Scale-space extrema detection
Keypoint localization
Orientation assignment
Keypoint descriptor
Flexible Binarization
Cross Indexing
Result
Interest Point Detection
Interest Point Detection
Interest Point Detection
Interest Point Detection
Initial Outlier Rejection
Dog is most stable across scale
Interest Point Detection
Rotation invariance


To achieve rotation invariance
Compute central derivatives, gradient magnitude
and direction of L (smooth image) at the scale of
key point (x,y)
Rotation invariance
Rotation invariance
Rotation invariance
Key point descriptor
FLEXIBLE SIFT BINARIZATION
Given an image, the detected interest points are denoted
by { fi }n−1 i=0 , in which N represents the total number of
the detected interest points. Each feature fi includes a L2normalized descriptor di ∈ RD, for SIFT descriptor D is 128.
Our target is to transform local feature descriptor di to an
L-bit binary code string B = {b0, b1, . . . , bL−1}
FLEXIBLE SIFT BINARIZATION
D
where C represents the 3-D
comparison array with size
D × D × 2. And C(i, j ) means
the comparison result
between the magnitudes in the i
-th and the j -th dimension
of descriptor d. α is a scalar
threshold whose impact will
be studied in the experiment
section.
FLEXIBLE SIFT BINARIZATION
And concatenate them into a comparison string S with
β = 2D(D − 1) bits in total, as shown by the second step in
Fig. 2. To simplify the notations, in the following, S is denoted
as S = {s0, s1, s2, . . . , sβ−1}. To obtain an L-bit binary code
B = {b0, b1, . . . , bL−1}, next we encode the comparison string
S into L bits.
FLEXIBLE SIFT BINARIZATION
CROSS-INDEXING STRATEGY
Code Word
the first 32 bits of the
binary code is code word.
The visual words are generated by
clustering the randomly selected SIFT descriptor. Each featur
is assigned to a visual word by nearest neighbor approach
approximate nearest neighbor approach.
CROSS-INDEXING STRATEGY
In the BoVW model, an image is represented by a visual
word histogram with tf -idf weighting strategy. The
similarity between two images are measured by the L1 or L2
distance of their visual word vectors. In the binary code based
retrieval system, the features’ binary codes are used to find the
true matches and we use the number of matches to measure
the similarity between two images, denoted by Scorei. And
this strategy can be formulated by
in which i represents the i -th database image. B(d) and B(q)
denote the binary SIFT code of the database feature d and
the query feature q, respectively. T is a pre-defined threshold
value. The impact of T will be studied in our experimental
part. H(・, ・) denotes the Hamming distance between two
binary SIFT codes. If two images have the same score value,
we favor the image with fewer features.
CROSS-INDEXING STRATEGY
CROSS-INDEXING STRATEGY
Result
Result
Thank you