Structural information

Download Report

Transcript Structural information

Structural information
Structural information
• Structural information deals with geometry of
objects
We are able to deal with very limited amounts
of structural information
How to interpret structural information? We were showing
before that this is difficult problem
We will introduce this by SHAPE CONTEXT method
We take now a very difficult case
Handwriting is
very difficult:
We recognize
numbers easily
even if they are
very distorted.
What are the
algorithms
achieving this?
We think that first the contour of object is detected
as illustrated below
Next we think that location of points on the contour
decide about the geometry of the object
• We need thus to measure the location of EACH
contour point RELATIVE to all other points. In other
words we need vectors from a point to all other
points.
Z
For example for point Z we need all
6 red vectors. Having all vectors for all points
describes the object but is very complicated
So now we reduce the description by using APPROXIMATE
polar coordinate net. The center of the net is located at
each point at we only count HOW MANY other points
are in each area of the net.
Shape histogram
• Shape histogram of a contour point ai is denoted
by Hi and it is a vector obtained from the polar
net by counting the number of points in each area
Hi = {hin=(#points in bin b), 0<k<M}
For a contour with M points we obtain a list
of m histograms.
Two contours are similar if the sum of
differences between the histograms is small.
Histogram differences
m
Hi - H j =
 H (k )  H
i
j
(k )
k 1
These are differences for two points i, j
Taking differences for all contour points will result
in the difference between contours.
Two contours which are ver ysimilar will have very
small difference
Example: Below we can see contours with point marked
examples of histograms for points
Example: Here we see handwritten numbers and
histograms of contour points marked in grey levels
Here we can see contours with points and the polar
net with areas marked in different colours
What counts is the number of points in
each area and this forms histogram
Other methods - examples
• There are hundreds of other methods for
object retrieval and recognition
It is impossible to lecture about all of them since
they are based on different principles.
To illustrate this we can look into an example of a
best method known currently. This the method of
eigenfaces which uses completely different
principle.
EIGENFACES – global method
1. Construction of Face Space
Suppose a face image consists of N pixels, so it can be
represented by a vector of dimension N. Let
the training set of face images. The average face of these M
images is given by
Then each face
differs from the average face
by
:
be
EIGENFACES
Now covariance matrix of the training images can be
constructed:
where
The basis vectors of the face space, i.e., the eigenfaces,
are then the orthogonal eigenvectors of the
covariance matrix .
The number of training images is usually less than
the number of pixels in an image, there will be only
M-1, instead of N, meaningful eigenvectors .
Eigenvalues, eigenvectors
x is eigenvector for matrix A,
is eigenvalue
If S is an nonsingular n x n matrix then matrix B has the same
eigenvalues
B = SAS-1
nxn matrix has n eigenvalues
EIGENFACES
Therefore, the eigenfaces are computed by first
finding the eigenvectors,
, of the M
by M matrix L:
The eigenvectors,
, of the matrix
then expressed by a linear combination of the
difference face images,
,
weighted by :
are
In practice, a smaller set of M'(M'<M) eigenfaces is sufficient for face identification.
Hence, only M' significant eigenvectors of L, corresponding to the largest M'
eigenvalues, are selected for the eigenface computation
Thus further data compression can be obtained. M' is determined
by a threshold,
, of the ratio of the eigenvalue summation:
In the training stage, the face of each known individual,
, is projected into
the face space and an M'-dimensional vector,
, is obtained:
where
is the number of face classes
A distance threshold, , that defines the maximum allowable
distance from a face class as well as from the face space, is set
up by computing half the largest distance between any two face
classes:
In the recognition stage, a new image,
space to obtain a vector, :
The distance of
, is projected into the face
to each face class is defined by
For the purpose of discriminating between face images and non-face like
images, the distance, , between the original image,
, and its
reconstructed image from the eigenface space,
, is also computed:
where
These distances are compared with the threshold given in equation
(8) and the input image is classified by the following rules:
•IF
THEN input image is not a face image;
•IF
AND
THEN input image contains an unknown face;
•IF
AND
THEN input image contains the face of individual .
Experimental results
The eigenface-based face recognition method was tested on the
ORL face database. 150 images of 15 individuals, were selected
for experiments.
Experimental results
In the training stage, three images of each individual were
used as the training samples, forming a training set totalling
45 images
The average face of the training set
Experimental results
The first 15 eigenfaces corresponding to the 15 largest eigenvalues.
Experimental results
Recognition rate
Recognition rate depends on training images – when single
view images are used for training recognition is much worse
Experimental results
Faces with calm expressions in the training stage and faces of the
same individual but with various expressions in the testing stage
Training images
Test images
lower images
are projections
in the face space
CONCLUSIONS
Eigenfaces method treat images globally, no local
information is used. Compression is done on global level.
The method requires lots of computations but results are good.
Explanation of good results:
images are represented as combinations of ”simple” images
and the system is trained on them.
• THERE ARE MANY OTHER METHODS FOR
OBJECT RECOGNITION AND
REPRESENTATION. THEY CAN BE
CLASSIFIED AS
- STRUCTURAL DESCRIPTIONS (WE
MENTIONED ALREADY CHAIN CODES)
- TRANSFORM METHODS
- TRAINING/LEARNING METHODS
BUT THERE ARE ALSO METHOD
BASED ON CLEVER TRICKS WHICH
WORK VERY WELL… NEXT
• A TRANSFORM METHOD
HERE WE TRY TO TRANSFORM THE
PICTURE (OR OBJECT INFORMATION)
TO SOME OTHER DOMAIN TO GET
INFORMATION IN MORE CONVENIENT
FORM.
• THE METHOD OF MOMENTS
MOMENTS of ORDER p,q ARE DEFINED AS
 
m pq 
 x
p
q
y f ( x, y )dxdy
  
p, q  0,1,2....
MOMENT OF ORDER 1 FOR PHYSICAL OBJECTS WILL BE
CENTER OF GRAVITY,
IT IS OF COURSE NOT DEPENDENT HOW THE OBJECT
IS LOCATED - IT IS THUS INVARIANT FOR LOCATION
• CENTRAL MOMENTS
 pq 
 

p
q
(
x

x
)
(
y

y
)
f ( x, y )dxdy

  
p, q  0,1,2....
where
m10
m01
x 
,y 
m00
m00
for digital im ages
 pq 
p
q
(
x

x
)
(
y

y
)
f ( x, y )

x
y
• HIGHER ORDER CENTRAL MOMENTS
20  m20  xm10
02  m02  ym01
11  m11  ym10
12  m12  2 ym11  xm02  2m10 y 2
21  m12  2 xm11  ym20  2m01 x
30  m30  3xm20  2m10 x
... AND SO ON...
2
2
• NEXT, NORMALIZED CENTRAL MOMENTS
ARE CREATED:
 pq
 pq  
00
AND INVARIANT MOMENTS:
1  20  02
2  (20 02 )  4
2
OTHER MOMENTS
3,4 ,5 ,....
2
11
CAN BE DEFINED TOO
• THESE MOMENTS ARE INVARIANT FOR
TRANSLATION, ROTATION, AND SCALE
CHANGE
THUS WHEN MOMENTS ARE
CALCULATED, THEY WILL NOT CHANGE
WHEN OBJECT ROTATES OR CHANGES
SIZE. THIS IS VERY DESIRABLE FEATURE.
HOWEVER, MOMENTS ARE SENSITIVE FOR
NOISE AND ILLUMINATION CHANGE
• EXAMPLE: ROTATED AND SCALED OBJECT
HERE MOMENTS CALCULATION IS
SHOWN, PLEASE NOTED THAT FOR
TRANSFORMED PICTURE THE
MOMENTS ARE CONSTANT
• PRACTICAL METHODS FOR DEALING
WITH VISUAL OBJECTS:
- THEY ARE BASED ON SOME TRICKS
WHICH RESULT THAT THEY WORK
VERY WELL FOR SPECIFIC PROBLEM
BUT THEY ARE NOT GENERAL
WE ILLUSTRATE THIS ON EXAMPLE OF
PRACTICAL FACE TRACKING SYSTEM
• WHAT IS FACE TRACKING?
THERE IS CAMERA IN FRONT OF PC
AND SOFTWARE WHICH ALLOWS TO MARK THE
FACE LOCATION AND POSITION OF USER SITTING
AT THE DISPLAY
HERE WE DESCRIBE A METHOD AND SYSTEM FOR
FACE TRACKING WHICH IS QUITE SIMPLE,
ROBUST AND RUNS IN REAL TIME ON PC!
THE METHOD IS BASED ON FACE COLOR
HISTOGRAM STATISTICS AND MOMENTS
HERE IS THE BLOCK
DIAGRAM OF FACE TRACKING
ALGORITHM.
FIRST THE
COLOR IMAGE IS CONVERTED
TO HUE, SATURATION,
INTENSITY.
NEXT SKIN COLOR
HISTOGRAM IS CALCULATED
FINALLY MOMENTS ARE
CALCULATED AD WINDOW
SIZE IS ADJUSTED
ITERATIVELY
• SKIN COLOR HISTOGRAM
COLOR = HUE IN THE HSI
REPRESENTATION
PEOPLE HAVE THE SAME
SKIN COLOR (HUE)
ONLY SATURATION IS DIFFERENT
SATURATION
LEVELS CHANGE
HERE IS THE DISTIRBUTION
OF PLACES CORRESPONDING
TO FACE ”COLOR”
COLOR IS GOOD FEATURE IF WE HAVE A COLOR CAMERA.
HAVING FACE COLOR DISTRIBUTION WE CAN TREAT
IT AS TWO-DIMENSIONAL FUNCTION I(x,y) AND CALCULATE:
FIRST WE SELECT WINDOW OF CERTAIN SIZE.
NEXT CALCULATE ZEROTH AND FIRST MOMENTS IN
THIS WINDOW
m00   I ( x, y )
x
y
m10   xI ( x, y ) m01   yI ( x, y )
x
y
x
y
NEXT NEW CENTER OF THE WINDOW
IS CALCULATED
m10
xc 
m00
m01
yc 
m00
AFTER ITERATING THIS CALCULATION
THE ALGORITHM WILL CONVERGE TO
SPECIFIC POSITION
HOW THE WINDOW SIZE IS SELECTED?
IT DEPENDS ON THE SIZE OF FACE.
THUS IT IS ADJUSTED ITERATIVELY
STARTING WITH SIZE 3
WE THEN SELECT WINDOW SIZE TO BE
2m0/max pixel value
BY THIS, THE WINDOW POSITION AND SIZE
IS CONTINUOUSLY ADAPTED UNTIL
IT WILL STABILIZE
THIS CAN THUS BE USED FOR FACE
TRACKING
• THIS PROCESS IS
ILLUSTRATED HERE ,
START IS FROM SMALL
WINDOW SIZE, THE SIZE IS
ADJUSTED AND CENTER OF
THE WINDOW IS MOVED
UNTIL IT STABILIZES
HERE THE FACE HAS MOVED,
IN THE NEXT PICTURE THE
WINDOW WILL ALSO MOVE
TO NEW POSITION
• THIS ALGORITHM IS SURPRISINGLY
ROBUST
NOISE DOES NOT HARM IT
AND AS WE CAN SEE
IT IS ROBUST AGAINST
DISTRACTORS:
ANOTHER FACE ON
THE LEFT
HAND ON THE RIGHT
• THE METHOD CAN BE ALSO USED FOR
EVALUATION OF HEAD ROLL, WIDTH
AND LENGTH
ROLL
• PARAMETERS FOR HEAD POSITION
CAN BE CALCULATED BASED ON THE
SYMMETRY OF LENGTH L AND WIDTH W
THIS SYSTEM CAN BE USED FOR FACE
TRACKING E.G. FOR INTERFACE TO
COMPUTER GAMES
ANOTHER EXAMPLE:
AMBULATORY VIDEO
COMPUTER WITH CAMERA
WEARABLE BY USER
THE GOAL IS TO BUILD COMPUTER
WHICH WILL KNOW WHERE THE USER IS
The user is wearing small camera attached e.g. to
head. The camera produces circular picture
which are not very good but good enough
HOW TO RECOGNIZE WHERE THE USER IS ?
(E.G. ROOM, STREET)
FIRST, SPLIT VIDEO INTO LIGHT
INTENSITY I AND CHROMINANCES IN
VERY APPROXIMATE WAY:
I=R+G+B Cr=R/I Cg=G/I
SECOND, SEGEMENT THE
PICTURE INTO REGIONS,
CALCULATE
PARAMETERS FOR EACH,
MEAN AND COVARIANCE
• FOR EACH ENVIRONMENT
THERE WILL BE DIFFERENT STATISTICAL
DISTRIBUTIONS OF SIGNALS , WE CAN USE
THEM TO FIND TO WHICH CLASS
RECORDED VIDEO BELONGS
FOR 2 HOURS OF RECORDING
RESULTS ARE VERY GOOD 
Label
Correlation
Coeff.
Office
0.9124
Lobby
0.7914
Bedro
om
0.8620
Cashie
r
0.8325
OVERALL CONCLUSION
• WE ARE LACKING GENERAL SOLUTION TO
OBJECT REPRESENTATION AND RECOGNITION
PROBLEMS WHICH WOULD BE AS EFFECTIVE THE
BIOLOGICAL SYSTEMS
• THERE ARE MANY APPROACHES FOR SOLUTION,
WE PRESENTED APPROACH BASED ON STATISTICS
OF QUANTIZED BLOCK TRANSFORM FEATURES
• THERE ARE APPROACHES BASED ON CLEVER
TRICKS WHICH WORK WELL FOR SPECIFIC
PROBLEMS