INFORMATION REPRESENTATION

Download Report

Transcript INFORMATION REPRESENTATION

INFORMATION
REPRESENTATION
• There is no known general method how to
represent information about objects to get
similar level of performance as in
biological systems.
I think that there are two types of
information about objects:
- statistical -> distribution of features
- structural -> location of features
Both types are evaluated and combined
in some yet unknown way
Let’s take a look into a very fresh example:
FujiFilm's Latest Camera Aims at Dogs, Cats
Mar 12, 2010
FujiFilm's Finepix Z700 features a face-detection function that can
recognize cat and dog faces, and it can snap a picture automatically
when they look towards the camera lens.
When it finds a face, a green box is drawn around it on screen and the camera
automatically focuses. In the auto-shooting mode it waits until the animal
turns to the camera before taking a picture. It worked well with the stuffed
animals but it turns out real dogs and cats can be a little bit trickier. FujiFilm
has a list of dog and cat breeds that are easier for its technology to identify.
FujiFilm says the technology can also get confused if the animal has a dark
coat, if it has large patches around its eyes, a wrinkly nose or hair over their
eyes.
Dog Types Recommended
for Detection
Cat Types Recommended
for Detection
Detection of dogs with hair covering the eyes, nose or entire face can be
difficult.
Detection of cats with hair covering the facial contour can be difficult
Detection of dogs/cats that have large patches around the eyes or nose
(especially black patches) can be difficult.
Dection of cats with thin faces can be difficult
Detection of blackish dark colored dogs/cats can be difficult.
The question is how such detection
algorithms are made. We do not know this
(company secret) but we can think that the
algorithms must work by identifying locations
of basic features: eyes, ears, nose, coat color.
But animals have also coat which is different in
details but statistically same for same species
Here we can see that this dog
has specific eye,nose,mouth
locations but fur is statistical
• At our university we investigate problems
in statistical and structural information about
objects:
How to produce such information?
How useful is such information?
What is the performance of a system using
statistical information only?
What is our approach?
We represent features by quantized block
DCT transforms or by vectors build from
transform coefficients in neighbouring
blocks
Then we form histogram of blocks
Our approach
:
We do not know how to describe locations of blocks so....
Let’s think first about GLOBAL content description in which locations are not
considered!
That is look first into the problem in which only
block STATISTICS is considered
Impact of Quantization
Distribution of DCT coefficients for typical 8x8 DCT block
We can see that higher frequency coefficients are small.
If we use strong quantization they will be quantized to zero.
Under strong quantization only first 4x4 block of
coefficients will be nonzero. This is equivalent to
4x4 DCT transform. There is another effect too:
The greater the quantization the smaller the number of DIFFERENT blocks. In fact,
with no quantization, almost every block is different.
Quantization is rounding the coefficients to limited
number of values.
Coefficients of the 4x4 blocks
DC AC ..... ...
AC ....
..... .....
DC – zero frequency,
average light level in
the block
AC – correspond to
different frequencies
Quantization by QP
[DC]=round[DC/QP]
[AC]=round[AC/QP]
Higher QP -> more zeros in the block
Here is an illustration for a picture
QP is quantization parameter, we see that as it
is increasing the number of DCT patterns is
reduced stronlgy
Now we use the following idea:
Let’s see how the histogram of the quantized DCT blocks looks!
For example, let’s find which blocks appear most often in a picture and create
histogram of e.g. first 40 patterns
The shape of this histogram obviously depends on
the quantization. If the quantization is low, the
histogram will tend to be flat. If the quantization
is high it will tend to have a peak.
Let us see example of histograms for two pictures
Histograms of two face images
The database retrieval problem based on block histograms
Assume we have database D of pictures 1,2,..i,,j..m
We take a picture and want to check if it is in the
database or if there are similar pictures there.
Example: database of passport photographs.
In our approach we will use the similarity measure
between pictures based on their quantized histograms
Histograms are treated as vectors and similarity
is based on the following formula:
m
Bi,j=
 H i (k )  H j (k )
k 1
i,j єD
This measure is city-block measure (differences between absolute values of
coefficients) and it achieves minimum value = 0. Then two histogram vectors should
be identical. The closer the value to zero the more similar pictures should be.
Remember that blocks are quantized so noise and nonrelevant features are removed.
The question is what is the performance of such
scheme but before we can check this, we need to
look into the light normalization problem.
Light normalization problem
The values of DCT transform coefficients depend on
the light level. If the light level is higher the values are
higher. If we use the same quantization for two identical
pictures with different light levels the quantized blocks will
be different.
Light level can be normalized. First, let’s calculate
average light level for a picture. For this we use values of
DC coefficients in blocks
1
DCmean ( j ) 
N
N
 DC ( j )
i
i 1
Here we get average light level for a picture
Average light level DCall in a database is calculated in the
same way based on values of DCmean for each picture. Next,
the values of light level for each picture are rescaled by the
factor of
DCall
R
DCmean ( j )
DCTi, j  DCTi, j  R,
1  i  N,
1 j  M
Rescaling makes that the values of coefficients in the
quantized blocks will be similar:
DCTi , j 
DCTi , j
QP
,
1  i  N,
1 j  M
The DC coefficients problem
At high quantization levels very many blocks will have
only DC coefficients. Information about these blocks
will be only DC that is what ist the average light level
in the block.
But of interest is how the average light level is changing
between the blocks. We want to use this information.
What we make is that we will account for the information
in the differences between DC values in neighbouring
blocks.
DC differences between blocks
In a) we see fragment of a picture in which DC values
of the blocks are shown. For each block we have 8
neighbours like shown in b). We calculate 9 differences
between the neighbours (8 for directions and 1 for the
average from all directions) as shown in c). Now we order
the differences and form a vector from first k coefficients
as shown in d) for k=4
Combined histogram
A combined histogram for AC blocks and DC vectors
is now formed
H =[ HAC , α xHDC ]
where α is a numerical parameter which will be optimized
later.
Combined histogram means that we have two vectors
for minimizing and they are summed with parameter α
m
Bi,j=
 H i (k )  H j (k )
k 1
i,j єD
Optimization of database retrieval
The question is: How good can be the database retrieval
based on combined histogram?
This means e.g. how many errors it will be made.
But we can also ask another question: What is the best
achievable performance of this approach?
Remember that we use only statistical information but
we have several parameters which can be selected:
- quantization level
- size of histograms
- parameter α for combining histograms
- size of DC difference vectors
Optimization procedure
We can check this problem taking some databases and
optimizing the parameters for best retrieval. This will
show us what is the maximum performance. We did this
for face databases using the following scheme:
Evaluation of results
Given certain classification threshold, an input face image of person A may be falsely
classified to person B. If the target person is person A.
The ratio of how many images of person A have been classified into other persons is
called False Rejection Rate, FRR.
The ratio of how many images of other persons have been classified into person A is
called False Acceptance Rate, FAR.
Equal Error Rate
The ratio of how many images of other persons have been classified into person A is
called False Acceptance Rate, FAR. From the FAR and FRR, an Equal Error Rate (EER) is
achieved when both measures take equal values. The lower the EER is, the better is the
system's performance, as the total error rate which is the sum of the FAR and the FRR at
the point of the EER decreases.
Typical performance of EER histogram for two face databases
Database selection
There are two cases:
1. Database in which there is only one (standard)
picture of each person
2. Database in which there are multiple pictures of each
person (and they might very different)
In case 2. the same person should be retrieved for any
of its pictures which can be difficult.
Research database
The FERET database contains overall more than 10,000 images from more than 1000
individuals taken in largely varying circumstances. The FERET database images are
divided into several sets which are formed to match its methodology of evaluation. Here
we made a test based on the sets fa and fb. In both of them, each face has one picture
with picture in fb taken seconds after the corresponding picture in fa. The fa set which
has size of 994 images and serves as the database, the fb set which has sizes of 992
images, is used as key images for retrieval from the fa.
Evaluation of results
FERET is considered difficult database used in evaluation of professional applications:
EER
AC-Patterns
DirectionVectors
Combined
Histogram
4.6371%
7.06%
3.43%
The best EER result is obtained when: QP_AC = 12, number of AC patterns
= 400, QP_DC=12, number of Direction-Vector patterns = 400 and α=0.5,
γ=4.
FERET methodology of evaluation
For FERET there is another methodology based on calculation of how many correct
retrievals will be obtained among n trials, n=1,2,…,3.
FERET specific evaluation method
FERET evaluation is called cumulative match score.
Results are seen for histogram (red) and is overlaid
with other known good methods. Rank means how many retrievals are made, one
retrieval is most demanding.
Features based on Binary Feature Vectors
For each non-border 4x4 image block, there are eight blocks surrounding it.
Such a 3x3 block matrix is utilized here to generate a Binary Feature Vector (BFV).
Taking the DC coefficients as an example: the nine DC coefficients within this area
form a 3x3 DC coefficient matrix. By measuring and thresholding the magnitude of
differences between the non-center DC’s and the central DC coefficient, a binary
vector length 8 is formed.
Two different cases are considered here:
Case1:
0 – current coefficient ≤ threshold
1 – current coefficient > threshold
Case2:
0 – current coefficient < threshold
1 – current coefficient ≥ threshold
Example
• DC-BFV Histogram (based on DC coeff.)
• AC-BFV Histogram (based on AC coeff.)
Example of DC-BFV histogram
Performance results for the Feret database
Result is quite good if we take into account that the
method uses statistical information only
How about structural information?
• Until now we compared pictures based on feature
histograms treated as vectors. No information about
location of features was taken into account. Thus,
features can be located anywhere in the pictures and
results would be the same. Results presented are valid
for faces since
we know that pictures are faces and not some random
feature pictures
BUT the question is how we should deal with structural
information?