Transcript Document

Descriptions
After the segmentation of an image, its regions or edges are represented
and described in a manner appropriate for further processing.
"Shape" is an intrinsic characteristic of 3-D objects or projections
thereof. Many other properties, such as edges and surfaces, can be
derived from an image. Objects and the naming thereof are primarily
defined by shape (and by the function of the object), and not by
properties such as color, reflection, surface texture, etc.
We are conscious of shape by both outline, which are mainly 2-D data,
and by surfaces, which are mainly 3-D structures.
To be useful for further processing the shapes must somehow be
represented. This is a tricky but a very interesting problem that becomes
more complicated by several factors:
2007
Theo Schouten
1
problems
-Shapes are often complex. Color, motion and intensity can be quantified by a small
number of well-understood parameters. Shape can often only be explicitly represented
using hundreds of parameters. It is not clear which aspects or features of shape are
important for recognition and which can decrease the complexity.
-Introspection does not help. A large amount of the human brains seems to work on
shape recognition. However, this activity occurs primarily subconsciously. Why is
shape recognition (think of faces for example) so easy for a human and shape
description so difficult? We do not have a precise language for shapes (we speak of
egg-shaped or ellipse-shaped).
- There is little mathematical guidance. Math has traditionally not used "computational
geometry". For example, just recently a mathematical definition of a “solid object" has
been given which coincides with our intuition of set operations on solid objects.
- This field of expertise is young, only recently it is useful to represent complex
shapes in a manner that a computer can read, edit and graphically represent them.
There are no generally accepted representation schemas for all types of shapes; there
are several with each their own advantages and disadvantages for certain applications.
Algorithms for the manipulation of shapes (for example, how to carry a couch up the
stairs) are extremely complex, and still in a rudimentary stage.
2007
Theo Schouten
2
Chain codes, signatures
2007
Theo Schouten
3
Polygonal approximations
An edge can each be approximated to any desired precision by a
polyline. Finding a polyline approximation for a certain edge is a
segmentation problem: finding the corner points or breakpoints
that yield a good or a best polyline approximation (according to a
certain criterion). Just as with regional segmentation, methods can
also be characterized by the concepts "merging" and "splitting".
This tolerance band method usually
does not find the most economical set
of segments. This is a general problem
of these "one-pass" algorithms, a new
break point is only taken when
something went wrong, but it is often
desired to take a new break point at an
earlier stage. Afterwards one can try to
find a better solution by shifting
certain break points.
2007
Split method
Theo Schouten
4
Spatial Occupation-Matrix
The y-axis representation is a run-length coding in the y-direction
of the spatial occupation-matrix. There are several possibilities to
do this:
{ (2,2,3), (4,4,4,6,6), (5,4,6), (6,6,6)} (starty, startx, stopx)
{ (8), (1,2,5), (8), (3,1,1,1,2), (3,3,2), (5,1,2), (8), (8)}: for each y
the length of 0,1,0,... rows
Union and intersection can be implemented as sorting and joining
operations on the RLE rows, with a timescale initially
proportional to the number of y rows. This representation is more
compact than the occupation-matrix, except when there are long
structures in the y-direction.
Quad trees are another manner of coding the spatial
occupation-matrix. The image is recursively divided
into four parts until every region is composed solely out
of a 1 or 0. They can easily be constructed from an
intermediate pyramid structure and stored as a linear
structure.
2007
Theo Schouten
5
Skeleton of a region
The medial-axis of an area A is a set of pairs:
{x,ds(x,B)} with ds(x,B) = min {d(x,z), z in B:
the boundary of the region}
such that the union of the circles with center x
and radius ds(x,B) is equal to that of region A.
This skeleton is very sensitive to noise on the
boundary, which can be prevented by
smoothing the edge.
Distance
transformations
Medial-axis is set
of local maxima
Original image
2007
4-neighbor DT
8-neighbor DT
Theo Schouten
6
DT’s
Many DT algorithms for different distance measures are possible:
- 4 neighbor: the minimum number of steps required to reach a 0 via 4-neighbors
- 8 neighbor: via 8 neighbors, always smaller or equal to the 4-neighbor distance
- approximations of euclidian (chamfer distances Borgefors, 1986 )
- Euclidian: the real Euclidian distance
There are parallel and serial versions.
Thinning algorithms, of which there are many,
shrink a (binary) region until there is a sort of
median left over, which is then used for
further processing and editing. The distance
information is not stored, therefore the original
image cannot be reconstructed.
2007
Theo Schouten
7
Shape numbers
Shape numbers of order n, related to
their chain code of length n, can be
given to edges. The derivative of the
chain code with length n is rotated such
that the smallest value is attained. This
shape number is independent of the
position and orientation of the object.
It is also independent of the scaling of the object,
only dependent on the relative proportions
between scale and size of the digitization grid.
By changing the size of this grid, "shape
numbers" of different orders can be attained. The
lower the order, the coarser the digitalization,
and the smaller the differences between the
shapes become.
2007
Theo Schouten
8
Comparing shapes
The highest order, at which two shapes still have the
same shape number, is an indication of equality of
the shapes .
2007
Theo Schouten
9
Fourier descriptors
The curve  (s)=  (s) - 2 s/P is used as a basis for the shape
description by Fourier transformation. Some shape parameters are
determined by using the amplitudes of the lower order Fourier
components. These parameters give an indication of the "pointiness"
of the shape.
A Fourier description can also be determined directly from the shape,
using (x,y) as a complex number x+jy.
A shape is usually well described by a
small amount of lower order Xk terms.
These are not invariant under rotation,
translation and scaling, but
combinations can be determined that
do have those properties.
2007
Theo Schouten
10
Region characteristics
The are several measures for the eccentricity. For example, if A is a
piece of string of the maximum length, B the string perpendicular
to A and also of maximal length, then:  = A / B
A unit for the compactness is the ratio:
circumference2 / surface area. This is
minimal for a circle (4). This can easily
be calculated from the chain-code. This
method is not appropriate for smaller
discrete objects.
Other eccentricity units are based on moments:
Mij =  R (x0-x)i(y0-y)j with x0 = (1/n)  R x and y0 = (1/n)  R y
The orientation of a region  (the angle between the main axis of the region to the xaxis) and  are given by:
tan 2 = 2 M11 / ( M20 - M02 )
 = ( ( M20 - M02 ) 2 + 4 M11) / surface area
2007
Theo Schouten
11
Moments
Moments for a gray image:
µpq =  x  y (x-x0)p (y-y0)q f[x,y]
A uniqueness theorem states that if f(x,y) is
continuous and only unequal to 0 in a restricted
area, then the series µpq is uniquely determined by
f(x,y) and vice versa. From the second and third
order moments a set of seven invariant moments
can be calculated, which do not change during
translation, scaling and rotation of a region.In
practice it is very difficult to use these moments
for the recognition of objects.
2007
Theo Schouten
12
Textures
A possible description of texture is: "an image is built up of many interweaved
elements". The idea of interweaved elements is closely related to the idea of texture
resolution, something like the average number of pixels needed to describe each
texture element. If this is large enough, one can try to describe the individual elements
with some detail and especially their positions. When this number comes close to 1, it
is more difficult to characterize individual elements. Statistical methods are then used
to describe the distribution of the gray levels in the image.
2007
Theo Schouten
13
hierarchical, gradient
Textures can be hierarchical, different levels correspond to different recording
resolutions. When we look at a brick wall closely, we see that each brick has color or
intensity variations which we can describe using a statistical model. If we look at the
wall at a larger distance, then we can recognize half or whole bricks and describe the
location and orientation of those bricks relative to each other. At an even larger
distance each individual brick will only be several pixels large and is not suitable for
geometric descriptions, we must then migrate to a more suitable statistical model.
Texture is almost always a characteristic bound
to a region. It can therefore be used to determine
the properties of the region, such as the
orientation with respect to the viewing direction,
or the distance, to the camera: the so called
texture gradient techniques.
2007
Theo Schouten
14
Statistical pattern recognition
Statistical pattern recognition occupies itself with the classification of (individual
occurrences) patterns. It is a separate field of expertise and has many application
possibilities.
A basic notation in pattern recognition is the "feature vector", v = (v1,...,vn), with
which the relevant properties of a pattern are represented in a small n-dimensional
Euclidian space. The feature vector is calculated out of available measurement data.
With effective features the different classes
can be divided into well-defined sub-spaces.
The vectors of instances of a certain class lie
close to each other and are well separated
from vectors in other classes.
Suitable features and a good partition of the feature space can be achieved by:
•analytical methods: when parametric models of textures are available.
•training: use several texture instances of each class. Think up features and vary these
to minimize distances within the classes and to maximize the inter-class distances.
•learning: take several textures, calculate possible feature spaces and in that try to find
spatial clusters. Try to identify the texture classes using those clusters.
2007
Theo Schouten
15
Classification methods
The "nearest mean" or "minimum
distance" method. Every texture class i has
a center point ci in the n-dimensional
feature space. It is determined by training,
for example by averaging the training
samples of each class. A new point, for
which the Euclidian distance || v - ci||2 is
minimal, to class i.
- "nearest neighbour" classifier: take the training sample which lie closest to the new
point, take that class as the class of the new point.
- With the "condensed nearest neighbor" classification we are only interested in the
training samples that lie on the edge of each class subspace.
- With the "k-Nearest Neighbour" (kNN) classifier we are interested in the k training
samples that are the closest to the new point. We take the most occuring class.
2007
Theo Schouten
16
Fourier features
Vr1,r2 =  |F(u,v)|2 dudv
r12  (u2 + v2) < r22
V 1, 2=  |F(u,v)|2 dudv
with  over 1  tan-1(v/u) <  2
2007
Theo Schouten
17
Laws method
We can also apply a similar sort of energy approximation to the spatial image itself.
The advantage is that the basis is not the Fourier basis (cos and sin waves) but rather a
more suitable set of basic texture patterns. An example of Laws (1980):
•first flatten the gray level histogram by transforming the gray levels, this
eliminates the influence of the lighting.
•decompose the image (as with Frei-Chen) into m 5*5 or 3*3 basic texture
patterns. This results in m images: f'k = f  hk
•determine the "energy" by averaging with the 15 * 15 surrounding environment
(texture is a regional characteristic):
f"k (x,y) = (1/225)  | f'k (x',y')| with |x-x'| < 7 and |y-y'| <7
•this f"k defines a m-dimensional feature vector for each pixel (x,y):
v(x,y) = { f"1 (x,y), f"2 (x,y),..., f"m (x,y) }
2007
Theo Schouten
18
Construction kernels
An alternative, that which Laws used, is to construct about 25 5*5 convolution
kernels from 5 one-dimensional kernels. This is done by the convolution of one
horizontal 1-D kernel with one vertical 1-D kernel:
L5
E5
S5
W5
R5
=
=
=
=
=
[ 1
[ -1
[ -1
[ -1
[ 1
4
-2
0
2
-4
6
0
2
0
6
4
2
0
-2
-4
1
1
-1
1
1
]
]
]
]
]
(Level)
(Edge)
(Spot)
(Wave)
(Ripple)
If the direction of the texture is not of importance, the features can be averaged to a
set of 14 features that remain invariant under the rotation of the texture.
2007
Theo Schouten
19
SGLD
Spatial Gray Level Dependence (SGLD) matrices (sometimes also referred to as cooccurrence matrices) are one of the most popular sources of texture features. The
definition of the SGLD matrix is:
S(i,j,d, ) : the number of locations (x,y) in the image f with
f(x,y) = i and f(x + d cos  , y + d sin ) = j;
i and j are gray values, usually in bins: minI, minI+  I,...., maxI
d the distance, smaller than the texel size (a small number of pixels)
usually restricts itself to a small number of angles (steps of 45°)
For many textures the reversal of the direction is not relevant:
S'(d,  ) = 1/2 ( S(d,  ) + S(d,  +  ) )
Some features which can be derived from the SGLD matrix are:
E(d, ) =  i  j S(i,j,d,  )2 (Energy)
H(d,  ) =  i  j S(i,j,d, ) ln S(i,j,d, ) (Entropy)
I(d, ) =  i  j (i-j)2 S(i,j,d, ) (Inertia, contrast)
These features have no relationship with "rough" or "smooth" which people typically
use to describe textures.
2007
Theo Schouten
20