Semantic Texton Forests for Image Categorization and Segmentation

Download Report

Transcript Semantic Texton Forests for Image Categorization and Segmentation

Semantic Texton Forests for Image
Categorization and Segmentation
We would like to thank Amnon Drory
for this deck
:‫הבהרה‬
.‫ לא מופיע במצגת‬/ ‫החומר המחייב הוא החומר הנלמד בכיתה ולא זה המופיע‬
Semantic Texton Forests

1.
Input:
Training: Images with pixel level ground truth classification
MSRC 21 Database
Semantic Texton Forests

1.
2.

Input:
Training: Images with
pixel level ground
truth classification.
Testing: Images
Output:
A classification of
each pixel in the test
images to an object
class.
Main Mathematical Tools
1.
Conditional Random Fields (CRF)
2.
Textons ( Convolution with Filter Banks )
3.
K-Means
4.
Joint Boost
Randomized Decision Forests
Decision Trees
Decision Trees
Decision Rules (Split Functions)
1. Choose one or two
pixels near pixel of
interest
2. Calculate simple
function of their values:
- Difference
- Sum
- Absolute Difference
- Or just the pixel
valuere To a hreshold
Motivation: very small
number of computer
operations. Easy to
implement on GPU.
Training the tree
For each node:
1.Randomly generate a few decision rules
2.Choose the one that maximally improves the ability of
the tree to separate between classes. (E(I) – entropy)
Stop when tree reached pre-defined depth , or when no
further improvement in classification can be achieved.
Decision Forests
For added strength, create several trees instead of just one.
Each tree is trained using a different subset of the training
data.
Classifying at test time
For each pixel in the test image:
Apply the segmentation forest – marking a path in each tree (yellow).
Each leaf is associated with a histogram of classes.
Average the histograms from all tree, achieving a vector of probabilities
for this pixel belonging to each class:
0.001
0.04
0.05
0.23
0.01
Classifying at test time
The probability vectors derived from the Decision Forests can be used
to classify pixels to classes, by assigning to each pixel the label that is
most likely. The results are very noisy.
0.001
0.04
0.05
0.23
0.01
Paths on trees represent Texture
1
84
2
4
8
85
3
9
10
87
7
6
5
11
86
91
17
The texture of an area around a pixel can be represented by
a vector comprised of all the nodes in the decision forest
that belong to paths traversed when applying the forest to
this pixel. In the above example, this would be the vector:
( 1, 3, 6, 10, 17, … , 84, 85, 87, 91 )
This vector is called a Semantic Texton.
Example of Semantic Textons
A visualization of leaf nodes from
one tree (distance d = 21 pixels). Each patch is
the average of all patches in the training images
assigned to a particular leaf node. Features
evident include color, horizontal, vertical and
diagonal edges, blobs, ridges and corners.
Example of Semantic Textons
Second Randomized Decision Forest
• The split functions at the nodes are based on
the Texture-Layout filter:
• Two types of Texture-Layout filters are used:
1. Count the occurrence of a certain node in the
semantic textons of pixels in a rectangle.
(
,
)
Second Randomized Decision Forest
• The split functions at the nodes are based on the
Texture-Layout filter:
• Two types of Texture-Layout filters are used:
1. Count the occurrence of a certain node N in the
semantic textons of pixels in a rectangle.
2. Sum the probabilities of belonging to a certain class K
for all pixels in a rectangle. This is semantic context.
(
,
)
STF - Results
Overall Accuracy: 72%
Though less aesthetic, these results are quantitatively
almost as good as those of TextonBoost.
STF - Summary
• To speed up calculation, this algorithm uses
Radomized Decision Forests instead of other
mathemaical tools used in TextonBoost.
• One RDF is used to calculate texture.
• A second RDF is used to classifiy pixels to object
classes.
• The results are almost as good (quantitatively) as
those of TextonBoost.
• The algorithm is much quicker than TextonBoost.
Real-Time Human Pose Recognition in
Parts from Single Depth Images
J. Shotton, A. Fitzgibbon, M. Cook, T. Sharp,
M. Finocchio, R. Moore, A. Kipman, A. Blake
Overview
Classification
• Classify each pixel to one of 31 body part classes:
–
–
–
–
–
–
–
–
–
–
–
–
Left Upper / Right Upper / Left Lower / Right Lower head
neck
Left / Right shoulder
Left Upper / Right Upper / Left Lower / Right Lower arm
Left /Right elbow
Left /Right wrist
Left /Right hand
Left Upper / Right Upper / Left Lower / Right Lower torso
Left Upper / Right Upper / Left Lower / Right Lower leg
Left /Right knee
Left /Right ankle
Left /Right foot
Randomized Decision Forests
• The Randomized Decision Forests are very deep
(depth = 20).
• => A very strong ability to classify correctly on the
Training Set.
• => A risk of over fitting. This risk is averted by
using a very (very) large training set, containing
examples of many poses we wish to recognize.
Most of this training set is artificially generated.
Decision Rules
Classification → body part locations
• Separately for each of 31 Body Part Classes:
– Create Probability map
– Project to 3D space
– Find modes ( using mean-shift )
– The modes (with a little post processing ) are the
suggested body part locations (the output of this
algorithm).