Transcript Човеко Машинен Интерфейс с KINECT
Random Forest and Graph Cut based segmentation of human limbs
Nadezhda Zlateva, IICT-BAS
7 Sept. 2011
Outline
• Human Pose Recognition • Case Study • Randomized Decision Tree • Random Forest • Experimental results with RF • Graph Cut • Experimental results with GC • Application to hand classification • Conclusion • References
2
Human Pose Recognition
Recognition via
conventional intensity cameras
depth cameras Frame to frame points tracking – slow to re-initialize
3
Pose Recognition in parts:
•
Body parts segmentation - Per pixel classification
•
3D skeletal joints estimation
[1] Shotton et al., 11
Case Study
Upper limbs segmentation for hand gesture recognition
4
Application:
• •
Sign language interpretation Medical environments
-
Robots medical assistants [Purdue University]
-
CT & MRI review in sterile environments [Sunnybrook Hospital, Toronto]
8 4
Binary Decision Tree: Basics
9 2 14 15 5
v
1
≥
16 10
≥ <
6 11
<
17 3 leaf nodes split nodes 12 7 13
5
category c
DT over depth images: Training
feature vector – pixel x [
x, y, z
] T of depth image I split function – depth comparison features
f
θ as function of x: d I (x) – depth at pixel x
6
[1] Shotton, 11 θ 1 θ 2 Combination of weak but computationally efficient features
Randomized DT: Training
1. Random selection of a set of split candidates ϕ = ( θ , τ), where set of split thresholds for each θ for tree t.
2. Definition of the set of training pixels Q ={( I ,x)} over all training images for the tree t. Q set of pixels at the root node .
7
3. Find best split candidate at node n – largest information gain from splitting Q into Q left & Q right
Randomized DT: Training
4. Recurse for Q left ( ϕ *) & Q right ( ϕ *) – till reaching stop conditions Maximum depth Minimum information gain Minimum number of node pixels
8
5. Estimation of P t (c| I,x )
c
at each leaf node over body part labels – use normalized histogram
Note:
• dependent on choice of parameters • prone to over-fitting
Random Forest
Forest - ensemble of T decision trees • Divide training (depth) images into T subsets – unique subset for each tree t • Train each tree
9
[3] Breiman 01 [1] Shotton et al. 11
tree t 1
Random Forest: Classification x x
tree t T ……
10
• classification is label c label c
Random Forest: Toy demo 11
[2] Shotton et al. 09
Random Forest: Summary
• Improves generalization to new data • Ensemble of trees gives robustness • Good for multi-class problems • Resistant to over-fitting • Fast training on large data sets • Efficient classifier
12
RF: Experiments and results 13
- Ground truth: 500 (upper limb) labeled depth images (640x480) - Number of trees: T=3 - Tree depth: 15 - Split candidates: |θ|
=100
, |τ|
=20 for each
θ - Random pixels per image: 1000 - 5-fold cross validation => 100 test images, 130 training images per tree Table 1. Average per class accuracy with RF classification
RF: Experiments and results
Ground truth & training Per pixel classification
14
Segmentation by Graph Cut: Motivation 15
RF classification results: • Fuzzy body part boundaries • Left/Right uncertainty Subsequent hand sign recognition – requires cleaner hand region segmentation Graph Cut framework: • Energy minimization framework • Binary and multi-label image segmentation • Combines local and contextual information
Pixel labeling problem
Given Pixels
Assignment cost – U (unary potential) Separation cost – B (boundary potential) - pairs of neighboring pixels
Find Labels that minimize [4] Boykov et al. 01
16
Graph Cut: Binary case
• Image as directed graph G(V, E)
t
-link Assignment cost
n
-link Separation cost Energy minimization problem = min s-t cut on G = max-flow
Theorem:
In a graph G, the maximum source-to-sink flow possible is equal to the capacity of the minimum cut in G.
[L. R. Foulds, Graph Theory Applications, 1992 Springer-Verlag New York Inc., 247-248]
17
Graph Cut: Multi-label case 18
Energy = cut cost ||
C
||
e
C
|
w ij
| Suboptimal approximation of the minimum energy
Energy function
Graph Cut: Potentials
Importance weight prob. by RF
19
Unary potential , Boundary potential prior constraints , [5] Boykov et al. 06
Graph Cut: Results
Spatial Coherence:
20
RF classifications
Graph Cut: Results
GC segmentation
21
Ground truth Random Forest
RF & GC for hands
63 frames 500 random pixels |Omax| = 45 58.5% per class accuracy
22
Graph Cut 70.9% per class accuracy
Conclusion
• RF – strong classifier • RF + GC over depth maps – good object segmentation
Future Work
• Increase available data • Improve pixel label inference • Estimate upper limb/hand joints • Recognize finger configuration
23
References
[1] Shotton, J., A. FItzgibbon, M. Cook, T. Sharp, M. Finocchio, R. Moore, A. Kipman, A. Blake. Real-time Human Pose Recognition in Parts from a Single Depth Image. CVPR, 2011 [2] Shotton, J. Boosting and Random Forest for Visual Recogniion, ICCV Tutorial, 2009. http://www.iis.ee.ic.ac.uk/~tkkim/iccv09_tutorial [3] Breiman, L. Random forests. Mach. Learning, 45(1):5–32, 2001. http://www.stat.berkeley.edu/~breiman/RandomForests [4] Boykov, Y., and M. P. Jolly. Interactive graph cuts for optimal boundary and region segmentation of objects in N-D images. In Proc. IEEE Int. Conf. on Computer Vision, 2001.
[5] Boykov, Y., and G. Funka-Lea. Graph cuts and efficient n-d image segmentation. IJCV, 70:109–131, 2006