Човеко Машинен Интерфейс с KINECT

download report

Transcript Човеко Машинен Интерфейс с KINECT

Random Forest and Graph Cut based segmentation of human limbs

Nadezhda Zlateva, IICT-BAS

7 Sept. 2011


Human Pose RecognitionCase StudyRandomized Decision TreeRandom ForestExperimental results with RFGraph CutExperimental results with GCApplication to hand classificationConclusionReferences


Human Pose Recognition

Recognition via

conventional intensity cameras

depth cameras Frame to frame points tracking – slow to re-initialize


Pose Recognition in parts:

Body parts segmentation - Per pixel classification

3D skeletal joints estimation

[1] Shotton et al., 11

Case Study

Upper limbs segmentation for hand gesture recognition



• •

Sign language interpretation Medical environments


Robots medical assistants [Purdue University]


CT & MRI review in sterile environments [Sunnybrook Hospital, Toronto]

8 4

Binary Decision Tree: Basics

9 2 14 15 5



16 10

≥ <

6 11


17 3 leaf nodes split nodes 12 7 13


category c

DT over depth images: Training

feature vector – pixel x [

x, y, z

] T of depth image I split function – depth comparison features


θ as function of x: d I (x) – depth at pixel x


[1] Shotton, 11 θ 1 θ 2 Combination of weak but computationally efficient features

Randomized DT: Training

1. Random selection of a set of split candidates ϕ = ( θ , τ), where set of split thresholds for each θ for tree t.

2. Definition of the set of training pixels Q ={( I ,x)} over all training images for the tree t. Q set of pixels at the root node .


3. Find best split candidate at node n – largest information gain from splitting Q into Q left & Q right

Randomized DT: Training

4. Recurse for Q left ( ϕ *) & Q right ( ϕ *) – till reaching stop conditions Maximum depth Minimum information gain Minimum number of node pixels


5. Estimation of P t (c| I,x )


at each leaf node over body part labels – use normalized histogram


• dependent on choice of parameters • prone to over-fitting

Random Forest

Forest - ensemble of T decision trees • Divide training (depth) images into T subsets – unique subset for each tree t • Train each tree


[3] Breiman 01 [1] Shotton et al. 11

tree t 1

Random Forest: Classification x x

tree t T ……


classification is label c label c

Random Forest: Toy demo 11

[2] Shotton et al. 09

Random Forest: Summary

• Improves generalization to new data • Ensemble of trees gives robustness • Good for multi-class problems • Resistant to over-fitting • Fast training on large data sets • Efficient classifier


RF: Experiments and results 13

- Ground truth: 500 (upper limb) labeled depth images (640x480) - Number of trees: T=3 - Tree depth: 15 - Split candidates: |θ|


, |τ|

=20 for each

θ - Random pixels per image: 1000 - 5-fold cross validation => 100 test images, 130 training images per tree Table 1. Average per class accuracy with RF classification

RF: Experiments and results

Ground truth & training Per pixel classification


Segmentation by Graph Cut: Motivation 15

RF classification results: • Fuzzy body part boundaries • Left/Right uncertainty Subsequent hand sign recognition – requires cleaner hand region segmentation Graph Cut framework: • Energy minimization framework • Binary and multi-label image segmentation • Combines local and contextual information

Pixel labeling problem

Given Pixels

Assignment cost – U (unary potential) Separation cost – B (boundary potential) - pairs of neighboring pixels

Find Labels that minimize [4] Boykov et al. 01


Graph Cut: Binary case

• Image as directed graph G(V, E)


-link Assignment cost


-link Separation cost Energy minimization problem = min s-t cut on G = max-flow


In a graph G, the maximum source-to-sink flow possible is equal to the capacity of the minimum cut in G.

[L. R. Foulds, Graph Theory Applications, 1992 Springer-Verlag New York Inc., 247-248]


Graph Cut: Multi-label case 18

Energy = cut cost ||


|| 


 



w ij

| Suboptimal approximation of the minimum energy

Energy function

Graph Cut: Potentials

Importance weight prob. by RF


Unary potential , Boundary potential prior constraints , [5] Boykov et al. 06

Graph Cut: Results

Spatial Coherence:


RF classifications

Graph Cut: Results

GC segmentation


Ground truth Random Forest

RF & GC for hands

63 frames 500 random pixels |Omax| = 45 58.5% per class accuracy


Graph Cut 70.9% per class accuracy


• RF – strong classifier • RF + GC over depth maps – good object segmentation

Future Work

• Increase available data • Improve pixel label inference • Estimate upper limb/hand joints • Recognize finger configuration



[1] Shotton, J., A. FItzgibbon, M. Cook, T. Sharp, M. Finocchio, R. Moore, A. Kipman, A. Blake. Real-time Human Pose Recognition in Parts from a Single Depth Image. CVPR, 2011 [2] Shotton, J. Boosting and Random Forest for Visual Recogniion, ICCV Tutorial, 2009. http://www.iis.ee.ic.ac.uk/~tkkim/iccv09_tutorial [3] Breiman, L. Random forests. Mach. Learning, 45(1):5–32, 2001. http://www.stat.berkeley.edu/~breiman/RandomForests [4] Boykov, Y., and M. P. Jolly. Interactive graph cuts for optimal boundary and region segmentation of objects in N-D images. In Proc. IEEE Int. Conf. on Computer Vision, 2001.

[5] Boykov, Y., and G. Funka-Lea. Graph cuts and efficient n-d image segmentation. IJCV, 70:109–131, 2006