Човеко Машинен Интерфейс с KINECT

download report

Transcript Човеко Машинен Интерфейс с KINECT

Random Forest and Graph Cut based segmentation of human limbs

Nadezhda Zlateva, IICT-BAS

7 Sept. 2011

Outline

Human Pose RecognitionCase StudyRandomized Decision TreeRandom ForestExperimental results with RFGraph CutExperimental results with GCApplication to hand classificationConclusionReferences

2

Human Pose Recognition

Recognition via

conventional intensity cameras

depth cameras Frame to frame points tracking – slow to re-initialize

3

Pose Recognition in parts:

Body parts segmentation - Per pixel classification

3D skeletal joints estimation

[1] Shotton et al., 11

Case Study

Upper limbs segmentation for hand gesture recognition

4

Application:

• •

Sign language interpretation Medical environments

-

Robots medical assistants [Purdue University]

-

CT & MRI review in sterile environments [Sunnybrook Hospital, Toronto]

8 4

Binary Decision Tree: Basics

9 2 14 15 5

v

1

16 10

≥ <

6 11

<

17 3 leaf nodes split nodes 12 7 13

5

category c

DT over depth images: Training

feature vector – pixel x [

x, y, z

] T of depth image I split function – depth comparison features

f

θ as function of x: d I (x) – depth at pixel x

6

[1] Shotton, 11 θ 1 θ 2 Combination of weak but computationally efficient features

Randomized DT: Training

1. Random selection of a set of split candidates ϕ = ( θ , τ), where set of split thresholds for each θ for tree t.

2. Definition of the set of training pixels Q ={( I ,x)} over all training images for the tree t. Q set of pixels at the root node .

7

3. Find best split candidate at node n – largest information gain from splitting Q into Q left & Q right

Randomized DT: Training

4. Recurse for Q left ( ϕ *) & Q right ( ϕ *) – till reaching stop conditions Maximum depth Minimum information gain Minimum number of node pixels

8

5. Estimation of P t (c| I,x )

c

at each leaf node over body part labels – use normalized histogram

Note:

• dependent on choice of parameters • prone to over-fitting

Random Forest

Forest - ensemble of T decision trees • Divide training (depth) images into T subsets – unique subset for each tree t • Train each tree

9

[3] Breiman 01 [1] Shotton et al. 11

tree t 1

Random Forest: Classification x x

tree t T ……

10

classification is label c label c

Random Forest: Toy demo 11

[2] Shotton et al. 09

Random Forest: Summary

• Improves generalization to new data • Ensemble of trees gives robustness • Good for multi-class problems • Resistant to over-fitting • Fast training on large data sets • Efficient classifier

12

RF: Experiments and results 13

- Ground truth: 500 (upper limb) labeled depth images (640x480) - Number of trees: T=3 - Tree depth: 15 - Split candidates: |θ|

=100

, |τ|

=20 for each

θ - Random pixels per image: 1000 - 5-fold cross validation => 100 test images, 130 training images per tree Table 1. Average per class accuracy with RF classification

RF: Experiments and results

Ground truth & training Per pixel classification

14

Segmentation by Graph Cut: Motivation 15

RF classification results: • Fuzzy body part boundaries • Left/Right uncertainty Subsequent hand sign recognition – requires cleaner hand region segmentation Graph Cut framework: • Energy minimization framework • Binary and multi-label image segmentation • Combines local and contextual information

Pixel labeling problem

Given Pixels

Assignment cost – U (unary potential) Separation cost – B (boundary potential) - pairs of neighboring pixels

Find Labels that minimize [4] Boykov et al. 01

16

Graph Cut: Binary case

• Image as directed graph G(V, E)

t

-link Assignment cost

n

-link Separation cost Energy minimization problem = min s-t cut on G = max-flow

Theorem:

In a graph G, the maximum source-to-sink flow possible is equal to the capacity of the minimum cut in G.

[L. R. Foulds, Graph Theory Applications, 1992 Springer-Verlag New York Inc., 247-248]

17

Graph Cut: Multi-label case 18

Energy = cut cost ||

C

|| 

e

 

C

|

w ij

| Suboptimal approximation of the minimum energy

Energy function

Graph Cut: Potentials

Importance weight prob. by RF

19

Unary potential , Boundary potential prior constraints , [5] Boykov et al. 06

Graph Cut: Results

Spatial Coherence:

20

RF classifications

Graph Cut: Results

GC segmentation

21

Ground truth Random Forest

RF & GC for hands

63 frames 500 random pixels |Omax| = 45 58.5% per class accuracy

22

Graph Cut 70.9% per class accuracy

Conclusion

• RF – strong classifier • RF + GC over depth maps – good object segmentation

Future Work

• Increase available data • Improve pixel label inference • Estimate upper limb/hand joints • Recognize finger configuration

23

References

[1] Shotton, J., A. FItzgibbon, M. Cook, T. Sharp, M. Finocchio, R. Moore, A. Kipman, A. Blake. Real-time Human Pose Recognition in Parts from a Single Depth Image. CVPR, 2011 [2] Shotton, J. Boosting and Random Forest for Visual Recogniion, ICCV Tutorial, 2009. http://www.iis.ee.ic.ac.uk/~tkkim/iccv09_tutorial [3] Breiman, L. Random forests. Mach. Learning, 45(1):5–32, 2001. http://www.stat.berkeley.edu/~breiman/RandomForests [4] Boykov, Y., and M. P. Jolly. Interactive graph cuts for optimal boundary and region segmentation of objects in N-D images. In Proc. IEEE Int. Conf. on Computer Vision, 2001.

[5] Boykov, Y., and G. Funka-Lea. Graph cuts and efficient n-d image segmentation. IJCV, 70:109–131, 2006