2/13/2012 12 Structured Light

Transcript 2/13/2012 12 Structured Light

How Kinect works?

Po-Hsiang Chen Advisor: Sheng-Jyh Wang

2/13/2012

Major References

• • Shotton, J., A. Fitzgibbon, et al. (2011). "Real-Time Human Pose Recognition in Parts from Single Depth Images." Microsoft Research Cambridge & Xbox

Incubation CVPR 2011 Best Paper

• • Freedman, B., A. Shpunt, et al. (2008). Depth mapping using projected patterns,

2010/0118123A1 PrimeSense Patent

2 2/13/2012

Outline

• • • • • • • What is Kinect?

Kinect Architecture • • From IR to depth image History of Structured Light PrimeSense Invented Structured Light • • From depth image to joint positions Body Part Interference Joint Proposals Experiments and Results Conclusion References

3 2/13/2012

Outline

• • • • • • • What is Kinect?

4 2/13/2012

What is Kinect?

• • Motion sensing input device by Microsoft • • • Depth camera tech. developed by PrimeSense Invented in 2005 Software tech. developed by Rare First announced at E3 2009 as “Project Natal” • Windows SDK Releases http://www.microsoft.com

/en-us/kinectforwindows/ discover/features.aspx

5 2/13/2012

Kinect IR Structured Light

6 2/13/2012

Outline

• • • • • • • What is Kinect?

7 2/13/2012

Kinect Architecture

Depth Image Body Parts Joint Position

IR Structured Light Random Decision Forest

Mean Shift

2/13/2012

Outline

• • • • • • • What is Kinect?

9 2/13/2012

3D Imaging of surface

10 2/13/2012

Triangulation

• • • Main Problem To recover shape from multiple views, need CORRESPONDENCES between the images • Matching/Correspondence problem is hard Occlusions, Texture, Colors.. Etc.

• • • Solution: Structured light Idea: Simplify matching Strategy: Use illumination to create your own correspondences

11 2/13/2012

Structured Light

• • • Basic Principle Use a projector to create unambiguous correspondences • Light projection If we project a single point, matching is unique

12 2/13/2012

Structured Light

• • • Line projection ( Line Scan ) For calibrated cameras, the epipolar geometry is known Project a line instead of a single point

13 2/13/2012

Structured Light

• • Project Multiple Stripes or Grids Which stripe matches which?

• Correspondence Again

14 2/13/2012

Structured Light

• • Answer 1: Assume Surface Continuity Ordering Constraint

15 2/13/2012

Structured Light

• • Answer 2: Coloured stripes (De Bruijn) Difficult to use for coloured surfaces

16 2/13/2012

Structured Light

• • Answer 2: Coloured dots (M-array) Difficult to use for coloured surfaces

17 2/13/2012

Structured Light

• • Answer 3: Pattern dots (M-array) Difficult for industrial manufacturing

18 2/13/2012

Structured Light

• • • Answer 4: Time-coded light patterns (Time multiplexing) Use a sequence of binary patterns → (log N) images Each stripe has a unique binary illumination code

19 2/13/2012

Structured Light

• • • All of the above are categorized as Discrete Methods There are a lot more Continuous Structured Light Methods such as Phase shifting and etc.

Salvi, J., S. Fernandez, et al. (2010). "A state of the art in structured light patterns for surface profilometry." Pattern Recognition 43(8): 2666-2680

20 2/13/2012

Structured Light

• • All of the above are human designed patterns.

• • Random Speckle Structured light using randomly generated patterns May obtain denser depth information by solving correspondence problem

21 2/13/2012

What can we do better?

• • • A Projector is just an inverse of a camera One projector and one camera is enough for triangulation Need Calibration

22 2/13/2012

PrimeSense Patents

• • •

2010/0118123 Projector-Camera system Already calibrated structure

•

δZ results in δX in 32

23 2/13/2012

PrimeSense Patents

• •

2010/0118123

Structured Light-1 • • • • Pseudo-random distribution Local: Random Global: Gray level decreases Can make a rough estimate in a low resolution image

24 2/13/2012

PrimeSense Patents

• •

2010/0118123

Structured Light-2 • • • Quasi-periodic pattern Five-fold symmetry Results in distinct peaks in freq. domain • Contain no unit cell repeats over spatial domain • Use to reduce noise and ambient light in environment

25 2/13/2012

Kinect IR Structured Light

26 2/13/2012

PrimeSense Patents

•

2010/0290698

27 2/13/2012

PrimeSense Patents

• • •

2010/0290698

Uses a special (“astigmatic”) lens with different focal length in x- and y- directions Orientation of the circle indicates depth

28 2/13/2012

Outline

• • • • • • • What is Kinect?

29 2/13/2012

From depth to joints

• • • • Shotton, J., A. Fitzgibbon, et al. (2011). "Real-Time Human Pose Recognition in Parts from Single Depth Images." Microsoft Research Cambridge & Xbox

Incubation

Treat body segmentation as a per-pixel classification task ( No pairwise term or CRF is used ) Algorithms runs 5ms per frame on Xbox GPU Novelty: Intermediate body parts representation

30 2/13/2012

Body Part Inference

• • • Body part labeling 31 body parts Distinct parts for left and right allow classifier to disambiguate the left and right sides of the body

31 2/13/2012

Body Part Inference

• Depth image features • • • dI(x) is the depth at pixel x in image I θ=(u,v) describe offsets u and v Each feature need only read at most 3 image pixels and perform at most 5 arithmetic operations

32 2/13/2012

Randomized Decision Forests

• • • • Fast and effective multi-class classifier Each split node consists of a feature fθ and a threshold τ At the leaf node in tree t, given a learned Final classification

33 2/13/2012

Combining Models

• • • • Multiple classifiers work together Committees • • E.g. Averaging the predictions of a set of individual models E.g. Majority votes • • Boosting Classifiers trained in sequence E.g. AdaBoost Decision Tree • Binary selection corresponding to the traversal of a tree

34 2/13/2012

Decision Tree

• • • • Three major aspect A splitting criterion A stop-splitting rule A rule to assign each leaf to a specific class • • Decision Forests A Decision Tree Committee

35 2/13/2012

Randomized Decision Forests

• • • • Fast and effective multi-class classifier Each split node consists of a feature fθ and a threshold τ At the leaf node in tree t, given a learned Final classification

How to train?

36 2/13/2012

Randomized Decision Forests

• • • • Training Each tree train on different images Each image pick 2000 example pixels Algorithm

37 2/13/2012

Randomized Decision Forests

• Algorithm(cont.) • Shannon entropy given Z on Y

38 2/13/2012

Randomized Decision Forests

• Algorithm(cont.) • • Training takes a lot of efforts 3 trees with depth 20 from 1 million images takes about a day on a 1000 core cluster

Where are those training data?

39 2/13/2012

Training Data

• • • Depth imaging Simplify the task of background subtraction Most important: easy to synthesize!!!

Take Real Images Learning Synthesize Parameters Generate Lots of training data

40 2/13/2012

Kinect Architecture

Depth Image Body Parts Joint Position

IR Structured Light Random Decision Forest

Mean Shift

2/13/2012

Joint Position Proposals

• From the previous section, • Use Mean Shift with a weighted Gaussian kernel

42 2/13/2012

Mean Shift

• • Kernel density estimator Discrete points -> Continuous function • • Calculate the gradient at initial point and shift Iterate till stop

43 2/13/2012

Outline

• • • • • • • What is Kinect?

44 2/13/2012

Experiments and Results

• Synthetic • Real

45 2/13/2012

Experiments and Results

• Failure

46 2/13/2012

Experiments and Results

• Training parameters vs. classification accuracy

47 2/13/2012

Experiments and Results

• Comparisons

48 2/13/2012

Outline

• • • • • • • What is Kinect?

49 2/13/2012

Conclusion

• • • • Depth images may contain enough information to solve human pose problems Depth images are color and texture invariant, which simplifies a lot of the corresponding problem A deep combining model with sufficient training data can become a good classifier even with simple features Buy a Kinect for LAB

50 2/13/2012

Outline

• • • • • • • What is Kinect?

51 2/13/2012

References

• • • Shotton, J., A. Fitzgibbon, et al. (2011). "Real-Time Human Pose Recognition in Parts from Single Depth Images." Microsoft Research Cambridge & Xbox

Incubation

Freedman, B., A. Shpunt, et al. (2008). Depth mapping using projected patterns,

2010/0118123A1

Freedman, B., A. Shpunt, et al. (2008). Distance-Varying Illumination and Imaging Techniques for Depth Mapping,

2010/0290698A1

52 2/13/2012

References

• • • • • Salvi, J., S. Fernandez, et al. (2010). "A state of the art in structured light patterns for surface profilometry." Pattern Recognition 43(8): 2666-2680.

Albitar, I., P. Graebling, et al. (2007). “Robust structured light coding for 3D reconstruction,” IEEE.

Scharstein, D. and R. Szeliski (2003). “High-accuracy stereo depth maps using structured light,” IEEE.

Breiman, L. (2001). "Random forests." Machine learning 45(1): 5-32.

Amit, Y. and D. Geman (1997). "Shape quantization and recognition with randomized trees." Neural computation 9(7): 1545-1588.

53 2/13/2012

• • • • •

References

• • • John MacCormick, “How does the Kinect work? ”

users.dickinson.edu/~jmac/selected-talks/kinect.pdf

“Structured Light”,

www.igp.ethz.ch/photogrammetry/.../MV-SS2011 structured.pdf

http://en.wikipedia.org/wiki/Kinect http://en.wikipedia.org/wiki/Structured-light_3D_scanner http://en.wikipedia.org/wiki/Triangulation http://dms.irb.hr/tutorial/tut_dtrees.php

http://www.anandtech.com/show/4057/microsoft-kinect the-anandtech-review/2 Chen, Y. S. and B. T. Chen (2003). "Measuring of a three dimensional surface by use of a spatial distance computation." Applied optics 42(11): 1958-1972.

54 2/13/2012