Transcript 2/13/2012 12 Structured Light
How Kinect works?
Po-Hsiang Chen Advisor: Sheng-Jyh Wang
2/13/2012
Major References
• • Shotton, J., A. Fitzgibbon, et al. (2011). "Real-Time Human Pose Recognition in Parts from Single Depth Images." Microsoft Research Cambridge & Xbox
Incubation CVPR 2011 Best Paper
• • Freedman, B., A. Shpunt, et al. (2008). Depth mapping using projected patterns,
US
2010/0118123A1 PrimeSense Patent
2 2/13/2012
Outline
• • • • • • • What is Kinect?
Kinect Architecture • • From IR to depth image History of Structured Light PrimeSense Invented Structured Light • • From depth image to joint positions Body Part Interference Joint Proposals Experiments and Results Conclusion References
3 2/13/2012
Outline
• • • • • • • What is Kinect?
Kinect Architecture • • From IR to depth image History of Structured Light PrimeSense Invented Structured Light • • From depth image to joint positions Body Part Interference Joint Proposals Experiments and Results Conclusion References
4 2/13/2012
What is Kinect?
• • Motion sensing input device by Microsoft • • • Depth camera tech. developed by PrimeSense Invented in 2005 Software tech. developed by Rare First announced at E3 2009 as “Project Natal” • Windows SDK Releases http://www.microsoft.com
/en-us/kinectforwindows/ discover/features.aspx
5 2/13/2012
Kinect IR Structured Light
6 2/13/2012
Outline
• • • • • • • What is Kinect?
Kinect Architecture • • From IR to depth image History of Structured Light PrimeSense Invented Structured Light • • From depth image to joint positions Body Part Interference Joint Proposals Experiments and Results Conclusion References
7 2/13/2012
Kinect Architecture
Depth Image Body Parts Joint Position
IR Structured Light Random Decision Forest
8
Mean Shift
2/13/2012
Outline
• • • • • • • What is Kinect?
Kinect Architecture • • From IR to depth image History of Structured Light PrimeSense Invented Structured Light • • From depth image to joint positions Body Part Interference Joint Proposals Experiments and Results Conclusion References
9 2/13/2012
3D Imaging of surface
10 2/13/2012
Triangulation
• • • Main Problem To recover shape from multiple views, need CORRESPONDENCES between the images • Matching/Correspondence problem is hard Occlusions, Texture, Colors.. Etc.
• • • Solution: Structured light Idea: Simplify matching Strategy: Use illumination to create your own correspondences
11 2/13/2012
Structured Light
• • • Basic Principle Use a projector to create unambiguous correspondences • Light projection If we project a single point, matching is unique
12 2/13/2012
Structured Light
• • • Line projection ( Line Scan ) For calibrated cameras, the epipolar geometry is known Project a line instead of a single point
13 2/13/2012
Structured Light
• • Project Multiple Stripes or Grids Which stripe matches which?
• Correspondence Again
14 2/13/2012
Structured Light
• • Answer 1: Assume Surface Continuity Ordering Constraint
15 2/13/2012
Structured Light
• • Answer 2: Coloured stripes (De Bruijn) Difficult to use for coloured surfaces
16 2/13/2012
Structured Light
• • Answer 2: Coloured dots (M-array) Difficult to use for coloured surfaces
17 2/13/2012
Structured Light
• • Answer 3: Pattern dots (M-array) Difficult for industrial manufacturing
18 2/13/2012
Structured Light
• • • Answer 4: Time-coded light patterns (Time multiplexing) Use a sequence of binary patterns → (log N) images Each stripe has a unique binary illumination code
19 2/13/2012
Structured Light
• • • All of the above are categorized as Discrete Methods There are a lot more Continuous Structured Light Methods such as Phase shifting and etc.
Salvi, J., S. Fernandez, et al. (2010). "A state of the art in structured light patterns for surface profilometry." Pattern Recognition 43(8): 2666-2680
20 2/13/2012
Structured Light
• • All of the above are human designed patterns.
• • Random Speckle Structured light using randomly generated patterns May obtain denser depth information by solving correspondence problem
21 2/13/2012
What can we do better?
• • • A Projector is just an inverse of a camera One projector and one camera is enough for triangulation Need Calibration
22 2/13/2012
PrimeSense Patents
• • •
US
2010/0118123 Projector-Camera system Already calibrated structure
•
δZ results in δX in 32
23 2/13/2012
PrimeSense Patents
• •
US
2010/0118123
Structured Light-1 • • • • Pseudo-random distribution Local: Random Global: Gray level decreases Can make a rough estimate in a low resolution image
24 2/13/2012
PrimeSense Patents
• •
US
2010/0118123
Structured Light-2 • • • Quasi-periodic pattern Five-fold symmetry Results in distinct peaks in freq. domain • Contain no unit cell repeats over spatial domain • Use to reduce noise and ambient light in environment
25 2/13/2012
Kinect IR Structured Light
26 2/13/2012
PrimeSense Patents
•
US
2010/0290698
27 2/13/2012
PrimeSense Patents
• • •
US
2010/0290698
Uses a special (“astigmatic”) lens with different focal length in x- and y- directions Orientation of the circle indicates depth
28 2/13/2012
Outline
• • • • • • • What is Kinect?
Kinect Architecture • • From IR to depth image History of Structured Light PrimeSense Invented Structured Light • • From depth image to joint positions Body Part Interference Joint Proposals Experiments and Results Conclusion References
29 2/13/2012
From depth to joints
• • • • Shotton, J., A. Fitzgibbon, et al. (2011). "Real-Time Human Pose Recognition in Parts from Single Depth Images." Microsoft Research Cambridge & Xbox
Incubation
Treat body segmentation as a per-pixel classification task ( No pairwise term or CRF is used ) Algorithms runs 5ms per frame on Xbox GPU Novelty: Intermediate body parts representation
30 2/13/2012
Body Part Inference
• • • Body part labeling 31 body parts Distinct parts for left and right allow classifier to disambiguate the left and right sides of the body
31 2/13/2012
Body Part Inference
• Depth image features • • • dI(x) is the depth at pixel x in image I θ=(u,v) describe offsets u and v Each feature need only read at most 3 image pixels and perform at most 5 arithmetic operations
32 2/13/2012
Randomized Decision Forests
• • • • Fast and effective multi-class classifier Each split node consists of a feature fθ and a threshold τ At the leaf node in tree t, given a learned Final classification
33 2/13/2012
Combining Models
• • • • Multiple classifiers work together Committees • • E.g. Averaging the predictions of a set of individual models E.g. Majority votes • • Boosting Classifiers trained in sequence E.g. AdaBoost Decision Tree • Binary selection corresponding to the traversal of a tree
34 2/13/2012
Decision Tree
• • • • Three major aspect A splitting criterion A stop-splitting rule A rule to assign each leaf to a specific class • • Decision Forests A Decision Tree Committee
35 2/13/2012
Randomized Decision Forests
• • • • Fast and effective multi-class classifier Each split node consists of a feature fθ and a threshold τ At the leaf node in tree t, given a learned Final classification
How to train?
36 2/13/2012
Randomized Decision Forests
• • • • Training Each tree train on different images Each image pick 2000 example pixels Algorithm
37 2/13/2012
Randomized Decision Forests
• Algorithm(cont.) • Shannon entropy given Z on Y
38 2/13/2012
Randomized Decision Forests
• Algorithm(cont.) • • Training takes a lot of efforts 3 trees with depth 20 from 1 million images takes about a day on a 1000 core cluster
Where are those training data?
39 2/13/2012
Training Data
• • • Depth imaging Simplify the task of background subtraction Most important: easy to synthesize!!!
Take Real Images Learning Synthesize Parameters Generate Lots of training data
40 2/13/2012
Kinect Architecture
Depth Image Body Parts Joint Position
IR Structured Light Random Decision Forest
41
Mean Shift
2/13/2012
Joint Position Proposals
• From the previous section, • Use Mean Shift with a weighted Gaussian kernel
42 2/13/2012
Mean Shift
• • Kernel density estimator Discrete points -> Continuous function • • Calculate the gradient at initial point and shift Iterate till stop
43 2/13/2012
Outline
• • • • • • • What is Kinect?
Kinect Architecture • • From IR to depth image History of Structured Light PrimeSense Invented Structured Light • • From depth image to joint positions Body Part Interference Joint Proposals Experiments and Results Conclusion References
44 2/13/2012
Experiments and Results
• Synthetic • Real
45 2/13/2012
Experiments and Results
• Failure
46 2/13/2012
Experiments and Results
• Training parameters vs. classification accuracy
47 2/13/2012
Experiments and Results
• Comparisons
48 2/13/2012
Outline
• • • • • • • What is Kinect?
Kinect Architecture • • From IR to depth image History of Structured Light PrimeSense Invented Structured Light • • From depth image to joint positions Body Part Interference Joint Proposals Experiments and Results Conclusion References
49 2/13/2012
Conclusion
• • • • Depth images may contain enough information to solve human pose problems Depth images are color and texture invariant, which simplifies a lot of the corresponding problem A deep combining model with sufficient training data can become a good classifier even with simple features Buy a Kinect for LAB
50 2/13/2012
Outline
• • • • • • • What is Kinect?
Kinect Architecture • • From IR to depth image History of Structured Light PrimeSense Invented Structured Light • • From depth image to joint positions Body Part Interference Joint Proposals Experiments and Results Conclusion References
51 2/13/2012
References
• • • Shotton, J., A. Fitzgibbon, et al. (2011). "Real-Time Human Pose Recognition in Parts from Single Depth Images." Microsoft Research Cambridge & Xbox
Incubation
Freedman, B., A. Shpunt, et al. (2008). Depth mapping using projected patterns,
US
2010/0118123A1
Freedman, B., A. Shpunt, et al. (2008). Distance-Varying Illumination and Imaging Techniques for Depth Mapping,
US
2010/0290698A1
52 2/13/2012
References
• • • • • Salvi, J., S. Fernandez, et al. (2010). "A state of the art in structured light patterns for surface profilometry." Pattern Recognition 43(8): 2666-2680.
Albitar, I., P. Graebling, et al. (2007). “Robust structured light coding for 3D reconstruction,” IEEE.
Scharstein, D. and R. Szeliski (2003). “High-accuracy stereo depth maps using structured light,” IEEE.
Breiman, L. (2001). "Random forests." Machine learning 45(1): 5-32.
Amit, Y. and D. Geman (1997). "Shape quantization and recognition with randomized trees." Neural computation 9(7): 1545-1588.
53 2/13/2012
• • • • •
References
• • • John MacCormick, “How does the Kinect work? ”
users.dickinson.edu/~jmac/selected-talks/kinect.pdf
“Structured Light”,
www.igp.ethz.ch/photogrammetry/.../MV-SS2011 structured.pdf
http://en.wikipedia.org/wiki/Kinect http://en.wikipedia.org/wiki/Structured-light_3D_scanner http://en.wikipedia.org/wiki/Triangulation http://dms.irb.hr/tutorial/tut_dtrees.php
http://www.anandtech.com/show/4057/microsoft-kinect the-anandtech-review/2 Chen, Y. S. and B. T. Chen (2003). "Measuring of a three dimensional surface by use of a spatial distance computation." Applied optics 42(11): 1958-1972.
54 2/13/2012