Tracking @ EMT

Download Report

Transcript Tracking @ EMT

Consistent Visual Information
Processing
Axel Pinz
EMT – Institute of Electrical Measurement
and Measurement Signal Processing
TU Graz – Graz University of Technology
[email protected]
http://www.emt.tu-graz.ac.at/~pinz
“Consistency”
• Active vision systems / 4D data streams
• Multiple visual
information
• Imprecision
• Ambiguity
• Contradiction
This Talk: Consistency in
• Active vision systems:
– Active fusion
– Active object recognition
• Immersive 3D HCI:
– Augmented reality
– Tracking in VR/AR
AR as Testbed
Consistent perception
in 4D:
• Space
– Registration
– Tracking
• Time
– Lag-free
– Prediction
Agenda
• Active fusion
• Consistency
• Applications
– Active object recognition
– Tracking in VR/AR
• Conclusions
Active
top level
Fusion Simplescene
scene
selection
decision-action-fusion
loop:
world
interaction
fusion, control
Active Fusion (2)
• Fusion schemes
– Probabilistic
– Possibilistic (fuzzy)
– Evidence theoretic (Dempster & Shafer)
Probabilistic Active Fusion
N measurements, sensor inputs: mi
M hypotheses: oj , O = {o1, …, oM }
Bayes formula:
P(m1,...,mN | oj ) P(oj )
P(oj | m1,...,mN ) 
P(m1,...,mN )
Use entropy H(O) to measure the quality of P(O)
M
H (O)   P(oj ) log P(oj )
j 1
Probabilistic Active Fusion (2)
P(oj )=const.  Hmax
Flat distribution:
Pronounced distribution:
P(oc ) = 1; P(oj ) = 0, j  c
H=0
• Measurements can be:
• difficult,
• expensive,
• N can be prohibitively large, …
 Find iterative strategy to minimize H(O)
M
H (O)   P(oj ) log P(oj )
j 1
Probabilistic Active Fusion (3)
Start with A  1 measurements: P(oj|m1, … ,mA), HA
Iteratively take more measurements: mA+1, … ,mB
Until: P(oj|m1, … ,mB), HB  Threshold
Summary: Active Fusion
• Multiple (visual) information, many
sensors, measurements,…
• Selection of information sources
• Maximize information content / quality
• Optimize effort (number / cost of
measurements, …)
Information gain by entropy reduction
Summary: Active Fusion (2)
• Active systems (robots, mobile cameras)
– Sensor planning
– Control
– Interaction with the scene
• “Passive” systems (video, wearable,…)
– Filtering
– Selection of sensors / measurements
Consistency
• Consistency vs. Ambiguity
– Unimodal subsets Ok
• Representations
– Distance measures
Consistent Subsets
Hypotheses O = {o1 ,…, oM }
Ambiguity:
P(O) is multimodal
Consistent unimodal subsets Ok  O
Benefits: • Application domains
• Support of hypotheses
• Outlier rejection
Distance Measures
Depend on representations, e.g.:
•
•
•
•
•
•
Pixel-level
Eigenspace
3D models
Feature-based
Symbolic
Graphs
SSD, correlation, rank
Euclidean
Euclidean
Mahalanobis, …
Mutual information
Subgraph isomorphism
Mutual Information
Shannon´s measure of mutual information:
O = {o1 ,…, oM }
A  O, B  O
I(A,B) = H(A) + H(B) – H(A,B)
Applications
• Active object recognition
– Videos
– Details
• Tracking in VR / AR
– Landmark definition / acquisition
– Real-time tracking
Active vision laboratory
Active Object Recognition
Active Object Recognition
in Parametric Eigenspace
•
•
•
•
•
Classifier for a single view
Pose estimation per view
Fusion formalism
View planning formalism
Estimation of object appearance at
unexplored viewing positions
Applications
 Active object recognition
– Videos
– Details
 Control of active vision systems
• Tracking in VR / AR
– Landmark definition / acquisition
– Real-time tracking
 Selection, combination, evaluation
 Constraining of huge spaces
Landmark Definition / Acquisition
What is a “landmark” ?
corners
blobs
natural landmarks
Automatic Landmark Acquisition
• Capture a dataset of the scene:
– calibrated stereo rig
– trajectory (by magnetic tracking)
– n stereo pairs
• Process this dataset
– visually salient landmarks for tracking
Automatic Landmark Acquisition
visually salient landmarks for tracking
• salient points in 2D image
• 3D reconstruction
• clusters in 3D:
– compact, many points
– consistent feature descriptions
• cluster centers  landmarks
Processing Scheme
Office Scene
Office Scene - Reconstruction
Office Scene - Reconstruction
Unknown Scene
Landmark
Acquisition
Real-Time Tracking
Real-Time Tracking
• Measure position and orientation of object(s)
• Obtain trajectories of object(s)
• Stationary observer – “outside-in”
– Vision-based
• Moving observer, egomotion – “inside-out”
– Hybrid
• Degrees of Freedom – DoF
– 3 DoF (mobile robot)
– 6 DoF (head and device tracking in AR)
Outside-in Tracking (1)
stereo-rig
IR-illumination
• wireless
• 1 marker/device:
3 DoF
• 2 markers: 5 DoF
• 3 markers: 6 DoF
devices
Epipolar Geo metry
Outside-in
Tracking (2)
3 DCor responden ce
3DObjects and Pose
3DPrediction
3DCorrespondence
2DBackprojection
Constraints
Epipolar Geometry
Consistent Tracking (1)
• Complexity
– Many targets
– Exhaustive search vs. Real-time
• Occlusion
– Redundancy (targets | cameras)
• Ambiguity in 3D
– Constraints
Consistent Tracking (2)
• Dynamic interpretation tree
– Geometric / spatial consistency
• Local constraints
– Multiple interpretations can happen
– Global consistency is impossible
• Temporal consistency
– Filtering, prediction
Consistent Tracking (3)
Hybrid Inside-Out Tracking (1)
Inertial Tracker
• 3 accelerometers
• 3 gyroscopes
• signal processing
• interface
Hybrid Inside-Out Tracking (2)
• complementary sensors
• fusion
Summary: Consistency in
• Active vision systems:
– Active fusion
– Active object recognition
• Immersive 3D HCI:
– Augmented reality
– Tracking in VR/AR
Conclusion
Consistent processing of visual information
can significantly improve
the performance of
active and real-time vision systems
Acknowledgement
Thomas Auer, Hermann Borotschnig, Markus Brandner,
Harald Ganster, Peter Lang, Lucas Paletta, Manfred
Prantl, Miguel Ribo, David Sinclair
Christian Doppler Gesellschaft, FFF, FWF, Kplus
VRVis, EU TMR Virgo