Transcript slides - Machine Learning
Learning Human Pose and Motion Models for Animation
Aaron Hertzmann University of Toronto
Animation is maturing … … but it’s still hard to create
Keyframe animation
Keyframe animation q
1
q
2
q
3 q(t) q(t) http://www.cadtutor.net/dd/bryce/anim/anim.html
Characters are very complex
Woody: - 200 facial controls - 700 controls in his body http://www.pbs.org/wgbh/nova/specialfx2/mcqueen.html
Motion capture
[Images from NYU and UW]
Motion capture
Mocap is not a panacea
Goal: model human motion
What motions are likely?
Applications: • Computer animation • Computer vision
Related work: physical models
•Accurate, in principle •Too complex to work with (but see [Liu, Hertzmann, Popović 2005]) •Computationally expensive
Related work: motion graphs
Input: raw motion capture “Motion graph” (slide from J. Lee)
Approach: statistical models of motions
Learn a PDF over motions, and synthesize from this PDF [Brand and Hertzmann 1999] What PDF do we use?
Style-Based Inverse Kinematics
with: Keith Grochow, Steve Martin, Zoran Popović
Motivation
Body parameterization
Pose at time t: q t Root pos./orientation (6 DOFs) Joint angles (29 DOFs) Motion X = [q 1 , …, q T ]
Forward kinematics
Pose to 3D positions:
q
t FK
[x i ,y i ,z i ]
t
Problem Statement
Generate a character pose based on a chosen style subject to constraints Degrees of freedom (DOFs)
q
Constraints
Approach
Off-Line Learning Motion Learning Real-time Pose Synthesis Constraints Style Synthesis Pose
Features
y(q) = q orientation(q) velocity(q) [ q 0 q 1 q 2 …… r 0 r 1 r 2 v 0 v 1 v 2 … ]
Goals for the PDF
• Learn PDF from any data • Smooth and descriptive • Minimal parameter tuning • Real-time synthesis
Mixtures-of-Gaussians
GPLVM
Gaussian Process Latent Variable Model [Lawrence 2004] x 2
GP
-1 x ~ N(0,I) y ~ GP(x; y ) 2 y 3 x 1 Latent Space Feature Space
Learning: arg max p(X,
| Y) = arg max p(Y | X,
) p(X)
y 1
Scaled Outputs
Different DOFs have different “importances” Solution: RBF kernel function k(x,x’) k i (x,x’) = k(x,x’)/w i 2 Equivalently: learn x Wy where W = diag(w 1 , w 2 , … w D )
Precision in Latent Space
2 (x)
x 2
SGPLVM Objective Function C y x f(x
;
θ)
y 2 y 3 x 1 L IK (
x, y
;
θ
)
W θ
(
y
2
f(x
;
θ)
) 2 (
x
;
θ
) 2
D
ln 2 (
x
;
θ
) 2 y 1
Baseball Pitch
Track Start
Jump Shot
Style interpolation
Given two styles 1 and 2 , can we “interpolate” them?
p
1 (
y
) exp(
L IK
(
y
;
θ 1
))
p
2 (
y
) exp(
L IK
(
y
;
θ 2
))
Approach: interpolate in log-domain
Style interpolation
p
1 (
y
) exp(
L IK
(
y
;
θ 1
)) (1-s)
p
2 (
y
) exp(
L IK
(
y
;
θ
2 )) s ( 1 s )
p
1 (
y
) s
p
2 (
y
)
Style interpolation in log space
exp(
L IK
(
y
;
θ 1
)) (1-s) exp(
L IK
(
y
;
θ 1
)) s exp( (( 1 s )
L
(
y
;
θ
1 ) s
L
(
y
;
θ
2 ))
Interactive Posing
Interactive Posing
Interactive Posing
Multiple motion style
Realtime Motion Capture
Style Interpolation
Trajectory Keyframing
Posing from an Image
Modeling motion
GPLVM doesn’t model motions • Velocity features are a hack How do we model and learn dynamics ?
Gaussian Process Dynamical Models with: David Fleet, Jack Wang
Dynamical models x t x t+1
Dynamical models Hidden Markov Model (HMM) Linear Dynamical Systems (LDS) [van Overschee et al ‘94; Doretto et al ‘01] Switching LDS [Ghahramani and Hinton ’98; Pavlovic et al ‘00; Li et al ‘02] Nonlinear Dynamical Systems [e.g., Ghahramani and Roweis ‘00]
Gaussian Process Dynamical Model (GPDM)
Latent dynamical model : latent dynamics pose reconstruction Assume IID Gaussian noise, and with Gaussian priors on and Marginalize out , and then optimize the latent positions to simultaneously minimize pose reconstruction error and (dynamic) prediction error on training data .
Dynamics
The latent dynamic process on has a similar form: where is a kernel matrix defined by kernel function with hyperparameters
Markov Property
Subspace dynamical model : Remark: Conditioned on , the dynamical model is 1 st -order Markov, but the marginalization introduces longer temporal dependence.
Learning
GPDM posterior: training motions latent trajectories hyperparameters reconstruction likelihood dynamics likelihood priors To estimate the latent coordinates & kernel parameters we minimize with respect to and .
Motion Capture Data
~2.5 gait cycles (157 frames) Learned latent coordinates (1st-order prediction, RBF kernel) 56 joint angles + 3 global translational velocity + 3 global orientation from CMU motion capture database
3D GPLVM Latent Coordinates
large “jumps’ in latent space
Reconstruction Variance
Volume visualization of .
(1 st -order prediction, RBF kernel)
Motion Simulation
initial state Random trajectories from MCMC (~1 gait cycle, 60 steps) Animation of mean motion (200 step sequence)
Simulation: 1 st -Order Mean Prediction
Red: 200 steps of mean prediction Green: 60-step MCMC mean Animation
Missing Data
50 of 147 frames dropped (almost a full gait cycle) spline interpolation
Missing Data: RBF Dynamics
Determining hyperparameters Data: six distinct walkers GPDM Neil’s parameters MCEM
Where do we go from here?
Let’s look at some limitations of the model 60 Hz 120 Hz
What do we want?
Phase Variation x 2 x 1 A walk cycle
Branching motions Walk Run
Stylistic variation
Current work: manifold GPs Latent space (x) Data space (y)
Summary GPLVM and GPDM provide priors from small data sets Dependence on initialization, hyperpriors, latent dimensionality Open problems modeling data topology and stylistic variation