slides - Machine Learning

Download Report

Transcript slides - Machine Learning

Learning Human Pose and Motion Models for Animation

Aaron Hertzmann University of Toronto

Animation is maturing … … but it’s still hard to create

Keyframe animation

Keyframe animation q

3 q(t) q(t) http://www.cadtutor.net/dd/bryce/anim/anim.html

Characters are very complex

Woody: - 200 facial controls - 700 controls in his body http://www.pbs.org/wgbh/nova/specialfx2/mcqueen.html

Motion capture

[Images from NYU and UW]

Motion capture

Mocap is not a panacea

Goal: model human motion

What motions are likely?

Applications: • Computer animation • Computer vision

Related work: physical models

•Accurate, in principle •Too complex to work with (but see [Liu, Hertzmann, Popović 2005]) •Computationally expensive

Related work: motion graphs

Input: raw motion capture “Motion graph” (slide from J. Lee)

Approach: statistical models of motions

Learn a PDF over motions, and synthesize from this PDF [Brand and Hertzmann 1999] What PDF do we use?

Style-Based Inverse Kinematics

with: Keith Grochow, Steve Martin, Zoran Popović

Motivation

Body parameterization

Pose at time t: q t Root pos./orientation (6 DOFs) Joint angles (29 DOFs) Motion X = [q 1 , …, q T ]

Forward kinematics

Pose to 3D positions:

t FK

[x i ,y i ,z i ]

Problem Statement

Generate a character pose based on a chosen style subject to constraints Degrees of freedom (DOFs)

Constraints

Approach

Off-Line Learning Motion Learning Real-time Pose Synthesis Constraints Style Synthesis Pose

Features

y(q) = q orientation(q) velocity(q) [ q 0 q 1 q 2 …… r 0 r 1 r 2 v 0 v 1 v 2 … ]

Goals for the PDF

• Learn PDF from any data • Smooth and descriptive • Minimal parameter tuning • Real-time synthesis

Mixtures-of-Gaussians

GPLVM

Gaussian Process Latent Variable Model [Lawrence 2004] x 2

   -1 x ~ N(0,I) y ~ GP(x;  y ) 2 y 3 x 1 Latent Space Feature Space

Learning: arg max p(X,



| Y) = arg max p(Y | X,



) p(X)

y 1

Scaled Outputs

Different DOFs have different “importances” Solution: RBF kernel function k(x,x’) k i (x,x’) = k(x,x’)/w i 2 Equivalently: learn x  Wy where W = diag(w 1 , w 2 , … w D )

Precision in Latent Space

 2 (x)

x 2

SGPLVM Objective Function C y x f(x

;

θ)

y 2 y 3 x 1 L IK (

x, y

;

) 

W θ

(

2  

f(x

;

θ)

) 2 (

;

) 2 

ln  2 (

;

) 2 y 1

Baseball Pitch

Track Start

Jump Shot

Style interpolation

Given two styles  1 and  2 , can we “interpolate” them?

1 (

)  exp( 

L IK

(

;

θ 1

))

2 (

)  exp( 

L IK

(

;

θ 2

))

Approach: interpolate in log-domain

Style interpolation

1 (

)  exp( 

L IK

(

;

θ 1

)) (1-s)

2 (

)  exp( 

L IK

(

;

2 )) s ( 1  s )

1 (

)  s

2 (

)

Style interpolation in log space

exp( 

L IK

(

;

θ 1

)) (1-s) exp( 

L IK

(

;

θ 1

)) s exp(  (( 1  s )

(

;

1 )  s

(

;

2 ))

Interactive Posing

Multiple motion style

Realtime Motion Capture

Style Interpolation

Trajectory Keyframing

Posing from an Image

Modeling motion

GPLVM doesn’t model motions • Velocity features are a hack How do we model and learn dynamics ?

Gaussian Process Dynamical Models with: David Fleet, Jack Wang

Dynamical models x t x t+1

Dynamical models Hidden Markov Model (HMM) Linear Dynamical Systems (LDS) [van Overschee et al ‘94; Doretto et al ‘01] Switching LDS [Ghahramani and Hinton ’98; Pavlovic et al ‘00; Li et al ‘02] Nonlinear Dynamical Systems [e.g., Ghahramani and Roweis ‘00]

Gaussian Process Dynamical Model (GPDM)

Latent dynamical model : latent dynamics pose reconstruction Assume IID Gaussian noise, and with Gaussian priors on and Marginalize out , and then optimize the latent positions to simultaneously minimize pose reconstruction error and (dynamic) prediction error on training data .

Dynamics

The latent dynamic process on has a similar form: where is a kernel matrix defined by kernel function with hyperparameters

Markov Property

Subspace dynamical model : Remark: Conditioned on , the dynamical model is 1 st -order Markov, but the marginalization introduces longer temporal dependence.

Learning

GPDM posterior: training motions latent trajectories hyperparameters reconstruction likelihood dynamics likelihood priors To estimate the latent coordinates & kernel parameters we minimize with respect to and .

Motion Capture Data

~2.5 gait cycles (157 frames) Learned latent coordinates (1st-order prediction, RBF kernel) 56 joint angles + 3 global translational velocity + 3 global orientation from CMU motion capture database

3D GPLVM Latent Coordinates

large “jumps’ in latent space

Reconstruction Variance

Volume visualization of .

(1 st -order prediction, RBF kernel)

Motion Simulation

initial state Random trajectories from MCMC (~1 gait cycle, 60 steps) Animation of mean motion (200 step sequence)

Simulation: 1 st -Order Mean Prediction

Red: 200 steps of mean prediction Green: 60-step MCMC mean Animation

Missing Data

50 of 147 frames dropped (almost a full gait cycle) spline interpolation

Missing Data: RBF Dynamics

Determining hyperparameters Data: six distinct walkers GPDM Neil’s parameters MCEM

Where do we go from here?

Let’s look at some limitations of the model 60 Hz 120 Hz

What do we want?

Phase Variation x 2 x 1 A walk cycle

Branching motions Walk Run

Stylistic variation

Current work: manifold GPs Latent space (x) Data space (y)

Summary GPLVM and GPDM provide priors from small data sets Dependence on initialization, hyperpriors, latent dimensionality Open problems modeling data topology and stylistic variation