下載/瀏覽

Transcript 下載/瀏覽

Action and Gait Recognition From
Recovered 3-D Human Joints
IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—
PART B: CYBERNETICS,VOL. 40, NO. 4, AUGUST 2010
Junxia Gu, Member, IEEE, Xiaoqing Ding, Senior Member, IEEE,
Shengjin Wang, Member, IEEE, and Youshou Wu
Adviser：Ming-Yuan Shieh
Student：shun-te chuang
ppt 製作：100%
Outline







ABSTRACT
INTRODUCTION
PREVIOUS WORK
FULL-BODY TRACKING METHOD
FOR POSE RECOVERY
CLASSIFICATION
EXPERIMENTAL RESULTS AND ANALYSIS
CONCLUSION
Abstract

A common viewpoint-free framework that fuses pose
recovery and classification for action and gait
recognition is presented in this paper.

First, a markerless pose recovery method is adopted
to automatically capture the 3-D human joint and
pose parameter sequences from volume data.

Second, multiple configuration features (combination
of joints) and movement features (position,
orientation, and height of the body) are extracted
from the recovered 3-D human joint and pose
parameter sequences.
Abstract

A hidden Markov model (HMM) and an exemplarbased HMM are then used to model the movement
features and configuration features, respectively.

Finally, actions are classified by a hierarchical
classifier that fuses the movement features and the
configuration features, and persons are recognized
from their gait sequences with the configuration
features.
INTRODUCTION

VIDEO-BASED study of human motion has been
receiving increased attention in the past decades.

This has been motivated by the desire for application
of intelligent video surveillance and human–
computer interaction.

With increased awareness in security issues, motion
analysis is becoming increasingly important in
surveillance systems.
INTRODUCTION

Action recognition is a new requirement for understanding
what the person is doing.

Current intelligent surveillance systems are in urgent need
of noninvasive and viewpoint-free research on motion
analysis.

This paper focuses on the movement of main body
segments (arms, legs, and torso). A human gait is extracted
from a “walk” action.
INTRODUCTION

In this paper, a vision-based markerless pose recovery
approach is proposed to extract 3-D human joints.

Human joint sequence is one of the most effective and
discriminative representations of human motion.

It contains much information, including position,
orientation, and joint position.

The information is categorized into two types:
movement features and configuration features.
INTRODUCTION

The changes of position, orientation, and height of the body,
which describe the global movement of the subject, are
defined as movement features.

The sequences of human joint positions, which describe the
change of relative configuration of body segments, are
defined as configuration features.

A hidden Markov model (HMM) and an exemplar-based
HMM (EHMM) are employed to characterize the
movement and configuration features, respectively.

Both the HMM and the EHMM have been used to
recognize an action and a gait .
INTRODUCTION
Fig. 1. Flowchart of the video-based motion recognition.
PREVIOUS WORK
1.

2.

Appearance-Based Methods：
Appearance-based approaches are widely used in
action and gait representation. They directly represent
human motion using image information, such as a
silhouette, an edge, and an optical flow.
Human Model-Based Methods：
Human model-based approaches represent an action or
a gait with body segments, joint positions, or pose
parameters.
PREVIOUS WORK

This paper combined stochastic search with a gradient
descent for local pose refinement to recover complex
whole body motion.

The initialization of the model was automatic, with an
initialization pose standing upright with his/her arms
and legs spread in the “Da Vinci” pose.

The tracking speed was below 1 s per frame. In this
paper, an adaptive particle filter method is proposed for
pose recovery.
PREVIOUS WORK

First, the whole body of a subject in each frame is
segmented into several body segments.

A particle filter with an adaptive particle number is
then used to track each body segment.

This method decomposes the search space and reduces
the computational complexity.
FULL-BODY TRACKING METHOD
FOR POSE RECOVERY

Human Model

Human Model-Based Full-Body Tracking
Human Model
Human Model-Based Full-Body Tracking
CLASSIFICATION

A. HMM and EHMM Learning

B. Classifier for Gait Recognition

C. Classifier for Action Recognition
A. HMM and EHMM Learning
The EHMM is different from the HMM in the definition
of observation densities. For the HMM, the general
representation of observation densities is the Gaussian
mixture model (GMM) of the following form:
In the EHMM, the definition of the observation probability
is as follows:
B. Classifier for Gait Recognition
For each person c ∈ {1, . . . , Cp} in the database, we learn an EHMM gait
model λ(c) Sgait with features Sgait and an EHMM gait model λ(c) Lgait
with features Lgait. A testing gait sequence Y = {y0, . . . , yK} is classified with
the following MAP estimation:
C. Classifier for Action Recognition
Testing action sequence Y = {y0, . . . , yK}, with features SY , RY , PY , OY , and HY , is
classified with a two-layer classifier fused multiple features. The first layer is a
weighted-MAP classifier that fuses three movement features and the
configuration feature of the whole body as
If the decision c1 of the first layer belongs to single-arm actions, sequence Y will
be recognized by the second MAP classifier with arm features as
EXPERIMENTAL RESULTS AND
ANALYSIS

There are 11 actions, and each subject plays each action
three times. All samples contain approximately 29 100
frames in total.

These actions include “check watch,” “cross arm,” “scratch
head,” “sit down,” “get up,” “turn around,” “walk in a
circle,” “wave hand,” “punch,” “kick,” and “pick up,” .To
demonstrate view invariance, subjects freely change their
orientations. The acquisition is achieved using five standard
firewire cameras.

The image resolution is 390 × 291 pixels, and the volume
of interest is divided into 64 × 64 × 64 voxels.
EXPERIMENTAL RESULTS AND
ANALYSIS
Fig. 8. Samples of images and 3-D volume data.
EXPERIMENTAL RESULTS AND
ANALYSIS
Fig. 9. Annotation of the human joint. (a) Annotation of the
knee joint. (b) Results of annotation.
EXPERIMENTAL RESULTS AND
ANALYSIS
Fig. 10. Average error of the joint positions.
Fig. 11. Average error of individual joint positions.
EXPERIMENTAL RESULTS AND
ANALYSIS
Fig. 12. Results of the pose recovery.
EXPERIMENTAL RESULTS AND
ANALYSIS
Fig. 13. Selected exemplars and recognition rate versus the number
of exemplars.
EXPERIMENTAL RESULTS AND
ANALYSIS
Fig. 14. Comparison of convergence performance between the EHMM and
the HMM.
EXPERIMENTAL RESULTS AND
ANALYSIS
Fig. 15. Average recognition rates of actions.
CONCLUSION

The main contribution of this paper is the fusion of pose recovery and
motion recognition.

Future work plans include automatically segmenting temporal sequence,
reducing computational complexity, analyzing actions that are more
complex, and recognizing 2-D actions based on the 3-D EHMM.

The free-viewpoint 3-D human joint sequence contains a significant
amount of information for motion analysis.

In addition to representing the single actions used in this paper, it can
be used for more applications, such as analysis of complex actions.
CONCLUSION

High DOF and huge 3-D points make the human
model-based pose recovery method very time
consuming.

To solve this problem, parallel computing, code
optimization, and a GPU can be used to lessen time
cost. It is difficult to obtain robust volume data of
subjects in surveillance and content analysis
scenarios at present.

Actions/gaits are affected by different factors,
including clothing, age, and gender. In the future,
performance with these factors will be analyzed in
larger databases.

下載/瀏覽

Transcript 下載/瀏覽

Directory