下載/瀏覽Download

Download Report

Transcript 下載/瀏覽Download

Journal of Information Science and Engineering
信息科學與工程學報26卷2期(2010/03)
余執彰(Chih-Chang Yu);陳映濃(Ying-Nong Chen);鄭旭詠(Hsu-Yung Cheng);
黃正能(Jenq-Neng Hwang);范國清(Kuo-Chin Fan)
指導教授:陳炳彰教授
報告人:李冠毅
M99F0206

Unlike many other real world moving objects, human is in
arbitrary shape and can do a lot of complicated motions,
which make the task more challenging.

不同於許多其他現實世界的對象,人具有各式各樣的形態,
並且能做很多複雜的行動,使任務更加富挑戰性。

An automated 2D human body decomposition system is very
useful in this type of application as well as video surveillance,
video understanding and analysis, 3D human model
reconstructionand medical rehabilitation, etc.

一個自動化的2D人體提取系統不論是在視頻監控、視頻
的認識和分析、3D人物的樣式重建和醫療修復等等都是
非常有用的。

In order to describe the human behavior or motion in
detail,body parts extraction becomes one of the most critical
steps. Usually a human body can be approximately
represented by several parts such as head, torso, and limbs
along with some joint points like shoulders, elbows, hips and
knees.

為了詳細描述人類行為或行動,身體局部提取成為其中一
個關鍵措施。通常人體能由幾個部分大約代表,例如頭、
軀幹和肢體與像是肩膀、手肘、臀部和膝蓋的一些點連結
在一起。

who only focus on extracting body parts in a single image,
Some researchers pay attention to integrate multi-frame
information to get human body parts. Both approaches have
a basic assumption that in the walking motion, a human
torso has less movement than limbs.

有些人專注於提取身體的某個部位在單一的圖像上,有些
研究員專注於結合複數的畫面訊息去得到人體的部位。兩
個方法對於行走動作都有一個基本的假定”人的軀幹比起
肢體有較少的運動”。

However, in some human actions such as falling, jumping or
sitting down, the orientation of the torso is not fixed. That is,
using the silhouette or centroid to align the silhouette may
not be adequate.

然而,在某些人類行為例如落下、跳躍或坐下,軀幹的取
向不是固定的。即使用剪影或利用質量中心來校正剪影都
可能是不夠充分的。

In this paper, we developed an innovative system for
automated human body tracking and modeling based on a
monocular camera. The head is extracted by analyzing
negative minimum curvature (NMC) points on a
parameterized silhouette and tracked by using the Kalman
filter.

在本文中,我們開發了一套創新的以單眼相機為基礎的人
體追蹤及造型系統。頭部藉由以最小負曲率(NMC)點分析
一個被參數化的剪影取出並利用卡爾曼濾波器進行追蹤。

We proposed a torso extraction mechanism by integrating
multiframe information and the output of Poisson equation.
After getting the position of torso, the rest joints can be
obtained by using the proposed connectivity energy and
analyzing the contour extremities as well.

我們提出了一個軀幹提取機制是通過合併複數影像信息並
利用泊松等式來輸出。在得到軀幹的位置以後,支撐的點
可以藉由被推薦的連結能量和分析輪廓的末端來獲得。

Fig. 2 shows the overall system framework on our automatic
human body parts modeling/ recognition system which
starts with moving object detection, head and torso
acquisition, hand/foot classification for limbs modeling and
behavior recognition by using discrete Hidden Markov Model.

圖2顯示了我們的自動人體部位模擬 /辨識系統它開始與運
動目標檢測時整個系統的框架,頭和軀幹取得、手/腳分
類四肢建模和行為識別使用離散隱馬爾科夫模型。

The 2D human body model used in our work is composed of
10 body parts, 6 joint points and 5 endpoints, as illustrated in
Fig. 3. In our experiments, only one shoulder point and one
hip point are detected because in most sideview shot videos,
the shoulder and hip appear to degenerate into one single
point.

2D人體模型在我們的做法是由10個身體部位,6個關節點
和5個端點,如圖3所示。在我們的實驗中,只有一肩膀的
點和一臀部的點被發現,因為在大多數側視拍攝的影片,
肩膀和臀部出現退化成一個單一的點。

Traditionally the video object can be extracted by using
background subtraction technique. However, in real world
applications, it is inevitable that video objects are usually
accompanied with shadows. In this paper, we employed a
robust background subtraction and shadow removal
algorithm which was proposed in.

傳統可以藉由使用背景減法技術將視頻對象提取出來。但
是在現實世界中的應用,不可避免的通常視頻對象會伴隨
著陰影。在本文中,我們採用了強大的背景減法和陰影去
除算法來進行研究。

Considering all human body parts the head is the easiest one
to locate and segment. In this paper we use a relaxed version
of the neck rule on head acquisition which only requires one
contour point on the inscribed circle to be an NMC point.

考慮到所有人體部位中頭部是最容易找到和分割。在本文
中,我們在頭部的取得上使用了一種較為隨意的頸部規則
只需要一個輪廓點的內切圓是一個 NMC點。

After VO extraction, the human silhouette is obtained and
the NMC points can be obtained by computing the second
order partial derivatives on the extracted contour boundary.

在經過VO提取後可以得到人的輪廓,而NMC點可以透過
利用電腦計算所提取的輪廓邊界的二階偏導數來獲得。

The first case is the entire head region is successfully
extracted.The second case is that the head region is not
extracted because none of NMC points can generate a cut to
separate head and torso.The third case is that head region is
divided into two or more sub-regions.In order to solve the
above problems,we adopt the tracking technique via Kalman
filter in our work.

第一種情況是整個頭部區域提取成功。第二種情況是不提
取頭部區域,因為沒有一個NMC的切點可以分割出一個
單獨的頭和軀幹。第三種情況是頭部區域被劃分成兩個或
更多的子區域。為了解決上述問題,在我們的工作中採取
了卡爾曼濾波技術來進行跟蹤。

For the initialization of Kalman filter, we need a wellsegmented head region as the Kalman observation. However,
we cannot assume that we can always obtain a good head
region in the first frame of the sequence.

對於初始化卡爾曼濾波,我們需要一個良好的頭部區域分
割作為卡爾曼觀察。但是,我們不能假設我們能取得一個
好的頭部區域在一連串畫面中的第一幀。

Therefore, we do not employ Kalman filter until a wellsegmented head region occurs .Once the head region in
frame i is detected, the head region will be tracked by using
two Kalman filters. One of them tracks the head region from
frame i to the last frame and the other tracks the head region
from frame i to the first frame.

因此,我們在一個良好的頭部區域分割發生前不持續利用
卡爾曼濾波。一旦頭部區域在frame i檢測到,頭部區域將
使用兩個卡爾曼濾波器跟蹤。其中一組跟蹤頭部區域從
frame i到最後一幀,其他跟蹤頭部區域則是從frame i到第
一幀。

In order to accurately extract torso part, we propose a torso
estimation method which can estimate both the orientation
and the size of the torso.

為了準確地提取軀幹的一部分,我們提出了一個軀幹估計
方法,可以同時估算軀幹的方向和大小。

Since the torso region should and only could be nearby the
head region. Thus we only consider the pixels inside a
specified circle, whose centroid is the neck point with radius r.

由於軀幹部位應該只能在頭部區域的附近。因此,我們只
考慮像素在一個指定的圓內,其是以頸部座標為質心的半
徑r。

In our work r is defined as the Euclidean distance from the
neck point to the centroid of VO. Afterwards we employ the
Poisson solver on this reduced VO and get the estimated
torso region.

在工作中以歐氏距離定義r為VO中頸部的座標到質心。接
下來我們在這個減少了的VO中利用泊松求解器得到估計
的軀幹部位。

the extracted contour extremities are usually the end of the
limbs. However, we cannot tell which extremity is hand or
foot directly. Therefore we propose a nearest neighbor
tracking mechanism to classify these extremities.

提取的輪廓末端通常是四肢的末端。但是,我們不能直接
分辨出提出的肢體哪個是手或腳。所以我們提出了一個最
近的鄰近物跟蹤機制來分類這些四肢。

Human limbs can be defined as two disjoint sets with two
joint points in-between and one extremity for each. That is,
hand-elbow-shoulder and foot-knee-hip.

人類的四肢可以被定義為兩個不相交集帶有兩個關節點在
中間,每個分別有一個末端。也就是說,手-肘-肩和腳-膝
-髖關節。

due to the human kinematic constraints, the elbow (the
black point) has a limited movement on the red line L⊥ which
intersects with L at the midpoint of shoulder to hand.

於人類運動學上的限制,肘(黑點)有一個運動上的限制
在紅色直線 L⊥與 L相交在肩膀到手的中點上。


Therefore, by searching all possible points on L⊥, the elbow
point can be obtained by minimizing the equation below:
因此,在 L⊥尋找所有可能的點,肘部的點可以得到盡量
減少的方程式如下:

Etotal = argmin{E(hand, elbow) + E(elbow, shoulder)}, E ≠ 0

where E is the connectivity energy function. Same operation is
performed on the hip-kneefoot set so that the complete
human body model can be established.
其中E為能量函數的連接。相同的操作也可以在髖關節-膝
蓋-足的設置上執行,以便建立完整的人體模型。


In our experiments, a set of human actions taken in both
indoor and outdoor environments are tested. Six different
types of action performed by eight people including both
male and female are modeled and analyzed in our
experiments. These six types of motions are: walking, sitting
down, standing up, running, jumping and falling down.

在我們的實驗,一組人採取的行動分別在室內和室外環境
進行了測試。我們的實驗共有八個包含了男性和女性的人
進行了六種不同類型的行動進行建模和分析。這六種類型
的運動是:走路,坐下,站起來,跑,跳躍和跌倒。

The videos were taken at three different places at different
time to ensure the robustness of the whole system. The
resolutions of the videos are 320 × 240 (indoor) and 360 × 240
(outdoor) respectively.

這些影片分別在三個不同的地方採取不同時間,以確保整
個系統的穩健性。這些決定的影片分別是320×240(室內)
和360×240(室外)。


In the first experiment, we evaluate the performance of our
human body modeling mechanism.
第一個實驗中,我們評估我們人體建模機制的性能。
提出的頭部區域識別算法的表現結果
環境
總幀數
對幀進行成功的跟蹤 成功率(%)
室內
3287
3122
94.98
室外
2061
1795
87.09
四肢定量模擬結果的準確性和召回率
運動
走路
坐下
站起
跌倒
跑步
跳躍
分類
檢測
觀察
正確的
精確度
召回率
手
328
339
328
100%
96.8%
腳
355
379
355
100%
93.7%
手
420
304
272
64.8%
89.5%
腳
265
265
265
100%
100%
手
329
216
215
65.3%
99.5%
腳
216
216
216
100%
100%
手
299
269
205
68.6%
76.2%
腳
390
420
380
97.4%
90.5%
手
35
47
29
82.9%
61.7%
腳
412
432
407
98.8%
94.2%
手
160
78
68
42.5%
87.2%
腳
358
392
352
98.3%
89.8%

In this paper, we propose an automated 2D human body
modeling system. The whole human body is modeled by 10
body parts, 6 joint points and 5 endpoints, which can be
effectively identified and tracked.

在本文中,我們提出了一個自動化的2D人體模型系統。
整個人體是由10個身體部位為藍本,6個關節點和5個端點,
它可以有效地識別和跟蹤。

The system has shown promising capability on modeling a
human under a single video camera.

該系統已顯示出在單一的攝像機上的人體建模能力是大有
可為的。

Since our system depends highly on the silhouette of the
extracted VO, human is required to carry no object in order
to avoid unsatisfactory silhouette extraction.

由於我們的系統依賴於高度剪影提取的VO,因此人是不
能攜帶任何物品以避免不理想的剪影提取。

人物從進入畫面到建模成功所需耗費的時間?

複數人物進入到畫面時建模及追蹤的成功率?

光線對於人體建模的影響?