下載/瀏覽Download

Download Report

Transcript 下載/瀏覽Download

Broadcast Court-Net Sports Video
Analysis Using Fast 3-D Camera
Modeling
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO
TECHNOLOGY, VOL. 18, NO. 11, NOVEMBER 2008
Adviser: Ten-Chuan Hsiao
Date:2010/06/08
Speaker: Chin He Hsu 1
Outline
I. Introduction
II. Over of the proposed sports-video analysis
system
III. 3-D camera calibration
IV. Pixel- and object-level analysis
V. Scene-level analysis
VI. Experimental results
VII. Conclusions
2
I. Introduction
• consumer videos
– video indexing
– augmented reality presentation of sports
– content-based sports video compression
3
past research
• past research can be roughly divided into four
stages
–
–
–
–
pixel and/or object-level analysis
extract highlights
event-based analysis
Increasing interest for constructing a generic
framework
4
two problems
• still two problems remaining unsolved
– required system should be adapted to more
sports games
– Provide a broad range of different analysis
results
5
Our system
• Our system is original in three aspects
– automatic 3-D camera calibration
– several novel pixel- and object-level video
processing techniques
– build the entire framework upon the 3-D
camera calibration
6
II. Over of the proposed sportsvideo analysis system
• sports-video analysis system modules
–
–
–
–
–
Playing-frame detection
Court-net detection and camera calibration
Player segmentation and tracking in image domain
Visual features extraction in the 3-D domain
Scene-level content analysis
7
Architecture of the complete system
8
III. 3-D camera calibration
• consider the ground-plane homography
9
Camera Calibration Introduction
p '  Mp
x
 u   m11 m12 m13 m14   
 y 
  
v

m
m
m
m
21
22
23
24 
  


z
 w  m m m m  
   31 32 33 34 
1 
10
homography-matrix
 m11 m12 m14 


H   m21 m22 m24 
m m m 
 31 32 34 
11
Computing the Ground-Plane Homography
• the advantage that lines are easy to detect
simply by their color
• they can still be extracted even with partial
occlusion
12
four stages
• The complete algorithm consists of
four stages
– Line-Pixel Detection (p16)
– Line-Parameter Estimation (p17 p18)
– Court Model Fitting (p19 p20)
– Model Tracking (p21)
13
Line-Pixel Detection
• Detection of white court-line pixels is
carried out in two steps
– luminance threshold and local structure
– initialization
14
Line-Parameter Estimation
s( g ) 
  max(  d ( g , x ', y '), 0)
( x ', y ')
d ( g , x, y ) : distance of the pixel x,y from line g
 : is the set of court-line pixels
 : line width
15
Lines after parameter refinement
16
Court Model Fitting
17
model matching error E
E

( p , q )
min(|| pˆ ', Hp ||2  || qˆ ', Hq ||2 , em )
 : collection of line segments
(p,q):end-points
ˆ ˆ
(p',q'):closest
line segment in the image
em : error for a line segment is bounded by a maximum value
18
Model Tracking
• Let H t be the camera parameters for frame .
• If we know the camera parameters for frames t-1and t
– Transformation between them is
• we can predict
H t11 H t
1
ˆ
Ht 1  Ht Ht 1Ht
19
Upgrading the Homography to the Full
Camera Matrix
• implicitly assumes that the camera roll-angle is zero
• we can neglect the in the z-direction
 m11 m12 0 m14 


M   m21 m22 m23 m24 
m m 0 m 
34 
 31 32
c0  (0, 0, 0,1)T
cn  (0, 0, h,1)
T
20
net plane
c0 '  Mc0  (m14 , m24 , m34 )
T
cn '  Mcn  (m14 , m24 , m34 )
T
m24  h  m23
m24
y0 
and yn 
m34
m34
m23
yn  y0  h 
m34
m34
m23  ( yn  y0 )
h
21
IV. Pixel- and object-level analysis
• obtain the real-world positions of the player
– detection in the pixel-level
– trajectory computation in the object-level
22
Playing-Frame Detection
• playing-frame detection would be to just use
the output of the court-detection algorithm
• Similar to the court-detection step, our
playing-frame detection uses only the white
pixels of the input frames
23
Our idea
• to count the number of white pixels within the
court-area during the court-tracking timeperiod
24
detail
•
•
•
•
let “A” be the real-world area of the court
A’=HA
A’ in frame t is denoted as F(t)
if ( F (t )  F  2 F )
we assume that a court is
again visible in the image and the court-detection
algorithm is executed.
 F : mean number of white pixels
 F : variance of the number of white pixels
25
Moving-Player Segmentation
• subtraction of consecutive frames
• change-detection algorithms
26
we contribute a novel method
• build a background using our court model
• playing-frame mainly contains three parts
– playing-field inside the court lines
– the area surrounding the court
– area of the audience
27
separately construct background models
• has two advantages
– the background picture cannot be influenced by
any camera motions
– only color and spatial information are considered
28
player segmentation algorithm
• formed by three steps
– Player Segmentation With a Synthetic Background
– EM-Based Background Subtraction
– Player Body Locating
29
Player Segmentation With a Synthetic
Background
30
use the RGB color space for modeling the
background
31
EM-Based Background Subtraction
32
33
34
Multiple Players Tracking and
Occlusion Handling
• The frequently occurring occlusion in our
application is caused by players in the
same team.
• Our algorithm is also base on two steps
– split
– verify
35
most court-net games
• the occlusion caused by players from the same
team is associated with two properties
– peak in the vicinity of the head
– player usually moves along the direction
36
Blob Splitting
37
Player Tracking
38
Smoothing Player Motion in the 3-D
Domain
• player’s position in the 3-D domain is
obtained with high accuracy
39
V. Scene-level analysis
• generation of more concise and semantically
40
Behavior Representation
• some existing video analysis systems employ
two common visual features:
– position and speed of the player
• propose two novel features for event
identification
– speed change and temporal order of the event
41
f=[PR , S I , SC , TR ]
PR : ( PR1 , PR 2 )
PR1 : relative location between two players
PR 2 : horizontal relative relation of two players
S I : average speed of two players
1, if both players are accelerating

SC  1, if bothplayers are decelerating
0, otherwise

TR  n, (3n  2) s ~ (3n) s frames
42
Event Classification
• we intend to model key events of each sports
game
– Service in a single game
– Both-net in a double game
43
VI. Experimental results
• evaluate the performance of the proposed
algorithms
44
3-D Camera Modeling Technique
45
TABLE I
46
Results for Pixel and Object-Level
Algorithms
• present the results of our playing-frame
detection algorithm, player segmentation and
player tracking algorithm
47
Playing-Frame Detection
48
Player Segmentation and Tracking in
Image Domain
49
Results for Scene-Level Analysis
Algorithm
50
System Efficiency
51
VII. Conclusions
• 3-D camera modeling
• enables us to establish a relation between the
image domain and the real-world domain
• feature-extraction
52
Thanks…
53