slides - Yanran (Joyce) Wang

Transcript slides - Yanran (Joyce) Wang

Beauty is Here!
Evaluating Aesthetics in Videos Using
Multimodal Features and Free Training Data
Yanran Wang, Qi Dai, Rui Feng, Yu-Gang Jiang
School of Computer Science, Fudan University, Shanghai, China
ACM MM, Barcelona, Catalunya, Spain, 2013
Overview
Task:
• Design a system to automatically identify
aesthetically more appealing videos
Contribution:
• Propose to use free training data
• Use and evaluate various kinds of features
Result：
• Attain a Spearman‘s rank correlation
coefficient of 0.41 on the NHK Dataset
Free Training Data
• Construct two annotation-free training datasets by assuming
images/videos on certain websites are mostly beautiful
DPChallenge
+
images
Flickr
videos
Dutch
documentary
videos
+
-
Free Training Data
• The first training set
– Using images from DPChallenge as positive
samples,
– and the Dutch documentary videos frames as
negative samples
• The second training set
– Using videos from Flickr as positive samples,
– and the Dutch documentary videos as negative
samples
Multimodal Features
Color
LBP
SIFT
HOG
Classemes
[ECCV’10]
Traditional
Visual
Features
Mid-level
Semantic
Attributes
Style
Descriptor
Video
Motion
Feature
Dense Trajectory
[CVPR’11]
Framework
Feature
Extraction
Input Videos
Image Low-Level
Features
(Color, LBP, SIFT,
HOG)
Mid-Level Semantic
Attributes
(Classemes)
Video Motion
Feature
(Dense Trajectory)
Ranking
List
SVM
Models
(Image
Training
Data)
SVM
Models
(Video
Training
Data)
…
Style Descriptor
Classifiers
Result
• Using training data from Flickr & Dutch Documentary videos
Spearman's rank correlation
• Evaluated on a subset labeled by ourselves
The best
single
feature
Dense Trajectory which is very powerful
in human action recognition, performs
poorly, indicating that motion is less
related to beauty
Result
• Using training data from DPChallenge & Dutch Documentary
images/frames
Spearman's rank correlation
• Evaluated on a subset labeled by ourselves
The best
single feature
0.41
Image-based training is more
suitable on NHK dataset, because
most NHK videos focus on scenes.
The best
result
0.43
Result
• Official evaluation results from NHK, on the entire test set
• We submitted 5 runs
• Evaluated on NHK’s official labels, which are not publicly available
Image training
data
Video training
data
Image+Video
training data
0.41
0.03
0.39
Color+Classemes+ 0.37
SIFT
0.19
---
Color+Classemes
• Observations
• Image training data is more effective, similar to observations on the
small subset
• Color and Classemes are complementary, SIFT is not
• NOTE: These submitted runs were selected before annotating the
subset, which was done later to provide more insights in the paper!
Demo
A collection of clips from the top 10 videos identified by our system
Thank you!