Robust scene text detection with adaptive clustering (基于自适应聚
Download
Report
Transcript Robust scene text detection with adaptive clustering (基于自适应聚
VALSE 2014
Robust Scene Text Detection with
Adaptive Clustering
Xu-Cheng Yin (殷绪成)
PhD
http://prir.ustb.edu.cn/yin
Pattern Recognition and Information Retrieval Lab
Department of Computer Science and Technology
University of Science and Technology Beijing
2014-09
1
Text detection in natural scenes:
Background
Challenges with scene text detection
Complex background
Variations of font and size
Variations of text color
Variations of illumination
Variations of text orientation
http://prir.ustb.edu.cn/yin/
2
Text detection in natural scenes:
Review
Previous text detection technologies
Region-based (Sliding window-based)
K. Kim et al., “Texture-based approach for text detection in images using SVM …”, TPAMI 2003.
X. Chen, and A. Yuille, “Detecting and reading text in natural scenes”, CVPR 2004.
T. Wang, D.J. Wu, A. Coates, and A. Y. Ng, “End-to-end text recognition with CNN”, ICPR 2012.
VERY SLOW
(each pixel, multi-scales)
Connected components-based
B. Epshtein et al., “Detecting text in natural scenes with stroke width transform (SWT)”, CVPR2010.
C. Yao, X. Bai et al., Detecting texts of arbitrary orientations in natural images…, CVPR 2012, TIP 2014.
W. Huang et al., Text localization in natural images with Stroke Feature Transform …, ICCV 2013, ECCV 2014.
C. Yi and Y. Tian, Text string detection from natural scenes with boundary clustering, stroke segmentation,
structure modeling, …, TIP 2011, TIP 2012, CVIU 2013.
Y.-F. Pan, X. Hou and C.-L. Liu, “A hybrid approach to detect and localize texts in natural scene images”, TIP 2011
Frangibility in CC calculation
http://prir.ustb.edu.cn/yin/
3
Text detection in natural scenes:
Review
Recent MSER/ER-based text detection technologies
Maximally Stable Extremal Region (MSER/ER)
Robust to color, size, illumination, resolution
MSER/ER-based detection
A specific category of CC-based methods;
Use MSERs/ERs as character candidates (have become the focus of recent
projects).
L. Neumann and J. Matas, (Realtime) Text localization and recognition in real-world images, ACCV 2010, ICDAR
2011/2013, CVPR 2012, ICCV 2013.
H.I. Koo and D.H. Kim, “Scene text detection via connected component clustering and nontext filtering”, TIP 2013.
C. Shi, C. Wang, B. Xiao, et al., Scene text detection using graph model, MSER, CRF, …, Pattern Recognition Letters
2013, CVPR 2013, ICDAR 2013, TCSVT 2014, PR 2014.
L. Sun, Q. Hou, et al., Robust text detection in natural scene images by Generalized Color enhanced contrasting extremal
region, … ICPR 2012, ICDAR 2013, ICPR 2014.
L. Kang, D. Doermann, et al., Orientation robust text line detection with HOCC…, CVPR 2014.
X.-C. Yin, et al., “Robust text detection in natural scenes,” TPAMI 2014.
http://prir.ustb.edu.cn/yin/
4
Text detection in natural scenes:
Motivation
Main pitfalls for MSER/ER-based text detection methods
Most of the detected character candidates
(MSERs/ERs) correspond to non-characters
(MSER pruning)
Insufficient text candidates construction with time consuming and
error pruning (parameter tuning with rule-based methods)
(Adaptive hierarchical clustering with metric learning)
Text candidate classifier trained on an unbalanced data
(Eliminating most non-text candidates with the character
classifier)
http://prir.ustb.edu.cn/yin/
5
Text detection in natural scenes:
System overview
http://prir.ustb.edu.cn/yin/
6
Text detection in natural scenes:
Highlights
A MSERs pruning algorithm with minimizing regularized
variations is proposed to reduce most of the non-characters
Character candidates are clustered into text candidates by the
adaptive single-link clustering algorithm where distance
weights and threshold are learned simultaneously using a
self-training metric learning algorithm
The posterior probabilities of text candidates corresponding to
non-text are measured using the character classifier and text
candidates with high probabilities for non-text are removed
efficiently
http://prir.ustb.edu.cn/yin/
7
Text detection in natural scenes:
Key technologies
Character candidates extraction with MSER
pruning
Text candidates construction with adaptive
hierarchical clustering and distance metric
learning
Text candidates elimination with the
character classifier
http://prir.ustb.edu.cn/yin/
8
Character Candidates Extraction
http://prir.ustb.edu.cn/yin/
9
Character Candidates Extraction
(Variation regularization)
http://prir.ustb.edu.cn/yin/
10
Text Candidates Construction
Clustering-based text candidates grouping
from character candidates (MSERs)
Clustering:
single-link clustering
(elongated clusters)
Similarity:
weighted distance
Threshold:
threshold for deciding
the number of clusters
http://prir.ustb.edu.cn/yin/
11
Adaptive single-link clustering with
distance metric learning
Feature
space
(similarity)
http://prir.ustb.edu.cn/yin/
12
Adaptive single-link clustering with
distance metric learning
Weighted distance
Clusters
How to select weights and threshold?
Rule-based: time consuming and error-prone
Clustering-based: a separate two-stage learning
style (first weights, then threshold)
Adaptive (single-link) clustering where distance weights and
threshold are learned simultaneously using a self-training
metric learning algorithm.
http://prir.ustb.edu.cn/yin/
13
Adaptive single-link clustering with
distance metric learning
(1) Sample selection
Focus on the hardest part (closest and farthest data)
http://prir.ustb.edu.cn/yin/
14
Adaptive single-link clustering with
distance metric learning
(2) Weight conversion
Original:
Converted: ( weights and threshold learned simultaneously)
http://prir.ustb.edu.cn/yin/
15
Adaptive single-link clustering with
distance metric learning
(3) Model determination
With the logistic regression loss, a discriminative
model is designed by
Distance metric learning:
http://prir.ustb.edu.cn/yin/
16
Adaptive single-link clustering with
distance metric learning
(4) Self-training algorithm
http://prir.ustb.edu.cn/yin/
17
Text Candidates Elimination
Empirical results
Text candidates elimination
In ICDAR 2011 competition training set, only 9% of the text
candidates correspond to true text
Hard to train an effective text classifier using such
unbalanced dataset
Most methods based on rules and heuristics
Our discriminative method
Use a character classifier to estimate the posterior
probabilities of text candidates corresponding to non-text
Remove candidates with high probability for non-text
http://prir.ustb.edu.cn/yin/
18
Text Candidates Elimination
http://prir.ustb.edu.cn/yin/
19
Experiments
On the ICDAR 2011 Robust Reading Competition Set
(Challenge 2: Reading Text in Scene Images) 1,2,3,4
1.
2.
3.
4.
http://robustreading.opendfki.de/wiki/SceneText
Top 4 winners of ICDAR2011: Kim’s, Yi’s, TH-TextLoc System, and Neumann’s
Shi et al.’s (Pattern Recognition Letters, 2013(2))
Neuman and Matas’s (CVPR2012)
http://prir.ustb.edu.cn/yin/
20
Experiments
Speed on ICDAR 2011 data set
Methods
Time (s) per image
Remarks
Our Method
0.43
A Linux laptop with Intel (R)
Core (TM)2 Duo 2.00GHZ CPU
Shi et al.’s
1.5
A PC with Intel (R) Core (TM)2
Duo 2.33GHZ CPU
Neuman and
Matas’s
1.8
A “standard PC”
http://prir.ustb.edu.cn/yin/
21
Experiments (ICDAR 2011 Samples)
Notice the robustness against low contrast, complex background and font variations.
http://prir.ustb.edu.cn/yin/
22
Experiments
On a publicly multilingual (include Chinese
and English) dataset 1,2,3
Scheme III: constructed on ICDAR 2011 training set
Scheme IV: constructed on the multilingual training set
1. http://liama.ia.ac.cn/wiki/projects:pal:member:yfpan
2. Pan et al.’s method (Yifeng Pan, Xinwen Hou, and Cheng-Lin Liu, IEEE TIP 20(3), 2011)
3. Speed of Pan et al.'s method is with a PC with Pentium D 3.4GHz CPU
http://prir.ustb.edu.cn/yin/
23
Experiments (Multilingual Samples)
http://prir.ustb.edu.cn/yin/
24
Demos
Online Demos
http://prir.ustb.edu.cn/TexStar/scene-text-detection/
http://prir.ustb.edu.cn/yin/
25
Demos
APP Demo on Android Mobile Phones,
iPhone and Tablets
http://prir.ustb.edu.cn/yin/
26
Text recognition and retrieval in street
views
http://prir.ustb.edu.cn/yin/
27
ICDAR 2013 Robust Reading
Competition Results
技术获奖(Technology Awards)
在2013年国际文档分析与识别技术竞赛上,我们的创新技术获得本届大赛最受关注的Robust Reading Competition竞
赛“自然场景文本检测”、“网络图片文本检测”、和“网络图片文本提取”三项冠军。其中,“网络图片文本提取”和
“网络图片文本检测”获胜结果性能比第二名分别提高了19.36%和8.37%。
特别的,“自然场景文本检测”竞赛单元自2003年国际文档与识别大会设立项目以来,由于其技术的挑战性和应用的
重要性,先后吸引了来自美国、德国、中国、法国、新加坡、俄罗斯、日本等十多个国家近三十支团队参加,包括了美国加
州大学、美国纽约城市大学、清华大学、中国科学院自动化所、新加坡国立大学等单位的文档分析与识别、模式识别、计算
机视觉及人工智能领域世界顶级研究团队;该项比赛已经成为评价和检验自然场景与图片文本检测与识别领域最新技术研究
进展的最重要国际赛事及标准。今年,我们的创新技术取得了10年来该项竞赛的最好性能,也是中国研究机构首次问鼎该项
冠军。在这次国际文档分析与识别大会上,该创新技术引起了Samsung、Google、Microsoft、Amazon、Motorola、Canon、
Fujitsu等跨国公司研究人员的高度关注。
注:国际文档分析与识别大会(ICDAR)是国际模式识别协会(IAPR)举办的文档分析与识别、模式识别领域世界上
最重要的国际学术会议之一,每两年举办1次,从1991年第1届开始,到今年(2013)已成功举办12届。
http://prir.ustb.edu.cn/yin/
28
ICDAR 2013 Robust Reading
Competition Results
Results for the ICDAR 2013 Robust Reading Competition
(Challenge2: Text Localization in Real Scenes)
http://prir.ustb.edu.cn/yin/
29
ICDAR 2013 Robust Reading
Competition Results
Results for the ICDAR 2013 Robust Reading Competition (Challenge1: Text
Localization in Born-Digital Images (Web and Email))
http://prir.ustb.edu.cn/yin/
30
Main References
[1] Xu-Cheng Yin, Xuwang Yin, Kaizhu Huang, and Hong-Wei Hao, “Robust
text detection in natural scene images,” IEEE Trans. Pattern Analysis and
Machine Intelligence (TPAMI), 36(5): 970-983, 2014.
[2] Xu-Cheng Yin, Wei-Yi Pei, Jun Zhang, and Hong-Wei Hao, “Multiorientation scene text detection with adaptive clustering”, IEEE TPAMI,
submitted (with revision), 2014.
[3] Xu-Cheng Yin, Xuwang Yin, Kaizhu Huang, and Hong-Wei Hao,
“Accurate and robust text detection: A step-in for text retrieval in natural
scene images”, ACM SIGIR’13.
[4] Xuwang Yin, Xu-Cheng Yin, et al., “Effective text localization in natural
scene images with MSER, geometry-based grouping and AdaBoost”, IAPR
ICPR’12.
http://prir.ustb.edu.cn/yin/
31
Discussions and Questions
Industrial R&D
Multilingual text detection and recognition in
natural scenes, web images, ubiquitous
documents and videos
Academic Research
End-to-end text recognition and retrieval in natural
scenes and web images with FeedforwardFeedback
http://prir.ustb.edu.cn/yin/
32