High-level Component Filtering for Robust Scene Text Detection

Download Report

Transcript High-level Component Filtering for Robust Scene Text Detection

High-level Component Filtering for
Robust Scene Text Detection
Weilin Huang (黄韡林)
Shenzhen Institutes of Advanced Technology (SIAT),
Chinese Academy of Sciences
Multimedia Laboratory, The Chinese University of Hongkong
Outline
■
Introduction
♦ Connected Component and Sliding-Window Methods
♦ Stroke Width Transform (SWT)
♦ SWT based Text Detection
■
Stroke Feature Transform
♦ Colour Information on Text Stroke Detection
■ Text
Covariance Descriptor (TCD)
♦ TCD for Component Filtering
♦ TCD for Text-line Filtering
■ Convolution
Neural Network Induced MSER Trees
♦ Maximally Stable Extremal Regions (MSERs)
♦ CNN for Component Classification
♦ Component Splitting
I. Introduction: Text Detection Methods
■
Connected Component Methods
♦ Step 1: Separate text and non-text information at pixel-level
♦ Step 2: Group text pixels to construct character components
♦ Advantages: fast computing
♦ Limitations: not robust, erroneous components, many false alarms
♦ Examples: SWT, MSERs
■
Sliding-Window Methods
♦ Step 1: Train a text classifier
♦ Step 1I: Scan a sliding sub-window though the image
♦ Advantages: high-level text classification
♦ Limitations: computing costly, difficulty in feature design
I. Introduction: Stroke Width Transform(1)
■ Example
SWT
Operator
Stroke width constraint:
|Op - Oq|<λ
■ Low-level
■ Canny
edges
■ Gradient
■
pixel filter
orientation for ray tracking
Compute stroke width bwt. paired pixels
SWT Map
■ Problem
1:
Erroneous connection
Connecting multiple characters
Separating single characters
■ Problem 2:
many non-text components
I. Introduction: SWT based Text Detection
■ Complete
Processing:
Comp.
filtering
SWT
Heuristic Filtering
Random Forest classifier
(heuristic and geometric
features)
Our Improvements
TL
filtering
GP
More powerful
high-level filters
Text components
Grouped text lines
Final text lines
C. Yao, X. Bai,W. Liu, Y. Ma, Z. Tu, Detecting texts of arbitrary orientations in natural images, CVPR, 2012.
II. Stroke Feature Transform (SFT) (1)
■ Stroke
Feature Transform(SFT):
Stroke Width Constraint:
|Op - Oq|<λ1
Stroke Color Constraint:
|Cp - Cq|<λ2
Stroke width constraint:
|Op - Oq|<λ
Neighborhood Coherency
Constraint
SWT
Stroke Width Map
SFT
Output
Stroke Width Map
Stroke Color Map
II. Stroke Feature Transform (SFT) (2)
■ SFT
vs SWT
 Mitigate inter-component connections
 Enhance intra-component connections
 Better character candidate detection
 Higher Recall
II. Stroke Feature Transform (SFT) (3)
■ Limitation: not
robust by low-level operation
 Text-like outliers
■ Bricks
■ Windows
■ Leaves
……
Many false alarms
 Low Precision
 Heuristic filter not work well
 High-level learning based filtering required
III. Text Covariance Descriptor (TCD) (1)
■ Text
Covariance Descriptor
 Each pixel represented by d-features
 TCD is computed as:
 U is a given region:
 Multiple features are incorporated in a matrix
III. Text Covariance Descriptor (TCD) (2)
■ TCD
for components
 Pixel coordinates in X- and Y-axis
Encode spatial information
 Pixel intensities and RGB values
Color uniformity
9x9 Covariance Features
 Stroke width and distance values
Stroke width/distance consistency
 Edge information by Canny detector
Stroke spatial layout
■ Totally
9 features to construct a 9 x 9 matrix
■ Transform to a 45-dim feature vector
■ Get
component confident maps by RF classifier
III. Text Covariance Descriptor (TCD) (3)
■ TCD
for Text-line
 Mean properties of component features
Uniformity
 Coordinates of component centers
12x12 Covariance Features
Spatial information
 Heights of components
Consistency
 Horizontal distances between components
Text spatial layout
 16-bins HOG on edge pixels
Orientated spatial features
■
16x16 Covariance Features
Get Text-line Confident Maps by RF classifier
III. Text Covariance Descriptor (TCD) (4)
■ Component
and text-line confidence maps
III. Text Covariance Descriptor (TCD) (5)
■ Top:TCD
for component; Middle:TCD for text-line; Bottom: detection
III. Text Covariance Descriptor (TCD) (5)
■ Results
■ Failure
Cases
W. Huang, Z. Lin, J.Yang and J. Wang,Text localization in natural images using stroke feature transform and text covariance
descriptors, ICCV, 2013.
V. Convolution Neural Network Induced
MSER Trees (1)
■ Maximally
Stable Extremal Region (MSER) Tree
L. Neumann and J. Matas. Text localization in real-world images using efficiently pruned exhaustive search, ICDAR, 2011.
■ MSER
vs SWT
♦ Detect low-quality texts
 Higher Recall
♦ Generate more non-text components  Lower Precision
♦ Require a more powerful classifier/filter
V. Convolution Neural Network Induced
MSER Trees (2)
■ A Two-layers
Convolution Neural Network (CNN)
T. Wang, D. J. Wu, A. Coates and A. Y. Ng, End-to-end text recognition with convolutional neural networks, ICPR,
2012.
V. Convolution Neural Network Induced
MSER Trees (3)
■ Training
Data: Synthetic 15000 samples
■ Data Transformation
♦ Fixed-size
of 32x32
♦ Horizontal
♦ Include
warp
additional image context
V. Convolution Neural Network Induced
MSER Trees (3)
■ CNN
Confident Scores
MSERs
CNN
Scores
Comp.
Splitting
Detection
V. Convolution Neural Network Induced
MSER Trees (4)
■ Component
Splitting
Erroneously connected
Component
■ High
aspect ratio
■ Positive
conf. score
■ Leaf of the MESR tree
or conf. score> all children
V. Convolution Neural Network Induced
MSER Trees (5)
■ Comparisons
with SFT-TCD
V. Convolution Neural Network Induced
MSER Trees (6)
■ Results
V. Convolution Neural Network Induced
MSER Trees (7)
■ Results
on the ICDAR 2011 Database
W. Huang,Y. Qiao, and X. Tang, Robust Scene Text Detection with Convolution Neural Network
Induced MSER Trees, ECCV, 2014.
The End
Thank You!