presentation

Download Report

Transcript presentation

Lukáš Neumann and Jiří Matas
Centre for Machine Perception, Department of Cybernetics
Czech Technical University, Prague
1
Problem Introduction
Contributions:
1. Text Fragments – Generalization of
character detection
2. Stroke Support Pixels
3. Text-line Resegmentation
Experiments
Conclusion
2015.08.25 Neumann, Matas, ICDAR 2015
2/22

Text
◦ Anything that can be represented as a sequence of
Unicode characters
2015.08.25 Neumann, Matas, ICDAR 2015
3/22
Scene Text (Text in the Wild)


Typically short snippet(s) of text, arbitrary script
and orientation,
non-standard fonts, out-of-vocabulary words, complex
backgrounds
Text in the wild
Other text
Image/video taken by a camera
2015.08.25 Neumann, Matas, ICDAR 2015
4/22

Region-based methods assume:
one region (connected component)
one character
represents

We generalize this assumption by detecting
arbitrary
Text Fragments in a single pass

Text Fragment
◦
◦
◦
◦
Part of a Character
Character
Group of Characters
Word
2015.08.25 Neumann, Matas, ICDAR 2015
5/22


Text Fragments in the majority of scripts
and fonts share the “strokeness” property
This observation was popularized in the
Stroke Width Transform [1] to detect
individual characters
[1] B. Epshtein et al., “Detecting text in natural scenes with stroke width
transform,” in CVPR 2010
2015.08.25 Neumann, Matas, ICDAR 2015
6/22


Text Fragment candidates detected
as MSERs over multiple scales and color
projections
MSERs classified as either
◦ Character (character or a character part)
◦ Multi-character (group of characters or words)
◦ Background



Characters and multi-characters grouped into
text lines with an efficient exhaustive search
strategy [2]
Each text line is refined using a local text
model
Character segmentations are recognized using
an OCR module trained on synthetic data [3]
[2] L. Neumann, J. Matas, “Text localization in real-world images using efficiently pruned
exhaustive search,” in ICDAR 2011
[3] L. Neumann, J. Matas, “On combining multiple segmentations in scene text recognition,” in
2015.08.25
Neumann, Matas, ICDAR 2015
ICDAR 2013
7/22
Area A of a stroke is
approximately equal to the
product of
the stroke axis length sl
and
the stroke width sw
 Stroke area ratio As / A is
a very discriminative
feature to eliminate nontext regions
 A character can be “drawn”
by a circular brush with a
possibly changing diameter
di equal the stroke width sw
sweeping a curve S – the
stroke axis.
 The non-constant diameter
2015.08.25
Neumann,
Matas, ICDAR 2015
models
characters
made of
S

d = sw
sl
i
w
8/22




The stroke is “in the mind
of the writer” (it could
be easily found in a
online handwriting setup)
The Stroke Support Pixels
(SSP) is a subset of
pixels that lie on the
stroke (but unlike
skeleton, it does not have
to be continuous)
The subset is found as
local maxima in a region’s
distance map
Stroke area discretization
effects are compensated by
weighing all SSPs in a 3x3
neighborhood
2015.08.25 Neumann, Matas, ICDAR 2015
9/22

Less sensitive to discretization effects
and scale change than standard skeleton
algorithms; detection trivial
2015.08.25 Neumann, Matas, ICDAR 2015
10/22

Less sensitive to discretization effects
and scale change than standard skeleton
algorithms
2015.08.25 Neumann, Matas, ICDAR 2015
11/22
2015.08.25 Neumann, Matas, ICDAR 2015
12/22

Polynomial SVM with features:
𝐴𝑠
𝐴
◦ Approx. Stroke Area Ratio
.. number of pixels
◦ Aspect Ratio*
◦ Compactness
𝑤
ℎ
√𝐴
,
𝑃
Character/
Fragment
𝐴𝐻
𝐴
O(1)
P .. perimeter
𝐴𝐶
𝐴
Multi-character
2015.08.25 Neumann, Matas, ICDAR 2015
N
* only not rotation
◦ Approx. Convex Hull Area Ratio
◦ Holes Area Ratio
O(N)
invariant, replaced in
current work to
achieve
O(1) full rotation
invariance
O(1)
O(N)
Background
13/22

Key feature in the
classification
Works for wide
variety of scripts
and fonts

Example: MSERs 460

CharacterMulti-character
2015.08.25 Neumann, Matas, ICDAR 2015
Non-character MSER
14/22


Not all characters (even their fragments or
groups)
are detected as MSERs
Characters which are detected can have many
different segmentations (over-complete
representation)
The detected Text Fragments are used to
initialize a hypotheses-verification
iterative process
 For each text line, a local color model is
iteratively updated using a standard graph
cut framework
 The graph cut is initialized using the
2015.08.25
Neumann,
Matas, ICDARpixels
2015
stroke
support

15/22
After every iteration:
• the text box position is re-estimated
• connected components are classified (character,
multi, non-char)
Source Image
Iteration #1
2015.08.25 Neumann, Matas, ICDAR
MSER detection
Initialization
Iteration #2
Final iteration
(#6)in
stroke support pixels
2015
16/22
Source Image
Text
Fragment
detection
Final
Segmentation
Latin (stencil), Hebrew Script
2015.08.25 Neumann, Matas, ICDAR 2015
17/22
Source Image
Text
Final
Fragment
Segmentation
detection
Indian (Kanada), “Latin”, Armenian
Script
2015.08.25 Neumann, Matas, ICDAR 2015
18/22
ICDAR 2013 Dataset – Text
Localization
pipeline
recall
precision
f
Proposed method
72.4
81.8
77.1
Yin et al. [4]
68.3
86.3
76.2
TexStar (ICDAR’13 winner)
66.4
88.5
75.9
our previous method [3]
64.8
87.5
74.5
Kim (ICDAR’11 winner)
62.5
83.0
71.3
[4] X.-C. Yin, X. Yin, K. Huang, and H.-W. Hao, “Robust text detection in natural scene images,”,
TPAMI 2014
2015.08.25 Neumann, Matas, ICDAR 2015
19/22
TAXI
iMac
CARLING
THE DOLLAR ARMS
2015.08.25 Neumann, Matas, ICDAR 2015
D8LL
PANTENE PROV
20/22





Arbitrary Text Fragments detected in a single
pass
An efficiently calculated “strokeness”
feature exploited to discriminate between
Text Fragments and background clutter
Detected Text-lines are refined by resegmentation
in a hypotheses-verification iterative
process that exploits local text line
properties
Competitive results with the state-of-the-art
Online demo available at
http://www.textspotter.org/
2015.08.25 Neumann, Matas, ICDAR 2015
21/22
Thank you for your attention!
http://www.TextSpotter.org/
2015.08.25 Neumann, Matas, ICDAR 2015
22/22