Titel van de presentatie

Download Report

Transcript Titel van de presentatie

Ralph Niels, Franc Grootjen & Louis Vuurpijl
Writer identification through
information retrieval
Writer identification through
information retrieval
A search engine for forensic experts
Ralph Niels
Franc Grootjen
Louis Vuurpijl
Writer identification through
information retrieval
Overview
• Forensic writer identification
• Prototypical shapes in handwriting
• Information retrieval (IR)
• Traditional
• Writer identification using
prototypes
• Experiments
• Method
• Results
• Conclusions & future work
Ralph Niels
Franc Grootjen
Louis Vuurpijl
Writer identification through
information retrieval
Forensic writer identification
Ralph Niels
Franc Grootjen
Louis Vuurpijl
Writer identification through
information retrieval
Ralph Niels
Franc Grootjen
Louis Vuurpijl
Forensic information retrieval
• Web search: query of words to search in documents
containing words
• Forensic search: query of characters to search in
documents containing characters
• Previous work*: sub-character level, binary features
• Based on characters: improves justification possibilities
* A. Bensefia, T. Paquet, and L. Heutte.
A writer identification and verification
system. Pattern Recogn. Letters,
26(13):2080–2092, 2005.
Ralph Niels
Franc Grootjen
Louis Vuurpijl
Writer identification through
information retrieval
Forensic information retrieval
• Dictionary of character shapes: prototypes
– Experts use prototypes
– Describe query & documents by prototype usage
instances of
prototype
Prototypes
Writer identification through
information retrieval
Ralph Niels
Franc Grootjen
Louis Vuurpijl
Character to prototype matcher
• Find most similar prototype for each character
a5
a9
a16
a52
(…)
W48 h16 a9 t1 y2 o1 u23 d16 i25 d12 i6 s12 (…)
Writer identification through
information retrieval
Ralph Niels
Franc Grootjen
Louis Vuurpijl
Prototypes
• Averaged shapes of real handwritten characters
• Dynamic Time Warping-distance to find most similar
prototype
Prototypes
R. Niels & L. Vuurpijl & L. Schomaker. Automatic allograph matching in forensic
writer identification. International Journal of Pattern Recognition and Artificial
Intelligence. Vol. 21, No. 1. Pages 61-81. February 2007.
Ralph Niels
Franc Grootjen
Louis Vuurpijl
Writer identification through
information retrieval
The IR model for writer identification
Writer input
Character to
prototype
matcher
af(w)
aw(w)
Indexing
Ranked
list
Prototype list
Matching
Justification
Query input
Character to
prototype
matcher
af(q)
Writer identification through
information retrieval
Ralph Niels
Franc Grootjen
Louis Vuurpijl
Indexing: create weighted vectors
• Vector of prototype usage for each writer: af(w)
• Adjust weight of prototypes in that vector:
• Protos used by many writers: not distinctive -> lower weight
• wf(p) = number of writers using proto p
iwf ( p)2 log( wf n( p) )
• Weighted vector of prototype use for each writer
aw(w) p  af (w)  iwf ( p)
Ralph Niels
Franc Grootjen
Louis Vuurpijl
Writer identification through
information retrieval
The IR model for writer identification
Writer input
Character to
prototype
matcher
af(w)
aw(w)
Indexing
Ranked
list
Prototype list
Matching
Justification
Query input
Character to
prototype
matcher
af(q)
Prototype frequency in query
Ralph Niels
Franc Grootjen
Louis Vuurpijl
Writer identification through
information retrieval
The IR model for writer identification
Writer input
Character to
prototype
matcher
af(w)
aw(w)
Indexing
Ranked
list
Prototype list
Matching
Justification
Query input
Character to
prototype
matcher
af(q)
Writer identification through
information retrieval
Matching
• Input
• ‘Database writers’: Indexed writer vectors aw(w)
• ‘Query writer’: Vector af(q)
• Match:
• Calculate cosine of angle between af(q) and each aw(w)
• Output
• Ranked list of writers (similarity to query)
Ralph Niels
Franc Grootjen
Louis Vuurpijl
Ralph Niels
Franc Grootjen
Louis Vuurpijl
Writer identification through
information retrieval
The IR model for writer identification
Writer input
Character to
prototype
matcher
af(w)
aw(w)
Indexing
Ranked
list
Prototype list
Matching
Justification
Query input
Character to
prototype
matcher
af(q)
Writer identification through
information retrieval
Justification
• Similarity value (cosine of angle)
• Prototype contribution to retrieval result
Ralph Niels
Franc Grootjen
Louis Vuurpijl
Writer identification through
information retrieval
Justification
• Forensic expert can further inspect justification
Ralph Niels
Franc Grootjen
Louis Vuurpijl
Writer identification through
information retrieval
Ralph Niels
Franc Grootjen
Louis Vuurpijl
Experiment
• 43 writers from plucoll database
• Online data
• Segmented into characters
• How well does our technique perform given a certain
amount of data (characters)?
• Amount of characters in database (d)
• Amount of characters in query (q)
Writer identification through
information retrieval
Ralph Niels
Franc Grootjen
Louis Vuurpijl
Experiment
Repeat
10 times
for each
comb. of
d and q
• Pick d random letters from each database writer
• Pick q random other letters from one writer,
and use those as query
Repeat 10
• Find most similar writer
• Prototypes
• iwf(p), aw(w)
• Matching
• Vary d and q
times for
each writer
Ralph Niels
Franc Grootjen
Louis Vuurpijl
Writer identification through
information retrieval
Results
d
q
q
d
100
300
500
1000
10
59
79
83
88
30
86
97
99
100
50
94
99
100
100
70
96
100
100
100
100
98
100
100
100
Writer identification through
information retrieval
Ralph Niels
Franc Grootjen
Louis Vuurpijl
Conclusions & future work
• Needed for 100%: 70 chars (q), 300 chars (d)
• Average English sentence: 75-100 characters
• No black box: results are justified
• Online data: forensic practice?
• Extract semi-automatically with help expert
• Use offline matching technique
• Just 43 writers
• Bigger (n writers & n techniques) experiments planned
• Promising results
Writer identification through
information retrieval
A search engine for forensic experts
Ralph Niels
Franc Grootjen
Louis Vuurpijl