Graph Analysis Matching Program
Download
Report
Transcript Graph Analysis Matching Program
Graph Analysis Matching
Program
Burdette Pixton
Record Linkage
Object Identification Problem
Identifies possible links in pedigrees
Advantages
Compress search results
Merge pedigrees
Discover missing Information
Record Linkage
Manual
Time consuming
Error prone
N2
Blocking
n2
Record Linkage
Current Approaches
Naïve approach
Deterministic algorithms
Probabilistic Algorithms
Record Linkage
Name: James Paul
Name: J. Paul
Sex: M
Sex: M
DOB: 7/4/1804
DOB: 7/11/1804
POB: England
POB: England
DOD: 8/15/1845
DOD: 8/15/1845
Parents: Howard Paul
Mary Jones
Parents: H. Paul
Mary Jones
Children: Lucy Paul
Children: Lucy Paul
Record Linkage
Problems with Current Standards
Uses Probabilistic Record Linkage Formula
Weights and thresholds are 10 years old
Depends on attributes of one record
Does not completely solve missing fields
problem
Record Linkage
Name: James Paul
Name: J. Paul
Sex: M
Sex: M
DOB: 7/4/1804
DOB: 7/11/1804
POB: England
POB: England
DOD: 8/15/1845
DOD: 8/15/1845
Parents: Howard Paul
Mary Jones
Parents: H. Paul
Mary Jones
Children: Lucy Paul
Children: Lucy Paul
Thesis Statement
Graph-matching can enhance current
record linkage techniques to find a
smaller set of possible matches and
have high precision.
GRAMP - Overview
Uses Multiple records
Transverse two graphs in parallel
Continue until no more links
Compare related nodes to each other to
get measurement
Record Linkage
Name: James Paul
Sex: M
Name: J. Paul
Sex: M
DOB: 7/4/1804
POB: England
DOD: 8/15/1845
Parents: Howard Paul
Mary Jones
Children: Lucy Paul
DOB: 7/11/1804
POB: England
DOD: 8/15/1845
Parents: H. Paul
Mary Jones
Children: Lucy Paul
Record Linkage
Name: Howard Paul
Sex: M
DOB: 2/4/1789
POB: England
DOD: 1/13/1815
Parents: Louis Paul
??
Children: James Paul
Name: H. Paul
Sex: M
DOB: 2/4/1789
POB: England
DOD: 8/15/1845
Parents: Louis Paul
Michelle P.
Children: J Paul
GRAMP
Determine probable matches
Transverse the graph
Weakness is decreased
Keep those with potential
Determine relationships
Compute similarity matches against both sets
Recursive calls
Combine Measurements
For each node in graph
GRAMP
Testing
Records with Errors
Records without Errors
Random set of Records
Expected Results
Do similar or better, smaller blocks
Slow
Contributions
Provides a useful tool for genealogical,
census, and statistical programs
An algorithm which matches objects
utilizing surrounding nodes
Offers a different approach to the
object identity problem
Questions/Comments