Graph Analysis Matching Program

Download Report

Transcript Graph Analysis Matching Program

Graph Analysis Matching
Program
Burdette Pixton
Record Linkage



Object Identification Problem
Identifies possible links in pedigrees
Advantages



Compress search results
Merge pedigrees
Discover missing Information
Record Linkage

Manual




Time consuming
Error prone
N2
Blocking

n2
Record Linkage

Current Approaches



Naïve approach
Deterministic algorithms
Probabilistic Algorithms
Record Linkage
Name: James Paul
Name: J. Paul
Sex: M
Sex: M
DOB: 7/4/1804
DOB: 7/11/1804
POB: England
POB: England
DOD: 8/15/1845
DOD: 8/15/1845
Parents: Howard Paul
Mary Jones
Parents: H. Paul
Mary Jones
Children: Lucy Paul
Children: Lucy Paul
Record Linkage

Problems with Current Standards




Uses Probabilistic Record Linkage Formula
Weights and thresholds are 10 years old
Depends on attributes of one record
Does not completely solve missing fields
problem
Record Linkage
Name: James Paul
Name: J. Paul
Sex: M
Sex: M
DOB: 7/4/1804
DOB: 7/11/1804
POB: England
POB: England
DOD: 8/15/1845
DOD: 8/15/1845
Parents: Howard Paul
Mary Jones
Parents: H. Paul
Mary Jones
Children: Lucy Paul
Children: Lucy Paul
Thesis Statement
Graph-matching can enhance current
record linkage techniques to find a
smaller set of possible matches and
have high precision.
GRAMP - Overview

Uses Multiple records



Transverse two graphs in parallel
Continue until no more links
Compare related nodes to each other to
get measurement
Record Linkage
Name: James Paul
Sex: M
Name: J. Paul
Sex: M
DOB: 7/4/1804
POB: England
DOD: 8/15/1845
Parents: Howard Paul
Mary Jones
Children: Lucy Paul
DOB: 7/11/1804
POB: England
DOD: 8/15/1845
Parents: H. Paul
Mary Jones
Children: Lucy Paul
Record Linkage
Name: Howard Paul
Sex: M
DOB: 2/4/1789
POB: England
DOD: 1/13/1815
Parents: Louis Paul
??
Children: James Paul
Name: H. Paul
Sex: M
DOB: 2/4/1789
POB: England
DOD: 8/15/1845
Parents: Louis Paul
Michelle P.
Children: J Paul
GRAMP

Determine probable matches



Transverse the graph




Weakness is decreased
Keep those with potential
Determine relationships
Compute similarity matches against both sets
Recursive calls
Combine Measurements

For each node in graph
GRAMP

Testing




Records with Errors
Records without Errors
Random set of Records
Expected Results


Do similar or better, smaller blocks
Slow
Contributions



Provides a useful tool for genealogical,
census, and statistical programs
An algorithm which matches objects
utilizing surrounding nodes
Offers a different approach to the
object identity problem
Questions/Comments