Retrieving Danish Genealogical Records on the Semantic Web

Download Report

Transcript Retrieving Danish Genealogical Records on the Semantic Web

Simplifying Family History
Research for the Naïve User:
Building an Ontology and Expert Logic for Searching Danish
Genealogical Primary Records
By
Charla Woodbury
June 13, 2005
Real User Problem
A person decides to do family history research for the first time
on their Danish family lines.
• Where do they go?
• What records do they look for?
• How do they handle records in Danish?
• How can they tell when the records they have match their
search family?
2
Problem


Semantic web tools - Expanded to
specialized domain expertise
SMART websites

Automatically link to best information

Make the user an expert
•
•
•
•
HELP
ANTICIPATE
GUIDE
TRAIN
3
Solution

Use an ontology with lexicons and
description logic to:
• Extract the correct matching primary
records
• Compute feast dates and birth dates
from age at death
• Match names and families
4
Methods
• Preparing for the records extraction
• Producing results listing
• Evaluating the methodology
5
Preparing for Records Extraction
1.
Ontology Building at the Entity Level
2.
Annotating Primary Record Websites
3.
4.
Building Research Tools Inside the
Ontology
Logic and Reasoning inside the Ontology
6
1
Ontology – Entity Level
7
ONTOLOGY ENTITIES
FIND and MARK UP relevant web
pages by:
• NAME
• DATE
• PLACE
• RELATIONSHIP
• OCCUPATION
• RECORD_TYPE
• SOURCE
<NAME>
<DATE>
<PLACE>
<RELATION>
<OCCUPATION>
<RTYPE>
<SOURCE>
8
Danish GIVEN NAME LEXICON
Add synonyms and thesaurus

MALE
•
•
•
•
•
•
•
Anders –And.
Andreas
Christen –Kristen
Christian –Kristian
Erik –Eric
Gregers
Hans
• Ib –Jep –Jeppe
•
•
•
•
•
•
•
Jacob
Jens
Johan – Johannes – Joh.
Jorgen –Jørgen
Knud
Lars – Laurs – Laurids –Lauritz
Mads –Mats - Mats

FEMALE
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Ane – Anna – Anne
Birthe – Birte
Bodil
Caroline
Dorthe – Dorte
Ellen -Helene -Elene
Elisabeth –Elsbeth –Lisbeth
Else –Ilse
Ingeborg
Inger
Karen
Kirsten –Christen –Kirstine –
Christine –Kirstine –Chirstine
Malene
Maren
9
DATE Lexicon
Adds Thesaurus of Synonyms
MONTHS
FEAST DATES













January –Jan –Januar -11br
Februrary –Feb –Februar -12br
March –Mar –Marts
April – Apr –Apl
May –Mai
June –Jun –Juni
July –Jul –Juli -5br
August –Aug –Augst -6br
September –Sep –Sept -7br –Septembre
October –Oct -8br –Octobre
November –Nov -9br –Novembre
December –Dec -10br


DAYS OF WEEK





TIME





Easter – Paaske –Påske –Paasche –Påsche –
P.
Pentecost – Pent –Pinse -Pin
Trinity –Tr –Trin –Trinitatis


Sunday –Sun –Dominico –Dom.
Monday –Mon –Mondag –Mond.
Tuesday –Tue –Tirsdag –Tirsd.
Wednesday –Wed -Onsdag –Onsd.
Thursday – Thur –Tørsdag –Tørsd.
Friday –Fri –Fredag –Fred.
Saturday –Sat –Lørsdag –Lørs
Year –yr –aar –år
Month –mo –maaned –m.
Week –uge –ug.
Day –dag –d.
Hour – h. –hr.
10
2
Annotating Primary
Record Websites
Colors are used to represent
the mark-ups
Web Page
• SOURCE URL -Tvilum Sogne Kirkebog
• [PAGE HEADER] Fødde 1751 3
• [BODY] Truust Dom. 23 p: Trinit: laest
over Niels Baches SØREN fadd.
Johannes Michelsens og Niels Mollers
hustruer af Søebyevad, Peder
Rasmussen af Søebyevad, Jens Bachis
søn Peder og Niels Thylkes s. Peder af
Truust
12
ONTOLOGY ENTITIES
FIND and MARK UP relevant web pages by:
•
•
•
•
•
•
•
NAME
DATE
PLACE
RELATIONSHIP
OCCUPATION
RECORD_TYPE
SOURCE
<NAME>
<DATE>
<PLACE>
<RELATION>
<OCCUPATION>
<RTYPE>
<SOURCE>
13
Annotated Web Page
• SOURCE -Tvilum Parish Register
• [PAGE HEADER] Fødde 1751 3
• [BODY] Truust Dom. 23 p: Trinit: laest
over Niels Baches SØREN fadd.
Johannes Michelsens og Niels Mollers
hustruer af Søebyevad, Peder
Rasmussen af Søebyevad, Jens Bachis
søn Peder og Niels Thylkes s. Peder af
Truust
14
3
Building Research Tools
Inside the Ontology

Conversion functions

Matching different name forms

Matching place names to appropriate
records
15
CONVERSION FUNCTIONS
inside the ontology
• Compute birthdate from age at death
Death – 22 Mar 1743
Age - 23 yr 2 m
-> BIRTH Jan 1720
• Compute dates from feast dates
Sunday 23rd after Trinity 1751
-> 14 Nov 1751
16
Match different name forms as
ONE PERSON
Uses lexicon to determine different
forms of the same name
• JENS PEDERSEN
• JENS PEDERSEN BACH
• JENS BACH
• JENS BACHIS
17
PLACES - County Map of
DENMARK
18
Parish and District Map of
SKANDERBORG
19
Matching Places to Records
Farm
name
Parish
District
County
Record Links
Molgiaer
Tamdrup
Nim
Skanderborg
PARISH
Tamdrup 1684-1912
PROBATE
Nim Herred Provisti
Rask
Skanderborg Rytterdistrikt
Tamdrup
Nim
Skanderborg
List of URL’s
Includes Molgiaer URL’s
Adds Parish specific records
Nim
Skanderborg
List of URL’s
Includes Tamdrup URL’s
Adds District specific records
Skanderborg
List of URL’s
Includes all district URL’s
Adds County specific records
20
4 Logic and Reasoning
inside the Ontology

Correct family placement of primary
records - This is a logic and reasoning
knowledge base which applies rules to
determine that:
• Names of the children follows common naming
practices
• High percentage % of the witnesses match
individuals in the family knowledge base
21
Naming Practices
Male children are named in this order:
• [occasional] Mother’s previous husband
• Father’s father
• Mother’s father
• Father
22
Knowledge Base
Points out deviations of naming practices

Father

LARS Andersen

FathersFather

ANDERS Pedersen

Mother

Maren Jensen

MothersFather

JENS Olesen

MothersPrevHusband

HENRICH Sorensen

Son1

HENRICH Larsen

Son2

ANDERS Larsen

Son3

JENS Larsen

Son4

LARS Larsen
23
Witness Match Knowledge Base

PURPOSE -Correct Family Placement

Description logic knowledge base
•
•
•
•

CHILD
PARENT
SPOUSE
SIBLING
Match christening record to family where
highest % of witnesses can matched to
the knowledge base load
24
Sample Load
Niels Baches SØREN fadd. Johannes Michelsens og Niels Mollers hustruer af
Søebyevad, Peder Rasmussen af Søebyevad, Jens Bachis søn Peder og Niels
Thylkes s. Peder af Truust
Jens Pedersen Bach= Inger Nielsen
Ibsen
Peder Jensen Bach
Michel Jensen = Anna
Anna Michelsen = Niels Thylke
Niels Jensen Bach=Abigael Michelsen
Thylke
Peder Nielsen
Johannes Michelsen
Soren Nielsen Bach
= SPOUSE
[arrow] PARENT CHILD
25
Producing Results Listing

Processing the Input
• Enough information?
• Do the names, dates, places, and
relationships correspond to lexicon
values?

Using ONTOS to extract records
26
RESULTS LISTING
TARGET – Jens Pedersen Bach
Truust, Tvilum Parish, Gjern District, Skanderborg
born 1693, died 1778
Name
Date
Place
Relation
Jens Bachis
Dom. 23 p:
Trinit:
1751
Truust
fadd:
(14 Nov 1751)
Occupation Record
Type
Fødde
Source
(URL)
Tvilum
Parish
Register
SOURCE -Tvilum Parish Register
[PAGE HEADER]
Fødde 1751 3
[BODY] Truust Dom. 23 p: Trinit: laest over Niels Baches SØREN
fadd. Johannes Michelsens og Niels Mollers hustruer af Søebyevad,
Peder Rasmussen af Søebyevad, Jens Bachis søn Peder og
27
Niels Thylkes s. Peder af Truust
Evaluating the Methodology

Search Speed

User Relevance Feedback
• Accuracy of the results list
• Ease or difficulty of use

Precision and Recall
28
MAJOR CONTRIBUTIONS

A portal for family history research that could be
easily expanded with:






Maps and gazeteers
Look-ups
Helps
Training
Other countries and states
The first genealogical primary record extractor
using semantic web tools which promises:



Accuracy
Fast response
Ease of use

The first use of logic and reasoning inside an
ontology to add expert rules for family history

A practical demonstration of the superiority of
semantic web tools for future research
29