Conversion of cultural database records to XML

Download Report

Transcript Conversion of cultural database records to XML

Mapping Techniques:
Demonstration of automatic data
transformation
Maria Theodoridou
ICS-FORTH
April 2004
The Problem
Convert data of various formats into a
common XML format compatible with the
CIDOC Conceptual Reference Model
• Input data:
– Databases
– Text files
– Spreadsheet files
• Output files:
– XML files, compatible with the CIDOC Conceptual
Reference Model
April 21, 2004
Maria Theodoridou
2
Data Junction 7.5
An integration/migration tool designed to convert
structured data from one format to another. Also
designed to clean and restructure the data to fit the
new format. Market Leader in its Field
Map Designer: transfers data among a wide variety of
data file types. To transfer data, you design and run a
conversion. Each conversion contains all the
information needed to convert data from an existing
data file or table to a new Target data file or table.
Conversions are done in three Steps:
1. Define options for the source file
2. Define options for the target file
3. Define appropriate mapping from the source to the
target file using all the functions, expressions, actions
and events the tool offers
April 21, 2004
Maria Theodoridou
3
Step 1 (Source)
• Define Source Type
– ASCII, Access, Excel, SQL, Oracle etc.
• Define Source File
• Each text source file will be separated into
fields and records
– Field is a category of things, which have a
common characteristic
– Things of the same field are listed in
separate records below the field
– We use field and record delimiters
April 21, 2004
Maria Theodoridou
4
Step 1 (Source)
• We can sort our data descending or
ascending depending on one or more
fields
• By using the “Source Filters” we can keep
only the records we want
• We can have the field names by using a
header in the source file
• The parsing is done serially and every
record is listed below the other
• From the Data Browser we see all the
fields, records and the data
April 21, 2004
Maria Theodoridou
5
Step 2 (Target)
• Define Target Type
– ASCII, Access, Excel, Lotus, SQL, Oracle etc.
• Define Target File
• We can use an existing record layout
from:
– A previous conversion
– An XML file, an XML DTD, an XML Schema
April 21, 2004
Maria Theodoridou
6
General about a conversion
• Every conversion uses the Event Handlers,
the Actions, the Expressions and the
Functions
• Every Event, from a predefined set, contains
one or more actions
• Every action has either a predefined set of
parameters or uses an expression
• Every expression can use one or more of the
predefined or the user defined functions
• We can also declare global variables, which
can be used by every expression within a
conversion.
April 21, 2004
Maria Theodoridou
7
Events
• Predefined set of Events
• They occur at a specific moment with a
predefined order in the conversion cycle
• Events are separated into Source Generic
Events, Target Events, Conversion
Events, and Record Type Events
• Examples of Events:
– BeforeConversion, AfterConversion,
AfterEveryRecord, OnDataChanged,
BeforeMap, AfterMap,
BeforePutRecord, OnDiscard
April 21, 2004
Maria Theodoridou
8
Actions
Actions have their own format and parameters:
• Execute: executes an expression we specify
• Clear: Clears the contents of a target record
memory buffer
• Map: maps a specific target record type
• Put: writes the results of a target record type
to a specific position in the target file
• ClearMapPut: combines the functionality of
Clear, Map, and Put
• Abort: causes the conversion to abort
April 21, 2004
Maria Theodoridou
9
An Example
• We Map and then Put the target record types
“CRM_Entity”, “is_identified_by” and
“has_current_owner”
• The global variable “id” helps us map the
appropriate fields to the “is_identified_by” target
record type
April 21, 2004
Maria Theodoridou
10
Expressions
• They offer conditional mapping and data
modification
• Each expression is written for an
“Execute” Action or is a Target Field
Expression
• We use the same syntax in every
expression
• Every expression is written in the
expression builder window
April 21, 2004
Maria Theodoridou
11
An expression example
A target field expression:
If (IsNull(Fields("nr"))) Then
"NULL"
Else
"OID:LUPA."&Fields("nr")
End If
April 21, 2004
Maria Theodoridou
12
Functions
Set of predefined functions separated in categories:
• Conversion: typecast a string or a variable
• Date/Time: convert strings into several formats of
date and time
• Math: return the sign, absolute, logarithm etc.
• Parse: parse a string into sub strings by using
delimiters
April 21, 2004
Maria Theodoridou
13
Functions
• Text: replace, compare, count, trim strings
• File: read, copy, append and rename a file
or get the name of a directory or make a
new directory
• Flow Control: For … Next, Go To, If …
Then … Else, Return, Select Case … End
Select, While … Wend.
• We can also have user-defined functions,
save them as a Visual Basic Module File
(.bas) and call them within the
conversions we want
April 21, 2004
Maria Theodoridou
14
Step 3 (Map)
• Here we define the Events, Actions,
Expressions and Functions we use
• We define the order and type of the fields of
every target record type
• A target record type may contain a lot of other
record types
• When a field is selected to be a record, then it
automatically links to a record type, which has
the same name with the field
April 21, 2004
Maria Theodoridou
15
Lupa transformation
Lupa – Stone data base of 7000 archaeological
records, City of Vienna, Austria
Microsoft Access to XML conversion
April 21, 2004
Maria Theodoridou
16
Lupa transformation
Source
April 21, 2004
Maria Theodoridou
17
Lupa transformation
Target
XML file
April 21, 2004
Maria Theodoridou
18
Lupa transformation
Map
Field name
CIDOC CRM correspondence
Nr
CRM_Entity->P47F_is_identified_by->E42_Object_Identifier
Bilder
CRM_Entity->P129B_is_subject_of->E38_Image
Titel
CRM_Entity-> P102F_has_title-> E35 Title
Objekt
CRM_Entity ->P2F_has_type-> E55_Type
Fundort
CRM_Entity -> P12B_was_present_at-> E5_Event
Bezirk
Verwahrort
CRM_Entity->P12B_was_present_at->P7F_took_place_at>E53_Place
CRM_Entity->P55F_has_current_location->E53_Place
Literatur
CRM_Entity->P70B_is_documented_in->E31_Document
Inschrift
CRM_Entity->P65F_shows_visual_item->P142F_has_clear_text
Inschrift_Abfrage
CRM_Entity->P65F_shows_visual_item->P141F_has_transcription
April 21, 2004
Maria Theodoridou
19
CIL transformation
CIL – Corpus Inscriptionum Latinarum
25,150 roman inscriptions
text to XML conversion
April 21, 2004
Maria Theodoridou
20
CIL transformation
Source
CIL 13, 00029* = AE 1994, 1194.
Abelloni / deo / (A)elia(?) Mar/[6] / v(otum) s(olvit)
CIL 13, 00118* (AE 1989, 0519).
D(eo?) M(ercurio?) / pro sal(ute) Imp(eratoris)
M(arci) Aur(eli) Antonini / Fel(icis) Aug(usti)
Brit(annici) p(ontificis) m(aximi) tr(ibunicia)
p(otestate) XV imp(eratoris) / II co(n)s(ulis)
II des(ignati) III p(atris) p(atriae) C(aio)
Iul(io) / Drutedo et Balorice Taur(i) f(ilio) /
ex v(oto)
April 21, 2004
Maria Theodoridou
21
CIL transformation
Source
ContentExtractor is a report reader
designed to mine important data out of
unstructured report formats such as
invoices, bank statements, emails, html
docs, etc., and turn that information into a
database
We used Content Extractor to convert
unstructured text to Microsoft Excel file
format
April 21, 2004
Maria Theodoridou
22
CIL transformation
Target
XML file
April 21, 2004
Maria Theodoridou
23
CIL transformation
Map
Field name
CIDOC CRM correspondence
Field 1
CRM_Entity->E34 Inscription->P65F_shows_characters
Field 1
CRM_Entity->E34 Inscription->P141F_has_transcription
Field 2
CRM_Entity->P70B_is_documented_in->E31_Document
Field 2
CRM_Entity->P70B_is_documented_in->E31_Document
April 21, 2004
Maria Theodoridou
24
Thank You!