4. Miguel-Angel Sicilia (CVN).

Download Report

Transcript 4. Miguel-Angel Sicilia (CVN).

Building CERIF
databases from CVN
Miguel-Angel Sicilia
University of Alcalá
EuroCRIS LOD TG Leader
[email protected]
What’s this?
CVN to CERIF is a tool that extracts
data from CVN-XML files to store it into
a CERIF compliant database
https://github.com/ieru/cvn2cerif
How it works?
Fit in EuroCRIS specs
CERIF extended
ontologies (OWL)
CERIF LOD (RDF)
CVN to
CERIF XML
CERIF mappings
CERIF base
vocabularies/ontologies
(RDF)
CERIF REST
interface (JSON)
CERIF model (SQL + E/R)
CERIF
Conformance
testing
CERIF Interchange
(XML+BODs)
How it works?
• Requirements:
– XSLT Processor  takes a xml file and a xslt
file and transforms the input xml into another
xml or file. For examle: saxonb-xslt
• XSLT files:
– cvn_to_cerif.xslt  transforms a cvn-xml file
into a cerif-xml file
– cerif_xml_to_db.xslt  transforms a cerifxml file into a sql script with INSERT INTO
statements.
Example
1. Convert CVN xml file into CERIF xml file
Input: CVN.xml
saxonb-xslt -ext:on CVN.xml cvn_to_cerif.xslt > CERIF.xml
Output: CERIF.xml
Example
2. Convert CERIF xml file into SQL script
Input: CERIF.xml
saxonb-xslt -ext:on CERIF.xml cerif_xml_to_db.xslt >
CERIF_DB.sql
Output: CERIF_DB.sql
Example
CVN.xml (Person)
[…]
<GivenName code="000.010.000.020" multiplicity="false"
obligatory="true">
<Item>Miguel Ángel</Item>
</GivenName>
<FirstFamilyName code="000.010.000.010" obligatory="true">
<Item>Sicilia</Item>
</FirstFamilyName>
<SecondFamilyName code="000.010.000.010" obligatory="true">
<Item>Urbán</Item>
</SecondFamilyName>
[…]
<BirthDate code="000.010.000.050" obligatory="true">
<Item>1973-02-26</Item>
</BirthDate>
[…]
<Gender code="000.010.000.030" obligatory="true">
<Item>000</Item>
</Gender>
[…]
Example
CERIF.xml (Person)
[…]
<cfPers>
<cfPersId>4972b826-8ff1-4762-91ee-ab40c3f151ea</cfPersId>
<cfBirthdate>1973-02-26</cfBirthdate>
<cfGender>m</cfGender>
<cfuri />
</cfPers>
<cfPersName>
<cfPersId>4972b826-8ff1-4762-91ee-ab40c3f151ea</cfPersId>
<cfFirstNames>Miguel Ángel Sicilia Urbán</cfFirstNames>
</cfPersName>
[…]
Example
CERIF_DB.sql (Person)
[…]
INSERT INTO cfPers ( cfPersId, cfBirthdate, cfGender, cfuri ) VALUES (
'4972b826-8ff1-4762-91ee-ab40c3f151ea',
'1973-02-26',
'm',
'' );
INSERT INTO cfPersName ( cfPersId, cfFirstNames ) VALUES (
'4972b826-8ff1-4762-91ee-ab40c3f151ea',
'Miguel Ángel Sicilia Urbán‘
);
[…]
Example
CVN.xml (Organization)
<CvnItem>
[...]
<Title>
<Name code="010.020.000.170"><Item>Ingeniero de I+D</Item></Name>
<Type>000</Type>
</Title>
[...]
<Entity>
<EntityName code="010.020.000.020">
<Item>Intelligent Software Components S.A.</Item>
</EntityName>
</Entity>
<Date>
[...]
<StartDate>
<DayMonthYear code="010.020.000.180">
<Item>2000-08-21</Item>
</DayMonthYear>
</StartDate>
<Duration code="010.020.000.190"><Item>P1Y3M9D</Item></Duration>
<DurationType>010</DurationType>
</Date>
[...]
</CvnItem>
Example
CERIF.xml (Organization)
[…]
<cfOrgUnit>
<cfOrgUnitId>d456e998-4a28-438b-a4da-0bc23f2137ce</cfOrgUnitId>
</cfOrgUnit>
<cfOrgUnitName>
<cfOrgUnitId>d456e998-4a28-438b-a4da-0bc23f2137ce</cfOrgUnitId>
<cfName>Intelligent Software Components S.A.</cfName>
</cfOrgUnitName>
[…]
<cfPers_OrgUnit>
<cfOrgUnitId>d456e998-4a28-438b-a4da-0bc23f2137ce</cfOrgUnitId>
<cfPersId>5b08333b-bcbc-4a9f-8e67-1d76bc16957b</cfPersId>
<cfClassId>Ingeniero de I+D</cfClassId>
<cfStartDate>2000-08-21</cfStartDate>
<cfEndDate>2001-11-30</cfEndDate>
</cfPers_OrgUnit>
[…]
Example
CERIF_DB.sql (Organization)
[…]
INSERT INTO cfOrgUnit ( cfOrgUnitId ) VALUES (
'd456e998-4a28-438b-a4da-0bc23f2137ce‘
);
INSERT INTO cfOrgUnitName ( cfOrgUnitId, cfName ) VALUES (
'd456e998-4a28-438b-a4da-0bc23f2137ce',
'Intelligent Software Components S.A.’
);
[…]
INSERT INTO cfPers_OrgUnit ( cfOrgUnitId, cfPersId, cfClassId,
cfStartDate, cfEndDate ) VALUES (
'd456e998-4a28-438b-a4da-0bc23f2137ce',
'5b08333b-bcbc-4a9f-8e67-1d76bc16957b',
'Ingeniero de I+D',
'2000-08-21',
'2001-11-30‘
);
[…]
Mapping CVN - CERIF
The first prototype of CVN to CERIF tool can extract the following data
from CVN:
• Person data [cfPers]
• Name [cfPersName.cfFirstNames]
• Birthdate [cfPers.cfBirthdate]
• Gender [cfPers.cfGender]
• Organization data [cfOrgUnit]
• Name [cfOrgUnitName.cfName]
• Person position [cfPers_OrgUnit]
• Position [cfPers_OrgUnit.cfClassId]
• Start date: [cfPers_OrgUnit.cfStartDate]
• End date: [cfPers_OrgUnit.cfEndDate]
• Project data [cfProj]
• Title [cfProjTitle.cfTitle]
• Person [cfProj_Pers]
• Start date: [cfProj_Pers.cfStartDate]
• End date: [cfProj_Pers.cfEndDate]
Then expose as LOD…