Transcript ISO 16642

TMF - a tutorial
Part 3: Designing (schemas and)
filters
TMF - Terminological Markup
Framework
Laurent Romary - Laboratoire Loria
General principles
 Terminological information interchange
– Three components:
• Source TDB1
• Target TDB2
• Terminological interchange format
– A specific TML (DXLT, Geneter)
TDB1
TML
TDB2
Important notice
– GMT is not a TML
• A too abstract format
– Uncontrolled recursivity (‘ struct ’ element)
– Uncontrolled content (‘ feat ’ and ‘ annot ’)
• Necessity to provide a schema to check
interchanged data
– Precise list of datacategory
– Precise definition of format
– GMT is here to provide conceptual simplicity
Designing filters
TML to GMT
General principles
 Just for your information
– The creation of the filters can be automatized
 Basic processes
– Reduction of expansion trees
– Mapping elements and attributes to the
corresponding data categories
Reducing expansion trees
 Example
• DXLT (Martif) sub-tree
<ntig>
<!-- some general information associated
with the term -->
<termGrp>
<!-- term related information -->
</termGrp>
</ntig>
• GMT
<struct type="TS">
<!-- some features -->
</struct>
Element mapping
 Example
• DXLT (Martif)
<definition>Bla, bla, bla
etc.</definition>
• GMT
<feat type="definition">Bla, bla, bla
etc.</feat>
Structural elements
 Generating a GMT ‘ struct ’ element
<xsl:template match="termEntry">
<xsl:element name="struct">
<xsl:attribute
name="type">TE</xsl:attribute>
<xsl:apply-templates select="@*|node()"/>
</xsl:element>
</xsl:template>
Features
 Generating a GMT‘ feat ’ element
» (style=Attribute)
<xsl:template match="@id">
<xsl:element name="feat">
<xsl:attribute
identifier</xsl:attribute>
<xsl:value-of select="."/>
</xsl:element>
</xsl:template>
name="type">iso12620-
Features
 Generating a GMT‘ feat ’ element
» (style=Element)
<xsl:template match="term">
<xsl:element name="feat">
<xsl:attribute
term</xsl:attribute>
<xsl:apply-templates/>
</xsl:element>
</xsl:template>
name="type">iso12620-
Features
 Generating a GMT‘ feat ’ element
» (style=TypedElement)
<xsl:template match="descrip[@type='subjectField']">
<xsl:element name="attr">
<xsl:attribute
name="type">SubjectField</xsl:attribute>
<xsl:apply-templates/>
</xsl:element>
</xsl:template>
XML Schemas for TMLs
…work ahead…
Analysing existing TDBs
Towards a generic methodology
General Architecture
TDB
Flat XML
GMT
TML
A two phase process
 List the various Data Categories used in the
TDB
– Relate them to existing registries (e.g. iso
12620), cf.
http://salt.loria.fr/public/salt/DCQuery.html
 Identify the underlying organization of the
TDB
– Relate it to the Meta-model
– Anchor the DatCat where they actually occur
Analysis of an existing TDB
Going through an example
Eurodicautom sample
<entry>
<BE>BTB</BE>
<TY>DAG77</TY>
classificationCode-12620A.4.2 (TE)
<NI>398</NI>
<CF>3</CF>
Language 12620A.10.7(LS)
<CM>AG1</CM>
<CM>JUA</CM>
term-12620A.1 (TS)
<EN>
<VE>key money</VE>
<RF>CILF,Dict.Agriculture,ACCT,1977</RF>
</EN>
<FR>
definition-12620A.5.1 (TS)
<VE>pas-de-porte</VE>
<DF>prix payé au précédent occupant pour le droit d'entrer
dans une exploitation agricole</DF>
<RF target="DF">TNC(1997)</RF>
<RF>CILF,Dict.Agriculture,ACCT,1977</RF>
<NT type="NTE">droit rural;pratique prohibée par la loi</NT>
</FR>
</entry>
note-12620A.8 (TS)
Result in GMT (1/2)
<tmf>
<struct type="TE">
<feat type="entryIdentifier-12620A.10.15">BTB-TY-398</feat>
<feat type="originatingInstitution-12620A.10.22.2">BTB</feat>
<feat type="projectSubset">DAG77</feat>
<feat type="NI">398</feat>
<feat type="reliabilityCode">3</feat>
<feat type="classificationCode-12620A.4.2">AG1</feat>
<feat type="classificationCode-12620A.4.2">JUA</feat>
<struct type="LS">
<feat type="language-12620A.10.7">EN</feat>
<struct type="TS">
<feat type="term-12620A.1">key money</feat>
</struct>
<feat type="sourceIdentifier12620A.10.20">CILF,Dict.Agriculture,ACCT,1977</feat>
</struct>
Result in GMT (2/2)
<struct type="LS">
<feat type="language-12620A.10.7">fr</feat>
<struct type="TS">
<feat type="term-12620A.1">pas-deporte</feat>
</struct>
<brack>
<feat type="definition-12620A.5.1">prix payé
au précédent occupant pour le droit d'entrer dans une exploitation
agricole</feat>
<feat type="sourceIdentifier12620A.10.20">TNC(1997)</feat>
</brack>
<feat type="sourceIdentifier12620A.10.20">CILF,Dict.Agriculture,ACCT,1977</feat>
<feat type="note-12620A.8">droit rural;pratique
prohibée par la loi</feat>
</struct>
</struct>
</tmf>
Simple rules
 Using XSL locality
<xsl:template match="CM">
<feat type="classificationCode-12620A.4.2">
<xsl:apply-templates/>
</feat>
</xsl:template>
Introducing specific levels
 Necessity to combine structure and content
<xsl:template match="VE">
<struct type="TS">
<feat type="term-12620A.1">
<xsl:apply-templates/>
</feat>
</struct>
</xsl:template>
Default rule
 Useful for keeping track of unmapped data
categories
<xsl:template match="*">
<feat>
<xsl:attribute name="type">
<xsl:value-of select="name()"/>
</xsl:attribute>
<xsl:apply-templates/>
</feat>
</xsl:template>
Useful pointers
 TMF page:
– http://www.loria.fr/projets/TMF
 HLT/Salt project page
– http://www.loria.fr/projets/SALT
 Data category query tool:
– http://salt.loria.fr/public/salt/DCQuery.html