Transcript ISO 16642
TMF - a tutorial Part 3: Designing (schemas and) filters TMF - Terminological Markup Framework Laurent Romary - Laboratoire Loria General principles Terminological information interchange – Three components: • Source TDB1 • Target TDB2 • Terminological interchange format – A specific TML (DXLT, Geneter) TDB1 TML TDB2 Important notice – GMT is not a TML • A too abstract format – Uncontrolled recursivity (‘ struct ’ element) – Uncontrolled content (‘ feat ’ and ‘ annot ’) • Necessity to provide a schema to check interchanged data – Precise list of datacategory – Precise definition of format – GMT is here to provide conceptual simplicity Designing filters TML to GMT General principles Just for your information – The creation of the filters can be automatized Basic processes – Reduction of expansion trees – Mapping elements and attributes to the corresponding data categories Reducing expansion trees Example • DXLT (Martif) sub-tree <ntig> <!-- some general information associated with the term --> <termGrp> <!-- term related information --> </termGrp> </ntig> • GMT <struct type="TS"> <!-- some features --> </struct> Element mapping Example • DXLT (Martif) <definition>Bla, bla, bla etc.</definition> • GMT <feat type="definition">Bla, bla, bla etc.</feat> Structural elements Generating a GMT ‘ struct ’ element <xsl:template match="termEntry"> <xsl:element name="struct"> <xsl:attribute name="type">TE</xsl:attribute> <xsl:apply-templates select="@*|node()"/> </xsl:element> </xsl:template> Features Generating a GMT‘ feat ’ element » (style=Attribute) <xsl:template match="@id"> <xsl:element name="feat"> <xsl:attribute identifier</xsl:attribute> <xsl:value-of select="."/> </xsl:element> </xsl:template> name="type">iso12620- Features Generating a GMT‘ feat ’ element » (style=Element) <xsl:template match="term"> <xsl:element name="feat"> <xsl:attribute term</xsl:attribute> <xsl:apply-templates/> </xsl:element> </xsl:template> name="type">iso12620- Features Generating a GMT‘ feat ’ element » (style=TypedElement) <xsl:template match="descrip[@type='subjectField']"> <xsl:element name="attr"> <xsl:attribute name="type">SubjectField</xsl:attribute> <xsl:apply-templates/> </xsl:element> </xsl:template> XML Schemas for TMLs …work ahead… Analysing existing TDBs Towards a generic methodology General Architecture TDB Flat XML GMT TML A two phase process List the various Data Categories used in the TDB – Relate them to existing registries (e.g. iso 12620), cf. http://salt.loria.fr/public/salt/DCQuery.html Identify the underlying organization of the TDB – Relate it to the Meta-model – Anchor the DatCat where they actually occur Analysis of an existing TDB Going through an example Eurodicautom sample <entry> <BE>BTB</BE> <TY>DAG77</TY> classificationCode-12620A.4.2 (TE) <NI>398</NI> <CF>3</CF> Language 12620A.10.7(LS) <CM>AG1</CM> <CM>JUA</CM> term-12620A.1 (TS) <EN> <VE>key money</VE> <RF>CILF,Dict.Agriculture,ACCT,1977</RF> </EN> <FR> definition-12620A.5.1 (TS) <VE>pas-de-porte</VE> <DF>prix payé au précédent occupant pour le droit d'entrer dans une exploitation agricole</DF> <RF target="DF">TNC(1997)</RF> <RF>CILF,Dict.Agriculture,ACCT,1977</RF> <NT type="NTE">droit rural;pratique prohibée par la loi</NT> </FR> </entry> note-12620A.8 (TS) Result in GMT (1/2) <tmf> <struct type="TE"> <feat type="entryIdentifier-12620A.10.15">BTB-TY-398</feat> <feat type="originatingInstitution-12620A.10.22.2">BTB</feat> <feat type="projectSubset">DAG77</feat> <feat type="NI">398</feat> <feat type="reliabilityCode">3</feat> <feat type="classificationCode-12620A.4.2">AG1</feat> <feat type="classificationCode-12620A.4.2">JUA</feat> <struct type="LS"> <feat type="language-12620A.10.7">EN</feat> <struct type="TS"> <feat type="term-12620A.1">key money</feat> </struct> <feat type="sourceIdentifier12620A.10.20">CILF,Dict.Agriculture,ACCT,1977</feat> </struct> Result in GMT (2/2) <struct type="LS"> <feat type="language-12620A.10.7">fr</feat> <struct type="TS"> <feat type="term-12620A.1">pas-deporte</feat> </struct> <brack> <feat type="definition-12620A.5.1">prix payé au précédent occupant pour le droit d'entrer dans une exploitation agricole</feat> <feat type="sourceIdentifier12620A.10.20">TNC(1997)</feat> </brack> <feat type="sourceIdentifier12620A.10.20">CILF,Dict.Agriculture,ACCT,1977</feat> <feat type="note-12620A.8">droit rural;pratique prohibée par la loi</feat> </struct> </struct> </tmf> Simple rules Using XSL locality <xsl:template match="CM"> <feat type="classificationCode-12620A.4.2"> <xsl:apply-templates/> </feat> </xsl:template> Introducing specific levels Necessity to combine structure and content <xsl:template match="VE"> <struct type="TS"> <feat type="term-12620A.1"> <xsl:apply-templates/> </feat> </struct> </xsl:template> Default rule Useful for keeping track of unmapped data categories <xsl:template match="*"> <feat> <xsl:attribute name="type"> <xsl:value-of select="name()"/> </xsl:attribute> <xsl:apply-templates/> </feat> </xsl:template> Useful pointers TMF page: – http://www.loria.fr/projets/TMF HLT/Salt project page – http://www.loria.fr/projets/SALT Data category query tool: – http://salt.loria.fr/public/salt/DCQuery.html