ISO 16642 - TermSciences

Download Report

Transcript ISO 16642 - TermSciences

ISO 16642 - a tutorial
Part 2: Representing data
categories
TMF - Terminological Markup
Framework
Laurent Romary - Laboratoire Loria
Why formalizing DatCats?
 Systematizing data category description:
– Notion of Data Category Registry (DCR)
• I need a data category: is it there?
– Query by name, definition etc.
 Automatizing processes:
– Format control of TMLs
– Filters from one TML to GMT
Which model for DatCats?
 Using XML:
– Coherence with TMF principles
– Using stylesheet to generate schemas and filters
 Using RDF (Resource Description
Framework)
– Intended format for representing meta-data:
• Description of a DatCat is meta-data with regards
TMF
RDF - a quick presentation
Cf. other file
Data Categories
A Formal Description
Data Category Registry
DCRegistry
rdf:about
Description
dcsd:VersionNumber
VersionNumber
dcsd:DataCategory
Data Category
Data Category description
DCIdentifier
DCParent
DCName
dcsd:DCIdentifier
DCDefinition
dcsd:DCParent
dcsd:DCName
dcsd:DCDefinition
dcsd:DCExample
dcsd:DCType
Data Category
DCType (S, C)
dcsd:DCAdmin
DCExample
dcsd:DCComment
dcsd:Level
dcsd:Content
DCAdmin
DCComment
Locus
Content
Salt 2000-11-08/SEW
Simple and complex DatCats
 Complex data categories
– shall serve as field identifiers (not names) in databases and
can have content. The datatype for this content shall be
declared for each data category and can commonly take
the form of different categories of text, defined data types
(such as dates), and specified data domains, e.g., picklists
comprising standardized permissible instances.
» Example: /Part of Speech/
 Simple data categories
– shall serve as the content of complex data categories.
» Example: /Noun/, /Verb/, /Adjective/ etc.
Levels and content
Content
dcsd:DataType
dcsd:TargetType
Level/Loci
rdf:Alt
rdf:Alt
List of References
rdf:li
Ref to other datcats
TargetType
DataType
List of References
rdf:li
Ref to other datcat(s)
rdf:Alt
List of References
rdf:li
Ref to other datcat(s)
Administrative properties
Source
Status
Data Category
dcsd:DCAdmin
dcsd:Source
dcsd:Status
dcsd:StatusDate
DCAdmin
StatusDate
dcsd:EditionDate
dcsd:StatusNote
dcsd:VariantNames
EditionDate
StatusNote
VariantNames
Dcsd:ShortForm
ShortForm
Dcsd:AdmittedName Dcsd:ForbiddenName
AdmittedName
ForbiddenName
RDF Representation
/term/ - RDF description (1)
<dcsd:DataCategory
dcsd:DCIdentifier="ISO12620A01"
dcsd:DCName="term"
dcsd:position="A.01"
dcsd:DCType="C">
<dcsd:DCDefinition> A verbal designation of a general
concept in a specific subject field </dcsd:DCDefinition>
<dcsd:DCComment>
<dcsd:sourceComment>For definition of related term,
see ISO 1087-1, 3.4.3.</dcsd:sourceComment>
<dcsd:conceptComment>Terms can consist of single words
or be composed of multiword strings…</dcsd:conceptComment>
<dcsd:Example>"radix" in annex C, figure
C.1.</dcsd:Example>
<dcsd:DictionnaryID>A.1</dcsd:DictionnaryID>
</dcsd:DCComment>
/term/ - RDF description (2)
<dcsd:Content dcsd:DataType="plainText"/>
<dcsd:Level>
<rdf:Alt>
<rdf:li>TL</rdf:li>
<rdf:li>TC</rdf:li>
</rdf:Alt>
</dcsd:Level>
<dcsd:DCAdmin
dcsd:OrgSource="ISO TC 37"
dcsd:DocSource="ISO12620:1999"
dcsd:subDate="2000-10-20 SEW"
dcsd:registryComment="Prepared
2000-10-20"
dcsd:Status="Accepted"/>
</dcsd:DataCategory>
/term type/ - RDF description (1)
<dcsd:DataCategory
dcsd:DCIdentifier="ISO12620A0201"
dcsd:DCName="term type"
dcsd:position="A.02.01"
dcsd:DCType="C">
<dcsd:DCDefinition>An attribute assigned to a
term</dcsd:DCDefinition>
<dcsd:DCComment>
<dcsd:DictionnaryID>A.2.1</dcsd:DictionnaryID>
</dcsd:DCComment>
<dcsd:Content dcsd:DataType="picklist">
<rdf:Alt>
<rdf:li>ISO12620A020101</rdf:li>
<rdf:li>ISO12620A020102</rdf:li>
<rdf:li>ISO12620A020119</rdf:li>
</rdf:Alt>
</dcsd:Content>
/term type/ - RDF description (2)
<dcsd:Level>
<rdf:Alt>
<rdf:li>TL</rdf:li>
<rdf:li>TC</rdf:li>
</rdf:Alt>
</dcsd:Level>
<dcsd:DCAdmin
dcsd:OrgSource="ISO TC 37"
dcsd:DocSource="ISO12620:1999"
dcsd:subDate="2000-10-20 SEW"
dcsd:registryComment="Prepared
2000-10-20"
dcsd:Status="Accepted"/>
</dcsd:DataCategory>
Actualizing a DatCat
TMF specific properties
Styling properties
Level
Anchor
Simple
AnchorInfo
Data Category
StyleName
dcsd:Anchor
dcsd:StyleName
Element
Attribute
TypedElement
ValuedElement
TVElement
dcsd:Style
dcsd:ElementName
ElementName
Style
dcsd:Value
dcsd:AttributeName
dcsd:TypeValue
AttributeName
Value
TypeValue
For ‘ Simple ’
Attribute style description
• dcsd:StyleName="Attribute"
– Conditions of use:
• Not valid for annotations
– Required properties
• dcsd:AttributeName
– Example:
• dcsd:AttributeName="id"
• <anchorElement id="xx54893">…</>
Element style description
• dcsd:StyleName="Element"
– Required properties
• dcsd:ElementName
– Example:
• dcsd: ElementName ="definition"
• <definition>…</definition>
TypedElement style description
• dcsd:StyleName="TypedElement"
– Required properties
• dcsd:ElementName, dcsd:TypeValue
– Example:
• dcsd:ElementName ="termNote"
• dcsd:TypeValue="partOfSpeech"
• <termNote type="partOfSpeech"/>N</termNote>
ValuedElement style description
• dcsd:StyleName="ValuedElement"
– Conditions of use:
• Not valid for annotations
– Required properties
• dcsd:ElementName
– Example:
• dcsd:ElementName ="pos"
• <pos value="noun"/>
TVElement style description
• dcsd:StyleName="TVElement"
– Conditions of use:
• Not valid for annotations
– Required properties
• dcsd:ElementName, dcsd:TypeValue
– Example:
• dcsd:ElementName ="free"
• dcsd:TypeValue="pos"
• <free type="pos" value="noun"/>
Simple style description
• dcsd:StyleName="Simple"
– Conditions of use:
• Express the value of simple data categories
– Required properties:
• dcsd:Value
– Example:
• dcsd:Value ="Nom"
• <pos>Nom</pos>
Dealing with languages
Two types of languages
 Working language
• The language used at a given place in a document,
along the XML hierarchy
• Representation: xml:lang
 Object language
• The language about which you speak at a given
place in your terminological entry (e.g. describes the
Language Section level)
• Representation: as a data category "language", with
a narrow scope
Example — DXLT
<langSet lang='en’ xml:lang="fr">
<descrip type="definition">Une valeur entre 0 et 1
utilisée...</descrip>
<tig>
<term xml:lang="en">alpha smoothing
factor</term>
<termNote type="termType">fullForm</termNote>
</tig>
</langSet>
Example — GMT
<struct type="LS" xml:lang="fr">
<feat type="language">en</feat>
<feat type="definition">Une valeur entre 0 et 1
utilisée...</feat>
<struct type="TL">
<feat type="term" xml:lang="en">alpha smoothing
factor</feat>
<feat type="termType">fullForm</feat>
</struct>
</langSet>
Conclusion
– A general model for analysing and representing
terminological data collection
– An underlying formalism expressed in
XML,RDF
– Associated tools (Salt project)
• DCSEditor,
• DCSBrowser,
• Automatic generation of XSLT filters and XML
schemas from a given TML specification
Useful pointers
 SALT project
– http://www.loria.fr/projets/SALT
– http://www.ttt.org/
 The TMF site
– http://www.loria.fr/projets/TMF