CLARIN: een introductie

Download Report

Transcript CLARIN: een introductie

CLARIN-NL
ISOcat workshop 2011
part 2
Ineke Schuurman
Menzo Windhouwer
Part A
• Issues brought up by participants
– When (not) to adopt an existing DC
– What about (CLARIN) standards
– What with ‘flagged’ DCs
– Relation DCS – profile
– What should be included in ISOcat (level of
detail, abbreviations, …)
– What about TEI, metadata, webservice?
– How to deal with larger amounts of data
Part B
• ISOcat and CLARIN: Do’s and don’ts
(version 0.1)
–
Introduction and discussion
• Part 1
– When (not) to adopt an existing DC
– What about (CLARIN) standards
– What with ‘flagged’ DCs
– Relation DCS – profile
– What should be included in ISOcat (level of
detail, abbreviations, …)
– What about TEI, metadata, webservice?
– How to deal with larger amounts of data
• When (not) to adopt an existing DC
– It should ‘match’ with the way you use a
specific notion in your annotation
scheme, application, …
– It should come with the same profile
– It should handle the same phenomenon,
SpeakerID =/= SingerID
Speaker vs Singer
String→Name→Person→Singer→Opera →
Opera singer→Tenor →Tenor in La Bohème
First: too generic, last: too specific
The others are candidates
Note that SingerID and SpeakerID are siblings,
whereas SingerID is subclass of both Singer
and ID (RELcat!)
– When (not) to adopt an existing DC
– What about (CLARIN) standards
– What with ‘flagged’ DCs
– Relation DCS – profile
– What should be included in ISOcat (level of
detail, abbreviations, …)
– What about TEI, metadata, webservice?
– How to deal with larger amounts of data
Standards
• Within ISOcat currently there are little or
no standards,
Therefore
• CLARIN NL and VL will set up their own
set of ‘standardized DCs’, Ineke will be in
charge (she will consult with others)
– When (not) to adopt an existing DC
– What about (CLARIN) standards
– What with ‘flagged’ DCs
– Relation DCS – profile
– What should be included in ISOcat (level of
detail, abbreviations, …)
– What about TEI, metadata, webservice?
– How to deal with larger amounts of data
Flagged DCs
• Never link with ‘deprecated’ DCs !
(in case of doubt: consult with Ineke or
Menzo)
• In other cases the flags show whether
the DC specification is correct from a
technical point of view.
• Note that only DCs with a green marking
are qualified for standardization
– When (not) to adopt an existing DC
– What about (CLARIN) standards
– What with ‘flagged’ DCs
– Relation DCS – profile
– What should be included in ISOcat (level of
detail, abbreviations, …)
– What about TEI, metadata, webservice?
– How to deal with larger amounts of data
DC/DCS and profile
• Profiles are not added automatically, a
DCS may contain elements with various
profiles
• In case the profile you need is not yet
available, contact Menzo and Ineke
– When (not) to adopt an existing DC
– What about (CLARIN) standards
– What with ‘flagged’ DCs
– Relation DCS – profile
– What should be included in ISOcat
(level of detail, abbreviations, …)
– What about TEI, metadata, webservice?
– How to deal with larger amounts of data
What to include?
• Cf slide on SingerID/SpeakerID
• In general: all linguistically meaningful
notions mentioned in your schema,
manual, definition (cf part B)
• Abbreviations (PST for /past tense/)
are to be mentioned as Data Element
Name
– When (not) to adopt an existing DC
– What about (CLARIN) standards
– What with ‘flagged’ DCs
– Relation DCS – profile
– What should be included in ISOcat (level of
detail, abbreviations, …)
– What about TEI, metadata,
webservice?
– How to deal with larger amounts of data
TEI, metadata, webservice
• TEI: likely to be taken care of at ‘higher
level’, if not YOU are to insert the TEI
definitions you use.
• Metadata: new in CMDI? In that case
definition in ISOcat to be provided as well
• Webservice: to be taken care of in CMDI
– When (not) to adopt an existing DC
– What about (CLARIN) standards
– What with ‘flagged’ DCs
– Relation DCS – profile
– What should be included in ISOcat (level of
detail, abbreviations, …)
– What about TEI, metadata, webservice?
– How to deal with larger amounts of
data
Larger amounts?
in such a case:
contact Menzo Windhouwer
([email protected])
Part B: do’s & don’ts
Do’s:
• Create a DCS for your scheme (name
project, ann.scheme, …)
• Provide clear definition (short, to the point)
for your scheme, application, ….
• Take care not to leave concepts used in your
definition undefined or vague
• Use appropriate vocabulary (per profile)
• Check ‘adopted’ DC’s regularly till
standardization !
Do’s (continued)
When creating a DC, fill out
• Justification: used in XYZ, part of tagset
N
• Language section
– Always English language section
– Strong recommendation: sections for object
language(s), for working language manual
– Sections in the various languages should
match (+/- be translations of each other)
Do’s (continued)
When creating a DC, fill out
• Example section
– Note that *negative* examples may be very
helpful! (jongens, mannen, niet: gelovigen
(is form of ADJ))
Example sections
Suppose you want to illustrate a German
phenomenon:
• Ex.sec. in EN language section
– German ex with transl in English
• Ex.sec. in NL language section
– German ex with transl in Dutch
• Ex.sec. in EN linguistic section
– EN example
• Ex.sec. in NL linguistic section
– NL example with translation in English
Don’ts
• Confuse Language and Linguistic section
– Latter contains language specific values for
closed domains
•
•
•
•
•
•
Be (too) language specific in definition
Mention scheme in definition
Use several definitions in one DC
Circular definitions
Rely on authority
Rely on standardized status
– Definition should fit YOUR scheme, etc
.
--
End --