Transcript Document

SIL FieldWorks Language Explorer:
The lexicon component
Gary Simons
SIL International
Lexicon Tools and Lexicon Standards
Nijmegen, 4–5 August 2010
SIL FieldWorks
 FieldWorks is:
 a suite of integrated software tools to help
field workers manage language and
cultural data, with support for complex
scripts.
 http://fieldworks.sil.org/
 The Language Explorer tool is designed to:




manage a lexical database
produce dictionaries
interlinearize texts
analyze morphology
2
Quick Tour
 A short quick tour screen movie
demonstrates the look and feel
 It is the first of 55 narrated screen movies
available at:
 http://downloads.sil.org/FieldWorks/Movies
/brief demo menu.html
3
Integration among areas
 The Lexicon, Texts, and Grammar areas
all operate over the same database.
 In the Lexicon area, users enter lexical
entries directly.
 In the Texts area, as new morphemes are
glossed in text, new lexical entries are
created behind the scenes.
 In the Grammar area, users describe the
categories and features used in lexical
description, plus the inflectional templates
that guide automatic parsing in Texts. 4
Conceptual-modeling approach
 Lexicon, texts, and grammar are all stored
in a single, normalized relational database.
 We began by working with domain experts
to build a conceptual model of the areas
and how they integrate.
 That was modeled in UML and transformed
to a SQL relational database schema.
 See the full model with over 100 classes at:
http://fieldworks.sil.org/ModelDoc/ModelDocumentation.chm
5
Some key features
 Use automatic parsing to empirically verify
morphological description within lexicon
 Build the word net via lexical relations
 Build richness into the lexicon by eliciting
through semantic domains
 Use “bulk edit” for global clean up
 Repurpose content by developing multiple
presentation views
 Clean separation between stored data and
presentation (see example in next 2 slides)
6
Root-based dictionary (Cherokee)
- Stem entries just cross-refer to root
- Root entries list stems as subentries
- Subentries give full description
7
Stem-based dictionary (Cherokee)
- Stem entries give full description
- Root entries cross-refer to stems
- No subentries
8
Pathways to publishing
 First create a “configured view” to display the
lexical entries as desired
 Then use the Pathway plug-in to take this
stream of configured content and lay it out
onto pages for a publishable dictionary
 http://code.google.com/p/pathway/
 Publishing tools supported so far:
 Prince XML (to PDF)
 Open Office (to ODF)
 Adobe InDesign
9
Lexical interchange
 Supports two import formats:
 From Shoebox / Toolbox via SFM
 “Standard Format Markers” = backslash codes
 User configures the mapping of markers to
conceptual equivalents in FLEx database
 The default mapping is for MDF SFM
 From WeSay / Lexique Pro via LIFT
 Lexicon Interchange FormaT: an XML
application for interchange of lexicons
 http://code.google.com/p/lift-standard/
10
Lexicon export
 The entire database for a language project
can be dumped to Fieldworks XML
 http://fieldworks.sil.org/supportdocs/FieldWorks XML model.doc
 The complete lexical database (a subset of
the whole project) can be exported to:
 LIFT XML
 MDF-based SFM (either root- or stem-based)
 http://fieldworks.sil.org/supportdocs/Export options in Flex.doc
11
More lexicon export
 Any configured view can be exported to:
 A streamlined version of Fieldworks XML
 MDF-based SFM
 XHTML + CSS for presentation
 Furthermore, one can create a Fieldworks
XML Template (FXT) to define a custom
export format (XML, SFM, plain text)
 http://fieldworks.sil.org/supportdocs/FXT export options.doc
12
Interoperation with GOLD
 FLEX is preloaded with a grammatical categories
catalog that is based on an early GOLD
 http://www.sil.org/computing/fieldworks/flex/categories.html
 Similarly, a Morphosyntactic Gloss Assistant is
preloaded with morphosyntactic properties from
an early GOLD; see p. 10 of:
 http://www.sil.org/~simonsg/preprint/FLExParser Preprint.pdf
 Thus morphosyntactic information in lexicon and
texts is implicitly aligned with GOLD
 The remaining step is for us to map to GOLD ids
when they are standardized; then we can easily
export GOLD ids in LIFT and other XML
13
Uptake
 October 2009: FLEx 3.0 released in Fieldworks 6.0.
Free download from:
 http://www.sil.org/computing/fieldworks/FW_downloads.htm
 323 members of a reasonably active Google
Group (~3,000 messages)
 http://groups.google.com/group/flex-list
 185 language projects have registered as users
 Over 30 did a 4-day FLEx workshop led by Beth
Bryson at InField 2010. Beth will also do a
one-day FLEx workshop at ICLDC, Feb 2011.
14