No Slide Title

Download Report

Transcript No Slide Title

Språkbanken i Finland
Kielipankki
Language Bank of Finland
Nordic Treebank Network
Fefor, September 17, 2003
AEB/Yleisesittely
7/18/2015
Vem, kuka, who?
• The Language Bank of Finland is a service provided by CSC
• CSC is owned by the Finnish Ministry of Education
– provides HPCN services to all universities
– maintains scientific applications and databases
• CSC focuses on providing shared services
• Services are gratis for universities, non-profit for companies
• The Language Bank serves the linguistic community in Finland
– Server: corpus.csc.fi server (Linux)
– Text collections (Finnish and Finland-Swedish)
– Taggers
– Web based corpus query tool
AEB/Yleisesittely
7/18/2015
Varför, miksi, why?
• There is no Treebank of Finnish at present
• … and it is a shame, so
• The Language Bank wants to bring about its creation
– Infrastructure programme by the Academy of Finland in 2004
– The plan is to use Finnish Dependency Grammar by Connexor
• Without query and analysis tools the treebank is just a large heap
of files
– We need information on tools and technology in order to create
a nice service for linguists and language technology
professionals
AEB/Yleisesittely
7/18/2015
At present the Language Bank offers... (1)
• Text collection of Finnish
– 180 million words
– 60 % with msd tags (TextMorfo 2.0)
• Text collection of Finland-Swedish
– 32 million words
– 100 % with msd tags (SWECG)
• Swedish PAROLE
– 19 million words (courtesy of Språkbanken, Gothenburg)
• Other:
– Le Monde 1990, German PAROLE, FISC, Susanne, OTA,
Middle French, Oulu
AEB/Yleisesittely
7/18/2015
At present the Language Bank offers... (2)
• WWW Lemmie 2.0 (screenshot on next slide)
– Easy-to-use corpus query tool developed at CSC
• Taggers
– Fi-lite (Connexor)
– En-lite (Connexor)
– ENGCG (Lingsoft)
– SWECG (Lingsoft)
– FINTWOL (Lingsoft)
– TextMorfo (Kielikone)
– Morfo (Kielikone)
AEB/Yleisesittely
7/18/2015
AEB/Yleisesittely
7/18/2015
In the past the Language Bank has been active in...
• Preparing ground for research programmes
– Preliminary survey on language technology 1998
– Preliminary survey on spoken language research 2001
• Participating in programmes with universities
– Enlargement of text collections 1999-2001
– Integrated resources for speech technology and spoken
language research 2002-2004
AEB/Yleisesittely
7/18/2015
In the future the Language Bank will offer...
• Spoken language data
– Academy of Finland funding
– The work is being done
• Annotation editor for spoken language data (screenshot on next
slide)
– Annotation interchange format in RDF
– Supports collaborative annotation
• Treebank of Finnish ;-)
– Just need some money…
• Better tools for querying and processing research data
AEB/Yleisesittely
7/18/2015
AEB/Yleisesittely
7/18/2015
More information
http://www.csc.fi/kielipankki/
[email protected]
AEB/Yleisesittely
7/18/2015