A MULTILINGUAL STATISTICAL DICTIONARY IN THE BROADER …

Download Report

Transcript A MULTILINGUAL STATISTICAL DICTIONARY IN THE BROADER …

Dictionaries for the Human
Language Technologies
virtual network
Dr Mariëtta Alberts
Focus Area Manager
Standardisation and Terminology Development
Pan South African Language Board
(PanSALB)
Outline of presentation
• Introduction
• Reviewing Human Language Technologies
– Scope of HLT
– Potential of HLT
– Multilingualism and HLT
• The South African HLT initiative
– History of South African HLT project
– National Facility
– South African HLT model
• Terminology Training initiative of PanSALB
• Conclusion Afrilex,13 - 15 July 2005, UFS, Bloemfontein
1. Introduction
• South Africa is on the verge of establishing a
Human Language Technology (HLT) Centre
• The Centre will probably be managed as a
national facility
• It will provide an appropriate and sustainable
virtual (or otherwise) infrastructure conducive
to
the
development
and
effective
management of reusable electronic text and
speech resources
Afrilex,13 - 15 July 2005, UFS, Bloemfontein
2. Reviewing Human
Language Technologies (HLT)
• Human Language Technologies are
enabling technologies
• They enable human beings to interact
with computers by using human
language (text and speech)
Afrilex,13 - 15 July 2005, UFS, Bloemfontein
Human Language Technologies range from:
• high-level parsing and machine translation
• applications in education and training
• public service (e-governance and
e-commerce applications)
• voice-operated educational systems
• voice-operated commercial systems that
can be used by illiterate people
Afrilex,13 - 15 July 2005, UFS, Bloemfontein
Human Language Technologies:
• Provide interfaces that enable spoken
human-machine interaction (telephonebased information systems, automated
booking systems);
• Provide linguistic assistance (spelling and
grammar checking)
• Provide access to multilingual polythematic
information
• Empower people to actively participate in
the Information
Afrilex,13 -Society
15 July 2005, UFS, Bloemfontein
2.1 The scope of HLT:
• Text based language processing
– Text analysis (e.g. spellcheckers, term
extraction, search engines)
– Summarisation
– Text translation
• Speech processing
– Speech recognition (e.g. desktop or
telephony environment)
– Speech synthesis
Afrilex,13 - 15 July 2005, UFS, Bloemfontein
2.2 Potential of HLT:
• Access for all to the information era
• Enhanced mother-tongue or first language
teaching
• Affordable multilingual documents
• Improved
languages
functionality
and
quality
• Contact withAfrilex,13
the - developing-world
context
15 July 2005, UFS, Bloemfontein
of
Potential of HLT...
• Availability of multilingual words and
polythematic terminology: indicator of
development
• Specialised communication has a central axle
or hub in terminology
• Standardised terminology contributes to
quality of translations, interpreting and
communication
• Streamlined translation and interpreting
Afrilex,13 - competitive
15 July 2005, UFS, Bloemfontein
services provide
advantages
2.3 Multilingualism and HLT: South
African situation
• South Africa has a severe illiteracy rate
• Only 22% of the citizens can function
through medium of English
• A small percentage of South Africans have
access to computers - fewer still are IT
literate
• The divide is even greater in the rural
versus urban scene
Afrilex,13 - 15 July 2005, UFS, Bloemfontein
• Effective e-government is necessary (i.e. birth
certificates, identity documents, marriage and death
certificates, telephone, electricity and water bills, traffic fines,
etc.)
• All citizens should have access to information
in the languages they understand best (e.g. 11
official languages; South African Sign Language; Khoe and
San languages)
• Government should communicate to citizens
in their own languages regarding key services
(e.g. health; safety and security; education; postal services;
justice (courts); banks (economy); media (electronic and
print); labour (jobs);
social welfare (pensions); etc.)
Afrilex,13 - 15 July 2005, UFS, Bloemfontein
Language Policy and Legislation
• Multilingual policy since 1994 - South African
Constitution of 1996 (Act 108 of 1996)
• Mechanisms of protecting and promoting
linguistic rights were put in place
• Section 6 of the South African Constitution
specifically mentions the principles of
language policy which takes into
consideration the multilingual nature of the
South African society
Afrilex,13 - 15 July 2005, UFS, Bloemfontein
Establishment of PanSALB
• The Pan South African Language Board
(PanSALB) (Act 59 of 1995) was established:
– to develop, promote and ensure use of South
Africa’s eleven official languages, South African
Sign Language (SASL) and the Khoe and San
languages, and
– to promote respect for other languages used in the
country (e.g. heritage languages (Dutch, French,
German, Hindu, KiSwahili, Portuguese, Tamil, etc. )
Afrilex,13 - 15 July 2005, UFS, Bloemfontein
• PanSALB ensures the implementation of the
National Language Policy Framework (NLPF) to
ensure access to services to all citizens through:
• 9 Provincial Language Committees (PLCs)
– Assist Provinces with language policy formulation and
implementation
• 13 National Language Bodies (NLBs)
–
–
–
–
–
Standardisation (e.g. spelling and orthography rules)
Terminology development
Dictionary needs (general vocabulary)
Literacy and media
Research and Education
• 11 National Lexicography Units (NLUs)
– Compilation of comprehensive monolingual and other
types of dictionaries
Afrilex,13 - 15 July 2005, UFS, Bloemfontein
3. The South African HLT initiative
3.1 History
• Lexinet research programme of HSRC (1988)
(Wordnet, Termnet, Docnet, Transnet, Ailang, etc.)
• PanSALB and DACST (now DAC) initiated the
HLT project in 1999
• The former Minister of DACST appointed a panel
of experts to investigate the establishment of a
HLT virtual network
• The HLT task team concluded that a HLT National
Facility should be established
• The developers of the envisaged HLT National
Facility should ensure that HLT advance
multilingualism in different respects, i.e.:
Afrilex,13 - 15 July 2005, UFS, Bloemfontein
• Key government documents in the languages the
citizens can understand best
• Electronic systems to connect lexicographers and
terminologists with other language practitioners
• Electronic systems to disseminate lexicographical
and terminological data
• Electronic systems to connect translators and other
language workers with word and term banks
• Central government assistance to meet
communication needs of all its citizens
• Local and provincial governments to serve as focal
points of information dissemination (e.g. multipurpose
community centres
) - 15 July 2005, UFS, Bloemfontein
Afrilex,13
3. The South African HLT
initiative
3.2 National Facility
• Purpose of HLT project:
– to fast track the use and development of
indigenous languages
– to promote the SA government’s policy of
multilingualism
– to facilitate better service delivery for citizens to
access or supply information in any of the
official languages
Afrilex,13 - 15 July 2005, UFS, Bloemfontein
• Basic premises for the development of HLT:
– development and effective management of
reusable text and speech resources in all official
languages of SA;
– capacity building with respect to research and
development in the field of HLT; and
– stimulation of an HLT industry that will provide
language-based electronic products which, in
turn, will be applicable in all relevant sectors,
especially in the government sector.
Afrilex,13 - 15 July 2005, UFS, Bloemfontein
3.3
SA Human Language
Technologies Model
• The South African HLT model is based on a
model being implemented by the European
Union (EU)
• EU model is effectively implemented in the
EU Framework Programmes (FP 3/4/5/6)
• South African HLT model will grow
exponentially as expertise and resources are
developed
Afrilex,13 - 15 July 2005, UFS, Bloemfontein
Afrilex,13 - 15 July 2005, UFS, Bloemfontein
3.3.1 Aims of envisaged HLT virtual
network
• An e-government process needs to provide citizens
with:
– Access to online facilities
– Required and necessary service delivery
– Infrastructure to make it work
• Two basic prerequisites are:
– A technical infrastructure (IT access; proven and
multipurpose IT systems; online language services)
– Human capital (capacity building e.g. trained and reskilled
language practitioners)
Afrilex,13 - 15 July 2005, UFS, Bloemfontein
3.3.2
Identified needs:
• Low general awareness level regarding HLT
benefits
• Interdisciplinary curricula at tertiary level to
advance HLT development
• Systematic presentation of short dedicated HLT
courses
• Theoretical and practical training in the fields of
lexicography and terminology
• Job creation should be carefully planned
• Upgrade and maintain
a
knowledge
base
on
HLT
Afrilex,13 - 15 July 2005, UFS, Bloemfontein
3.3.3
Proposed three-step strategy
for development of HLT model:
• Step 1: Applied research and capacity building,
production of language resources, development of
enabling technologies and of a HLT industry.
• Step 2: Development of a legal framework to ensure
systematic acquisition, administration and
conservation of electronic language resources.
• Step 3: Development of an infrastructure to manage
the implementation of the proposed HLT model
Afrilex,13 - 15 July 2005, UFS, Bloemfontein
The following diagram demonstrates the various relationships:
MEDIA
University
A
Govt
Dept B
SABC
Centre for
Human Language Technologies
NLU
Z
Central planning, coordination &
consultation
Digital Text and Speech Corpora
Acquisition, enhancement, management
University
D
NLP Software development
NLU
P
Company
A
HLT Training
University
C
Company
B
Govt
Dept A
Resources and Expertise to feed into
National
Lexicographic
Units
(NLUs)
Government
Departments
HLT products for
e-governance
e-learning
e-commerce
Private sector
development
ICT (HLT) job
creation
software dev.
e-commerce
Afrilex,13 - 15 July 2005, UFS, Bloemfontein
Academic
research and
development
3.3.4 Role players
• Government services: national, provincial
and local (e.g. e-government, e-learning, ecommerce, etc.)
• Parastatal institutions (e.g. PanSALB)
• Private sector
• Academia (tertiary education)
• Education (primary and secondary
education)
Afrilex,13 - 15 July 2005, UFS, Bloemfontein
3.3.5 Progress
• Parsing (Zulu and other African languages)
by Special Interest Group (SiG), African
Languages Association of Southern Africa
(ALASA)
• Speech recognition (Tourism: pilot booking
service)
• Amalgamated Banks of South Africa (ABSA)
multilingual pilot project: ATM screen
prompts and telephone banking prompts in
African languages
(Zulu, Xhosa and South Sotho)
Afrilex,13 - 15 July 2005, UFS, Bloemfontein
Progress...
• TISSA (Telephone Interpreting Service of South
Africa) (all ports of entry; health services; police
charge offices; etc.)
• Spellcheckers: Afrikaans developed by NorthWest University; African Languages by
University of Pretoria/North West University;
future development combined effort
• Microsoft human/machine interface:
combined effort re terminology development
• Afrilingo: e-learning tool for language
- 15 July 2005, UFS, Bloemfontein
acquisition (11Afrilex,13
official
SA languages)
Progress ...
• TshwaneLex: dedicated computer software
program for data capturing (lexicography)
• 11 National Lexicography Units (NLUs) of
PanSALB: Monolingual dictionaries for each
of the 11 official South African languages
• NLUs: Data collection and building of corpora
• NLUs: on-line dictionaries (e.g. Afrikaans, Northern
Sotho (Sesotho sa Leboa))
• TshwaneTerm: dedicated computer software
program forAfrilex,13
data- 15capturing
(terminology)??
July 2005, UFS, Bloemfontein
Progress ...
• National term bank (multilingual, polythematic):
Terminology Coordination Section (TCS) of
the National Language Service (NLS),
Department of Arts and Culture (DAC)
• Latin terminology: interactive multilingual
e-learning project (PanSALB, CLTAL,
Trydian Interactive)
• Mathematics on-line dictionary project:
South African Multilingual Mathematical
Lexicon (SAAfrilex,13
MML)
- 15 July 2005, UFS, Bloemfontein
Lexicographical and Terminological
information available on HLT virtual network
• SA Government has approved the development
of a human language technology (HLT) virtual
network
• All lexicography and terminology endeavours to
be part of HLT virtual network
• For multilingual words and terms to be available
on HLT virtual network to end-users (subject
specialists, students, language practitioners,
general public) - dictionaries are needed!!!
Afrilex,13 - 15 July 2005, UFS, Bloemfontein
4. New terminology training
initiative from PanSALB:
• Members of TCs, NLBs: Guidelines to verify
and authenticate terms
• Skills development: Language practitioners:
terminologists, lexicographers (e.g. NLUs),
translators, interpreters, linguists, teachers,
journalists, language students, etc.
• Skills development: subject specialists
• Reskilling: Unemployed
language workers
Afrilex,13 - 15 July 2005, UFS, Bloemfontein
PanSALB skills development terminology training programme
TCS, NLS, DAC
PanSALB
Re-skilling of
unemployed
and other
language
workers
Multilingual polythematic
national term bank
National Lexicography Units
Monolingual general dictionaries
National Language Bodies
Verify and authenticate terms
(need terminographic guidelines)
Provincial Language Committees
Subject specialists
Afrilex,13 - 15 July 2005, UFS, Bloemfontein
NLUs
Lexicography
School
for
Languages
Statistics
Zoology
NLBs
Terminology
PLCs
TCS
NLS
Psychology
LUs
Afrilex,13 - 15 July 2005, UFS, Bloemfontein
5. Conclusion:
– Development of skills
– Enhancement of South African languages
– Development of languages into functional language
– Dissemination of multilingual polythematic (speech
and text) information within the South African
community
– Better communication among all citizens in different
spheres of life
– Improvement
of computer literacy
Afrilex,13 - 15 July 2005, UFS, Bloemfontein
“Utilising technology for the development of
the South African languages and developing
these languages for use with Human
Language Technology applications such as
spellcheckers, translation memories and
speech-recognition systems will enhance the
status of the indigenous languages and will
result in increased job opportunities in the
language field.”
Dr Ben Ngubane (former Minister of Arts Culture Science
and Technology) 2003