Presentation1.958 KB

Download Report

Transcript Presentation1.958 KB

MT@EC
European Commission machine translation
supporting e-government
Spyridon Pilos
Head of language applications
Directorate-General for Translation
MT@Work
Brussels, 5.12.2014
European Commission machine translation
and public administrations
•
•
•
•
MT@EC: a service for the EU
The context of the free trial
Implementation
What next?
2
EU official languages over time
3
EU translation services
DGT
4
Why does the Commission need MT?
• The Commission…
• DGT has 1700 translators
• Over 2 M pages translated in 2013
• But…
…just to make europa.eu fully multilingual
almost 6.8 M documents to be translated
or 8 500 translators/year!
The result:
Thousands of non-translated documents
(and this does not include user generated content)
5
There are also interactions with and
between actors in the Member States
Member State X
Member State Y
Business
A2B
Citizens
A2C
Administration
A2A
Administration
A2B
A2C
A2A
A2A
EU Administrations
First type
Second type
6
Vision
Wouldn’t it be great if I could start using a public service in
any Member State from any place and obtain the information
in my mother tongue?
7
8
• EIF=European Interoperability Framework
• ISA=programme for interoperability
solutions for public administrations
9
EIF*: 12 Underlying principles
Need for EC action
• Subsidiarity and Proportionality
User needs and expectations
• User Centricity, Inclusion and Accessibility,
Security and Privacy, Multilingualism,
Administrative Simplification, Transparency,
Preservation of Information
Collaboration
• Openness, Reusability, Technological
Neutrality and Adaptability, Effectiveness and
Efficiency
10
* European Interoperability Framework
http://ec.europa.eu/isa/documents/isa_annex_ii_eif_en.pdf
The role of Machine Translation
MT is the only viable solution for:
 quick and cheap access to information in
foreign languages.
 understanding information received in a
foreign language that otherwise could not be
used or would require substantial time and
costs to translate.
 making multilingual use of websites possible
 facilitating cross-lingual information search
and analytics.
That is why machine translation (MT) is a
critically important technology for multilingual Europe
MT@EC: a European Commission product
•
• Released : 26 June 2013 (version 1.0)
•
•
Version 2.0 released on 3 July 2014
Languages:
• Technology:
All 24 EU official languages
552 language pairs (62 direct)
Statistical machine translation
using open source software Moses co-funded by EU
Framework Programmes for research and innovation
• Development by DGT: between 2010-2013
co-funded by the ISA programme (action 2.8)
•
* Interoperability solutions for public administrations
http://ec.europa.eu/isa/actions/02-interoperability-architecture/2-8action_en.htm
12
MT@EC description
• Delivery: - web user interface (human to machine)
- web services (machine to machine)
• Special features:
• User interface in 24 languages
• Source document format/formatting maintained [not for pdf]
• Specific output formats for translation: tmx and xliff
• Translation can also be returned by email
• Can translate multiple documents to multiple languages
• Indication of quality for language pairs (using BLEU Scores)
• Feedback mechanism (using EU Survey)
13
MT@EC security
• Secure hosting in the EC data centre
• Access through ECAS (EC Authentication Service)
• Secure document transfers :
- over sTESTA*, a very secure private network between
public administrations in the EU, separate from the internet
- over the internet (through a secure https connection)
•
* You can check if your organisation has access to sTESTA on:
https://portal.testa.eu/jetspeed/portal/homepage/about.psml.
14
MT@EC is already available for…
 … the staff of European institutions and bodies:





Commission
Parliament
Council
Court of Justice
Court of Auditors
 Economic and Social Committee
 Committee of the Regions
 European Central Bank,
 European Investment Bank
etc.
 … online services funded or supported by the EU
 … real-life trial and pilot projects with public

administrations in the EU Member States
… collaboration projects with EMT* Universities
* European Masters in Translation
15
Online services connected to MT@EC
 in production
Service Description/URL
Internal Market Information
IMI
System
http://ec.europa.eu/internal_market/imi-net/index_en.html
SOLVIT
nLex
SOLVIT is an on-line problem solving network concerning
misapplication of Internal Market law by public authorities.
http://ec.europa.eu/solvit/
A common gateway to National Law
http://eur-lex.europa.eu/n-lex/
16
Online services connected to MT@EC
 in test
Service
Description/URL
e-Justice
The future electronic one-stop-shop in the area of justice
http://e-justice.europa.eu/
ODR
Platform to facilitate the resolution of consumer disputes out-ofcourt (Alternative Dispute Resolution)
http://ec.europa.eu/consumers/redress_cons/adr_en.htm
CircaBC
Communication and Information Resource Centre for
Administrations, Businesses and Citizens
https://circabc.europa.eu/
EU Survey
Tool for creating multilingual online surveys
http://ec.europa.eu/eusurvey/
17
Online services to be connected to MT@EC
 in preparation
Service
Description/URL
TED
TED (Tenders Electronic Daily) is the online version of the
'Supplement to the Official Journal of the European Union',
dedicated to European public procurement
http://ted.europa.eu/
Joinup is an open collaborative platform supporting
interoperability in Europe
https://joinup.ec.europa.eu/
Joinup
18
Online services interested in using MT@EC
 discussions initiated
(indicative list)
Service Description/URL
EURES
The European employment services network
(European Job Mobility portal)
https://ec.europa.eu/eures/
EQF
ESCO
The portal supporting the implementation of the European
Qualifications Framework for lifelong learning
http://ec.europa.eu/eqf/home_en.htm
The multilingual classification of European Skills, Competences,
Qualifications and Occupations which identifies and categorises skills
and competences, qualifications and occupations in all 22 European
languages and supports EURES and other similar portals
https://ec.europa.eu/esco/
EPALE
The European Portal for Adult Learning
http://ec.europa.eu/epale
19
MT@EC for Public Administrations
Context: MT@EC "Pilot operation" phase until Q4/2014 (ISA)
Objective: Develop and test in real-life conditions methods
and structures for most efficient use of MT@EC by different
beneficiaries (including PAs); normal operation of service.
Conditions
• PAs participate on a voluntary basis.
• No cost for PAs other than use of internal resources.
• No commitment by DGT on use of service after the end of
the pilot.
Output
• Service delivery models (including pricing)
• Operational support structure and methods
20
MT@EC for Public Administrations
- Free real-life trial
 - Staff members can have direct access to the
standard MT@EC service
 [upon request by the individual PA staff member]
• - The Organisation can participate in a
customisation pilot project, where DGT can
also build specific engines with their own data.
• [Administrative Agreement between PA and DGT needed,
to be signed until end of June 2015]
21
Customisation pilots for PA
• Pilot A:
• Pilot B:
• Pilot C:
• Pilot D:
• Pilot E:
Connect a PA information system
to the standard MT@EC service.
DGT builds custom engines with PA data
available through MT@EC to all
DGT builds custom engines with PA data
available through MT@EC only to the PA
DGT builds custom engines with PA data
for PA to run in PA premises
DGT assists PA to build own custom
engines to run in PA premises
If you are interested
email [email protected]
22
Ongoing pilots
Country
Name of administration
Finland
Prime Minister's Office
Germany
Bundesprachenamt
Greece
Hellenic Quality Assurance and
Accreditation Agency for Higher Education
Type
Pilot
Central translation service
Translation service
of the Armed Forces
C
Education administration
A
E
Discussions were held with more PAs but did not lead to signature of
agreements on pilots usually because:
• there was no need for custom engines
• the necessary data were not enough or could not be shared
• resources could not be made available for the work
to be performed on the PA side.
Special types of "pilots"
 Networks (Association des Conseils d’État et Cours administratives suprêmes de
l'UE, Réseau des Présidents des Cours suprêmes judiciaires de l'UE, Legivoc project)
23
 New languages (Norwegian)
Staff access to MT@EC
• Get an individual ECAS user name and password (selfregistration) using your work email address.
[go to https://webgate.ec.europa.eu/cas/eim/external/register.cgi
and follow the instructions]
• Send an email to [email protected] asking for the
activation of access to the service.
• DGT will activate your access and inform you by email.
24
Users - total
Country
reg'd using
Country
TOTAL
registered
Austria
3
3
Belgium
5
3
Bulgaria
1
1
Croatia
0
0
Cyprus*
77
46
Czech Republic*
25
15
Denmark
0
0
Estonia
3
3
Finland
2
2
Only one
France*
21
15
Germany*
30
28
Greece*
37
23
Hungary
1
0
Ireland
0
0
reg'd using
Italy
2
1
Latvia
0
0
Lithuania
1
1
Luxembourg
3
2
Malta
0
0
Netherlands
8
8
Poland
0
0
Portugal*
7
5
32%
Romania
9
7
2 to 9
54%
Slovakia*
86
39
10 or more
14%
Slovenia*
13
7
Spain*
9
7
Sweden*
3
3
UK
1
125
347
using
220
63,4%
Requests per user
* Countries where
national events
were organised
Top 40 users
Country
Requests
Domain
Requests
Germany
633
Economy and finance
674
Slovakia
313
Agriculture
218
France
156
Foreign affairs
92
Greece
125
European affairs
61
Cyprus
75
Health
61
Portugal
22
Modernisation
55
Finland
15
Education
48
Spain
14
Local government
48
Slovenia
12
Bulgaria
10
Czech republic
10
Lithuania
10
Domain
Requests
Transport
37
Telecom
20
Statistical authority
14
Employment
12
Interior
11
Justice
11
Police
11
26
Implementation
• Usually individuals ask for their own translations.
• In some cases a translation service centralises requests
(for example through functional mailbox)
• No guidelines on feedback or evaluation were imposed by
DGT. Quality is "fit for purpose" (compliance with user
requirements). A feedback function is available in MT@EC.
• Translation to/from non-EU languages is very important in
several cases.
• For translators, if MT is not integrated in their translation
workflow so as to post-edit easily, then they will not use it.
• Original is sometimes hand-written or "confidential".
27
Feedback
• Different depending on whether it comes from translators
or other users
• Little understanding of statistical MT technology and its
constraints
• Several problems were pointed out:
• document formats and formatting
• national names and acronyms
• non translation of "common" words
• ommission of words
• consistency
• syntax, grammar etc.
Hint: Do not test on only one document to draw general conclusions.
Usefulness depends greatly on factors such as type of document,
quality of original, domain and language pair.
28
Intermediate conclusions (1)
On the pilots
• In most cases the generic engines were sufficient.
• Difficult to find data that are useful in terms of quality and
quantity for building engines while ownership and
confidentiality is an issue.
• Lack of clarity on status of the service after the end of the
pilot discouraged investment on the side of PAs.
• Translation services asked for guidelines for evaluation and
structured feedback.
• Information to technicians should be provided in their own
language.
• Need more clarity on scope of "public administration".
29
Intermediate conclusions (2)
On the service
• Do not need too much security: sTesta to internet https
• The interface should be multilingual
• A tool for translators and other users: different attitudes.
• Use depends on "fitness for purpose" and not on some
general quality of language
On communication
• Difficult to find the right network to promote (used ISA,
EUPAN, COTSOES, DGT Field Offices in MS etc.)
• Promotion in national events in the language of the country
(even in videoconference) worked best.
30
MT@EC for EMT universities
• Free use for teaching or research.
• Mutually beneficial project-based cooperation.
• The teacher/researcher may ask for access to see how it
looks like and check whether it is relevant for his/her work.
• If interested s/he sends a short project description (title,
duration, objectives, approach, expected volume of
requests) and a list of more persons to access.
• At the end of the project s/he informs DGT on the outcome
of the project or study, as well as any other feedback
considered useful to improve the service and its use.
Status: On 30.11.2014 we had 103 registered users, of which 75 are
students, from 21 universities from 12 countries (11 EU MS and CH),
of which 9 have communicated a research/teaching plan.
31
What next?
from MT@EC... to the CEF
automated translation platform
CEF.AT will:
• build on the existing MT@EC service
• put emphasis on secure, quality, customisable MT
32
MT@EC
Outline
MT engines
Users and
Services
DISPATCHER
managing
MT requests
by language,
subject…
MT data
language resources
specific for each MT engine
Language
resources
built around Euramis
DATA
MODELLING
Customised
interfaces
ENGINES HUB
USER FEEDBACK
DATA HUB
CEF.AT platform
Outline
The service
MT engines
DSIs
DISPATCHER
by language,
domain…
Engines factory
Language resources
managing
MT requests
Multilingual
corpora
Monolingual
corpora
NLP Tools
Other
SECURE
(and performing)
From data to engines
Collect and clear
QUALITY
CUSTOMISABLE
Real-life trial and customisation pilots
for Public Administrations
- There is still time for your organisation to participate in a
pilot (sign agreement until end of June 2015).
- Any staff member of a public administration can ask for
access at any time.
- Access will be free of charge until further notice.
- Service delivery models (including pricing) will be
developed only under the Connecting Europe Facility.
- Lessons learned from the pilots will be used for developing
the operational support structure and methods for the CEF.
35
Useful links
• DGT MT page on europa.eu
http://ec.europa.eu/dgs/translation/translationresources/machine_translation/index_en.htm
• ISA page on action 2.8 Machine translation
http://ec.europa.eu/isa/actions/02-interoperability-architecture/2-8action_en.htm
Includes:
• The ISA Work programme 2010-2014 for MT@EC
• Presentations for public administrations
• and more…
• CEF work programme for 2014 where
section 3.1.7 is on the CEF.AT platform
https://ec.europa.eu/digital-agenda/sites/digital-agenda/files/WP2014%20%20official%20published.pdf
• Language technologies (CEF, H2020,…)
http://ec.europa.eu/digital-agenda/language-technologies
• Language technology resources (DGT-TM, EuroVoc,…)
http://ec.europa.eu/jrc/en/language-technologies
36
Questions?
[email protected]
[email protected]