Presentation429 KB

Download Report

Transcript Presentation429 KB

POST-EDITING –
PROFESSIONAL TRANSLATION
SERVICE REDEFINED
Darja Fišer
University of Ljubljana
MT@Work
5 December 2014
Brussels, Belgium
Presentation Outline
1. Basic concepts about post-editing
2. Quality in translation projects
3. Types of post-editing
4. Post-editing guidelines
MOTIVATION
Why should I care about post-editing?
Why PEMT?
• Increasing demand for PEMT in the market:
• increasing volume of short-lived documents
• different levels of text quality acceptable
• The industry perspective on MT:
• to lower productivity prices
• to publish more content
• to publish in more languages
• to publish in less time
• The TAUS (2010) survey:
• 52% companies in the US, EU & Asia provide PE services regularly
• 74% of the resources they used are freelance translators
THE BIG PICTURE
How does MT affect the translation process and the
translator?
Integrating MT in the translation process
Phase 1:
Translation Memories
Source
text
0%
translate
d
Translation
Memory
(TM)
Hybrid
text
x%
translate
d
Phase 2:
Machine Translation
Machine
Translation
(MT)
Untranslated
segments
Hybrid
text
100%
translate
d but with
MT errors
Phase 3:
Post-editing
Posteditor
Target
text
100%
translate
d
The role of translators in PEMT
• The role of PEMT experts:
• edit the output
• select the adequate corpus
• clean up the data so the output is more suitable for the customer
• provide constant feedback to improve the system’s performance
• the role changes as MT improves
• The nature of PEMT projects:
• large contents of highly repetitive nature, short-lived, internal use
• pre-editing (at the SL level before MT to avoid ambiguous input to
the MY system)
• post-editing (at the TL level after MT to correct errors in the MT
output)
BASIC CONCEPTS IN PEMT
Is post-editing a translation task or a revision task?
Post-editing vs. Translation
• Post-editing:
• reviewing an MT text against an original text and correcting any errors in
order to comply with a set of quality criteria in as few edits as
possible
• the set of quality criteria ≠ a personal idea of translation quality
• as few edits as possible > to increase the productivity
• PE vs. Translation:
• Translation: 1 source
• PE: 2 sources (the original & the raw MT output):
1. reject the MT output & translate from scratch (PE closer to Trans than Rev)
2. correct a lot/a few of errors (PE closer to Rev than Trans)
3. accept the proposed translation as is (PE closer to Rev than Trans)
• PE should be done by a translator, not a monolingual reviewer!
Post-editing vs. Revision
• PE:
• deals with recurring, predictable errors
• MT texts put a strain on the post-editing expert, so PE is more
cognitively demanding
• Revision:
• checks for random mistranslations or omissions
• human errors more difficult to spot but the texts are easier to read
• PE & Revision both require specific skills and should
be tackled by translators trained & experienced in the
task!
• > 100,000 words / 1 month of full-time post-editing
THE JOB PROFILE
What skills and qualities do I need to be a good postediting expert?
Skills for post-editing (O’Brien 2002)
• Degree in translation or related subjects
• Expert in the subject area and target language
• Proficient in the source language and contrastive issues
• Experience in technical translation/localization
• Advanced word processing skills, full key proficiency (search&replace)
• Positive, tolerant and open-minded towards MT
• Confidence in abilities and technical expertise
• Recognition of typical and repetitive MR errors
• Ability to use macros and coded dictionaries
• Advanced terminology management skills
• Background MT knowledge, types of PE and levels of expected quality
• Pre-editing skills (controlled language & controlled authoring tools)
• Programming skills for automatically correcting errors
MT QUALITY
What can I expect from MT and what can clients expect
from PEMT?
Common MT errors
• What MT errors to expect:
• Depend on the MT system, the content and the language pair used!
• error analysis time-consuming but:
• crucial to improve the MT system
• crucial to raise awareness about the post-editing task
• Several error classifications exist (Schäffer, 2003):
1. Lexical errors (general vocabulary, terminology, polysemy, idioms)
2. Syntactic errors (sentence analysis, word order)
3. Grammatical mistakes (tense, number, gender, case, punctuation)
4. Errors due to defective input (mistakes in the source language)
Quality in technical translation and localization
• Functionalist’s approach to quality:
• the focus is on the customer’s needs and what they pay for
• quality is variable and is defined by clients, not the society in general
• Fit-for-purpose! (not what trained translators would consider the best)
• Quality of MT
• standard MT evaluation measures (BLEU, Meteor, NIST, TER):
• how close the input is to human quality with a single number
• not very reliable in most translation projects
• manual quality assessment needed!
• crucial for productivity savings & pricing
• random strings checked for grammar, terminology and format (grades 1-5)
• very specific client’s quality expectations needed! (rapid/full PE)
• Quality of PE
• MT is used to save costs, so revision of PE texts is usually not done
• Crucial to strike a balance between speed and the quality of PE
TYPES OF POST-EDITING
How much post-editing should I do?
Different levels of post-editing
1.
No post-editing
• directly published on the internet, with disclaimer
2.
Rapid post-editing
• suitable for short-lived documents needed gisting & internal use
• min editing, shortest time possible, min no. of changes, to remove
blatant & significant errors, no stylistic changes
3.
Full post-editing
• leading to human quality, required for texts for publication
• max editing, all errors and stylistic changes taken into account (but still
in less time than translating from scratch)
• Criteria:
• the MT system and language pair used
• the domain and structure of the text
• the use of the final text, the desired quality and the type of readers
• the volume of translation and the time available
POST-EDITING GUIDELINES
What exactly should I correct and how much?
General guidelines for PE
• Language- and project-specific guidelines needed for
each project!
• as short and precise as possible:
• a description of the MT system and the source text used
• a description of the quality of MT output and the expected quality of the
•
•
•
•
finished translation
scenarios when to discard a useless segment
typical types of errors that need to be corrected
changes to be avoided
terminology issues
Guidelines for rapid PE
1. Read the source segment first
2. Read the MT suggestion
3. Make the necessary changes:
• Make sure the content of the sentence is accurate
• If the terminology is incorrect, don’t spend too much time researching
• Don’t post-edit word-order if the sentence can be understood as is
• Don’t change style
• Don’t replace words with a synonym
• Don’t correct grammar mistakes unless the target sentence doesn’t
reflect the meaning of the source sentence
Guidelines for full PE
• Always very project-specific
• Use the MT suggestion if:
• a large piece of the sentence is correct
• the raw MT quality is very good with only minor corrections needed
• the raw MT quality is not so good but would still be faster to correct
it than to translate from scratch
• the MT has the correct meaning and is completely understandable
• Don’t use the MT suggestion if:
• the raw MT doesn’t make any sense and it would take longer to
correct it than to translate from scratch
• you need a more than a few seconds to understand it
• there are errors that would require rearranging most of the text
Examples from the guidelines at Microsoft
• The 5-10 second evaluation:
• the maximum time you should spend evaluating the validity of the
MT suggestion
• if it is hard to understand already at the beginning, don’t even read
the whole sentence, just proceed to translate from scratch instead.
• The High-5 & Low-5 rule:
• When you detect a long sentence, do the following:
• Read the first 5 words. If it’s good, read on until it’s bad, then stop and
copy the correct part and continue to translate and forget about reading
on.
• If the first 5 or 6 words aren’t good, skip to read the last 5 or 6 words. It
the last part of the sentence is correct, use it, or just start the whole
thing from scratch.
• If both first 5 and last 5 words are incorrect, do not carry on reading
through the middle to try to identify correct MT segments. Just discard
the MT suggestion and proceed to translate from scratch.
POST-EDITING EFFORT
AND PRODUCTIVITY
How hard will post-editing be and how much will I gain?
Post-editing effort
• Key element to decide if the use of MT is worthwhile or not
(Krings, 2001):
• Temporal PE effort
• Does PEMT save time vs. human translation?
• Does PEMT save time vs. TM fuzzy matches?
• Depends of the quality of the raw MT output and type of errors!
• Cognitive PE effort
• How complex and cognitively demanding are the corrections?
• Obvious mistakes (gender) vs. ambiguous complex syntactic structures
• Technical PE effort
• Does PE require to delete, insert, reorder or a all 3?
• Measuring PE effort:
• temporal: the easiest to measure
• cognitive & technical PE: eye-trackers, Translog, Think Aloud Protocols
(useful in research, less so in the commercial world)
Post-editing productivity
• One of the big unknown factors in PEMT projects
• new field, so no standard metrics exist
• productivity in PE estimated at 4,000-10,000 words/day
• many variables to consider:
•
•
•
•
the quality of raw MT output?
the productivity of translators in general?
the experience of post-editors?
the amount of effort to post-edit fuzzy matches?
• inconclusive results:
• early studies: show productivity gains up to 3 times compared to HT
(Vasconcellos and Leon 1985)
• recent studies: productivity gain not always achieved
(O’Brien 2006, Guerberof 2008)
• commercial users: many claim high productivity gains but don’t make their
methodology available
• Test before you commit!
LETS PEMT!
PEMT is here to stay
Learn & Teach PEMT
Ride the wave!