信息时代的翻译工具

Download Report

Transcript 信息时代的翻译工具

The Role of Translation Tools in the Information Age

Feng Zhiwei Institute of Applied Linguistics, MOE, China [email protected]

First Sino-German Symposium on Knowledge Handling, 2007, Beijing

Abstract

 We are in the information age. The snowballing acceleration of available information resulted in drastic changes in the way of translators work. The paper introduces the translation tools in information age: machine translation system, translation resources on Internet and on CD-ROM, computer assisted terminology management system, parallel corpora, translation memories and localization tools, computer-aided machine translation system. If translators use these translation tools properly, they shall improve the efficiency and quality of translation.

Keywords:

 multilingualism, machine translation system, Internet, CD-ROM, translation resources, computer-assisted terminology management system, parallel corpora, translation memories, localization tools, computer-aided machine translation system.

English – lingua franca?

    80 percent of all business transactions in Denmark are carried out in English.

Many large corporations have adopted English as their official language.

85 percent of international organizations use English as their working language.

In Europe, 99 percent of all international organizations have English as one of their official languages.

English – lingua franca?

     98 percent of all German physicists and 83 percent of all German chemists publish their findings in English.

90 percent of all scientific publications are written in English.

The majority of Nobel Prizes go to laureates who are citizens of countries where English is the official language. English is the default language for international scientific conferences. No matter where they take place or what their specific topics are.

english-reader.JPG

Linguistic uniformity or multilingualism ?

 In the world, there are 6000 different languages with different cultural backgrounds.

 in Welsh of United Kingdom, the people speak welsh language.  In USA, some people speak Spanish.  The multilingualism is very popular and necessary.

Multilingualism in European Union

    IN EU institutions, its original 15 member states have the privilege of using their state languages to conduct their official business.

This multilingualism is made possible by the work of about 4000 in-house translators, interpreters and terminologists and many more freelancers.

With 11 official languages (for 15 states) and 110 possible language-pair combinations, in 1997, 2 billion euros were spent on translation.

This does not include the more than 200,000 pages translated by EC-SYSTRAN MT system each year.

Multilingualism in European Union - cont

 Each additional official language increases the demands by 250 to 300 linguists.

 With the expansion of the EU by as many as 12 new members and the integration of 10 new languages, the number of combinations would increase exponentially, resulting in 420 combinations of 21 languages.

Localization in business

 The clients will only buy in their own language.

 The sellers need to speak the language of the customer and to adopt their conduct and products to the specific characteristics of the local market .

 Localization is not only for products, but also for the methods of designing, producing, marketing and distribution.

Internet and translation

      Internet becomes a multilingualism network.

2000-2005, internet growth of English is only 126.9%, while internet growth of Russian is 664.5%, Portuguese 327.3%, Chinese 309.6%, French 235.9%.

Growth rate in number of Internet users in non English speaking countries are much higher than in English speaking countries.

Dominant position of English was broken.

Translation becomes more and more important.

internet-langue.doc

Information explosion

    In information age, the snowballing acceleration of available information resulted the information explosion.

The amount of knowledge to be processed within the next decade is larger than the amount of knowledge accumulated during the past 2500 years.

165,000 scientific journals are currently being published.

20,000 new scientific papers are produced every day.

Information explosion

 The amount of data that is circulating on the Internet on any given day is larger than all the information available throughout the 19 century (

Der Spiegel

, 1996)  The combined vocabulary of technical and scientific disciplines amounted to 30 million words in 1991 (Siemens, 1991)

Translation market

  According to the study by Allied Business Intelligence, the global translation market is 10.4 billion in 1999 and 17.2 billion in 2003 respectively.

In 1997, the EU-funded ASSIM study estimated the total turnover of the translation markets of 18 member states of the EU and EEA (European Economic Area ) to be 3.75 billion euros with software, audio-visual and multimedia translation constituting 20 percent of the total turnover.

Translation market

 According to ASSIM study, the total number of in-house and external translators in EU and EEA exceeds 100,000.

 The total turnover of the translation market in China continent exceeded 10 billion Yuan.

 Electronic translation tools will be helpful in translation

Using electronic translation tools

 According to ASSIM study, more than 50 percent of translators interviewed for the 1997 ASSIM report were using electronic dictionaries, and about one-third of the translators were using translation memory systems.

 A lot of translators in China continent and Hong Kong were using electronic tools in their translation.

Electronic Translation Tools

        Machine Translation System Translation Resources on Internet Translation Resources on CD-ROM Computer-Assisted Terminology Management Parallel Corpus as Translation Tools Translation Memory Localization Tools Computer-Assisted Translation System

Machine Translation System

  The first attempts to mechanize translation were made as early as the 1930s.

Weaver memorandum (1949) – “I have a text in front of me which is written in Russian but I am going to pretend that is really written in English and that is has been coded in some strange symbols. All I need to do is strip off the code in order to retrieve the information contained in the text.” – MT is a decoding system.

Warren Weaver (1947)

ingcmpnqsnwf cv fpn owoktvcv hu ihgzsnwfv rqcffnw cw owgcnwf kowazoanv ...

Warren Weaver (1947)

e e e e

ingcmpnqsnwf cv fpn owoktvcv

e e e

hu ihgzsnwfv rqcffnw cw owgcnwf

e

kowazoanv ...

Warren Weaver (1947)

e e e the

ingcmpnqsnwf cv fpn owoktvcv

e e e

hu ihgzsnwfv rqcffnw cw owgcnwf

e

kowazoanv ...

Warren Weaver (1947)

e he e the

ingcmpnqsnwf cv fpn owoktvcv

e e e t

hu ihgzsnwfv rqcffnw cw owgcnwf

e

kowazoanv ...

Warren Weaver (1947)

e he e of the

ingcmpnqsnwf cv fpn owoktvcv

e e e t

hu ihgzsnwfv rqcffnw cw owgcnwf

e

kowazoanv ...

Warren Weaver (1947)

e he e of the fof

ingcmpnqsnwf cv fpn owoktvcv

e f o e o oe t

hu ihgzsnwfv rqcffnw cw owgcnwf

ef

kowazoanv ...

Warren Weaver (1947)

e he e of the

ingcmpnqsnwf cv fpn owoktvcv

e e e t

hu ihgzsnwfv rqcffnw cw owgcnwf

e

kowazoanv ...

Warren Weaver (1947)

e he e is the sis

ingcmpnqsnwf cv fpn owoktvcv

e s i e i ie t

hu ihgzsnwfv rqcffnw cw owgcnwf

es

kowazoanv ...

Warren Weaver (1947)

decipherment is the analysis

ingcmpnqsnwf cv fpn owoktvcv

of documents written in ancient

hu ihgzsnwfv rqcffnw cw owgcnwf

languages ...

kowazoanv ...

Warren Weaver (1947)

When I look at an article in Russian, I say to myself: This is really written in English, but it has been coded in some strange symbols. I will now proceed to decode.

FAHQT can not be achieved

   But Full Automatic High-Quality Translation (FAHQT) can not be achieved using today’s technology.

Bar Hillel’s example (1959), – John was looking for his toy box. Finally he found it. The box was in the pen. John was very happy. How we can decide the sense of “pen” is a play-pen, and is not the writing tool. It is very difficult using today’s technology !

MT in SAP company.

 SAP uses the mainframe-based Metal MT system for translations from German to English.

 SAP see MT as enhancing productivity and of growing importance in the company’s translation methodology.

 SAP has found that using MT, under the best circumstances, can be two or four times faster than traditional translation methods.

Multilingual MT system -- Systran

 The Systran translation from English to Chinese is readable.

 Web of Systran : http://www.systransoft.com

 systran.JPG

Multilingual Intelligent hand-phone

    ‘Beijing city guide’ is a multilingual translation hand phone (Beijing Information development Company, 2006-08) hand-phone.jpg

Foreign visitor type in “I want to Beijing Hotel”, hand phone can translate it as “ 我想去北京饭店” .

Taxi driver type in “ 欢迎你来北京” , hand-phone can translate it as “You are welcome to Beijing”

Translation Resources on Internet

    Internet is the language resource for translation.

Finding data on the Internet is no problem at all. But finding reliable information is a rather difficult task.

Finding the information you really need can be very time-consuming and often frustrating.

Three strategies for Internet search

   Institutional search through URL ( Uniform Resource Locator ) . Thematic search via subject trees.

Keyword search via search engine. Search engine basically consist two components: – – A large index of words contained in web documents.

Retrieval software that lets you search for words in the index and then display the matching documents on the screen.

Libraries online and virtual bookstores

   In order to understand the source text, it may be as necessary to access libraries and browse virtual bookstore.

Via OPAC ( http://catalog.loc.gov

) ,you can search main libraries. catalog.jpg

Via Amazon ( http://www.amazon.com

) , you can browse the virtual bookstores on web. amazon.JPG

General encyclopedias

 Via Britannica online, you can search Encyclopaedia Britannica on line.

 http://www.britannica.com

 britannica.JPG

Specialized encyclopedias

    PC Webopedia is to the world of specialized online encyclopedias. It is a English reference work for information and communication technology (ICT). It contains a multitude of ICT terms, including elaborate and easy to-understand definitions.

Via PC Webopedia ( http://www.pcwebopedia.com

) ,you can search ICT encyclopedias on line.

webopedia.JPG

General monolingual dictionaries

  Merriam-Webster Collegiate Dictionary is available online.

Via Merriam-Webster ( http://www.m w.com

) , you can search either the dictionary or the thesaurus.  Simply enter your search terms, and click the search button.

 M-W.JPG

General multilingual dictionaries

   One-Look Dictionary is a search platform allowing you to search simultaneously about 600 word lists, glossaries, dictionaries and databases.

Via OneLook ( http://www.onelook.com

) , you can search the online multilingual dictionaries. one look.JPG

Via “ 金山词霸”( http://cb.kingsoft.com

) , you can search Chinese-English or English-Chinese dictionaries online. kingsoft.JPG

Multilingual Terminology databases

  Via Termite ( http://www.itu.int

) , you can search online multilingual terminology database of International Telecommunication Union (ITU). itu.JPG

Via Eurodicautom ( http://europa.eu.int/eurodicautom ) , you can search the online EU multilingual terminology database. In 1999, it contained 5.5 million entries and 180,000 abbreviations in the EU’s 11 official languages. eurodautom.JPG

Newspaper and magazine archives

    Via Spanish newspaper ABC (http://www.abc.es), German newspaper Die Welt (http://www.welt.de), American magazine Newsweek (http://www.newsweek.com), you can search the related background information for your translation. abc.JPG

welt. JPG Newsweek. JPG

Translation Resources on CD-ROM

     Translation resources on CD-ROMs are offline language resources. CD-ROM can offer information offline.

CD-ROMs are able to store vast amounts of data (in general around 650 Mb), They are highly suitable for storing multimedia information or huge amounts of textual data. The contents of the 32-volume edition of the Encyclopaedia Britannica can be stored on a single CD-ROM.

CD-ROMs are fairly cheap to produce.

Multimedia ability: graphics, audio and video sequences are easily integrated.

The use of hyperlinks allows for effective networking of entries (for cross-references, synonyms, etc)

Encyclopedia on CD-ROM

Encyclopedia of China on CD-ROM

。 

encyclop.jpg

 Britannica Concise Encyclopedia on CD ROM 。  concise-encyclo.JPG

Specialized encyclopedia on CD-ROM

 Construction Installation Encyclopedia 。  constrution.JPG

General dictionaries on CD-ROM

 Oxford English Dictionary (OED) on CD ROM 。 OED-3.JPG

 Bibliorom Larouse on CD-ROM 。 larouse.JPG

Electronic dictionaries in palm

 Children talking dictionary and spell corrector  Various English dictionaries  Bilingual dictionaries  Franklin. JPG

Computer-assisted terminology management       Professional translation is mostly technical translation. A technical translator is forced to keep up with the many fast changes that are taking place in the fields of information technology, manufacturing, business, medicine, biotechnology ,etc. It would be unrealistic to expect a translator to be a nature expert in all these fields. But the translator must to be an expert in quickly finding the information that he is lacking.

Search for terminology can take up to 75% of a translator’s time).

Terminology management is a general term for the documentation, storage, manipulation and presentation of specialized vocabulary. It can help the translator to resolve the problems of terminology.

Trados’ MultiTerm

   Main functions of Trados’ MultiTerm ( http://trados.com

) – Creating a new terminology database entry.

– – – – – – – – Importing terminology data.

Importing data from a word processor Retrieving terminology data Exporting terminology data Creating word lists, glossaries or dictionaries.

Distributing terminology data Exporting data via WinWord Exchanging data between a word-processor and MultiTerm trados.JPG

trados-term.JPG

Corpora as translation tools

     Corpus constitutes the raw textual material for various forms of linguistic analysis. The parallel corpus can help translator to compare the source language and target language.

Using corpora to check the acceptability of translation text.

Using Internet documents to Create a corpus. Retrieving data from your corpus with WordSmith ( http://www1.oup.com/elt/catalogue/Multimedia/Wordsmith ) .

– – Creating the wordlist.

Concordance shows the occurrence of a given search term in its textual context.

– Finding the keywords from a short article. wordsmith.JPG

Using Alta Vista Personal ( http://altavista.com

) to index and search local documents. altvista.JPG

Bible – bilingual parallel corpus

      Following is the segments of bible corpus ( http://www.o-bible.com/b5/int.html

) : 1:1[hb5] 起 初 神 創 造 天 地 。 [kjv] In the beginning God created the heaven and the earth. [bbe] At the first God made the heaven and the earth. 1:2[hb5] 地 是 空 虛 混 沌 . 淵 面 黑 暗 . 神 的 靈 運 行 在 水 面 上 。 [kjv] And the earth was without form, and void; and darkness was upon the face of the deep. And the Spirit of God moved upon the face of the waters. [bbe] And the earth was waste and without form; and it was dark on the face of the deep: and the Spirit of God was moving on the face of the waters. 1:3[hb5] 神 說 、 要 有 光 、 就 有 了 光 。 [kjv] And God said, Let there be light: and there was light. [bbe] And God said, Let there be light: and there was light. 1:4[hb5] 神 看 光 是 好 的 、 就 把 光 暗 分 開 了 。 [kjv] And God saw the light, that it was good: and God divided the light from the darkness. [bbe] And God, looking on the light, saw that it was good: and God made a division between the light and the dark, 1:5[hb5] 神 稱 光 為 晝 、 稱 暗 為 夜 . 有 晚 上 、 有 早 晨 、 這 是 頭 一 日 。 [kjv] And God called the light Day, and the darkness he called Night. And the evening and the morning were the first day. [bbe] Naming the light, Day, and the dark, Night. And there was evening and there was morning, the first day.

Translation memories & localization tools     Since many products are based on previously existing products, the corresponding documentation is also based on prior documentation.

Research has shown that 50% or more of the elements in a text can be repeated in the same text. If those elelments have been translated previously, it will be useful for translators to be able to recycle that prior work. Translation Memories (TMs) recycle existing translations so as to reduce time and costs as well as improve quality and consistency.

Three categories of search results in TMs

    Perfect or exact match: The translation unit found in the database corresponds exactly to the new source text element (100% match).

Full match: The translation unit found in the database is identical to a stored translation unit with the exception of variable elements such as dates, numbers, time, measurements, etc.

Fuzzy match: All other matches that do not match an existing segment exactly but range within a user-defined minimum match value (e.g. 75%) are fuzzy matches. The sentence match with the highest degree of similarity is displayed first. All other matches with a lower degree of similarity are added to a match list which can be accessed by the user.

If no match is found, the sentence has to be translated manually. The new translation is stored in the database.

Benefits of using TMs

     With a Translation memory system, the level of benefits is proportional to the degree of repetition in the document.

The use of TM can result in enormous savings, both for the client and the translator or translation agency.

Increase in income.

Elimination of repetitive translation tasks.

Consistency.

Translation memory of TRADOS

 Translator’s Workbench ( http://www.trados.com

)  trados-TM.JPG

 SDL-Trados 2006 freelance  SDL-Trados 2006 professional

Software localization tools

   Localization is the process of adapting a product to the specific situation of its target market. This includes not only translating the texts accompanying the product but also adapting to the cultural norms of the local market.

Corel Catalyst (http://alchemysoftware.ie). catalyst.JPG

Passolo (pass software localizer) (http://www.passolo.com). passolo.JPG

Computer-Aided Translation (Computer-Assisted Translation)       Computerized systems responsible for the production from one natural language to another, with or without human assistance. The central core of MT itself is the automation of the full translation process (Hutchins & Somers, 1992) According to the degree of automation or the degree of human involvement in the translation process, we use following terms: FAHQT: Fully Automatic High-Quality Translation FAMT: Fully Automatic Machine Translation HAMT: Human-Aided Machine Translation MAHT: Machine-Aided Human Translation

FAHQT & FAMT

 FAHQT, based on the idea that MT systems were capable of producing translations of quality comparable to that of human translators, was abandoned.

 FAMT is possible, its output of translation is a raw translation. But the quality of translation is not good.

HAMT

 In HAMT, the source text is decoded and analyzed by the system, not by the human operator, whose task consists of assisting in the translation process.

 Human involvement: – – Pre-editing Interaction – Post-editing

Pre-editing

      Preparing the source text in order to avoid the problems from the outset.

Avoid idiomatic expression.

Avoid omitting pronoun before a verb.

Avoid omitting relative pronouns.

Breaking up long sentences into shorter ones.

Keeping to standard, formal English in which grammatical connection are clearly expressed.

Human-machine interaction

 The system pauses during the translation process.

 For example, – when MT can not resolve systematic or semantic ambiguities in source text analysis.

– when it can not decide on one target language equivalent or the other.

 Errors can be avoided in the analysis stage.

Post-editing

 Correction of the target text which generated by the MT system.

 Whether post-editing is conducted, and to what extent, largely depends on the quality required by the user.

MAHT

 MAHT includes the use of aids such as electronic dictionaries, terminology database, translation memory system, and other electronic tools.

 In contract with FAMT and HAMT, in MAHT, the decoding and analysis of the source text lies in the hands of the translator.

 CAT are sometimes used to cover both HAMT and MAHT.

参考文献     1. 冯志伟,机器翻译研究 [M] ,中国对外翻译出版公司, 2004 年。 2. 冯志伟,应用语言学新论 — 语言应用研究的三大 支柱 [M] ,当代世界出版社, 2003 年。 3. D. Jurafsky , J. Martin ,自然语言处理综论(冯志 伟、孙乐译) [M] ,电子工业出版社, 2005 年。 4. F. Austermuehl, Electronic Tools for Translators, St. Jerome Publishing, 2001.

Thank you!

谢谢 !