Hidden Markov Model and its application in Pos Tagging

Download Report

Transcript Hidden Markov Model and its application in Pos Tagging

Machine Translation

Dai Xinyu 2006-10-27 1

Outline

       Introduction Architecture of MT Rule-Based MT vs. Data-Driven MT Evaluation of MT Development of MT MT problems in general Some Thinking about MT from recognition 2

Introduction

"

I have a text in front of me which is written in Russian but I am going to pretend that it is really written in English and that it has been coded in some strange symbols. All I need do is strip off the code in order to retrieve the information contained in the text"

machine translation

- the use of computers to translate from one language to another •The classic acid test for natural language processing.

•Requires capabilities in both interpretation and generation.

•About $10 billion spent annually on human translation.

http://www.google.com/language_tools?hl=en 3

Introdution - MT past and present

      

mid-1950's - 1965:

Great expectations

The dark ages for MT:

 Academic research projects

1980's - 1990's:

 Successful specialized applications

1990's:

 Human-machine cooperative translation

1990's - now:

 Statistical-based MT  Hybrid-strategies MT

Future prospects:

 ??? 4

Interest in MT

 Commercial interest:  U.S. has invested in MT for intelligence purposes  MT is popular on the web — it is the most used of Google ’ s special features  EU spends more than $1 billion on translation costs each year.

 (Semi-)automated translation could lead to huge savings 5

Interest in MT

 Academic interest:    One of the most challenging problems in NLP research Requires knowledge from many NLP sub-areas, e.g., lexical semantics, parsing, morphological analysis, statistical modeling, … Being able to establish links between two languages allows for transferring resources from one language to another 6

Related Area to MT

     Linguistics Computer Science  AI   Compile Formal Semantics  … Mathematics  Probability  Statistics  … Informatics Recognition 7

Architecture of MT -- (Levers of Transfer)

8

Rule-Based MT vs. Data-Driven MT

  Rule-Based MT Data-Driven MT  Example-Based MT  Statistics-Based MT 9

Rule-Based MT

语言学 语义学 认知科学 人工智能 写规则 规则 自然语言输入

x

翻译系统 翻译结果 10

Rule-Based MT

11

Man, this is so boring.

Hmm, every time he sees “banco”, he either types “bank” or “bench” … but if he sees “banco de…”, he always types “bank”, never “bench”…

Translated documents

12

Example-Based MT

     origins: Nagao (1981) first motivation: collocations, bilingual differences of syntactic structures basic idea:  human translators search for analogies (similar phrases) in previous translations  MT should seek matching fragment in bilingual database, extract translations aim to have less complex dictionaries, grammars, and procedures improved generation (using actual examples of TL sentences) 13

EBMT still going

    Bi-lingual corpus Collection Store Searching and matching … 14

Statistical MT Basics

   Based on assumption that translations observed statistical regularities   origins: Warren Weaver (1949) Shannon ’ s information theory core process is the probabilistic ‘ translation model ’ and producing TL words or phrases as output taking SL words or phrases as input, succeeding stage involves a probabilistic ‘ language model ’ words as ‘ which synthesizes TL meaningful ’ TL sentences 15

Statistical MT

统计学习 自然语言输入

x

1

x

2 

x n

自然语言输入

x n

 1 建立模型 学习系统 预测系统 概率模型

ˆ p (

预测

x n

 1

)

16

Statistical MT schema

17

Statistical MT processes

      Bilingual corpora: original and translation little or no linguistic ‘ knowledge ’ , based on word co occurrences in SL and TL texts (of a corpus), relative positions of words within sentences, length of sentences Alignment: sentences aligned statistically (according to sentence length and position) Decoding: compute probability that a TL string is the translation of a SL string ( ‘ translation model ’ ), based on:  frequency of co-occurrence in aligned texts of corpus position of SL words in SL string  Adjustment: compute probability that a TL string is a valid TL sentence (based on a ‘ language model ’ of allowable bigrams and trigrams) search for TL string that maximizes these probabilities argmax e P(e/f) = argmax e P (f/e) P (e) 18

Language Modeling

 Determines the probability of some English

e

1

l

 P(e) is normally approximated as:

P

(

e

1

l

) 

P

(

e

1 )

P

(

e

2 |

e

1 ) 

l i

 3

P

(

e i

|

e i i

 1 

m

)   of previous words that are considered, m=1, bi-gram language model m=2, tri-gram language model 19

Translation Modeling

 Determines the probability that the foreign word f is a translation of the English word e  How to compute P(f | e) from a parallel corpus?

 Statistical approaches rely on the co occurrence of e and f in the parallel data: If e and f tend to co-occur in parallel sentence pairs, they are likely to be translations of one another 20

    

SMT issues

ignores previous MT research (new start, new ‘ paradigm ’ )  basically  ‘ direct ’ approach: replaces SL word by most probable TL word,  reorders TL words decoding is effectively kind of ‘ back translation ’  originally wholly word-based (IBM syntax-based ‘ Candide ’ 1988) ; now predominantly phrase-based (i.e. alignment of word groups); some research on mathematically simple, but huge amount of training (large databases) problems for SMT:  translation is not just selecting the most frequent ‘ equivalent ’ (wider context)     no quality control of corpora lack of monolingual data for some languages insufficient bilingual data (Internet as resource) lack of structure information of language merit of SMT: evaluation as integral process of system development 21

Rule-Based MT & SMT

   SMT black box: no way of finding how it works in particular cases, why it succeeds sometimes and not others RBMT: rules and procedures can be examined RBMT and SMT are apparent polar opposites, but gradually ‘ rules ’ incorporated in SMT models    first, morphology (even in versions of first IBM model) then, ‘ phrases phrases) ’ (with some similarity to linguistic now also, syntactic parsing 22

Rule-Based MT & SMT

 Comparison from following perspectives:       Theory background Knowledge expression Knowledge discovery Robust Extension Development Cycle 23

Evaluation of MT

  Manual:  Precise / fluency / integrality  信 达 雅  Automatically evaluation: BLEU: percentage of word sequences  (n-grams) occurring in reference texts NIST 24

Development of MT - MT System

25

MT Development - Research

Shallow/ Simple Electronic

dictionaries Knowledge Acquisition Strategy All manual

Hand-built by experts Original direct approach Hand-built by non-experts Word-based only Original statistical MT Phrase tables

Example-based

MT Learn from annotated data Learn from un annotated data

Fully automated

Syntactic Constituent Structure Typical transfer system Classic

interlingual

system Semantic analysis Interlingua New Research Goes Here!

Deep/ Complex

Knowledge Representation Strategy

26

MT problems in general

  Characters of language  Ambiguous  Dynamic  Flexible Knowledge    How to express How to discovery How to use 27

Some Thinking about MT from recognition

   Human Cerebra  Memory  Progress - Learning   Model Pattern Translation by human … Translation by machine … 28

Further Reading

         Arturo Trujillo, Translation Engines: Techniques for Machine Translation, Springer-Verlag London Limited 1999 P.F. Brown, et al., A Statistical Approach to MT, Computational Linguistics, 1990,16(2) P.F. Brown, et al., The Mathematics of Statistical Machine Translation: Parameter Estimation, Computational Linguistics, 1993, 19(2) Bonnie J. Dorr, et al, Survey of Current Paradigms in Machine Translation Makoto Nagao, A Framework of a Mechanical Translation between Japanese and English by Analog Principle, In A. Elithorn and R. Banerji(Eds.), Artificial and Human Intelligence. NATO Publications, 1984 Hutchins WJ, Machine Translation: Past, Present, Future. Chichester: Ellis Horwood, 1986 Daniel Jurafsky & James H. Martin, Speech and Language Processing, Prentice-Hall, 2000 Christopher D. Manning & Hinrich Schutze, Foundations of Statistical Natural Langugae Processing, Massachusetts Institute of Technology, 1999 James Allen, Natural Language Understanding, The Benjamin/Cummings Publishing Company, Inc. 1987 29