Hidden Markov Model and its application in Pos Tagging

Transcript Hidden Markov Model and its application in Pos Tagging

Machine Translation

Dai Xinyu 2006-10-27 1

Outline

       Introduction Architecture of MT Rule-Based MT vs. Data-Driven MT Evaluation of MT Development of MT MT problems in general Some Thinking about MT from recognition 2

Introduction

I have a text in front of me which is written in Russian but I am going to pretend that it is really written in English and that it has been coded in some strange symbols. All I need do is strip off the code in order to retrieve the information contained in the text"

machine translation

- the use of computers to translate from one language to another •The classic acid test for natural language processing.

•Requires capabilities in both interpretation and generation.

•About $10 billion spent annually on human translation.

http://www.google.com/language_tools?hl=en 3

Introdution - MT past and present

      

mid-1950's - 1965:

Great expectations

The dark ages for MT:

 Academic research projects

1980's - 1990's:

 Successful specialized applications

1990's:

 Human-machine cooperative translation

1990's - now:

 Statistical-based MT  Hybrid-strategies MT

Future prospects:

 ??? 4

Interest in MT

 Commercial interest:  U.S. has invested in MT for intelligence purposes  MT is popular on the web — it is the most used of Google ’ s special features  EU spends more than $1 billion on translation costs each year.

 (Semi-)automated translation could lead to huge savings 5

Interest in MT

 Academic interest:    One of the most challenging problems in NLP research Requires knowledge from many NLP sub-areas, e.g., lexical semantics, parsing, morphological analysis, statistical modeling, … Being able to establish links between two languages allows for transferring resources from one language to another 6

Related Area to MT

     Linguistics Computer Science  AI   Compile Formal Semantics  … Mathematics  Probability  Statistics  … Informatics Recognition 7

Architecture of MT -- (Levers of Transfer)

Rule-Based MT vs. Data-Driven MT

  Rule-Based MT Data-Driven MT  Example-Based MT  Statistics-Based MT 9

Rule-Based MT

语言学语义学认知科学人工智能写规则规则自然语言输入

翻译系统翻译结果 10

Rule-Based MT

Man, this is so boring.

Hmm, every time he sees “banco”, he either types “bank” or “bench” … but if he sees “banco de…”, he always types “bank”, never “bench”…

Translated documents

Example-Based MT

     origins: Nagao (1981) first motivation: collocations, bilingual differences of syntactic structures basic idea:  human translators search for analogies (similar phrases) in previous translations  MT should seek matching fragment in bilingual database, extract translations aim to have less complex dictionaries, grammars, and procedures improved generation (using actual examples of TL sentences) 13

EBMT still going

    Bi-lingual corpus Collection Store Searching and matching … 14

Statistical MT Basics

   Based on assumption that translations observed statistical regularities   origins: Warren Weaver (1949) Shannon ’ s information theory core process is the probabilistic ‘ translation model ’ and producing TL words or phrases as output taking SL words or phrases as input, succeeding stage involves a probabilistic ‘ language model ’ words as ‘ which synthesizes TL meaningful ’ TL sentences 15

Statistical MT

统计学习自然语言输入

2 

x n

自然语言输入

x n

 1 建立模型学习系统预测系统概率模型

ˆ p (

预测

x n

 1

)

Statistical MT schema

Statistical MT processes

      Bilingual corpora: original and translation little or no linguistic ‘ knowledge ’ , based on word co occurrences in SL and TL texts (of a corpus), relative positions of words within sentences, length of sentences Alignment: sentences aligned statistically (according to sentence length and position) Decoding: compute probability that a TL string is the translation of a SL string ( ‘ translation model ’ ), based on:  frequency of co-occurrence in aligned texts of corpus position of SL words in SL string  Adjustment: compute probability that a TL string is a valid TL sentence (based on a ‘ language model ’ of allowable bigrams and trigrams) search for TL string that maximizes these probabilities argmax e P(e/f) = argmax e P (f/e) P (e) 18

Language Modeling

 Determines the probability of some English

 P(e) is normally approximated as:

(

) 

(

1 )

(

2 |

1 ) 

l i

 3

(

e i

e i i

 1 

)   of previous words that are considered, m=1, bi-gram language model m=2, tri-gram language model 19

Translation Modeling

 Determines the probability that the foreign word f is a translation of the English word e  How to compute P(f | e) from a parallel corpus?

 Statistical approaches rely on the co occurrence of e and f in the parallel data: If e and f tend to co-occur in parallel sentence pairs, they are likely to be translations of one another 20

    

SMT issues

ignores previous MT research (new start, new ‘ paradigm ’ )  basically  ‘ direct ’ approach: replaces SL word by most probable TL word,  reorders TL words decoding is effectively kind of ‘ back translation ’  originally wholly word-based (IBM syntax-based ‘ Candide ’ 1988) ; now predominantly phrase-based (i.e. alignment of word groups); some research on mathematically simple, but huge amount of training (large databases) problems for SMT:  translation is not just selecting the most frequent ‘ equivalent ’ (wider context)     no quality control of corpora lack of monolingual data for some languages insufficient bilingual data (Internet as resource) lack of structure information of language merit of SMT: evaluation as integral process of system development 21

Rule-Based MT & SMT

   SMT black box: no way of finding how it works in particular cases, why it succeeds sometimes and not others RBMT: rules and procedures can be examined RBMT and SMT are apparent polar opposites, but gradually ‘ rules ’ incorporated in SMT models    first, morphology (even in versions of first IBM model) then, ‘ phrases phrases) ’ (with some similarity to linguistic now also, syntactic parsing 22

Rule-Based MT & SMT

 Comparison from following perspectives:       Theory background Knowledge expression Knowledge discovery Robust Extension Development Cycle 23

Evaluation of MT

  Manual:  Precise / fluency / integrality  信达雅  Automatically evaluation: BLEU: percentage of word sequences  (n-grams) occurring in reference texts NIST 24

Development of MT - MT System

MT Development - Research

Shallow/ Simple Electronic

dictionaries Knowledge Acquisition Strategy All manual

Hand-built by experts Original direct approach Hand-built by non-experts Word-based only Original statistical MT Phrase tables

Example-based

MT Learn from annotated data Learn from un annotated data

Fully automated

Syntactic Constituent Structure Typical transfer system Classic

interlingual

system Semantic analysis Interlingua New Research Goes Here!

Deep/ Complex

Knowledge Representation Strategy

MT problems in general

  Characters of language  Ambiguous  Dynamic  Flexible Knowledge    How to express How to discovery How to use 27

Some Thinking about MT from recognition

   Human Cerebra  Memory  Progress - Learning   Model Pattern Translation by human … Translation by machine … 28

Hidden Markov Model and its application in Pos Tagging

Transcript Hidden Markov Model and its application in Pos Tagging

Machine Translation

Outline

Introduction

Introdution - MT past and present

Interest in MT

Interest in MT

Related Area to MT

Architecture of MT -- (Levers of Transfer)

Rule-Based MT vs. Data-Driven MT

Rule-Based MT

Rule-Based MT

Example-Based MT

EBMT still going

Statistical MT Basics

Statistical MT

Statistical MT schema

Statistical MT processes

Language Modeling

Translation Modeling

SMT issues

Rule-Based MT & SMT

Rule-Based MT & SMT

Evaluation of MT

Development of MT - MT System

MT problems in general

Some Thinking about MT from recognition

Further Reading

Directory