Transcript Hidden Markov Model and its application in Pos Tagging
Machine Translation
Dai Xinyu 2006-10-27 1
Outline
Introduction Architecture of MT Rule-Based MT vs. Data-Driven MT Evaluation of MT Development of MT MT problems in general Some Thinking about MT from recognition 2
Introduction
"
I have a text in front of me which is written in Russian but I am going to pretend that it is really written in English and that it has been coded in some strange symbols. All I need do is strip off the code in order to retrieve the information contained in the text"
machine translation
- the use of computers to translate from one language to another •The classic acid test for natural language processing.
•Requires capabilities in both interpretation and generation.
•About $10 billion spent annually on human translation.
http://www.google.com/language_tools?hl=en 3
Introdution - MT past and present
mid-1950's - 1965:
Great expectations
The dark ages for MT:
Academic research projects
1980's - 1990's:
Successful specialized applications
1990's:
Human-machine cooperative translation
1990's - now:
Statistical-based MT Hybrid-strategies MT
Future prospects:
??? 4
Interest in MT
Commercial interest: U.S. has invested in MT for intelligence purposes MT is popular on the web — it is the most used of Google ’ s special features EU spends more than $1 billion on translation costs each year.
(Semi-)automated translation could lead to huge savings 5
Interest in MT
Academic interest: One of the most challenging problems in NLP research Requires knowledge from many NLP sub-areas, e.g., lexical semantics, parsing, morphological analysis, statistical modeling, … Being able to establish links between two languages allows for transferring resources from one language to another 6
Related Area to MT
Linguistics Computer Science AI Compile Formal Semantics … Mathematics Probability Statistics … Informatics Recognition 7
Architecture of MT -- (Levers of Transfer)
8
Rule-Based MT vs. Data-Driven MT
Rule-Based MT Data-Driven MT Example-Based MT Statistics-Based MT 9
Rule-Based MT
语言学 语义学 认知科学 人工智能 写规则 规则 自然语言输入
x
翻译系统 翻译结果 10
Rule-Based MT
11
Man, this is so boring.
Hmm, every time he sees “banco”, he either types “bank” or “bench” … but if he sees “banco de…”, he always types “bank”, never “bench”…
Translated documents
12
Example-Based MT
origins: Nagao (1981) first motivation: collocations, bilingual differences of syntactic structures basic idea: human translators search for analogies (similar phrases) in previous translations MT should seek matching fragment in bilingual database, extract translations aim to have less complex dictionaries, grammars, and procedures improved generation (using actual examples of TL sentences) 13
EBMT still going
Bi-lingual corpus Collection Store Searching and matching … 14
Statistical MT Basics
Based on assumption that translations observed statistical regularities origins: Warren Weaver (1949) Shannon ’ s information theory core process is the probabilistic ‘ translation model ’ and producing TL words or phrases as output taking SL words or phrases as input, succeeding stage involves a probabilistic ‘ language model ’ words as ‘ which synthesizes TL meaningful ’ TL sentences 15
Statistical MT
统计学习 自然语言输入
x
1
x
2
x n
自然语言输入
x n
1 建立模型 学习系统 预测系统 概率模型
ˆ p (
预测
x n
1
)
16
Statistical MT schema
17
Statistical MT processes
Bilingual corpora: original and translation little or no linguistic ‘ knowledge ’ , based on word co occurrences in SL and TL texts (of a corpus), relative positions of words within sentences, length of sentences Alignment: sentences aligned statistically (according to sentence length and position) Decoding: compute probability that a TL string is the translation of a SL string ( ‘ translation model ’ ), based on: frequency of co-occurrence in aligned texts of corpus position of SL words in SL string Adjustment: compute probability that a TL string is a valid TL sentence (based on a ‘ language model ’ of allowable bigrams and trigrams) search for TL string that maximizes these probabilities argmax e P(e/f) = argmax e P (f/e) P (e) 18
Language Modeling
Determines the probability of some English
e
1
l
P(e) is normally approximated as:
P
(
e
1
l
)
P
(
e
1 )
P
(
e
2 |
e
1 )
l i
3
P
(
e i
|
e i i
1
m
) of previous words that are considered, m=1, bi-gram language model m=2, tri-gram language model 19
Translation Modeling
Determines the probability that the foreign word f is a translation of the English word e How to compute P(f | e) from a parallel corpus?
Statistical approaches rely on the co occurrence of e and f in the parallel data: If e and f tend to co-occur in parallel sentence pairs, they are likely to be translations of one another 20
SMT issues
ignores previous MT research (new start, new ‘ paradigm ’ ) basically ‘ direct ’ approach: replaces SL word by most probable TL word, reorders TL words decoding is effectively kind of ‘ back translation ’ originally wholly word-based (IBM syntax-based ‘ Candide ’ 1988) ; now predominantly phrase-based (i.e. alignment of word groups); some research on mathematically simple, but huge amount of training (large databases) problems for SMT: translation is not just selecting the most frequent ‘ equivalent ’ (wider context) no quality control of corpora lack of monolingual data for some languages insufficient bilingual data (Internet as resource) lack of structure information of language merit of SMT: evaluation as integral process of system development 21
Rule-Based MT & SMT
SMT black box: no way of finding how it works in particular cases, why it succeeds sometimes and not others RBMT: rules and procedures can be examined RBMT and SMT are apparent polar opposites, but gradually ‘ rules ’ incorporated in SMT models first, morphology (even in versions of first IBM model) then, ‘ phrases phrases) ’ (with some similarity to linguistic now also, syntactic parsing 22
Rule-Based MT & SMT
Comparison from following perspectives: Theory background Knowledge expression Knowledge discovery Robust Extension Development Cycle 23
Evaluation of MT
Manual: Precise / fluency / integrality 信 达 雅 Automatically evaluation: BLEU: percentage of word sequences (n-grams) occurring in reference texts NIST 24
Development of MT - MT System
25
MT Development - Research
Shallow/ Simple Electronic
dictionaries Knowledge Acquisition Strategy All manual
Hand-built by experts Original direct approach Hand-built by non-experts Word-based only Original statistical MT Phrase tables
Example-based
MT Learn from annotated data Learn from un annotated data
Fully automated
Syntactic Constituent Structure Typical transfer system Classic
interlingual
system Semantic analysis Interlingua New Research Goes Here!
Deep/ Complex
Knowledge Representation Strategy
26
MT problems in general
Characters of language Ambiguous Dynamic Flexible Knowledge How to express How to discovery How to use 27
Some Thinking about MT from recognition
Human Cerebra Memory Progress - Learning Model Pattern Translation by human … Translation by machine … 28
Further Reading
Arturo Trujillo, Translation Engines: Techniques for Machine Translation, Springer-Verlag London Limited 1999 P.F. Brown, et al., A Statistical Approach to MT, Computational Linguistics, 1990,16(2) P.F. Brown, et al., The Mathematics of Statistical Machine Translation: Parameter Estimation, Computational Linguistics, 1993, 19(2) Bonnie J. Dorr, et al, Survey of Current Paradigms in Machine Translation Makoto Nagao, A Framework of a Mechanical Translation between Japanese and English by Analog Principle, In A. Elithorn and R. Banerji(Eds.), Artificial and Human Intelligence. NATO Publications, 1984 Hutchins WJ, Machine Translation: Past, Present, Future. Chichester: Ellis Horwood, 1986 Daniel Jurafsky & James H. Martin, Speech and Language Processing, Prentice-Hall, 2000 Christopher D. Manning & Hinrich Schutze, Foundations of Statistical Natural Langugae Processing, Massachusetts Institute of Technology, 1999 James Allen, Natural Language Understanding, The Benjamin/Cummings Publishing Company, Inc. 1987 29