MT-Seminar-S08.ppt

Download Report

Transcript MT-Seminar-S08.ppt

11-734
Advanced MT Seminar
Spring 2008
Instructors:
Alon Lavie and Stephan Vogel
Course Objectives
• Objective: Study and review in depth a selection of
important research topics in current state-of-the-art MT
• Main Focus: Data-driven search-based MT approaches
– MT resources are primarily acquired automatically from large
volumes of monolingual and bilingual corpora
– Translation process is framed as a computational search
optimization problem, driven by various statistical “models” and
ML-based features
• Other important or related topics may also be explored
Course Format
• Course Format: Graduate-level Seminar
– Stephan and Alon will present a few introductory
lectures
– Students will present and lead the remaining lectures
and discussions
– Individual Student Tasks:
•
•
•
•
Select and define a specific research topic
Identify 1-2 basic research papers (for everyone to read)
Conduct a broad literature review of the topic
Prepare a class presentation on the topic and lead class
discussion
• Write a 10-15 page literature review “white paper” on current
state of the research topic and on its future directions
Course Format
• Requirements and Expectations:
– 1-2 basic readings for each topic should be announced at least
one week in advance
– Everyone is expected to attend all class meetings, read the 1-2
basic papers before class, prepare questions
– Student Presentations: present an overview of the topic in class
(not just the basic papers) and lead the discussion about
important issues, open research questions, future directions, etc.
– White Papers will be due towards the end of the semester
• Grading:
– 40% Presentation
– 40% White Paper
– 20% Class Participation
Preliminary List of Topics
• Models and Approaches for Word, Phrase and
Structure Alignment:
– Hierarchical Alignment Models: ITG-style, Hiero-style, Syntaxbased models: tree-to-string, string-to-tree, tree-to-tree
– Discriminative Alignment Models
– Constrained Alignment Models
– Methods for Phrase Extraction from word-aligned parallel data
– Methods for Rule Extraction from parsed and word aligned
parallel data
• Word Reordering Models:
– Word and phrase-based, POS-based, syntax-based
Preliminary List of Topics
• Search-based Decoding:
– Basic decoding algorithms, computational complexity and
efficiency issues
– Decoders for various “flavors” of data-driven search-based MT
– Optimization issues, monotonicy, pruning, hypothesis recombination
• Language Modeling for MT:
– Very large scale statistical LMs: technical challenges and
solutions
– Domain and Genre adaptation
– Syntactic LMs
– Discriminative LMs, Factored LMs, “unconventional” approaches
• Architecture and Design of Large-scale MT systems:
– Training methods and tools
– MERT and parameter tuning
– Runtime architectures, online vs. offline systems
Preliminary List of Topics
• Morphology and Word Segmentation and their
integration within MT:
– Morphological analysis and generation tools
– Integrating morphological processing within MT
– Input segmentation issues, ambiguity and confusion networks
• Multi-Engine MT and System Combination
Approaches
• MT Evaluation:
– Automatic metrics for MT evaluation; methods for assessing MT
eval metrics, strengths and weaknesses
– Human evaluation, Subjective and Objective metrics, Confidence
scores
– Evaluation campaigns and how they are conducted
• Online Translation Services and how they work:
– Google, Babelfish, MS Word tools, instant messaging
Tentative Schedule
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Jan 16:
Jan 23:
Jan 30:
Feb 6:
Feb 13:
Feb 20:
Feb 27:
Mar 5:
Mar 12:
Mar 19:
Mar 26:
Apr 2:
Apr 9:
Apr 16:
Apr 23:
Apr 30:
Organization + Stephan: Basic Word Alignment Models
Stephan: Word Alignment Models, Phrase Extraction methods
Stephan and/or Alon: TBD (Decoding basics? MT Evaluation?)
Student #1
Student #2
NO CLASS (Stephan and Alon away)
Student #3
Student #4
NO CLASS (Spring Break)
Student #5
Student #6
Student #7
NO CLASS (GALE PI Meeting)
Student #8
Student #9
Student #10
Task #1
• By next week’s class meeting (Wed 1/23):
– Select a research topic
– Write a one-page description that outlines and scopes
your selected research topic, and lists 1-2 basic
readings on the topic
– Email Alon and Stephan your one-page description,
plus three preferred presentation dates
– Act Fast! We will coordinate topic selections and
presentation date preferences primarily by logical
order and by receipt time
Students and Topics
•
•
•
•
•
•
•
•
•
•
•
Abhaya Agarwal: Discriminative Methods for Training Translation Models
Aaron Phillips: Methods for Context Incorporation in MT 3/05 2/27 3/19
Jason Adams: WSD and its Integration within MT 3/26 2/27 2/20
Alok Parlikar: Phrase-based SMT and Solutions to ‘Out of Order’ Problem
3/19 3/05 2/27
Amr Ahmed: Syntax-based Machine Translation Models 3/26 4/02 4/16
Eric Davis: Morphology and Segmentation Issues in MT 2/06 2/13 2/27
Greg Hanneman: Towards Syntactically-Constrained Statistical Word
Alignment 4/16 4/23 4/30
Linh Nguyen: Morphology and Word Segmentation and their integration
within MT 3/05 or later
Qin Gao: Large Scale Architecture for MT Systems 3/05 2/27 4/02
Vamshi Ambati: Dependency Structures in Syntax oriented Machine
Translation 3/19 3/26 4/02
Rashmi Gangadharaiah: Factored and Syntactic Language models 4/02
3/19 3/05
Proposed Schedule
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Jan 16:
Jan 23:
Jan 30:
Feb 6:
Feb 13:
Feb 20:
Feb 27:
Mar 5:
Mar 12:
Mar 19:
Mar 26:
Apr 2:
Apr 9:
Apr 16:
Apr 23:
Apr 30:
May 7:
Organization + Stephan: Basic Word Alignment Models
Stephan: Word Alignment Models, Phrase Extraction methods
Stephan: Decoding basics
Student #1: Eric Davis – Morphology and/or Segmentation
Student #2: Linh Nguyen – Morphology and/or Segmentation
NO CLASS (Stephan and Alon away)
Student #3: Jason Adams: WSD in MT
Student #4: Aaron Phillips – Incorporating Context in MT
NO CLASS (Spring Break)
Student #5: Alok Parlikar – Reordering in Phrase-based SMT
Student #6: Amr Ahmed – Syntax-based Models and their training
Student #7: Vamshi Ambati – Dependency Structures in MT
NO CLASS (GALE PI Meeting)
Student #8: Rashmi – Factored and Syntax-based LMs
Student #9: Greg Hanneman – Syntactically-constrained WA
Student #10: Qin Gao – Large-scale MT Architectures
Student #11: Abhaya Agarwal - Discriminative Training Methods
MT Lunch Slots
• Currently held reservations (all on
Tuesdays):
– Feb 19 (Alon and Stephan away)
– Mar 18 (12:30-2:00)
– Apr 22 (12:30-2:00)
– May 20
– Jun 17