Spoken Language Systems: The Unfinished Agenda

Download Report

Transcript Spoken Language Systems: The Unfinished Agenda

Spoken Language Systems:
The Unfinished Agenda
Raj Reddy
School of Computer Science
Carnegie Mellon University
Pittsburgh
September 21, 2006
The entire 67MB talk with video clips can be downloaded from
http://www.rr.cs.cmu.edu/icslp.zip
Speech Language Systems
• Objective: Recognize, interpret, execute and
respond to spoken language input to computer
• Background:
– ATT, CMU, IBM, and MIT working on the
problem for over 40 years
– Other Key Contributors: BBN, Dragon
Systems, Kurzweil, SRI, Japan Inc., Europe
Inc.
– Research and Development Level of Effort:
About $200 million/year world wide
• Long Term Goal : Make speech the preferred
mode of communication to computers
Why Speech Processin Has Been
Difficult?
• Too Many Sources of Variability
• Noise
• Microphones
• Speakers
• Different Speech Sounds
• Different Pronunciations
• Non Grammaticality
• Imprecision of Language
Why Speech Recognition Has Been Difficult?
(Cont)
• And Many Sources of Knowledge
– Acoustics
– Phonetics and Phonology
– Lexical Information
– Syntax
– Semantics
– Context
– Task Dependent Knowledge
Land Marks
• Dragon Dictate and Naturally Speaking
• IBM Via Voice dictation
• Nuance-based Tellme 800 services allow
voice query for directory information, stocks,
sports, news, weather, and horoscopes
• Microsoft Speech Server e.g. voice dialing
Need for Interdisciplinary Teams
• Signal Processing
– Fourier Transforms, DFT, FFT
• Acoustics
– Physics of sounds & speech
– Vocal tract model
• Phonetics and Linguistics
–
–
–
–
Sounds (Acoustic-Phonetics)
Words (Lexicon)
Grammar (Syntax)
Meaning (Semantics)
• Statistics
–
–
–
–
Probability Theory
Hidden Markov Models
Clustering
Dynamic Programming
• AI and Pattern Recognition
– Knowledge Representation
and Search
– Approximate Matching
– Natural Language
Processing
• Human Computer
Interaction
– Cognitive Science
– Design
– Social Networks
• Computer Science
– Hardware, Parallel Systems
– Algorithms Optimization
The Unfinished Agenda
• Technical
• Application specific
• Societal
Technical Challenges
• Unrehearsed Spontaneous Speech
• Non Native Speakers of English
• Dynamic Learning from Sparse Data
–
–
–
–
New Words
New Speakers
New Grammatical Forms
New Languages
• No Silver Bullet on the Horizon!
• 50 more years?
– Million times greater computational power,
memory and bandwidth?
One Application Specific Challenge:
The Million Book Digital Library Project
The Grand Challenge of Digital Libraries
Create Access to
•
•
•
•
•
•
All published works online
Instantly available
In any language
Anywhere in the world
Searchable, browsable, navigable
By humans and machines
One Step at a Time…
• Million Book DL
– Only about 1% of all the world’s books
•
•
•
•
Harvard University
Library of Congress
OCLC catalog
All Multilingual Books
12M
30M
42M
~100M
• At the rate of digitization of the last decade it
would take a 100 years!
Million Book Project: Issues
• Time
– At one page per second (20,000 pages per day
shift), it will take 100 years (200 working days per
year) to scan a million books of 400 pages each
• Cost
– 100M books at US$100 per book would coat $10B
– Even in India and China the cost will be $1B
– The annual cost is currently expected to be close
$10M per year with support from US, India and
China.
• Selection
– Selection of appropriate books for scanning is time
consuming and expensive
Million Book Project: Issues (cont)
• Logistics
– Each containers hold 10,000 to 20,000 books.
Shipping and handling costs about $10,000
• Meta Data
– Accessing and/or creating Meta data requires
professionals trained in Library science
• Optical Character Recognition Technology
– Essential for searching, translation and
summarization
– Many languages don’t have OCR
Million Book Project: Status
•
•
•
•
•
18 Centers in India
22 centers in China
1 Center in Egypt
Planned : Australia and Europe
Over 200,000 books scanned
–
–
–
–
–
Over 50,000+ accessible on the web
Uses 4TB of storage
10 TB server at CMU Library
500,000 books by the end of 2006
Capacity to scan a million pages a day
Title
Author
Language
Subject
Publisher
Year
Abstract
Rig Veda
Pandit Sriram Sharma Acharya
Sanskrit
Philosophy
Sanskriti Sansthan Bareli
Rig Veda is the oldest of the
Vedas. The Rig Veda is the
oldest book in Sanskrit or any
Indo-European language. Many
great Yogis and scholars who
have understood the
astronomical references in the
hymns, date the Rig Veda as
before 4000 B.C., perhaps as
early as 12,000. Modern
western scholars date it around
1500 B.C., though recent
archaeological finds in India
(like Dwaraka) now appear to
require a much earlier date
Title
Author
Language
Subject
Publisher
Year
Abstract
Elementary Treatise on the
Wave-Theory of Light
Humphery Lloyd, D.D, D.C.L
English
Physics
Longmans, Green & Co
1873
This book deals with the
various aspects of the wave
theory of light. It is a critical
work which contains an
analytical discussion of the
most recent researches in
Optics. It presents a clear and
connected view of the
subject.
Title
Author
Language
Subject
Publisher
Year
Abstract
Mudalayiram Mulamum
Periya Jeeyar
Tamil
Religion
Sri Vaishnava Sampirathaya
Sanjeevikiri Sabayai
1909
This volume is written in Tamil.
It provides a detailed account
of the origin of Vaishnava and
is written by Periya Jeeyar. .
Title
Author
Language
Subject
Publisher
Year
Abstract
Gulzar-A-Badesha
Khader Badesha
Urdu
Literature
Namipress, Chennai
1919
Literature
Title
Author
Language
Subject
Publisher
Year
Abstract
Jawahar Ali Joyviyah
Dr.Ilyas lomas
Arabic
Metrology
Bakri and Issa
1876
It is a book on Metrology, a
study of measurements
Title
Author
Language
Subject
Publisher
Year
Abstract
Structure Des Molecules
Victor Henri
French
Chemistry
Taylor and Francis
1925
This is a unique book that
explicates, in detail, the
structure of molecules and
touches upon certain specific
characteristics of molecules
with particular reference to
Benzene
Million Book Project: Research Challenges
• Providing Access to Billions everyday
– Distributed Cached Servers in every country and
region
•
•
•
•
•
Easy to use interfaces for Billions
Multilingual Information Retrieval
Translation
Summarization
Reading Assistant using Multi Lingual Speech
Synthesis and Translation (e.g. for news
paper DL)
Bringing the World Closer:
Robust Communication among the People of the World
Vision
• Preservation of minority languages, cultures and heritage
• Study of Human Language including
–
–
–
–
Translation
Summarization
Speech
Search
• Facilitate the use ICT in languages other than English
–
–
–
In communication among uneducated people of the world
In commerce
Search and access to knowledge across all languages
• Globalization requires cross-border and cross-language
communication
• Eliminate cultural and social barriers
• Language barriers can significantly slow down the economic growth
• Access to rare (and potentially beneficial) knowledge requires
eliminating the language divide
Research Agenda:
What we must do
• Create technologies and solutions for
overcoming the language barrier
• Create toolkits for rapid acquisition of new
language capabilities
– Character codes, optical character recognition,
speech recognition, speech synthesis, translation,
search engines, text mining, summarization,
language tutoring, etc.
• Capture data, information and knowledge
from masses
• Make fundamental advances in language
processing algorithms, e.g.,
– Deal with 1000 times more data
– Conceptual advance in semantic information
retrieval
The Research Plan:
How we will do it
• Analogy to Human Genome Project
• Meticulous core-science based
fundamentals
• Researcher toolkits for known
methodologies
• Architecture supporting diversity of
methodologies
• Long planning horizon to support
development of novel and radical
approaches
• Quantitative evaluation against a standard
of steadily accumulating improvements in
performance