Introduction to Natural Language Processing and Speech Computer Science Research Practicum Fall 2012 Andrew Rosenberg Artificial Intelligence • AI is no longer a single subdiscipline in computer science –

Download Report

Transcript Introduction to Natural Language Processing and Speech Computer Science Research Practicum Fall 2012 Andrew Rosenberg Artificial Intelligence • AI is no longer a single subdiscipline in computer science –

Introduction to Natural
Language Processing and
Speech
Computer Science
Research Practicum
Fall 2012
Andrew Rosenberg
Artificial Intelligence
• AI is no longer a single subdiscipline in
computer science
– Natural Language Processing
– Speech/Spoken Language Processing
– Robotics
– Logic/Planning
– “Cognitive Radio”
– Machine Learning
1
Artificial Intelligence
• What is intelligence?
• How does computer science make
“intelligent” tools, systems, algorithms?
• Does computer science theory contribute
to the definition of “intelligence”?
2
Language and Speech
• What is the relationship between language
and intelligence/thought/cognition?
3
Language and Speech
• Most people consider language to be the
most direct access to cognition and
thought.
• Language is core to Artificial Intelligence
4
Natural Language Processing
• Information Retrieval (search)
• Information Extraction
– Knowledge Base Population
• Summarization
• Question Answering
• Named Entity Recognition
– Named Entity Linking, Co-reference resolution
• Parsing
• Sentiment Analysis
5
Information Retrieval
• Input: Query
• Output: Relevant Documents
• Simplest approach:
– Identify every document that contains the word or
words in the query
• What about related words?
– “run” is related to “running” “runs” and “marathon”
• How do you rank for relevance?
6
Information Extraction
• Identify specific information from a single
document or set of documents.
– Who works for what organization
– Who was born when? died when?
– Who did what to whom.
• This is *very* complex.
– Domain specific systems are developed
– How many different ways are there to say the
same thing?
7
Named Entity Recognition and Linking
• Bo Obama is Fat. POTUS says so.
– The President called his dog fat. Mr. Obama,
speaking to an interviewer said that The White
House dog needs to go on a diet.
• Recognize that “Bo Obama” “POTUS”,
“The President” “Mr. Obama”, “The White
House” are all ENTITIES?
• How do you recognize that “POTUS”, “The
President”, “Mr. Obama”, “him” all refer to
the same person?
8
Parsing
• Understanding grammatical structure from text.
• Important step in some relation extraction,
question answering, etc.
9
Sentiment Analysis
• Can you tell the difference between a
positive review and a negative one?
– Some reviews come with labels
– Some labels have no reviews
– Some reviews have no “stars”
10
Spoken Language Processing
• Automatic Speech Recognition
– “Rich” Transcription
• Speaker Recognition
• Speech Synthesis
– Text Normalization
• Discourse and Dialog
– Turn taking
• Emotion Recognition
11
Speech Recognition
• Converting speech to text.
– Acoustic Modeling
• Speech to Phoneme
– Pronunciation Modeling
• How are words pronounced?
– Language Modeling
• What sequences of words are most common?
12
Rich Transcription
ALSO FROM NORTH STATION I THINK THE ORANGE LINE RUNS
BY THERE TOO SO YOU CAN ALSO CATCH THE ORANGE LINE
AND THEN INSTEAD OF TRANSFERRING UM I YOU KNOW THE
MAP IS REALLY OBVIOUS ABOUT THIS BUT INSTEAD OF
TRANSFERRING AT PARK STREET YOU CAN TRANSFER AT UH
WHAT’S THE STATION NAME DOWNTOWN CROSSING UM AND
THAT’LL GET YOU BACK TO THE RED LINE JUST AS EASILY
13
Rich Transcription
Also, from the North Station...
(I think the Orange Line runs by there too so you can also catch the
Orange Line... )
And then instead of transferring
(um I- you know, the map is really obvious about this but)
Instead of transferring at Park Street, you can transfer at (uh what’s the
station name) Downtown Crossing and (um) that’ll get you back to the
Red Line just as easily.
14
Speaker/Author Recognition
• What makes one speaker or author
distinguishable from another?
• Email hacks, Chat transcripts, Anonymous
authors.
• What are the acoustics which distinguish
across two speakers?
– Spectral Qualities
– Prosodic Qualities
• Lexical, syntactic and content usage
15
Speech Synthesis
• Generating Speech from Text
• There are tools like Festival, HTS and Mary TTS that
make this relatively easy
• Unit Selection
– Use a corpus of a single speaker and paste together small
slices of speech to make new words
– Watson http://www.youtube.com/watch?v=WFR3lOm_xhE
• Parametric Synthesis
– Learn the spectral shape of different speech sounds, and
synthesize them from oscillators and additive noise.
• Mary TTS Web client
– http://mary.dfki.de:59125/
16
Discourse and Dialog
• How do you accomplish some task through
discourse?
–
–
–
–
Understanding the semantics of a user turn
Generating an appropriate prompt
Dialog/Task planning.
Semantic Frame filling.
17
Emotion Recognition
Three Hundred Twelve.
• What are the acoustic
properties of emotion
Three Thousand Twelve.
expression?
• Loudness, speaking rate, pitch, hesitation etc.
• This type of analysis can extend to other speaker
states
–
–
–
–
–
–
Intoxication
Sleepiness
Age
Gender
Personality Factors
Deception
18
Corpus Analysis
• A corpus is a body of linguistic material
• Corpora (plural of corpus) are generally
shared across research groups
• Allow for reproducible findings
• Division of Labor
• Describing phenomena is an important first
step in most research.
– What is the distribution of ratings?
– What are the correlations between features and
labels?
– Are there errors in the annotation?
19
Some famous corpora
• Penn Treebank
– Parse trees and part of speech
• ACE and KBP
– Information Extraction
• Switchboard
– Conversational telephone speech
• TIMIT
– Phonetic Transcription
• Boston Radio News Corpus
– Prosodic Annotation
20
The “standard” approach
• Identify labeled training data
• Decide what to label
– What is a data point?
• Extract features based on the entity
• Train a supervised classifier
– Machine Learning
• Evaluate
– Cross-validation or a held-out test set.
21
How does machine learning fit in?
• Automatically identifying patterns in data
• Automatically making decisions based on
data
• Hypothesis:
Data
Learning Algorithm
Behavior
≥
Data
Programmer or Expert
Behavior
22
Challenges
• Conversational text
– Social Media: Facebook, Twitter, reddit
– Email
– Chat/IM
• Spoken Dialog Systems
– Text Dialog Systems
• Sentiment Analysis
– Reviews
• Collaborative Filtering
• Natural Language Generation
23
Publicly available web-data
• Social Media
– twitter, google plus, forums, etc.
• Reviews
– amazon, tripadvisor, etc.
• Wikipedia.
– Find missing links in wikipedia
– Find potentially incorrect information in wikipedia
• YouTube videos, soundcloud songs.
– Can you classify topics?
– Music genres?
24
Use of web technologies
• The feedback loop.
– The use of the tool provides information that
can be used to improve the tool.
• The use of the product provides training
data.
– Which search results are best.
– Which ads are useful
– Which recommendations are correct
25
Feedback in Google
• Rank the top
hits in response
to a query
• When someone
clicks on a link,
boost its
ranking/relevan
ce
• Same for ads
• UI/UX
experimnets
26
Feedback in Amazon
• Try to give users an offer.
– If they take it increase its value.
27
Feedback in Netflix
• Suggestions for people “like you”
– How do you group people
– How do you group movies
28
Project ideas
• Look at the most recent conferences in NLP
and Speech
– ICASSP, Interspeech, ASRU
– ACL, EMNLP, NAACL-HLT, CoLING
• Also, Journals
– Computational Linguistics
– Computer Speech and Language
– IEEE transactions on Audio Speech and
Language Processing
• Consider real-world problems and
applications
29