WORDS Lab CSC 9010: Special Topics. Natural Language Processing. Paula Matuszek, Mary-Angela Papalaskari Spring, 2005 Examples taken from the Bird, Klein and Loper: NLTK.

Download Report

Transcript WORDS Lab CSC 9010: Special Topics. Natural Language Processing. Paula Matuszek, Mary-Angela Papalaskari Spring, 2005 Examples taken from the Bird, Klein and Loper: NLTK.

WORDS Lab
CSC 9010: Special Topics. Natural Language Processing.
Paula Matuszek, Mary-Angela Papalaskari
Spring, 2005
Examples taken from the Bird, Klein and Loper: NLTK Tutorial, Tagging,
nltk.sourceforge.net/tutorial/tagging/index.html
CSC 9010: Special Topics, Natural Language Processing. Spring, 2005. Matuszek & Papalaskari
1
Words, Words, Words
• So far we have covered methods that
largely operate on tokens.
– Tokenizing text
– Stemming words and determining lemmas
– POS-tagging
– Language models based on n-gram
frequencies
CSC 9010: Special Topics, Natural Language Processing. Spring, 2005. Matuszek & Papalaskari
2
Every time I fire a linguist, my
performance goes up1
• None of this has much of what could be
considered "linguistic" knowledge or
"understanding".
– No parsing
– Not much domain knowledge o "meaning"
• For the next two sections of the course we
will talk extensively about syntax and
semantics.
1.
Hirschberg, Julia. 1998. "Every time I fire a linguist, my performance goes up," and other myths of the statistical natural
language processing revolution. Invited talk, Fifteenth National Conference on Artificial Intelligence (AAAI-98).
CSC 9010: Special Topics, Natural Language Processing. Spring, 2005. Matuszek & Papalaskari
3
What's In a Word?
• For this lab, we will focus on some of the
things that can be done with application of the
techniques we have already studied.
• Format will be
– Try a demo
– Discuss what techniques were needed to
implement it
– Discuss some of what would be needed to
improve it
CSC 9010: Special Topics, Natural Language Processing. Spring, 2005. Matuszek & Papalaskari
4
Gender Genie
• www.bookblog.net/gender/genie.html
• Techniques:
• How good is it? What might improve it?
• Reference:
– www.cs.biu.ac.il/~koppel/papers/male-female-textfinal.pdf
CSC 9010: Special Topics, Natural Language Processing. Spring, 2005. Matuszek & Papalaskari
5
Pearson Knowledge Technologies
Text Classification Demo
• www.k-a-t.com:8080/classify/
• Techniques:
• How good is it? What might improve it?
• Reference: www.k-a-t.com/publications.shtml
CSC 9010: Special Topics, Natural Language Processing. Spring, 2005. Matuszek & Papalaskari
6
Google Sets
• labs.google.com/sets
• Techniques:
• How good is it? What might improve it?
• Reference: if you find one let me know. Possibly
something like this: ww.arxiv.org/pdf/cs.CL/0412098
CSC 9010: Special Topics, Natural Language Processing. Spring, 2005. Matuszek & Papalaskari
7
AT&T Text to Speech
• www.research.att.com/projects/tts/demo.html
• Techniques:
• How good is it? What might improve it?
• Reference: www.research.att.com/projects/tts/pubs.html
CSC 9010: Special Topics, Natural Language Processing. Spring, 2005. Matuszek & Papalaskari
8