Stanford POS tagger - School of Liberal Arts

Download Report

Transcript Stanford POS tagger - School of Liberal Arts

Stanford POS tagger
17th February 2011
System requirement
• Java 1.5+
– http://www.java.com/en/download/index.jsp
Part-of-speech name abbreviations
The Penn Treebank English POS tag set
1. CC
2. CD
3. DT
4. EX
5. FW
6. IN
7. JJ
8. JJR
9. JJS
10. LS
11. MD
12. NN
13. NNS
14. NNP
15. NNPS
16. PDT
17. POS
18. PRP
19. PP$
20. RB
21. RBR
22. RBS
23. RP
24. SYM
Coordinating conjunction
Cardinal number
Determiner
Existential there
Foreignword
Preposition/subordinating participle conjunction
Adjective
Adjective,comparative
Adjective, superlative
Listitem marker
Modal
Noun, singular or mass
Noun, plural
Proper noun, singular
Proper noun, plural
Predeterminer
Possessive ending
Personal pronoun
Possessive pronoun
Adverb
Adverb, comparative
Adverb, superlative
Particle
Symbol (mathematical or scientific)
25. TO
26. UH
27. VB
28. VBD
29. VBG
30. VBN
31. VBP
32. VBZ
33. WDT
34. WP
35. WP$
36. WRB
37. #
38. $
39. .
40. ,
41. :
42. (
43. )
44. "
45. '
46. "
47. '
48. "
to
Interjection
Verb,base form
Verb, past tense
Verb,gerund/present
Verb, past participle
Verb, non-3rd ps. sing. present
Verb,3rd ps. sing. present
wh-determiner
wh-pronoun
Possessive wh-pronoun
wh-adverb
Pound sign
Dollar sign
Sentence-finalpunctuation
Comma
Colon, semi-colon
Left bracket character
Right bracket character
Straight double quote
Leftopen single quote
Leftopen double quote
Right close single quote
Right closedouble quote
Download
• http://nlp.stanford.edu/software/stanfordpostagger-2010-05-26.tgz
GUI
GUI
Command
1. generate a default properties file.
2. Tag file.
generate a default properties file
command
• java -classpath stanford-postagger.jar
edu.stanford.nlp.tagger.maxent.MaxentTagger
-genprops > myPropsFile.prop
Tag file command
• java -mx300m -classpath stanfordpostagger.jar
edu.stanford.nlp.tagger.maxent.MaxentTagger
-model models/bidirectional-distsim-wsj-018.tagger -textFile sample-input.txt > sampletagged.txt
Command
Result
• sample-input.txt
The School of Liberal Arts was originally the Department
of Language and Social Studies under the Faculty of
Industrial Education.
• sample-tagged.txt
The_DT School_NN of_IN Liberal_JJ Arts_NNS was_VBD
originally_RB the_DT Department_NNP of_IN
Language_NNP and_CC Social_NNP Studies_NNP
under_IN the_DT Faculty_NNP of_IN Industrial_NNP
Education_NNP ._.
Q&A