Transcript Document

Part-Of-Speech
Tagging
using Neural
Networks
Ankur Parikh
LTRC
IIIT Hyderabad
[email protected]
Outline
1.Introduction
2.Background and Motivation
3.Experimental Setup
4.Preprocessing
5.Representation
6.Single-neuro tagger
7.Experiments
8.Multi-neuro tagger
9.Results
10.Discussion
11.Future Work
Introduction

POS-Tagging:
It is the process of assigning the part of speech tag to the
NL text based on both its definition and its context.
Uses:
Parsing of sentences, MT, IR, Word Sense disambiguation,
Speech synthesis etc.
Methods:
1. Statistical Approach
2. Rule Based
Background: Previous Approaches


Lots of work has been done using various machine
learning algorithms like
 TNT
 CRF
for Hindi.
Trade-off: Performance versus Training time
- Less precision affects later stages
- For a new domain or new corpus parameter tuning
is a non-trivial task.
Background: Previous Approaches &
Motivation





Empirically chosen context.
Effective Handling of corpus based features
Need of the hour:
- Good performance
- Less training time
- Multiple contexts
- exploit corpus based features effectively
Two Approaches and their comparison with TNT and
CRF
Word level tagging
Experimental Setup : Corpus
statitstics

Tag set of 25 tags
Corpus
Size (in
words)
Training
187,095
Unseen words
(in
percentage)
-
Development 23,565
5.33%
Testing
8.15%
23,281
Experimental Setup: Tools and
Resources
Tools
- CRF++
- TNT
- Morfessor Categories – MAP
 Resources
- Universal word – Hindi Dictionary
- Hindi Word net
- Morph Analyzer

Preprocessing
XC tag is removed (Gadde et. Al., 2008).
 Lexicon
- For each unique word w of the training corpus
=> ENTRY(t1,……,t24)
- where tj = c(posj , w) / c(w)

Representation: Encoding &
Decoding



Each word w is encoded as an n-element vector
INPUT(t1,t2,…,tn) where n = size of the tag set.
INPUT(t1,t2,…,tn) comes from lexicon if training
corpus contains w.
If w is not in the training corpus
- N(w) = Number of possible POS tags for w
- tj = 1/N(w) if posj is a candidate
= 0 otherwise
Representation: Encoding &
Decoding
For each word w, Desired Output is encoded as
D = (d1,d2,….,dn).
- dj = 1 if posj is a desired ouput
= 0 otherwise
 In testing, for each word w, an n-element vector
OUTPUT(o1,…,on) is returned.
- Result = posj, if oj = max(OUTPUT)

Single – neuro tagger: Structure
Single – neuro tagger: Training &
Tagging
Error Back-propagation learning Algorithm
 Weights are Initialized with Random values
 Sequential mode
 Momentum term
 Eta = 0.4 and Alpha = 0.1
 In tagging, it can give multiple outputs or a
sorted list of all tags.

Experiments: Development Data
Features
Precision
Corpus based and contextual 93.19%
Root of the word
93.38%
Length of the word
94.04%
Handling of unseen words
Root->Dictionary->Word
net->Morfessor
{tj = c(posj ,s) + c(posj ,p)/
c(s) + c(p)}
95.62%
Development of the system
Multi – neuro tagger: Structure
Multi – neuro tagger: Training
Multi – neuro tagger: Learning
curves
Multi – neuro tagger: Results
Structure
Context
Development Test
97-48-24
121-48-24
121-48-24
145-72-24
169-72-24
169-72-24
193-96-24
3
4_prev
4_next
5
6_prev
6_next
7
95.44%
95.64%
95.66%
95.55%
95.56%
95.54%
95.46%
91.87%
92.05%
91.95%
92.15%
92.14%
92.14%
92.07%
Multi – neuro tagger:
Comparison

Precision after voting : 92.19%
Tagger
Development Test
Training
Time
TNT
95.18%
91.58%
1-2 (Seconds)
Multi – neuro 95.78%
tagger
92.19%
13-14
(Minutes)
CRF
92.92%
2-2.5(Hours)
96.05%
Conclusion
Single versus Multi-neuro tagger
 Multi-neuro tagger versus TNT and CRF
 Corpus and Dictionary based features
 More parameters need to be tuned
 24^5 = 79,62,624 n-grams, while 250,560
weights
 Well suited for Indian Languages

Future Work
Better voting schemes (Confidence point based)
 Finding the right context (Probability based)
 Various Structures and algorithms
- Sequential Neural Network
- Convolution Neural Network
- Combination with SVM

Queries???
Thank You!!