Deep Processing for Restricted Domain QA - uni
Download
Report
Transcript Deep Processing for Restricted Domain QA - uni
Deep Processing for
Restricted Domain QA
Yi Zhang
Universität des Saarlandes
[email protected]
Why Deep?
Is Shallow Processing Enough?
For TREC-like QA evaluation
(in most cases) YES
However, for restricted domain QA
More complicated questions
Less information redundancy for data
intensive approach
Domain knowledge available
Deep Processing Provides
More fine-grained linguistic analysis
Long distance dependency
Agreements
…
Semantic Representation
MRS/RMRS
General Problems with Deep Processing
Robustness
Lexicon
Compound NP
Specificity
“John saw Mary”
Efficiency (not discussed here)
Deep Processing
MRS/RMRS
(Robust) Semantic representation with
underspecification.
HPSG Grammars
LinGO ERG Grammar
Other grammars (German, Japanese, Modern
Greek, Norwegian, Chinese, …)
HoG
Hybrid shallow & deep processing architecture
with uniformed semantic representation (RMRS).
QA in QUETAL (1)
Hybrid shallow & deep approach
Cross-lingual QA
QA on
Texts
Semi-structured documents
Database
QA in QUETAL (2)
NLQ
Syntax Ana.
•Dependency Parser
•TAG for En/De Q.
Seman Ana.
•Seman Q. Ana.
•Q-type
•A-type
•Q-focus
IR Schema
GetData
Ans. Planning
& Generation
IR Query Planner
Result Merge
Info Source
Texts
IE
Fact DB
QA in QUETAL (3)
Deep processing in QUETAL
HPSG grammar used for question
analysis.
Documents are processed with relatively
shallow methods.
Answer matching with RMRS.
Restricted Domain QA
More complicated questions
Less documents with better quality
Domain specific ontology available
Restricted Domain QA – an Example
Where is the City Hall of Shanghai?
Shanghai City Planning Exhibition Hall[LOC_1] is
located to the east of the City Hall [LOC_2], …,
setting off with the crystal-like Grand
Theatre[LOC_3] to the west.
Between Shanghai City Planning
Exhibition Hall and the Grand
Theatre.
Domain Onto.
Open Topics
Grammar extension & automated
lexicon acquisition
Robust deep processing
Semantic answer matching
Cross-lingual
Grammar Extension
Tourism Domain
ERG extended for
“RONDANE” -- Norway mountain area tourism
1.4K sentences
15 word/sentence
coverage > 74%
Shanghai tourist guide from
http://www.shanghai.gov.cn
1,600 sentences
18 word/sentence
Test on RONDANE corpus
Test on RONDANE Corpus
Grammar Extension
ERG lexicon
Lexicon
Entry #
Top 10 Leaf Types
Lexicon Coverage
Verb
2891
77%
Noun
6873
96%
Adj.
2505
90%
It is relatively easier to automated the
lexicon acquisition for nouns
Automated Lexicon Acquisition
POS tagging
Name entity recognition
Statistical models finding the best
lexical type for unknown noun.
Robust Deep Processing
Back-off to RMRS generated with
intermediate or shallow parsers (HoG
architecture).
Keep non-full parsing charts and
corresponding MRS fragments for
semantic answer matching.
Parse Disambiguation
Select the best parse with statistical models
(Toutanova et al. 2002)
Answer Matching with (R)MRS
Semantic answer matching
Create semantic patterns for each
question type.
where -> locate_v(e, x1, x2)
Semantic distance measurement.
pred1(x)&pred2(x) <-> pred1(x)&pred2(y)
Query expansion
Synonym substitution
Semantic structure replacement
give_v(e1, x1, x2, x3) => receive_v(e2, x2, x1, x3)
Work Plan
Narrow down my focus onto one of
the topics above.
Continue the Chinese HPSG grammar
development.
References
Baldwin, Timothy, Emily M. Bender, Dan Flickinger, Ara Kim and Stephan Oepen (to appear) Road-testing the
English Resource Grammar over the British National Corpus, In Proceedings of the Fourth International
Conference on Language Resources and Evaluation (LREC 2004), Lisbon, Portugal.
Ulrich Callmeier. 2002. PET – a platform for experimentation with efficient HPSG processing techniques. In
Collaborative Language Engineering. CSLI Publications, Stanford, USA.
Hans Uszkoreit. 2002. New chances for deep linguistic processing. In Proc. of the 19th International Conference
on Computational Linguistics (COLING 2002), Taipei, Taiwan.
Ann Copestake, Dan Flickinger, Ivan A. Sag, and Carl Pollard. 2003. Minimal recursion semantics: An introduction.
Under review.
Timothy Baldwin and Francis Bond. 2003. Learning the countability of English nouns from corpus data. In Proc. of
the 41st Annual Meeting of the ACL, pages 463–70, Sapporo, Japan.
Carol, J. and Fang, A. Automatic Acquisition of Verb Subcategorisations and their Impact on the Performance of
an HPSG Parser. IJCNLP 2004
Oepen, Stephan, Dan Flickinger, Kristina Toutanova, Christoper D. Manning. 2002. LinGO Redwoods: A Rich and
Dynamic Treebank for HPSG In Proceedings of The First Workshop on Treebanks and Linguistic Theories
(TLT2002), Sozopol, Bulgaria.
Toutanova, Kristina, Christoper D. Manning, Stephan Oepen. 2002. Parse Ranking for a Rich HPSG Grammar In
Proceedings of The First Workshop on Treebanks and Linguistic Theories (TLT2002), Sozopol, Bulgaria.
Stephan Oepen. [incr tsdb()] - Competence and Performance Laboratory. User Manual.Technical Report.
Computational Linguistics. Saarland University (in preparation).
Robert Malouf and Gertjan van Noord. 2004. "Wide coverage parsing with stochastic attribute value grammars."
In IJCNLP-04 Workshop: Beyond shallow analyses - Formalisms and statistical modeling for deep analyses.
Toutanova, Kristina, Christopher D. Manning, Stuart M. Shieber, Dan Flickinger, and Stephan Oepen. 2002. Parse
Disambiguation for a Rich HPSG Grammar. First Workshop on Treebanks and Linguistic Theories (TLT2002), pp. 253-263.
Sozopol, Bulgaria.