Deep Processing for Restricted Domain QA - uni

Transcript Deep Processing for Restricted Domain QA - uni

Deep Processing for
Restricted Domain QA
Yi Zhang
Universität des Saarlandes
[email protected]
Why Deep?
Is Shallow Processing Enough?
 For TREC-like QA evaluation
 (in most cases) YES
 However, for restricted domain QA
 More complicated questions
 Less information redundancy for data
intensive approach
 Domain knowledge available
Deep Processing Provides
 More fine-grained linguistic analysis
 Long distance dependency
 Agreements
 …
 Semantic Representation
 MRS/RMRS
General Problems with Deep Processing
 Robustness
 Lexicon
 Compound NP
 Specificity
 “John saw Mary”
 Efficiency (not discussed here)
Deep Processing
 MRS/RMRS
 (Robust) Semantic representation with
underspecification.
 HPSG Grammars
 LinGO ERG Grammar
 Other grammars (German, Japanese, Modern
Greek, Norwegian, Chinese, …)
 HoG
 Hybrid shallow & deep processing architecture
with uniformed semantic representation (RMRS).
QA in QUETAL (1)
 Hybrid shallow & deep approach
 Cross-lingual QA
 QA on
 Texts
 Semi-structured documents
 Database
QA in QUETAL (2)
NLQ
Syntax Ana.
•Dependency Parser
•TAG for En/De Q.
Seman Ana.
•Seman Q. Ana.
•Q-type
•A-type
•Q-focus
IR Schema
GetData
Ans. Planning
& Generation
IR Query Planner
Result Merge
Info Source
Texts
IE
Fact DB
QA in QUETAL (3)
Deep processing in QUETAL
 HPSG grammar used for question
analysis.
 Documents are processed with relatively
shallow methods.
 Answer matching with RMRS.
Restricted Domain QA
 More complicated questions
 Less documents with better quality
 Domain specific ontology available
Restricted Domain QA – an Example
Where is the City Hall of Shanghai?
Shanghai City Planning Exhibition Hall[LOC_1] is
located to the east of the City Hall [LOC_2], …,
setting off with the crystal-like Grand
Theatre[LOC_3] to the west.
Between Shanghai City Planning
Exhibition Hall and the Grand
Theatre.
Domain Onto.
Open Topics
 Grammar extension & automated
lexicon acquisition
 Robust deep processing
 Semantic answer matching
 Cross-lingual
Grammar Extension
Tourism Domain
 ERG extended for
 “RONDANE” -- Norway mountain area tourism
 1.4K sentences
 15 word/sentence
 coverage > 74%
 Shanghai tourist guide from
http://www.shanghai.gov.cn
 1,600 sentences
 18 word/sentence
Test on RONDANE corpus
Test on RONDANE Corpus
Grammar Extension
 ERG lexicon
Lexicon
Entry #
Top 10 Leaf Types
Lexicon Coverage
Verb
2891
77%
Noun
6873
96%
Adj.
2505
90%
 It is relatively easier to automated the
lexicon acquisition for nouns
Automated Lexicon Acquisition
 POS tagging
 Name entity recognition
 Statistical models finding the best
lexical type for unknown noun.
Robust Deep Processing
 Back-off to RMRS generated with
intermediate or shallow parsers (HoG
architecture).
 Keep non-full parsing charts and
corresponding MRS fragments for
semantic answer matching.
Parse Disambiguation
 Select the best parse with statistical models
(Toutanova et al. 2002)
Answer Matching with (R)MRS
 Semantic answer matching
 Create semantic patterns for each
question type.
 where -> locate_v(e, x1, x2)
 Semantic distance measurement.
 pred1(x)&pred2(x) <-> pred1(x)&pred2(y)
 Query expansion
 Synonym substitution
 Semantic structure replacement
 give_v(e1, x1, x2, x3) => receive_v(e2, x2, x1, x3)
Work Plan
 Narrow down my focus onto one of
the topics above.
 Continue the Chinese HPSG grammar
development.
References











Baldwin, Timothy, Emily M. Bender, Dan Flickinger, Ara Kim and Stephan Oepen (to appear) Road-testing the
English Resource Grammar over the British National Corpus, In Proceedings of the Fourth International
Conference on Language Resources and Evaluation (LREC 2004), Lisbon, Portugal.
Ulrich Callmeier. 2002. PET – a platform for experimentation with efficient HPSG processing techniques. In
Collaborative Language Engineering. CSLI Publications, Stanford, USA.
Hans Uszkoreit. 2002. New chances for deep linguistic processing. In Proc. of the 19th International Conference
on Computational Linguistics (COLING 2002), Taipei, Taiwan.
Ann Copestake, Dan Flickinger, Ivan A. Sag, and Carl Pollard. 2003. Minimal recursion semantics: An introduction.
Under review.
Timothy Baldwin and Francis Bond. 2003. Learning the countability of English nouns from corpus data. In Proc. of
the 41st Annual Meeting of the ACL, pages 463–70, Sapporo, Japan.
Carol, J. and Fang, A. Automatic Acquisition of Verb Subcategorisations and their Impact on the Performance of
an HPSG Parser. IJCNLP 2004
Oepen, Stephan, Dan Flickinger, Kristina Toutanova, Christoper D. Manning. 2002. LinGO Redwoods: A Rich and
Dynamic Treebank for HPSG In Proceedings of The First Workshop on Treebanks and Linguistic Theories
(TLT2002), Sozopol, Bulgaria.
Toutanova, Kristina, Christoper D. Manning, Stephan Oepen. 2002. Parse Ranking for a Rich HPSG Grammar In
Proceedings of The First Workshop on Treebanks and Linguistic Theories (TLT2002), Sozopol, Bulgaria.
Stephan Oepen. [incr tsdb()] - Competence and Performance Laboratory. User Manual.Technical Report.
Computational Linguistics. Saarland University (in preparation).
Robert Malouf and Gertjan van Noord. 2004. "Wide coverage parsing with stochastic attribute value grammars."
In IJCNLP-04 Workshop: Beyond shallow analyses - Formalisms and statistical modeling for deep analyses.
Toutanova, Kristina, Christopher D. Manning, Stuart M. Shieber, Dan Flickinger, and Stephan Oepen. 2002. Parse
Disambiguation for a Rich HPSG Grammar. First Workshop on Treebanks and Linguistic Theories (TLT2002), pp. 253-263.
Sozopol, Bulgaria.