RMRS some background and current work Talk overview RMRS: integrating processors via semantics  Underspecified semantics from shallow processing  Integration experiments with broadcoverage systems/grammars (LinGO ERG and.

Download Report

Transcript RMRS some background and current work Talk overview RMRS: integrating processors via semantics  Underspecified semantics from shallow processing  Integration experiments with broadcoverage systems/grammars (LinGO ERG and.

RMRS
some background and
current work
Talk overview
RMRS: integrating processors via
semantics
 Underspecified semantics from shallow
processing
 Integration experiments with broadcoverage systems/grammars (LinGO
ERG and RASP)
 Planned work

Integrating processing
No single system can do everything:
deep and shallow processing have
inherent strengths and weaknesses
 Domain-dependent and domainindependent processing must be linked
 Parsers and generators
 Common representation for processing
`above sentence level’ (e.g., anaphora)

Compositional semantics as a
common representation





Need a common representation language for
systems: pairwise compatibility between systems is
too limiting
Syntax is theory-specific and unnecessarily
language-specific
Eventual goal should be semantics
Core idea: shallow processing gives underspecified
semantic representation, so deep and shallow
systems can be integrated
Full interlingua / common lexical semantics is too
difficult (certainly currently), but can link predicates
to ontologies, etc.
Shallow processing and
underspecified semantics



Integrated parsing: shallow parsed phrases
incorporated into deep parsed structures
Deep parsing invoked incrementally in
response to information needs
Reuse of knowledge sources:




domain knowledge, recognition of named entities,
transfer rules in MT
Integrated generation
Formal properties clearer, representations
more generally usable
Deep semantics taken as normative
RMRS approach: current and
planned applications

Question answering:



Information extraction:




Deep Thought
Chemistry texts (SciBorg (?))
Dictionary definition parsing for Japanese and English


Cambridge CSTIT: deep parse questions, shallow parse answers
QA from structured knowledge: Frank et al
Bond and Flickinger
Rhetorical structure, multi-document summarization, email
response ...
also LOGON: semantic transfer. MRSs from LFG used in HPSG
generator.
RMRS: Extreme
underspecification

Goal is to split up semantic representation
into minimal components (cf Verbmobil VITs)





Scope underspecification (MRS)
Splitting up predicate argument structure
Explicit equalities
Hierarchies for predicates and sorts
Compatibility with deep grammars:


Sorts and (some) closed class word information in
SEM-I (API for grammar, more later)
No lexicon for shallow processing (apart from POS
tags and possibly closed class words)
RMRS principles
Split up information content as much as
possible
 Accumulate information monotonically
by simple operations
 Don’t represent what you don’t know
but preserve everything you do know
 Use a flat representation to allow pieces
to be accessed individually

Separating arguments
lb1:every(x,h9,h6), lb2:cat(x), lb5:dog1(y),
lb4:some(y,h8,h7), lb3:chase(e,x,y),
h9=lb2,h8=lb5
goes to:
lb1:every(x), RSTR(lb1,h9), BODY(lb1,h6),
lb2:cat(x), lb5:dog1(y), lb4:some(y),
RSTR(lb4,h8), BODY(lb4,h7),
lb3:chase(e),ARG1(lb3,x),ARG2(lb3,y),
h9=lb2,h8=lb5
Naming conventions:predicate
names without a lexicon
lb1:_every_q(x1sg),RSTR(lb1,h9),BODY(lb1,h6),
lb2:_cat_n(x2sg),
lb5:_dog_n_1(x4sg),
lb4:_some_q(x3sg),RSTR(lb4,h8),BODY(lb4,h7),
lb3:_chase_v(esp),ARG1(lb3,x2sg),ARG2(lb3,x4sg)
h9=lb2,h8=lb5, x1sg=x2sg,x3sg=x4sg
POS output as
underspecification
DEEP –
lb1:_every_q(x1sg), RSTR(lb1,h9), BODY(lb1,h6),
lb2:_cat_n(x2sg), lb5:_dog_n_1(x4sg),
lb4:_some_q(x3sg), RSTR(lb4,h8),
BODY(lb4,h7),lb3:_chase_v(esp),
ARG1(lb3,x2sg),ARG2(lb3,x4sg), h9=lb2,h8=lb5,
x1sg=x2sg,x3sg=x4sg
POS –
lb1:_every_q(x1), lb2:_cat_n(x2sg),
lb3:_chase_v(epast), lb4:_some_q(x3),
lb5:_dog_n(x4sg)
POS output as
underspecification
DEEP –
lb1:_every_q(x1sg), RSTR(lb1,h9),BODY(lb1,h6),
lb2:_cat_n(x2sg), lb5:_dog_n_1(x4sg),
lb4:_some_q(x3sg), RSTR(lb4,h8),
BODY(lb4,h7),lb3:_chase_v(esp),
ARG1(lb3,x2sg),ARG2(lb3,x3sg), h9=lb2,h8=lb5,
x1sg=x2sg,x3sg=x4sg
POS –
lb1:_every_q(x1), lb2:_cat_n(x2sg),
lb3:_chase_v(epast), lb4:_some_q(x3),
lb5:_dog_n(x4sg)
Semantics from RASP



RASP: robust, domain-independent,
statistical parsing (Briscoe and Carroll)
can’t produce conventional semantics
because no subcategorization
can often identify arguments:


S -> NP VP
NP supplies ARG1 for V
potential for partial identification:


VP -> V NP
S -> NP S NP might be ARG2 or ARG3
Underspecification of
arguments
ARGN
ARG1or2
ARG1
ARG2or3
ARG2
ARG3
RASP arguments can be specified as ARGN, ARG2or3 etc
Also useful for Japanese deep parsing?
RMRS construction

ERG etc – uses MRS -> RMRS converter
argument splitting etc
 also RMRS -> MRS conversion

POS-RMRS: tag lexicon
 RASP-RMRS: tag lexicon plus semantic
rules associated with RASP rules to
match ERG


defaults when no rule RMRS specified
RMRS composition with nonlexicalized grammars


MRS composition assumes a lexicalized
approach: algebra defined in Copestake,
Lascarides and Flickinger (2001)
RMRS with non-lexicalised grammars: has
similar basic algebra




without lexical subcategorization, rely on grammar
rules to provide the ARGs
`anchors’ rather than slots, to ground the ARGs
(single anchor for RASP)
developed on basis of semantic test suite
most rules written by Anna Ritchie
Some cat sleeps (in RASP)
[h3,e], <h3>, {h3:_sleep(e)}
sleeps
[h,x], <h1>, {h1:_some(x),RSTR(h1,h2),h2:_cat(x)}
some cat
S->NP VP:
Head=VP, ARG1(<VP anchor>,<NP hook.index>)
[h3,e], <h3>, {h3:_sleep(e), ARG1(h3,x),
h1:_some(x),RSTR(h1,h2),h2:_cat(x)}
some cat sleeps
Real rule ...
<!--rule>
<name>S/np_vp</name>
<dtrs><dtr>NP</dtr><dtr>VP</dtr></dtrs>
<head>RULE</head>
<semstruct>
<hook><index>E</index><label>H1</label></hook>
<slots><noanchor/></slots>
<ep><gpred>PRPSTN_M_REL</gpred><label>H1</label><var>H2</var></ep>
<rarg><rargname>ARG1</rargname><label>H3</label><var>X</var></rarg>
<hcons hreln='qeq'><hi><var>H2</var></hi><lo><var>H</var></lo></hcons>
</semstruct>
<equalities><rv>X</rv><dh><dtr>NP</dtr><he>INDEX</he></dh></equalities>
<equalities><rv>H</rv><dh><dtr>VP</dtr><he>LABEL</he></dh></equalities>
<equalities><rv>H3</rv><dh><dtr>VP</dtr><he>ANCHOR</he></dh></equalities>
<equalities><rv>E</rv><dh><dtr>VP</dtr><he>INDEX</he></dh></equalities>
</rule-->
ERG-RMRS / RASP-RMRS
Inchoative
Infinitival subject (unbound in
RASP-RMRS)
Ditransitive: missing ARG3
Mismatch: Expletive it
Mismatch: larger numbers
Comments on RASP-RMRS




Fast enough (not significant compared to RASP
processing time because no ambiguity)
Too many RASP rules! Need to generalise over
classes.
Requires SEM-I – API for MRS/RMRS from deep
grammar
RASP and ERG may change:



compatible test suites – semi-automatic rule update?
alternative technique for composition?
Parse selection – need to generalise over RMRSs

weighted intersections of RMRSs (cf RASP grammatical
relations)
SEM-I: semantic interface


Meta-level: manually specified `grammar’
relations (constructions and closed-class)
Object-level: linked to lexical database for
deep grammars



Object-level SEM-I auto-generated from expanded
lexical entries in deep grammars (because type
can contribute relations)
Validation of other lexicons
Need closed class items for RMRS
construction from shallow processing
Alignment and XML

Comparing RMRSs for same text
efficiently uses characterization
labels RMRSs according to their source in
the text
 currently characters, but byte offset?
Japanese etc?

RMRS-XML
 RMRS seen as levels of mark-up:
standoff annotation

SciBorg: Chemistry texts

eScience project starting in October at Cambridge


Computer Laboratory (Copestake, Teufel), Chemistry
(Murray-Rust), CeSC (Parker)
Aims:




Develop an NL markup language which will act as a platform
for extraction of information. Link to semantic web
languages.
Develop IE technology and core ontologies for use by
publishers, researchers, readers, vendors and regulatory
organisations.
Model scientific argumentation and citation purpose in order
to support novel modes of information access.
Demonstrate the applicability of this infrastructure in a realworld eScience environment.
Research markup


Chemistry: The primary aims of the present study
are (i) the synthesis of an amino acid derivative that
can be incorporated into proteins /via/ standard
solid-phase synthesis methods, and (ii) a test of the
ability of the derivative to function as a photoswitch
in a biological environment.
Computational Linguistics: The goal of the work
reported here is to develop a method that can
automatically refine the Hidden Markov Models to
produce a more accurate language model.
RMRS and research markup
Specify cues in RMRS
 Deep process cues: feasible because
domain-independent

more general and reliable than shallow
techniques
 allows for complex interrelationships


Use zones for advanced citation maps
and other enhancements to repositories
Conclusions
RMRS: semantic representation
language allowing linking of deep and
shallower processors
 RMRS construction: phrase-level
compatibility between processors
 Many potential applications
