Transcript Slides

Machine Learning Group
Learning to Transform Natural to
Formal Languages
Rohit J. Kate Yuk Wah Wong
Raymond J. Mooney
Machine Learning Group
Department of Computer Sciences
University of Texas at Austin
July 13, 2005
University of Texas at Austin
Introduction
• Semantic Parsing: Transforming natural language
sentences into executable complete formal
representations
• Different from Semantic Role Labeling which
involves only shallow semantic analysis
• Two application domains:
– CLang: RoboCup Coach Language
– GeoQuery: A Database Query Application
2
CLang: RoboCup Coach Language
• In RoboCup Coach competition teams compete to
coach simulated players
• The coaching instructions are given in a formal
language called CLang
Coach
If the ball is in our
penalty area, then all our
players except player 4
should stay in our half.
Simulated soccer field
Semantic Parsing
CLang
((bpos (penalty-area our))
(do (player-except our{4}) (pos (half
our)))
3
GeoQuery: A Database Query Application
• Query application for U.S. geography database
containing about 800 facts [Zelle & Mooney, 1996]
How many
cities are
there in the
US?
User
Semantic Parsing
Query answer(A,
count(B, (city(B), loc(B, C),
const(C, countryid(USA))),A))
4
Outline
• Semantic Parsing using Transformation Rules
• Learning Transformation Rules
• Experiments
• Conclusions
5
Semantic Parsing using Transformation Rules
• SILT (Semantic Interpretation by Learning
Transformations)
• Uses pattern-based transformation rules which
map natural language phrases to formal language
constructs
• Transformation rules are repeatedly applied to the
sentence to construct its formal language
expression
6
Formal Language Grammar
NL:
If our player 4 has the ball, our player 4 should shoot.
CLang: ((bowner our {4}) (do our {4} shoot))
CLang Parse:
RULE
CONDITION
bowner
DIRECTIVE
TEAM
UNUM
our
4
do
TEAM
UNUM
ACTION
our
4
shoot
• Non-terminals: RULE, CONDITION, ACTION…
• Terminals: bowner, our, 4…
• Productions: RULE  CONDITION DIRECTIVE
DIRECTIVE  do TEAM UNUM ACTION
ACTION  shoot
7
Transformation Rule Representation
• Rule has two components: a natural language pattern and an
associated formal language template
• Two versions of SILT:
– String-based rules: used to convert natural language
word gap
sentence directly to formal language
String-pattern
TEAM
[1] ball tree to formal
– Tree-based
rules: used
to UNUM
converthas
syntactic
language
Template
CONDITION  (bowner TEAM {UNUM})
S
Treepattern
VP
NP
TEAM
Template
UNUM
NP
VBZ
has
DT
NN
the
ball
CONDITION  (bowner TEAM {UNUM})
8
Example of Semantic Parsing
If
our player 4 has the ball,
our player 4 should shoot.
our
player 4
shoot
TEAM  our
UNUM  4
ACTIONshoot
TEAM UNUM has [1] ball
If CONDITION, DIRECTIVE.
CONDITION  (bowner TEAM {UNUM})
RULE  (CONDITION DIRECTIVE)
TEAM UNUM should ACTION
DIRECTIVE  (do TEAM {UNUM} ACTION)
9
Example of Semantic Parsing
If TEAM
our player 4 has the ball, TEAM
our player 4 should shoot .
our
our
our
player 4
shoot
TEAM  our
UNUM  4
ACTIONshoot
TEAM UNUM has [1] ball
If CONDITION, DIRECTIVE.
CONDITION  (bowner TEAM {UNUM})
RULE  (CONDITION DIRECTIVE)
TEAM UNUM should ACTION
DIRECTIVE  (do TEAM {UNUM} ACTION)
10
Example of Semantic Parsing
If TEAM player 4 has the ball, TEAM player 4 should shoot .
our
our
our
player 4
shoot
TEAM  our
UNUM  4
ACTIONshoot
TEAM UNUM has [1] ball
If CONDITION, DIRECTIVE.
CONDITION  (bowner TEAM {UNUM})
RULE  (CONDITION DIRECTIVE)
TEAM UNUM should ACTION
DIRECTIVE  (do TEAM {UNUM} ACTION)
11
Example of Semantic Parsing
UNUM4 has the ball, TEAM player
UNUM4 should shoot .
If TEAM player
our
4
our
4
our
player 4
shoot
TEAM  our
UNUM  4
ACTIONshoot
TEAM UNUM has [1] ball
If CONDITION, DIRECTIVE.
CONDITION  (bowner TEAM {UNUM})
RULE  (CONDITION DIRECTIVE)
TEAM UNUM should ACTION
DIRECTIVE  (do TEAM {UNUM} ACTION)
12
Example of Semantic Parsing
If TEAM UNUM has the ball, TEAM
our
4
our
UNUM should shoot .
4
our
player 4
shoot
TEAM  our
UNUM  4
ACTIONshoot
TEAM UNUM has [1] ball
If CONDITION, DIRECTIVE.
CONDITION  (bowner TEAM {UNUM})
RULE  (CONDITION DIRECTIVE)
TEAM UNUM should ACTION
DIRECTIVE  (do TEAM {UNUM} ACTION)
13
Example of Semantic Parsing
If TEAM UNUM has the ball, TEAM
our
4
our
UNUM should ACTION
shoot .
4
our
player 4
shoot
TEAM  our
UNUM  4
ACTIONshoot
shoot
TEAM UNUM has [1] ball
If CONDITION, DIRECTIVE.
CONDITION  (bowner TEAM {UNUM})
RULE  (CONDITION DIRECTIVE)
TEAM UNUM should ACTION
DIRECTIVE  (do TEAM {UNUM} ACTION)
14
Example of Semantic Parsing
If TEAM UNUM has the ball, TEAM
our
4
our
UNUM should ACTION.
4
our
player 4
shoot
TEAM  our
UNUM  4
ACTIONshoot
shoot
TEAM UNUM has [1] ball
If CONDITION, DIRECTIVE.
CONDITION  (bowner TEAM {UNUM})
RULE  (CONDITION DIRECTIVE)
TEAM UNUM should ACTION
DIRECTIVE  (do TEAM {UNUM} ACTION)
15
Example of Semantic Parsing
CONDITION
If TEAM UNUM
has the ball , TEAM
our
(bowner
4 our {4})
our
UNUM should ACTION.
4
our
player 4
shoot
TEAM  our
UNUM  4
ACTIONshoot
shoot
TEAM UNUM has [1] ball
If CONDITION, DIRECTIVE.
CONDITION  (bowner TEAM {UNUM})
RULE  (CONDITION DIRECTIVE)
TEAM UNUM should ACTION
DIRECTIVE  (do TEAM {UNUM} ACTION)
16
Example of Semantic Parsing
If
, TEAM
CONDITION
(bowner our {4})
our
UNUM should ACTION.
4
our
player 4
shoot
TEAM  our
UNUM  4
ACTIONshoot
shoot
TEAM UNUM has [1] ball
If CONDITION, DIRECTIVE.
CONDITION  (bowner TEAM {UNUM})
RULE  (CONDITION DIRECTIVE)
TEAM UNUM should ACTION
DIRECTIVE  (do TEAM {UNUM} ACTION)
17
Example of Semantic Parsing
If
, TEAM
CONDITION
(bowner our {4})
our
UNUM
DIRECTIVE
should ACTION .
(do our
4 {4} shoot)
our
player 4
shoot
TEAM  our
UNUM  4
ACTIONshoot
shoot
TEAM UNUM has [1] ball
If CONDITION, DIRECTIVE.
CONDITION  (bowner TEAM {UNUM})
RULE  (CONDITION DIRECTIVE)
TEAM UNUM should ACTION
DIRECTIVE  (do TEAM {UNUM} ACTION)
18
Example of Semantic Parsing
If
,
CONDITION
.
DIRECTIVE
(bowner our {4})
(do our {4} shoot)
our
player 4
shoot
TEAM  our
UNUM  4
ACTIONshoot
TEAM UNUM has [1] ball
If CONDITION, DIRECTIVE.
CONDITION  (bowner TEAM {UNUM})
RULE  (CONDITION DIRECTIVE)
TEAM UNUM should ACTION
DIRECTIVE  (do TEAM {UNUM} ACTION)
19
Example of Semantic Parsing
If
CONDITION RULE ,
.
DIRECTIVE
(bowner
((bowner
ourour
{4})
{4}) (do our {4} shoot))
(do our {4} shoot)
our
player 4
shoot
TEAM  our
UNUM  4
ACTIONshoot
TEAM UNUM has [1] ball
If CONDITION, DIRECTIVE.
CONDITION  (bowner TEAM {UNUM})
RULE  (CONDITION DIRECTIVE)
TEAM UNUM should ACTION
DIRECTIVE  (do TEAM {UNUM} ACTION)
20
Learning Transformation Rules
• SILT induces rules from a corpora of NL sentences
paired with their formal representations
• Patterns are learned for each production by
bottom-up rule learning
• For every production:
– Call those sentences positives whose formal
representations’ parses use that production
– Call the remaining sentences negatives
21
Rule Learning for a Production
CONDITION
(bpos REGION)
positives
•
•
•
•
•
•
•
The ball is in REGION , our player 7 is in REGION and no
opponent is around our player 7 within 1.5 distance.
If the ball is in REGION and not in REGION then player 3
should intercept the ball.
During normal play if the ball is in the REGION then player 7 ,
9 and 11 should dribble the ball to the REGION .
When the play mode is normal and the ball is in the REGION
then our player 2 should pass the ball to the REGION .
All players except the goalie should pass the ball to REGION if
it is in RP18.
If the ball is inside rectangle ( -54 , -36 , 0 , 36 ) then player 10
should position itself at REGION with a ball attraction of
REGION .
Player 2 should pass the ball to REGION if it is in REGION .
negatives
•
•
•
•
•
•
•
•
If our player 6 has the ball then he should take a shot on goal.
If player 4 has the ball , it should pass the ball to player 2 or 10.
If the condition DR5C3 is true , then player 2 , 3 , 7 and 8
should pass the ball to player 3.
During play on , if players 6 , 7 or 8 is in REGION , they should
pass the ball to players 9 , 10 or 11.
If "Clear_Condition" , players 2 , 3 , 7 or 5 should clear the ball
REGION .
If it is before the kick off , after our goal or after the opponent's
goal , position player 3 at REGION .
If the condition MDR4C9 is met , then players 4-6 should pass
the ball to player 9.
If Pass_11 then player 11 should pass to player 9 and no one
else.
• SILT applies greedy-covering, bottom-up rule induction
method that repeatedly generalizes positives until they start
covering negatives
22
Generalization of String Patterns
ACTION  (pos REGION)
Pattern 1: Always position player UNUM at REGION .
Pattern 2: Whenever the ball is in REGION, position player
UNUM near the REGION .
Find the highest scoring common subsequence:
score (c)  length(c)   * ( sum of word gaps )
23
Generalization of String Patterns
ACTION  (pos REGION)
Pattern 1: Always position player UNUM at REGION .
Pattern 2: Whenever the ball is in REGION, position player
UNUM near the REGION .
Find the highest scoring common subsequence:
score (c)  length(c)   * ( sum of word gaps )
Generalization: position player UNUM [2] REGION .
24
Generalization of Tree Patterns
REGION  (penalty-area TEAM)
Pattern 1:
Pattern 2
NP
NP
NN NN
TEAM POS penalty box
NP
PRP$
NN
NN
TEAM penalty area
’s
Find common subgraphs.
25
Generalization of Tree Patterns
REGION  (penalty-area TEAM)
Pattern 1:
Pattern 2
NP
NP
NP
PRP$
NN NN
NN
NN
TEAM penalty area
TEAM POS penalty box
’s
NP
Find common subgraphs.
*
Generalization:
NN
NN
TEAM penalty
26
Rule Learning for a Production
CONDITION  (bpos REGION)
positives
negatives
•
•
•
•
•
•
•
The ball is in REGION , our player 7 is in REGION and no
opponent is around our player 7 within 1.5 distance.
If the ball is in REGION and not in REGION then player 3
should intercept the ball.
During normal play if the ball is in the REGION then player 7 ,
9 and 11 should dribble the ball to the REGION .
When the play mode is normal and the ball is in the REGION
then our player 2 should pass the ball to the REGION .
All players except the goalie should pass the ball to REGION if it
is in REGION.
If the ball is inside REGION then player 10 should position itself
at REGION with a ball attraction of REGION .
Player 2 should pass the ball to REGION if it is in REGION .
•
•
•
•
•
•
•
•
If our player 6 has the ball then he should take a shot on goal.
If player 4 has the ball , it should pass the ball to player 2 or 10.
If the condition DR5C3 is true , then player 2 , 3 , 7 and 8
should pass the ball to player 3.
During play on , if players 6 , 7 or 8 is in REGION , they should
pass the ball to players 9 , 10 or 11.
If "Clear_Condition" , players 2 , 3 , 7 or 5 should clear the ball
REGION .
If it is before the kick off , after our goal or after the opponent's
goal , position player 3 at REGION .
If the condition MDR4C9 is met , then players 4-6 should pass
the ball to player 9.
If Pass_11 then player 11 should pass to player 9 and no one
else.
Bottom-up Rule Learner
ball is [2] REGION
it is in REGION
CONDITION  (bpos REGION)
CONDITION  (bpos REGION)
27
Rule Learning for a Production
CONDITION  (bpos REGION)
positives
negatives
•
•
•
•
•
•
•
The
CONDITION , our player 7 is in REGION and no
opponent is around our player 7 within 1.5 distance.
If the CONDITION and not in REGION then player 3 should
intercept the ball.
During normal play if the CONDITION
then player 7 , 9
and 11 should dribble the ball to the REGION .
When the play mode is normal and the
CONDITION then
our player 2 should pass the ball to the REGION .
All players except the goalie should pass the ball to REGION if
CONDITION.
If the
CONDITION then player 10 should position itself at
REGION with a ball attraction of REGION .
Player 2 should pass the ball to REGION if CONDITION .
•
•
•
•
•
•
•
•
If our player 6 has the ball then he should take a shot on goal.
If player 4 has the ball , it should pass the ball to player 2 or 10.
If the condition DR5C3 is true , then player 2 , 3 , 7 and 8
should pass the ball to player 3.
During play on , if players 6 , 7 or 8 is in REGION , they should
pass the ball to players 9 , 10 or 11.
If "Clear_Condition" , players 2 , 3 , 7 or 5 should clear the ball
REGION .
If it is before the kick off , after our goal or after the opponent's
goal , position player 3 at REGION .
If the condition MDR4C9 is met , then players 4-6 should pass
the ball to player 9.
If Pass_11 then player 11 should pass to player 9 and no one
else.
Bottom-up Rule Learner
ball is [2] REGION
it is in REGION
CONDITION  (bpos REGION)
CONDITION  (bpos REGION)
28
Rule Learning for All Productions
• Transformation rules for productions should
cooperate globally to generate complete semantic
parses
• Redundantly cover every positive example by
β = 5 best rules
coverage
accuracy
pos(r )
• Find goodness
the subset
these
(r ) ofpos
(r ) *rules which best cooperate
pos(r )  neg (r )
to generate complete semantic
parses on the
training data
29
Experimental Corpora
• CLang
– 300 randomly selected pieces of coaching advice from
the log files of the 2003 RoboCup Coach Competition
– 22.52 words on average in NL sentences
– 14.24 tokens on average in formal expressions
• GeoQuery [Zelle & Mooney, 1996]
– 250 queries for the given U.S. geography database
– 6.87 words on average in NL sentences
– 5.32 tokens on average in formal expressions
30
Experimental Methodology
• Evaluated using standard 10-fold cross validation
• Syntactic parses needed by tree-based version were
obtained by training Collins’ parser [Bikel, 2004] on WSJ
treebank and gold-standard parses of training sentences
• Correctness
– CLang: output exactly matches the correct representation
– Geoquery: the resulting query retrieves the same answer as the
correct representation
• Metrics
Precision 
| Correct Completed Parses |
| Completed Parses |
Recall 
|Correct Completed Parses|
|Sentences|
31
Compared Systems
• CHILL
– Learns control rules for shift-reduce parsing using
Inductive Logic Programming (ILP)
– CHILLIN [Zelle & Mooney, 1996]
– COCKTAIL [Tang & Mooney, 2001]
• GEOBASE
– Hand-built parser for GeoQuery [Borland International,
1988]
32
Precision Learning Curves for CLang
33
Recall Learning Curves for CLang
34
Precision Learning Curves for GeoQuery
35
Recall Learning Curves for GeoQuery
36
Related Work
• SCISSOR [Ge & Mooney, 2005]
– Integrates semantic and syntactic statistical parsing
– Requires extensive annotations but gives better results
• PRECISE [Popescu et al., 2003]
– Designed to work specially on NL database interfaces
• Speech Recognition Community [Zue & Glass, 2000]
– Simpler queries in ATIS corpus
37
Conclusions
• New approach for semantic parsing, SILT, which
uses transformation rules
• SILT learns transformation rules by doing bottomup rule induction exploiting the target language
grammar
• Tested on two very different domains, performs
better than previous ILP-based approaches
38
Thank You!
Our corpora can be downloaded from:
http://www.cs.utexas.edu/~ml/nldata.html
Questions??
39
F-measure Learning Curves for CLang
40
F-measure Learning Curves for GeoQuery
41
Extra Slide: Average Training Time in
Minutes
CLang
GeoQuery
SILT-string
3.2
0.35
CHILLIN
10.4
6.3
SILT-tree
81.4
21.5
COCKTAIL
_
39.6
42
Extra Slide: Variations of Rule Representation
• Context in the patterns:
in REGION
CONDITION  (bpos REGION)
43
Extra Slide: Variations of Rule Representation
• Context in the patterns:
the ball in REGION
CONDITION  (bpos REGION)
TEAM UNUM has the ball CONDITION
in REGION
(bpos REGION)
TEAM UNUM has [1] ball
CONDITION  (bowner TEAM {UNUM})
44
Extra Slide: Variations of Rule Representation
• Context in the patterns:
• Templates with multiple productions:
TEAM UNUM has the ball in REGION
CONDITION  (and (bwoner TEAM UNUM) (bpos REGION))
45
Extra Slide: Experimental Methodology
• Correctness
– CLang: output exactly matches the correct
representation
– Geoquery: the resulting query retrieves the same
answer as the correct representation
If the ball is in our penalty area, all our players
except player 4 should stay in our half.
Correct:
((bpos (penalty-area our))
(do (player-except our{4}) (pos (half
our)))
((bpos (penalty-area opp))
Output:
(do (player-except our{4}) (pos (half
our)))
46
Extra Slide: Future Work
• Hard-matching symbolic patterns are sometimes
too brittle, exploit string and tree kernels as
classifiers [Lodhi et al., 2002]
• Unified implementation of string and tree-based
versions for direct comparisons
47