ITS 015: Compiler Construction

Download Report

Transcript ITS 015: Compiler Construction

Compiler Construction
Syntax Analysis
Top-down parsing
1
Syntax Analysis, continued
2
Syntax analysis
Last week we covered



The goal of syntax analysis
Context-free grammars
Top-down parsing (a simple but weak parsing method)
Today, we will


Wrap up top-down parsing, including LL(1)
Start on bottom-up parsing


Shift-reduce parsers
LR parsers: SLR(1), LR(1), LALR(1)
3
Top-Down Parsing
4
Recursive descent (Last Week)
Recursive descent parsers simply try to build a parse
tree, top-down, and BACKTRACK on failure.
Recursion and backtracking are inefficient.
It would be better if we always knew the correct action
to take.
It would be better if we could avoid recursive procedure
calls during parsing.
PREDICTIVE PARSERS can solve both problems.
5
Predictive parsers
A predictive parser always knows which production to
use, so backtracking is not necessary.
Example: for the productions
stmt -> if ( expr ) stmt else stmt
| while ( expr ) stmt
| for ( stmt expr stmt ) stmt
a recursive descent parser would always know which
production to use, depending on the input token.
6
Transition diagrams
Transition diagrams can describe recursive parsers, just
like they can describe lexical analyzers, but the
diagrams are slightly different.
Construction:
1. Eliminate left recursion from G
2. Left factor G
3. For each non-terminal A, do
1.
2.
Create an initial and final (return) state
For each production A -> X1 X2 … Xn, create a path from
the initial to the final state with edges X1 X2 … Xn.
7
Using transition diagrams
Begin in the start state for the start symbol
When we are in state s with edge labeled by terminal a
to state t, if the next input symbol is a, move to state
t and advance the input pointer.
For an edge to state t labeled with non-terminal A,
jump to the transition diagram for A, and when
finished, return to state t
For an edge labeled ε, move immediately to t.
Example (4.15 in text): parse the string “id + id * id”
8
Example transition diagrams
An expression grammar
with left recursion and
ambiguity removed:
Corresponding transition
diagrams:
E -> T E’
E’ -> + T E’ | ε
T -> F T’
T’ -> * F T’ | ε
F -> ( E ) | id
9
Predictive parsing without recursion
To get rid of the recursive procedure calls, we maintain
our own stack.
10
The parsing table and parsing program
The table is a 2D array M[A,a] where A is a nonterminal
symbol and a is a terminal or $.
At each step, the parser considers the top-of-stack
symbol X and input symbol a:



If both are $, accept
If they are the same (nonterminals), pop X, advance input
If X is a nonterminal, consult M[X,a]. If M[X,a] is “ERROR”
call an error recovery routine. Otherwise, if M[X,a] is a
production of he grammar X -> UVW, replace X on the stack
with WVU (U on top)
11
Example
Use the table-driven predictive parser to parse
id + id * id
Assuming parsing table
Initial stack is $E
Initial input is id + id * id $
12
Building a predictive parse table
We still don’t know how to create M, the parse table.
The construction requires two functions: FIRST and
FOLLOW.
For a string of grammar symbols α, FIRST(α) is the set
of terminals that begin all possible strings derived
from α. If α =*> ε, then ε is also in FIRST(α).
FOLLOW(A) for nonterminal A is the set of terminals
that can appear immediately to the right of A in some
sentential form. If A can be the last symbol in a
sentential form, then $ is also in FOLLOW(A).
13
How to compute FIRST(α)
1.
2.
If X is a terminal, FIRST(X) = X.
Otherwise (X is a nonterminal),
1.
2.
1. If X -> ε is a production, add ε to FIRST(X)
2. If X -> Y1 … Yk is a production, then place a in FIRST(X)
if for some i, a is in FIRST(Yi) and Y1…Yi-1 =*> ε.
Given FIRST(X) for all single symbols X,
Let FIRST(X1…Xn) = FIRST(X1)
If ε ∈ FIRST(X1), then add FIRST(X2), and so on…
14
How to compute FOLLOW(A)
Place $ in FOLLOW(S) (for S the start symbol)
If A -> α B β, then FIRST(β)-ε is placed in FOLLOW(B)
If there is a production A -> α B or a production A -> α
B β where β =*> ε, then everything in FOLLOW(A) is
in FOLLOW(B).
Repeatedly apply these rules until no FOLLOW set
changes.
15
Example FIRST and FOLLOW
For our favorite grammar:
E -> TE’
E’ -> +TE | ε
T -> FT’
T’ -> *FT’ | ε
F -> (E) | id
What is FIRST() and FOLLOW() for all nonterminals?
16
Parse table construction with
FIRST/FOLLOW
Basic idea: if A -> α and a is in FIRST(α), then we expand A to α
any time the current input is a and the top of stack is A.
Algorithm:
For each production A -> α in G, do:
For each terminal a in FIRST(α) add A -> α to M[A,a]
If ε ∈ FIRST(α), for each terminal b in FOLLOW(A), do:
add A -> α to M[A,b]
If ε ∈ FIRST(α) and $ is in FOLLOW(A), add A -> α to M[A,$]
Make each undefined entry in M[ ] an ERROR
17
Example predictive parse table
construction
For our favorite grammar:
E -> TE’
E’ -> +TE | ε
T -> FT’
T’ -> *FT’ | ε
F -> (E) | id
What the predictive parsing table?
18
LL(1) grammars
The predictive parser algorithm can be applied to ANY
grammar.
But sometimes, M[ ] might have multiply defined
entries.
Example: for if-else statements and left factoring:
stmt -> if ( expr ) stmt optelse
optelse -> else stmt | ε
When we have “optelse” on the stack and “else” in the
input, we have a choice of how to expand optelse
(“else” is in FOLLOW(optelse) so either rule is
possible)
19
LL(1) grammars
If the predictive parsing construction for G leads to a
parse table M[ ] WITHOUT multiply defined entries,
we say “G is LL(1)”
1 symbol of lookahead
Leftmost derivation
Left-to-right scan of the input
20
LL(1) grammars
Necessary and sufficient conditions for G to be LL(1):
If A -> α | β
1. There does not exist a terminal a such that
a ∈ FIRST(α) and a ∈ FIRST(β)
2. At most one of α and β derive ε
3. If β =*> ε, then FIRST(α) does not intersect
with FOLLOW(β).
This is the same as saying the
predictive parser always
knows what to do!
21
Top-down parsing summary
RECURSIVE DESCENT parsers are easy to build, but
inefficient, and might require backtracking.
TRANSITION DIAGRAMS help us build recursive descent
parsers.
For LL(1) grammars, it is possible to build PREDICTIVE
PARSERS with no recursion automatically.



Compute FIRST() and FOLLOW() for all nonterminals
Fill in the predictive parsing table
Use the table-driven predictive parsing algorithm
22
Bottom-Up Parsing
23
Bottom-up parsing
Now, instead of starting with the start symbol and
working our way down, we will start at the bottom of
the parse tree and work our way up.
The style of parsing is called SHIFT-REDUCE
SHIFT refers to pushing input symbols onto a stack.
REDUCE refers to “reduction steps” during a parse:


We take a substring matching the RHS of a rule
Then replace it with the symbol on the LHS of the rule
If you can reduce until you have just the start symbol,
you have succeeded in parsing the input string.
24
Reduction example
Grammar:
S -> aABe
A -> Abc | b
B -> d
Input: abbcbcde
Reduction steps: abbcbcde
In reverse, the
reduction traces
aAbcbcde
out a rightmost
aAbcde
derivation.
aAde
aABe
S
<-- SUCCESS!
25
Handles
The HANDLE is the part of a sentential form that gets
reduced in a backwards rightmost derivation.
Sometimes part of a sentential form will match a RHS in
G, but if that string is NOT reduced in the backwards
rightmost derivation, it is NOT a handle.
Shift-reduce parsing, then, is really all about finding the
handle at each step then reducing the handle.
If we can always find the handle, we never have to
backtrack.
Finding the handle is called HANDLE PRUNING.
26
Shift-reduce parsing with a stack
A stack helps us find the handle for each reduction step.
The stack holds grammar symbols.
An input buffer holds the input string.
$ marks the bottom of the stack and the end of input.
Algorithm:
1. Shift 0 or more input symbols onto the stack, until a
handle β is on top of the stack.
2. Reduce β to the LHS of the appropriate production.
3. Repeat until we see $S on stack and $ in input.
27
Shift-reduce example
Grammar:
1.
STACK
$
E -> E + E
E -> E * E
E -> ( E )
E -> id
INPUT
id+id*id$
w = id + id * id
ACTION
shift
28
Shift-reduce parsing actions
SHIFT:
The next input symbol is pushed onto the
stack.
REDUCE: When the parser knows the right end of a
handle is on the stack, the handle is
replaced with the corresponding LHS.
ACCEPT: Announce success (input is $, stack is $S)
ERROR: The input contained a syntax error; call an
error recovery routine.
29
Conflicts during shift/reduce parsing
Like predictive parsers, sometimes a shift-reduce
parser won’t know what to do.
A SHIFT/REDUCE conflict occurs when the parser can’t
decide whether to shift the input symbol or reduce
the current top of stack.
A REDUCE/REDUCE conflict occurs when the parser
doesn’t know which of two or more rules to use for
reduction.
A grammar whose shift-reduce parser contains errors is
said to be “Not LR”
30
Example shift/reduce conflict
Ambiguous grammars are NEVER LR.
stmt -> if ( expr ) stmt
| if ( expr ) stmt else stmt
| other
If we have a shift-reduce parser in configuration
STACK
INPUT
… if ( expr ) stmt
else … $
what to do?


We could reduce “if ( expr ) stmt” to “stmt” (assuming the else is
part of a different surrounding if-else statement)
We could also shift the “else” (assuming this else goes with the
current if)
31
Example reduce/reduce conflict
Some languages use () for function calls AND array refs.
stmt -> id ( parameter_list )
stmt -> expr := expr
parameter_list -> parameter_list , parameter
parameter_list -> parameter
parameter -> id
expr -> id ( expr_list )
expr -> id
expr_list -> expr_list , expr
expr_list -> expr
32
Example reduce/reduce conflict
For input A(I,J) we would get token stream id(id,id)
The first three tokens would certainly be shifted:
STACK
INPUT
… id ( id
, id ) …
The id on top of the stack needs to be reduced, but we have two
choices: parameter -> id OR expr -> id
The stack gives no clues. To know which rule to use, we need to
look up the first ID in the symbol table to see if it is a procedure
name or an array name.
One solution is to have the lexer return “procid” for procedure
names. Then the shift-reduce parser can look into the stack to
decide which reduction to use.
33
LR (Bottom-Up) Parsers
34
Relationship between parser types
35
LR parsing
A major type of shift-reduce parsing is called LR(k).
“L” means left-to-right scanning of the input
“R” means rightmost derivation
“k” means lookahead of k characters (if omitted, assume k=1)
LR parsers have very nice properties:




They can recognize almost all programming language constructs
for which we can write a CFG
They are the most powerful type of shift-reduce parser, but they
never backtrack, and are very efficient
They can parse a proper superset of the languages parsable by
predictive parsers
They tell you as soon as possible when there’s a syntax error.
DISADVANTAGE: hard to build by hand (we need something like
yacc)
36
LR parsing
37
LR parsing
The parser’s structure is similar to predictive parsing.
The STACK now stores pairs (Xi, si).


Xi is a grammar symbol.
si is a STATE.
The parse table now has two parts: ACTION and GOTO.
The action table specifies whether to SHIFT, REDUCE,
ACCEPT, or flag an ERROR given the state on the
stack and the current input.
The goto table specifies what state to go to after a
reduction is performed.
38
Parser configurations
A CONFIGURATION of the LR parser is a pair (STACK,
INPUT): ( s0 X1 s1 … Xm sm, ai ai+1 … an $ )
The stack configuration is just a list of the states and
grammar symbols currently on the stack.
The input configuration is the list of unprocessed input
symbols.
Together, the configuration represents a right-sentential
form X1 … Xm ai ai+1 … an (some intermediate step in
a right derivation of the input from the start symbol)
39
The LR parsing algorithm
At each step, the parser is in some configuration.
The next move depends on reading ai from the input
and sm from the top of the stack.




If action[sm,ai] = shift s, we execute a SHIFT move, entering
the configuration ( s0 X1 s1 … Xm sm ai s, ai+1 … an $ ).
If action[sm,ai] = reduce A -> β, then we enter the
configuration ( s0 X1 s1 … Xm-r sm-r A s, ai+1 … an $ ), where r
= | β | and s = goto[sm-r,A].
If action[sm,ai] = accept, we’re done.
If action[sm,ai] = error, we call an error recovery routine.
40
LR parsing example
Grammar:
1. E -> E + T
2. E -> T
3. T -> T * F
4. T -> F
5. F -> ( E )
6. F -> id
41
LR parsing example
CONFIGURATIONS
STACK
INPUT
0
id * id + id $
ACTION
shift 5
42
LR grammars
If it is possible to construct an LR parse table for G, we
say “G is an LR grammar”.
LR parsers DO NOT need to parse the entire stack to
decide what to do (other shift-reduce parsers might).
Instead, the STATE symbol summarizes all the
information needed to make the decision of what to
do next.
The GOTO function corresponds to a DFA that knows
how to find the HANDLE by reading the top of the
stack downwards.
In the example, we only looked at 1 input symbol at a
time. This means the grammar is LR(1).
43
How to construct an LR parse table?
We will look at 3 methods:



Simple LR (SLR): simple but not very powerful
Canonical LR: very powerful but too many states
LALR: almost as powerful with many fewer states
yacc uses the LALR algorithm.
44