Programs without Bugs in the 21st Century? (or

Download Report

Transcript Programs without Bugs in the 21st Century? (or

LANGUAGE TRANSLATORS:
WEEK 18
LECTURE:
“Shift-Reduce” Parsers: The JavaCup Parser-Generator
CREATES “Shift-Reduce” bottom up Parsers, they are
very commonly used and superior to top down LL
parsers
TUTORIAL (after reading week):
How to create a Shift-Reduce Parser
NEXT WEEK is READING WEEK -- WORK:
Do “Formative Assessment” on the Website
Rashid is doing some “feedback” sessions on coursework
To Create a SR Parser
This method is embedded in JavaCup
The Method :
INPUTS a BNF Grammar, G, (with numbered
rules, tokens and non-terminals as usual)
OUTPUTS a SR parsing table which can be
used as a parser as shown last week.
Jargon 1 : ITEM
An ITEM is a grammar’s rule with a “DOT”
somewhere in its Right Hand Side.
The DOT represents a notional parsing position
e.g. For grammar 3.20 on handout here are 3
distinct items:
.
S ::= ( L)
.
L ::= L , S
S ::=
. (L)
Jargon 2: Closure of an Item
The CLOSURE of an item R (or set of items R) is the set C
of items such that
(1) C contains R
AND
(2) IF there is a member of C of the form
X ::= w .Y z
where Y is a non-terminal, w, z are any strings then ALL the
defining production rules of Y must appear in C with the
DOT at the start of their RHS.
E.g. for Grammar 3.20, closure{ S’ ::= .S$} =
{ S’ ::= .S$ S ::= . (L) S::= . x }
Jargon 3 : “.” consuming symbols!
The DOT represents a notional parsing position
.
S ::= ( L)
.
L ::= L , S
S ::=
. (L)
.
The consumes a symbol means it moves over it to
the right past the symbol ….
.
.
L ::= L . , S consumes “,” this becomes L ::= L, . S
S ::= ( L) consumes L this becomes S ::= (L )
Generation of SR Parsers –
2 stage process:
1: CREATE A FSM
2: CREATE A PARSING TABLE FROM THE MACHINE
w
TOKENS AND NON-TERMINAL
2
1
W Z
x
z
3
5
4
1
2
3
..
STATE
NUMbERS
X …. ETC
Stage 1:
1: CREATE A FSM WITH
 EACH NODE = A CLOSURE OF ITEMS

AN ARC COMES OUT OF A NODE FOR EVERY
SYMBOL THAT CN BE “CONSUMED” BY THE “.”. THE
ARC IS ANNOTATED WITH THE CONSUMED SYMBOL
(NON-TEMINAL OR TOKEN)
Stage 1:HOW TO CREATE FSM
1.
2.
FIRST NODE IS THE CLOSURE OF ALL RULES WITH
THE GENERATING SYMBOL AT ITS LHS.
TO GENERATE NEW NODES
1.
2.
EACH ARC FROM A NODE IS GENERATED BY
ASSUMING THAT EVERY “SYMBOL” IS CONSUMED
(one step from left to right) BY THE “.” IN ITEM(S) OF
THE NODE
CREATE NEW NODE as the CLOSURE of the item(s)
where the “.” has consumed the symbol annotating the
connecting arc
END when no more symbols can be consumed – all “.” have
reached the right hand side
Stage 2: CREATE THE TABLE FROM THE FSM
1. NUMBER STATES 1,2,3, ...
2. For a transition n ---- x ----> m where m contains
an item of the form Z ::= ... w.
Put ‘reduce X’ all along row m under the token
column, where X is the no. of Z ::= ... W
Otherwise:
3. For a transition n ---- x ----> m where x is a
token, put ‘Shift m’ in row n column x
4. For a transition n ---- Y ----> m where Y is a nonterminal, put ‘goto m’ in row n column Y
LR Parsers - Summary


In this lecture we have seen HOW SR parsers
work and HOW they can be automatically
created from a grammar specification.
NB
 The
SR parse is an “LR” parser means parse
string from Left to right, but build up the parse tree
from the Right of the string first.
 “Most” parsers are “LR(1)” - the “1” means they
look at the 1 next token in the string.