Grammars – Chapter 2

Download Report

Transcript Grammars – Chapter 2

Programming Languages
2nd edition
Tucker and Noonan
Chapter 2
Syntax
A language that is simple to parse for the compiler is also simple to
parse for the human programmer.
N. Wirth
Copyright © 2006 The McGraw-Hill Companies, Inc.
Contents
2.1 Grammars
2.1.1 Backus-Naur Form
2.1.2 Derivations
2.1.3 Parse Trees
2.1.4 Associativity and Precedence
2.1.5 Ambiguous Grammars
2.2 Extended BNF
2.3 Syntax of a Small Language: Clite
2.3.1 Lexical Syntax
2.3.2 Concrete Syntax
2.4 Compilers and Interpreters
2.5 Linking Syntax and Semantics
2.5.1 Abstract Syntax
2.5.2 Abstract Syntax Trees
2.5.3 Abstract Syntax of Clite
Copyright © 2006 The McGraw-Hill Companies, Inc.
Translation/Execution – Compiler
review
Copyright © 2006 The McGraw-Hill Companies, Inc.
Thinking about Syntax
The syntax of a programming language is a precise
description of all its structurally correct programs.
Grammar rules are a common technique for
describing language syntax precisely.
Precise syntax was first used with Algol 60.
Three levels:
– Lexical syntax
– Concrete syntax
– Abstract syntax
Copyright © 2006 The McGraw-Hill Companies, Inc.
Levels of Syntax
Lexical syntax: describes the basic symbols of the
language (names, values, operators, etc.)
Concrete syntax: rules for writing expressions,
statements and programs.
Abstract syntax: describes an internal representation
of the program, emphasizes content over form.
The authors define Clite, a mini-language, to use as a
teaching tool in the study of syntax and semantics.
Copyright © 2006 The McGraw-Hill Companies, Inc.
2.1 Grammars
A metalanguage is a language used to define other
languages.
cf metaknowledge
A grammar is a set of rules, written in a
metalanguage, and used to define the syntax of a
language.
Copyright © 2006 The McGraw-Hill Companies, Inc.
2.1.1 Backus-Naur Form (BNF)
Notation for describing a context-free grammar
(see Chomsky hierarchy)
Sometimes called Backus Normal Form
First used to define syntax of Algol 60
Now used to define syntax of most major languages
Copyright © 2006 The McGraw-Hill Companies, Inc.
Elements of a Context-Free Grammar
Set of productions: P
terminal symbols: T
nonterminal symbols: N
start symbol: S  N
A production has the form
A 

A  N and ω is a string from N and T.
Copyright © 2006 The McGraw-Hill Companies, Inc.
Example: Binary Digits
Consider the grammar:
binaryDigit  0
binaryDigit  1
or equivalently:
binaryDigit  0 | 1
Here, | and  are metacharacters (metasymbols)
Copyright © 2006 The McGraw-Hill Companies, Inc.
2.1.2 Derivations
Consider the grammar:
Integer  Digit | Integer Digit
Digit  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
We can derive any unsigned integer, like 352, from
this grammar.
(Derivations can (1) produce all legal integers or (2)
show that a particular integer is correctly formed)
Copyright © 2006 The McGraw-Hill Companies, Inc.
Derivation of 352 as an Integer
A 6-step process, begins with the start symbol
Step 1: Integer  Integer Digit
Replace a nonterminal by a RHS of one of its rules:
Step 1: Integer  Integer Digit
Step 2:
 Integer Digit Digit
Step 3:
 Digit Digit Digit
Step 4:
 3 Digit Digit
Step 5:
 3 5 Digit
Step 6:
352
Finished when there are only terminals on the RHS
Copyright © 2006 The McGraw-Hill Companies, Inc.
A Different Derivation of 352
Integer  Integer Digit
 Integer 2
 Integer Digit 2
 Integer 5 2
 Digit 5 2
352
This is called a rightmost derivation, since at each
step the rightmost nonterminal is replaced.
(The first one was a leftmost derivation.)
Copyright © 2006 The McGraw-Hill Companies, Inc.
Notation for Derivations
Integer * 352
Means that 352 can be derived in a finite number of steps
using the grammar for Integer.
352  L(G)
Means that 352 is a member of the language defined by
grammar G.
Definition: the language L defined by a BNF grammar
G is the set of all terminal strings that can be derived
from the start symbol.
Copyright © 2006 The McGraw-Hill Companies, Inc.
2.1.3 Parse Trees
A parse tree is a graphical representation of a
derivation.
The root node of the tree is the start symbol.
Each internal node of the tree corresponds to a non-terminal
The child(ren) of a node represent a right-hand side of a
production for which the node is the left-hand side.
Each leaf node represents a terminal symbol of the derived
string, reading from left to right.
Copyright © 2006 The McGraw-Hill Companies, Inc.
E.g., The step Integer  Integer Digit
appears in the parse tree as:
Integer
Integer
Digit
Copyright © 2006 The McGraw-Hill Companies, Inc.
Parse Tree for 352
as an Integer
Figure 2.1
Copyright © 2006 The McGraw-Hill Companies, Inc.
Arithmetic Expression Grammar
The following grammar defines the language of
arithmetic expressions with 1-digit integers, addition,
and subtraction.
Expr  Expr + Term | Expr – Term | Term
Term  0 | ... | 9 | ( Expr )
Copyright © 2006 The McGraw-Hill Companies, Inc.
Parse of the String 5-4+3
Figure 2.2
Copyright © 2006 The McGraw-Hill Companies, Inc.
2.1.4 Associativity and Precedence
Grammars define associativity and precedence among
the operators in an expression.
Precedence: which operator is evaluated first; e.g., in the
expression “a + b / c”
Associativity: evaluation order for equal precedence
(adjacent) operators; e.g., in the expression “a - b + c”
Copyright © 2006 The McGraw-Hill Companies, Inc.
Grammar G1:
Consider the more interesting grammar G1:
Expr + Term | Expr – Term | Term
Term * Factor | Term / Factor |
Term % Factor | Factor
Factor  Primary ** Factor | Primary
Primary  0 | ... | 9 | ( Expr )
Expr 
Term 
Copyright © 2006 The McGraw-Hill Companies, Inc.
Parse of 4**2**3+5*6+7
for Grammar G1
Figure 2.3
Expr  Expr + Term
|Expr – Term
| Term
Term  Term * Factor
| Term / Factor
| Term % Factor
| Factor
Factor  Primary ** Factor
| Primary
Primary  0 | ... | 9
|( Expr )
Copyright © 2006 The McGraw-Hill Companies, Inc.
Associativity and Precedence for
Grammar G1 Table 2.1
Precedence
Associativity
Operators
3
right
**
2
left
* / %
1
left
+ The structure of the parse tree shows operator
precedence & associativity: Operators lower in the tree
are evaluated first.
An operation can’t be performed until its operands are
evaluated
Copyright © 2006 The McGraw-Hill Companies, Inc.
Precedence & Associativity in Grammars
An operator’s precedence is determined by the length
of the shortest derivation from the start symbol to
the operator (see Figure 2.3)
Left- or right- associativity is determined by left- or
right- recursion.
compare the operators ** and + in Figure 2.3
Copyright © 2006 The McGraw-Hill Companies, Inc.
2.1.5 Ambiguous Grammars
A grammar is ambiguous if one of its strings has two or
more different parse trees.
C, C++, and Java have a large number of
– operators and
– precedence levels
Instead of using a large grammar, we can:
– Write a smaller ambiguous grammar, and
– Give separate precedence and associativity rules (e.g.,
Table 2.1)
Copyright © 2006 The McGraw-Hill Companies, Inc.
An Ambiguous Expression Grammar G2
Expr → Expr Op Expr | ( Expr ) | Integer
Op → + | - | * | / | % | **
Notes:
– G2 is equivalent to G1. i.e., its language is the same.
– G2 has fewer productions and nonterminals than G1.
– However, G2 is ambiguous.
Copyright © 2006 The McGraw-Hill Companies, Inc.
Ambiguous Parse of 5-4+3
Using Grammar G2
Figure 2.4
Copyright © 2006 The McGraw-Hill Companies, Inc.
The Dangling Else
IfStatement → if ( Expression ) Statement |
if ( Expression ) Statement else
Statement
where
Statement → Assignment | IfStatement | Block
Suppose one of the statements was another If?
Copyright © 2006 The McGraw-Hill Companies, Inc.
Example
With which ‘if’ does the following ‘else’ associate ?
if (x < 0)
if (y < 0) y = y - 1;
else y = 0;
Answer: either one!
Copyright © 2006 The McGraw-Hill Companies, Inc.
The Dangling Else Ambiguity
Figure 2.5
Copyright © 2006 The McGraw-Hill Companies, Inc.
Solving The Dangling Else Ambiguity
Algol 60, C, C++: associate each else with
closest if; use {} or begin…end to override.
Algol 68, Modula, Ada: use explicit delimiter
to end every conditional (e.g., if…fi)
if (x < 0)
if (y<0)
y = y - 1;
else
y = x / y;
fi;
fi;
if (x < 0)
if (y<0)
y = y - 1;
fi;
else
y = x / y;
fi;
Copyright © 2006 The McGraw-Hill Companies, Inc.
Solving The Dangling Else Ambiguity
Java: rewrite the grammar to limit what can appear in a
conditional:
IfThenStatement → if ( Expression ) Statement
IfThenElseStatement → if ( Expression ) StatementNoShortIf
else Statement
The category StatementNoShortIf includes all
statement types except IfThenStatement.
Copyright © 2006 The McGraw-Hill Companies, Inc.
2.2 Extended BNF (EBNF)
BNF: recursion to represent iteration
EBNF: additional metacharacters represent iteration
– { } braces: show a series of zero or more occurrences
– ( ) parens: pick exactly one from the enclosed list
– [ ] brackets: pick zero or one from the enclosed list
How are metacharacters distinguished from terminal
symbols?
Copyright © 2006 The McGraw-Hill Companies, Inc.
Compare BNF/EBNF Examples
BNF
Expr
→ Term | Exp + Term | Exp - Term
IfStatement →if ( Exp ) Statement |
if ( Exp ) Statement else
Statement
EBNF
Expr
→ Term { ( + | - ) Term }
IfStatement →if ( Expr ) Statement
[ else Statement ]
Copyright © 2006 The McGraw-Hill Companies, Inc.
C-style EBNF
C-style EBNF lists alternatives on separate lines
and uses opt to signify optional parts. e.g.,
IfStatement:
if ( Expression ) Statement ElsePartopt
ElsePart:
else Statement
Copyright © 2006 The McGraw-Hill Companies, Inc.
EBNF to BNF
We can always rewrite an EBNF grammar as a BNF
grammar. e.g.,
A→x{y}z
can be rewritten:
A → x A' z
A' → ε | y A'
(Rewriting EBNF rules with ( ), [ ] is left as an exercise.)
Copyright © 2006 The McGraw-Hill Companies, Inc.
Syntax Diagram for
Expressions with Addition –
Figure 2.6
Copyright © 2006 The McGraw-Hill Companies, Inc.
Syntax diagrams
are another way
to describe
grammar rules.
Popularized
when they were
used to describe
Pascal grammar.
All Three are Equally Powerful
BNF is considered equivalent to context-free
grammars because it can express any rule in the
grammar
EBNF is no more (or less) powerful or expressive
than BNF. Its virtue is compactness.
Syntax diagrams are equally expressive.
Copyright © 2006 The McGraw-Hill Companies, Inc.
Summary & Preview
• Grammars
– BNF notation
– Grammars & parse trees
– Grammars, parse trees, associativity & precedence
– Ambiguity in grammars
• Next up:
– Clite syntax
– Lexical and concrete syntax
– Compilers & interpreters
– Abstract syntax
Copyright © 2006 The McGraw-Hill Companies, Inc.