Grammars – Chapter 2

Programming Languages
2nd edition
Tucker and Noonan
Chapter 2
A language that is simple to parse for the compiler is also simple to
parse for the human programmer.
N. Wirth
2.1 Grammars
2.1.1 Backus-Naur Form
2.1.2 Derivations
2.1.3 Parse Trees
2.1.4 Associativity and Precedence
2.1.5 Ambiguous Grammars
2.2 Extended BNF
2.3 Syntax of a Small Language: Clite
2.3.1 Lexical Syntax
2.3.2 Concrete Syntax
2.4 Compilers and Interpreters
2.5 Linking Syntax and Semantics
2.5.1 Abstract Syntax
2.5.2 Abstract Syntax Trees
2.5.3 Abstract Syntax of Clite
Translation/Execution – Compiler
Thinking about Syntax
The syntax of a programming language is a precise
description of all its structurally correct programs.
Grammar rules are a common technique for
describing language syntax precisely.
Precise syntax was first used with Algol 60.
Three levels:
– Lexical syntax
– Concrete syntax
– Abstract syntax
Levels of Syntax
Lexical syntax: describes the basic symbols of the
language (names, values, operators, etc.)
Concrete syntax: rules for writing expressions,
statements and programs.
Abstract syntax: describes an internal representation
of the program, emphasizes content over form.
The authors define Clite, a mini-language, to use as a
teaching tool in the study of syntax and semantics.
2.1 Grammars
A metalanguage is a language used to define other
cf metaknowledge
A grammar is a set of rules, written in a
metalanguage, and used to define the syntax of a
2.1.1 Backus-Naur Form (BNF)
Notation for describing a context-free grammar
(see Chomsky hierarchy)
Sometimes called Backus Normal Form
First used to define syntax of Algol 60
Now used to define syntax of most major languages
Elements of a Context-Free Grammar
Set of productions: P
terminal symbols: T
nonterminal symbols: N
start symbol: S  N
A production has the form
A 
A  N and ω is a string from N and T.
Example: Binary Digits
Consider the grammar:
binaryDigit  0
binaryDigit  1
or equivalently:
binaryDigit  0 | 1
Here, | and  are metacharacters (metasymbols)
2.1.2 Derivations
Consider the grammar:
Integer  Digit | Integer Digit
Digit  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
We can derive any unsigned integer, like 352, from
this grammar.
(Derivations can (1) produce all legal integers or (2)
show that a particular integer is correctly formed)
Derivation of 352 as an Integer
A 6-step process, begins with the start symbol
Step 1: Integer  Integer Digit
Replace a nonterminal by a RHS of one of its rules:
Step 1: Integer  Integer Digit
Step 2:
 Integer Digit Digit
Step 3:
 Digit Digit Digit
Step 4:
 3 Digit Digit
Step 5:
 3 5 Digit
Step 6:
Finished when there are only terminals on the RHS
A Different Derivation of 352
Integer  Integer Digit
 Integer 2
 Integer Digit 2
 Integer 5 2
 Digit 5 2
This is called a rightmost derivation, since at each
step the rightmost nonterminal is replaced.
(The first one was a leftmost derivation.)
Notation for Derivations
Integer * 352
Means that 352 can be derived in a finite number of steps
using the grammar for Integer.
352  L(G)
Means that 352 is a member of the language defined by
grammar G.
Definition: the language L defined by a BNF grammar
G is the set of all terminal strings that can be derived
from the start symbol.
2.1.3 Parse Trees
A parse tree is a graphical representation of a
The root node of the tree is the start symbol.
Each internal node of the tree corresponds to a non-terminal
The child(ren) of a node represent a right-hand side of a
production for which the node is the left-hand side.
Each leaf node represents a terminal symbol of the derived
string, reading from left to right.
E.g., The step Integer  Integer Digit
appears in the parse tree as:
Parse Tree for 352
as an Integer
Figure 2.1
Arithmetic Expression Grammar
The following grammar defines the language of
arithmetic expressions with 1-digit integers, addition,
and subtraction.
Expr  Expr + Term | Expr – Term | Term
Term  0 | ... | 9 | ( Expr )
Parse of the String 5-4+3
Figure 2.2
2.1.4 Associativity and Precedence
Grammars define associativity and precedence among
the operators in an expression.
Precedence: which operator is evaluated first; e.g., in the
expression “a + b / c”
Associativity: evaluation order for equal precedence
(adjacent) operators; e.g., in the expression “a - b + c”
Grammar G1:
Consider the more interesting grammar G1:
Expr + Term | Expr – Term | Term
Term * Factor | Term / Factor |
Term % Factor | Factor
Factor  Primary ** Factor | Primary
Primary  0 | ... | 9 | ( Expr )
Expr 
Term 
Parse of 4**2**3+5*6+7
for Grammar G1
Figure 2.3
Expr  Expr + Term
|Expr – Term
| Term
Term  Term * Factor
| Term / Factor
| Term % Factor
| Factor
Factor  Primary ** Factor
| Primary
Primary  0 | ... | 9
|( Expr )
Associativity and Precedence for
Grammar G1 Table 2.1
* / %
+ The structure of the parse tree shows operator
precedence & associativity: Operators lower in the tree
are evaluated first.
An operation can’t be performed until its operands are
Precedence & Associativity in Grammars
An operator’s precedence is determined by the length
of the shortest derivation from the start symbol to
the operator (see Figure 2.3)
Left- or right- associativity is determined by left- or
right- recursion.
compare the operators ** and + in Figure 2.3
2.1.5 Ambiguous Grammars
A grammar is ambiguous if one of its strings has two or
more different parse trees.
C, C++, and Java have a large number of
– operators and
– precedence levels
Instead of using a large grammar, we can:
– Write a smaller ambiguous grammar, and
– Give separate precedence and associativity rules (e.g.,
Table 2.1)
An Ambiguous Expression Grammar G2
Expr → Expr Op Expr | ( Expr ) | Integer
Op → + | - | * | / | % | **
– G2 is equivalent to G1. i.e., its language is the same.
– G2 has fewer productions and nonterminals than G1.
– However, G2 is ambiguous.
Ambiguous Parse of 5-4+3
Using Grammar G2
Figure 2.4
The Dangling Else
IfStatement → if ( Expression ) Statement |
if ( Expression ) Statement else
Statement → Assignment | IfStatement | Block
Suppose one of the statements was another If?
Copyright © 2006 The McGraw-Hill Companies, Inc.
With which ‘if’ does the following ‘else’ associate ?
if (x < 0)
if (y < 0) y = y - 1;
else y = 0;
Answer: either one!
The Dangling Else Ambiguity
Figure 2.5
Solving The Dangling Else Ambiguity
Algol 60, C, C++: associate each else with
closest if; use {} or begin…end to override.
Algol 68, Modula, Ada: use explicit delimiter
to end every conditional (e.g., if…fi)
if (x < 0)
if (y<0)
y = y - 1;
y = x / y;
if (x < 0)
if (y<0)
y = y - 1;
y = x / y;
Solving The Dangling Else Ambiguity
Java: rewrite the grammar to limit what can appear in a
IfThenStatement → if ( Expression ) Statement
IfThenElseStatement → if ( Expression ) StatementNoShortIf
else Statement
The category StatementNoShortIf includes all
statement types except IfThenStatement.
2.2 Extended BNF (EBNF)
BNF: recursion to represent iteration
EBNF: additional metacharacters represent iteration
– { } braces: show a series of zero or more occurrences
– ( ) parens: pick exactly one from the enclosed list
– [ ] brackets: pick zero or one from the enclosed list
How are metacharacters distinguished from terminal
Compare BNF/EBNF Examples
→ Term | Exp + Term | Exp - Term
IfStatement →if ( Exp ) Statement |
if ( Exp ) Statement else
→ Term { ( + | - ) Term }
IfStatement →if ( Expr ) Statement
[ else Statement ]
C-style EBNF
C-style EBNF lists alternatives on separate lines
and uses opt to signify optional parts. e.g.,
if ( Expression ) Statement ElsePartopt
else Statement
We can always rewrite an EBNF grammar as a BNF
grammar. e.g.,
can be rewritten:
A → x A' z
A' → ε | y A'
(Rewriting EBNF rules with ( ), [ ] is left as an exercise.)
Syntax Diagram for
Expressions with Addition –
Figure 2.6
Syntax diagrams
are another way
to describe
grammar rules.
when they were
used to describe
Pascal grammar.
All Three are Equally Powerful
BNF is considered equivalent to context-free
grammars because it can express any rule in the
EBNF is no more (or less) powerful or expressive
than BNF. Its virtue is compactness.
Syntax diagrams are equally expressive.
Summary & Preview
• Grammars
– BNF notation
– Grammars & parse trees
– Grammars, parse trees, associativity & precedence
– Ambiguity in grammars
• Next up:
– Clite syntax
– Lexical and concrete syntax
– Compilers & interpreters
– Abstract syntax
