lectures from week 1, 2, 3
Download
Report
Transcript lectures from week 1, 2, 3
Describing Syntax
CS 3360
Spring 2012
Sec 3.1-3.4
Adapted from Addison Wesley’s lecture notes (Copyright © 2004 Pearson Addison Wesley)
CS 3360
1
Outline
Introduction
Formal description of syntax
Backus-Naur Form (BNF)
Attribute grammars (probably next time )
CS 3360
2
Introduction
Who must use language definitions?
Implementers
Programmers
(the users of the language)
Syntax - the form or structure of the expressions,
statements, and program units
Semantics - the meaning of the expressions,
statements, and program units
CS 3360
3
Introduction (cont.)
Example
Syntax
of Java while statement
while (<boolean-expr>) <statement>
Semantics?
CS 3360
4
Describing Syntax – Vocabulary
A sentence is a string of characters over
some alphabet
A language is a set of sentences
A lexeme is the lowest level syntactic unit
of a language (e.g., *, sum, while)
A token is a category of lexemes (e.g.,
identifier)
CS 3360
5
Example
index = 2 * count + 17;
Lexemes
index
=
2
*
count
+
17
;
CS 3360
Tokens
identifier
equal_sign
int_literal
mult_op
identifier
plus_op
int_literal
semicolon
6
Describing Syntax
Formal approaches to describing syntax:
Recognizers (once you have code)
Can tell whether a given string is in a language or
not
Used in compilers, and called a parser
Generators (in order to build code)
Generate the sentences of a language
Used to describe the syntax of a language
CS 3360
7
Formal Methods of Describing
Syntax
Context-Free Grammars (CFG – see automata
course)
Developed
by Noam Chomsky in the mid-
1950’s
Language generators, meant to describe the
syntax of natural languages
Define a class of languages called contextfree languages
CS 3360
8
Formal Methods of Describing
Syntax
Backus-Naur Form
Invented
by John Backus to describe Algol 58
Extended by Peter Naur to describe Algol 60
BNF is equivalent to context-free grammars
A metalanguage is a language used to describe
another language.
In BNF, abstractions are used to represent classes of
syntactic structures--they act like syntactic variables
(also called nonterminal symbols)
CS 3360
9
Backus-Naur Form
<while_stmt> while ( <logic_expr> ) <stmt>
This is a rule (also called a production
rule); it describes the structure of a while
statement
CS 3360
10
Backus-Naur Form
A rule has a left-hand side (LHS) and a righthand side (RHS), and consists of terminal and
non-terminal symbols
A grammar is a finite non-empty set of rules
An abstraction (or non-terminal symbol) can
have more than one RHS
<stmt> <single_stmt>
| { <stmt_list> }
CS 3360
11
Backus-Naur Form
Syntactic lists are described using recursion
<ident_list> ident
| ident , <ident_list>
Example sentences:
ident
ident , ident
ident , ident, ident
CS 3360
12
Example
A grammar for small language:
<program> <stmts>
<stmts> <stmt> | <stmt> ; <stmts>
<stmt> <var> = <expr>
<var> a | b | c | d
<expr> <term> + <term> | <term> - <term>
<term> <var> | 5
Sample program
a = b + 5
CS 3360
13
Exercise
Define a grammar to generate all sentences of the form:
subject verb object .
where subject is “i” or “we”, and verb is “love” or “like”,
and object is “exercises” or “programming”.
CS 3360
14
Exercise
Define the syntax of Java Boolean expressions
consisting of:
Constants: false and true
Operators: !, &&, and ||
CS 3360
15
Derivation
A derivation is a repeated application of rules,
starting with the start symbol and ending with a
sentence (all terminal symbols)
Example:
<ident_list> ident | ident , <ident_list>
<ident_list> => ident , <ident_list>
=> ident , ident , <ident_list>
=> ident, ident , ident
CS 3360
16
More Example
<program> <stmts>
<stmts> <stmt> | <stmt> ; <stmts>
<stmt> <var> = <expr>
<var> a | b | c | d
<expr> <term> + <term>
| <term> - <term>
<term> <var> | 5
a = b + 5
<program> => <stmts>
=> <stmt>
=> <var> = <expr>
=> a = <expr>
=> a = <term> + <term>
=> a = <var> + <term>
=> a = b + <term>
=> a = b + 5
CS 3360
17
Derivation
Every string of symbols in the derivation is a
sentential form
A sentence is a sentential form that has only
terminal symbols
A leftmost derivation is one in which the leftmost
nonterminal in each sentential form is the one
that is expanded
A derivation may be neither leftmost nor
rightmost
CS 3360
18
Exercise
<program> <stmts>
<stmts> <stmt> | <stmt> ; <stmts>
<stmt> <var> = <expr>
<var> a | b | c | d
<expr> <term> + <term>
| <term> - <term>
<term> <var> | 5
Derive
a = b + 5
by using a rightmost derivation.
CS 3360
19
Parse Tree
A hierarchical representation of a
derivation
<program>
<stmts>
<stmt>
<var>
=
<expr>
a <term> +
<term>
<var>
5
b
CS 3360
20
Ambiguity of Grammars
A grammar is ambiguous if and only if it
generates a sentential form that has two or
more distinct parse trees.
CS 3360
21
An Ambiguous Expression Grammar
<expr> <expr> <op> <expr> | 5
<op> / | <expr>
<expr>
<expr>
<op> <expr>
<expr> <op>
<expr> <op> <expr>
5
CS 3360
-
5
<expr>
<expr> <op> <expr>
/
5
5
-
5
/
5
22
An Unambiguous Expression
Grammar
If we use the parse tree to indicate precedence levels of
the operators, we cannot have ambiguity
<expr> <expr> - <term> | <term>
<term> <term> / 5 | 5
<expr>
<expr>
<term>
5
CS 3360
-
<term>
<term>
/
5
5
23
Exercise
Prove or disprove the ambiguity of the following grammar
<stmt> -> <if-stmt>
<if-stmt> -> if <expr> then <stmt>
| if <expr> then <stmt> else <stmt>
CS 3360
24
Operator Precedence
Derivation:
<expr> <expr> - <term> | <term>
<term> <term> / 5 | 5
<expr> => <expr> - <term>
=> <term> - <term>
=> 5 - <term>
=> 5 - <term> / 5
=> 5 - 5 / 5
CS 3360
25
Operator Associativity
Can we describe operator associativity
correctly?
A=A+B + C
(A + B) + C or A + (B + C)?
Does it matter?
CS 3360
26
Operator Associativity
Operator associativity can also be indicated by
a grammar
<expr> -> <expr> + <expr> | 5 (ambiguous)
<expr> -> <expr> + 5 | 5 (unambiguous)
<expr>
<expr>
<expr>
<expr>
+
+
5
5
5
CS 3360
27
Left vs. Right Recursion
A rule is left recursive if its LHS also appears at
the beginning (left end) of its RHS.
A rule is right recursive if its LHS also appears
at the right end of its RHS.
<factor> -> <expr> ** <factor> | <expr>
<expr> -> c
Example: c ** c ** c interpreted as c ** (c ** c)
CS 3360
28
Exercise
Define a BNF grammar for expressions consisting of +, *,
and ** (exponential). The operator ** has precedence
over *, and * has precedence over +. Both + and * are
left associative while ** is right associative.
Using the above grammar, draw a parse tree for the
sentence:
7 + 6 + 5 * 4 * 3 ** 2 ** 1
Exercise to do in groups at the end of lecture
CS 3360
29
Extended BNF (EBNF)
Extended BNF (just abbreviations):
Optional
parts are placed in brackets ([ ])
<meth_call> -> ident ( [<expr_list>] )
Put
alternative parts of RHSs in parentheses and
separate them with vertical bars
<term> -> <term> (+ | -) const
Put
repetitions (0 or more) in braces ({ })
<ident> -> letter {letter | digit}
CS 3360
30
Example
BNF:
<expr> <expr> + <term>
| <expr> - <term>
| <term>
<term> <term> * <factor>
| <term> / <factor>
| <factor>
EBNF:
<expr> <term> {(+ | -) <term>}
<term> <factor> {(* | /) <factor>}
CS 3360
31
Exercise / Homework
Write BNF rules for the following EBNF rules:
1. <meth_call> -> <ident> “(” [<expr_list>] “)”
2. <term> -> <term> (+ | -) const
3. <ident> -> letter {letter | digit}
Due on Tuesday at the start of the session!
CS 3360
32
Outline
Introduction
Describing syntax formally
Backus-Naur Form (BNF)
Attribute grammars
CS 3360
33
Attribute Grammars
CFGs cannot describe all of the syntax of
programming languages
Additions to CFGs to carry some semantic
info along through parse trees
Primary value of attribute grammars:
Static
semantics specification
Compiler design (static semantics checking)
CS 3360
34
Basic Idea
Add attributes, attribute computation functions, and
predicates to CFGs
Attributes
Attribute computation functions
Associated with grammar symbols
Can have values assigned to them
Associated with grammar rules
Specify how to compute attribute values
Are often called semantic functions
Predicate functions
CS 3360
Associated with grammar rules
State some of the syntax and static semantic rules of the
language
35
Example
BNF
<meth_def> -> meth <meth_name> <meth_body> end <meth_name>
<meth_name> -> <identifier>
<meth_body> -> …
AG
1. Syntax rule: <meth_def> -> meth <meth_name>[1]
<meth_body>
end <meth_name>[2]
Predicate: <meth_name>[1].string == <meth_name>[2].string
2. Syntax rule: <meth_name> -> <identifier>
Semantic rule: <meth_name>.string <- <identifier>.string
CS 3360
36
Attribute Grammars Defined
An attribute grammar is a CFG with the following
additions:
A
set of attributes A(X) for each grammar symbol X
A(X) consists of two disjoint sets S(X) and I(X)
S(X): synthesized attributes
I(X): inherited attributes
Each
rule has a set of functions that define certain
attributes of the non-terminals in the rule
Each rule has a (possibly empty) set of predicates to
check for attribute consistency
CS 3360
37
Attribute Functions
Let X0 X1 ... Xn be a rule
Functions of the form S(X0) = f(A(X1), ... , A(Xn))
define synthesized attributes
Functions of the form I(Xj) = f(A(X0), ... , A(Xn)),
for 1 <= j <= n, define inherited attributes.
Often
of the form: I(Xj) = f(A(X0), ... , A(Xj-1))
Initially, there are intrinsic attributes on the
leaves.
Intrinsic
attributes are synthesized attributes whose
value are determined outside the parse tree.
CS 3360
38
Example - Type Checking Rules
BNF
<assign> -> <var> = <expr>
<expr> -> <var> | <var> + <var>
<var> -> A | B | C
Rule
A variable is either int or float.
If the two operands of + has the same type, the type of expression is that of the
operands; otherwise, it is float.
The type of the left side of assignment must match the type of the right side.
Attributes
actual_type: synthesized for <var> and <expr>
expected_type: inherited for <expr>
string: intrinsic for <var>
CS 3360
39
Example – Attribute Grammar
1. Syntax rule: <assign> -> <var> = <expr>
Semantic rule: <expr>.expected_type <- <var>.actual_type
2. Syntax rule: <expr> -> <var>[1] + <var>[2]
Semantic rule: <expr>.actual_type <(<var>[1].actual_type == int
&& <var>[2].actual_type == int) ? int : float
Predicate: <expr>.actual_type == <expr>.expected_type
3. Syntax rule: <expr> -> <var>
Semantic rule: <expr>.actual_type <- <var>.actual_type
Predicate: <expr>.actual_type == <expr>.expected_type
4. Syntax rule: <var> -> A | B | C
Semantic rule: <var>.actual_type <- lookup(<var>.string)
CS 3360
40
Example – Parse Tree
A=A+B
<assign>
<expr>
<var>
A
CS 3360
<var>[1]
=
A
<var>[2]
+
B
41
Example – Flow of Attributes
A=A+B
<assign>
<expr>.expected_type <- <var>.actual_type
<expr>.actual_type <- (<var>[1].actual_type == int
&& <var>[2].actual_type == int) ? int : float
expected_type <expr> actual_type
<var>
<var>[1]
CS 3360
actual_type
actual_type
actual_type
A
<var>[2]
=
A
+
B
42
Example – Calculating Attributes
<expr>.expected_type <- <var>.actual_type
<expr>.actual_type <- (<var>[1].actual_type == int
&& <var>[2].actual_type == int) ? int : float
A=A+B
<assign>
expected_type <expr> actual_type
float
float
<var>
<var>[1]
float
float
float
CS 3360
actual_type
int
actual_type
actual_type
A
<var>[2]
=
A
float
+
B
int
43
Example – Calculating Attributes
A=A+B
<assign>
<expr>.expected_type <- <var>.actual_type
<expr>.actual_type <- (<var>[1].actual_type == int
&& <var>[2].actual_type == int) ? int : float
expected_type <expr> actual_type
float
int
<var>
<var>[1]
int
int
int
CS 3360
actual_type
float
actual_type
actual_type
A
<var>[2]
=
A
int
+
B
float
44
Attribute Grammars
How are attribute values computed?
If
all attributes were inherited, the tree could
be decorated in top-down order.
If all attributes were synthesized, the tree
could be decorated in bottom-up order.
In many cases, both kinds of attributes are
used, and it is some combination of top-down
and bottom-up that must be used.
CS 3360
45
Group Exercise: homework due
Tuesday February, 7 at the start of class
BNF
<cond_expr> -> <expr> ? <expr> : <expr>
<expr> -> <var> | <expr> + <expr>
<var> -> id
Rule
id's type can be bool, int, or float.
Operands of + must be numeric and of the same type.
The type of + is the type of its operands.
The first operand of ?: must be of bool and the second and third must
be of the same type.
The type of ?: is the type of its second and third operands.
Given the above BNF and rule:
1. Define an attribute grammar
2. Draw a decorated parse tree for “id ? id : id + id” assuming that the
first id is of type bool and the rest are of type int.
CS 3360
46