Transcript Chapter 1

Chapter 3: Describing Syntax and
Semantics
•
•
•
•
•
Introduction
Terminology
Formal Methods of Describing Syntax
Attribute Grammars – Static Semantics
Describing the Meanings of Programs:
Dynamic Semantics
3-1
Introduction
• Syntax: the form or structure of the expressions,
statements, and program units, e.g., DD/DD/
DDDD
– lexical specification
– grammar
• Semantics: the meaning of the expressions,
statements, and program units, e.g., 先月後日
• Syntax and semantics provide a language’s
definition
– Users of a language definition
• Other language designers
• Implementers
• Programmers (the users of the language)
3-2
Terminology
• A sentence is a string of characters over
some alphabet
• A language is a set of sentences
• A lexeme is the lowest level syntactic unit
of a language (e.g., *, sum, x), given by the
lexical specification
• A token is a category of lexemes (e.g.,
identifier)
3-3
Formal Methods of Describing
Syntax
• Context-Free Grammars
– Developed by Noam Chomsky in the mid-1950s
– meant to describe the syntax of natural languages
– Define a class of languages called context-free
languages
• Backus-Naur Form (1959)
– Invented by John Backus to describe Algol 58
– BNF is equivalent to context-free grammars
– The Most widely known metalanguage, which is used
to describe another language
• Extended BNF
– Improves readability and writability of BNF
3-4
Four parts of a Context-Free Grammar
• a set of terminals:
lexemes and tokens, the atomic symbols in the language,
• a set of nonterminals:
abstractions, used to represent constructs in the language;
they act like syntactic variables
• a set of rules (or called productions):
– identifying the components of a construct
– A rule has a nonterminal as the left-hand side (LHS), and
the right-hand side (RHS) may consist of terminal and
nonterminal symbols
– Examples of a BNF rule:
<if_stmt> → if <logic_expr> then <stmt>
• A nonterminal chosen as the starting nonterminal.
3-5
BNF Rules
• Nonterminals are enclosed between symbols
“ < ” and “ > ”.
• An abstraction (or nonterminal symbol) can have
more than one RHS. Each alternative separated
by “|” is a distinct rule.
• “ ” is read as “can be”. “|” is read as “or”.
• Example:
<stmt>  <single_stmt>
| begin <stmt_list> end
• It sometimes uses subscripts, like [1], on the
right side to distinguish between occurrences of
a construct.
3-6
Describing Lists
• Syntactic lists are described using recursion
<ident_list>  ident
| ident, <ident_list>
• Example: BNF rules for real numbers
<real-number>  <integer-part > . <fraction>
<integer-part>  <digit> | <integer-part><digit>
<fraction>  <digit> |<digit><fraction>
<digit>  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
3-7
Derivation
• A derivation is a repeated application of rules,
starting with the start symbol and ending with a
sentence (all terminal symbols)
• Every string of symbols in the derivation is a
sentential form, which may consist of
nonterminals.
• A leftmost derivation is one in which the
leftmost nonterminal in each sentential form is
the one that is expanded
• A derivation may be neither leftmost nor
rightmost
3-8
Example
• An Example Grammar
<program>  <stmts>
<stmts>  <stmt> | <stmt> ; <stmts>
<stmt>  <var> = <expr>
<var>  a | b | c | d
<expr>  <term> + <term> | <term> - <term>
<term>  <var> | const
• Example of deviation:
<program> => <stmts> => <stmt>
=> <var> = <expr>
=> a = <expr>
=> a = <term> + <term>
=> a = <var> + <term>
=> a = b + <term>
=> a = b + const
3-9
Parse Tree
• A hierarchical representation of a derivation
<program>
<program>
=> <stmts>
<stmts>
=> <stmt>
<stmt>
=> <var> = <expr>
=> a =<expr>
<var> = <expr>
=> a = <term> + <term>
a <term> + <term>
=> a = <var> + <term>
=> a = b + <term>
<var>
const
=> a = b + const
b
3-10
Parse Tree
(cont)
• Each leaf is labeled with a terminal.
• Each non leaf node is labeled with a nonterminal.
• The label of a non leaf node is the left side of
some rule, and the labels of the children of the
node, from left to right, form the right side of that
production.
• The root is labeled with the starting nonterminal.
• A parse tree generates the sentence formed by
reading the terminals at its leaves from left to
right.
• The construction of a parse tree is called parsing.
3-11
Parser
• top - down parser :
from the root of a parse tree toward the
leaves;
• bottom - up parser:
from leaves of a parse tree toward the root;
3-12
Ambiguity in Grammars
• A grammar is ambiguous if and only if it
generates a sentential form that has two or
more distinct parse trees
• Ambiguity can be resolved by establishing
conventions.
• Example: dangling-else ambiguity
Consider the following grammar:
<S>  if <E> then <S>
<S>  if <E> then <S> else <S>
Consider the sentential form:
if E1 then if E2 then S1 else S2
3-13
Dangling-else ambiguity
(a) corresponds to: if E1 then (if E2 then S1 else S2)
(b) corresponds to: if E1 then (if E2 then S1) else S2
It is resolved by matching an else with the nearest
unmatched if.
3-14
Expression notations
• prefix notation:
op E1 E2, e.g., * + 20 30 60 = * 50 60 = 3000;
easy to decode during a left-to-right scan of an
expression.
• postfix notation:
E1 E2 op, e.g., 20 30 + 60 *=50 60 *=3000;
can be mechanically evaluated with a stack data
structure.
3-15
Expression notations
(cont)
• infix notation: E1 op E2
– familiar and easy to read;
– without rules for specifying the relative “precedence” of
operators, parentheses would be needed in expressions
to make explicit the operands of an infix operator,
e.g., a+b*c  a+(b*c).
• An operator is “ left associate” if subexpressions
containing multiple occurrences of the operator
are grouped from left to right,
– e.g. 4-2-1  (4-2)-1
• An operator is “ right associate” if subexpressions
containing multiple occurrences of the operator
are grouped from right to left,
– e.g. x=y=3

x=(y=3)
3-16
An Ambiguous Expression Grammar
<expr>  <expr> <op> <expr>
<op>  / | -
|
const
<expr>
<expr>
<expr>
<op> <expr>
<expr> <op>
<expr> <op> <expr>
const
-
const
<expr>
<expr> <op> <expr>
/
const
const
-
const /
const
3-17
An Unambiguous Expression Grammar
• If we use the parse tree to indicate
precedence levels of the operators, we
cannot have ambiguity
<expr>  <expr> - <term> | <term>
<term>  <term> / const| const
<expr>
<expr>
-
<term>
<term>
<term> /
const
const
const
3-18
Associativity of Operators
• Operator associativity can also be indicated by a
grammar
• <expr> -> <expr> + <expr> |
• <expr> -> <expr> + const |
const
const
(ambiguous)
(unambiguous)
<expr>
<expr>
<expr>
<expr>
+
+
const
const
const
3-19
Associativity of Operators
(cont)
(1) <L> -> <L> + number
(2) <R> -> number + <R>
| <L> - number
| number - <R>
| number
| number
Although both grammars are unambiguous, (1) is more
suitable for left associate operators, because its parse tree
grows down and to the left, which is close to the semantics.
L
L
L
number 4
-
-
R
number 1
number 2
number 4
-
R
-
number 2
R
number 1
3-20
Handling Associativity and Precedence
• The syntax of expressions in a language can be
characterized by a table giving the associativity
and precedence of operators.
• Suppose we have a table, where all operators on
the same line have the same associativity and
precedence. (see the next page)
• A grammar for expressions can be designed by
choosing a nonterminal for each precedence level,
and an additional nonterminal for the smallest
subexpression (factors).
3-21
Handling Associativity and Precedence
(cont)
• Example of three-level operators:
[A]
=
right associative
[E]
+ left associative
[T]
*/
left associative
[F]
factors
• The grammar is:
<A> -> <E> = <A> | <E>
<E> -> <E> + <T> | <E> – <T> | <T>
<T> -> <T > * <F> | <T> / <F> | <F>
<F> -> (< E> ) | name | number
3-22
Extended BNF
• Optional parts are placed in brackets [ ]
<proc_call> -> ident [(<expr_list>)]
• Alternative parts of RHSs are placed
inside parentheses and separated via
vertical bars
<term> → <term> (+|-) const
• Repetitions (0 or more) are placed inside
braces { }
<ident> → letter {letter|digit}
3-23
BNF and EBNF
• BNF
<expr>  <expr> + <term>
| <expr> - <term>
| <term>
<term>  <term> * <factor>
| <term> / <factor>
| <factor>
• EBNF
<expr>  <term> {(+ | -) <term>}
<term>  <factor> {(* | /) <factor>}
3-24
Attribute Grammars
• Context-free grammars (CFGs) cannot describe all of
the syntax of programming languages. For example,
all variables must be declared before they are
referenced.
• attribute grammars (AGs): additions to CFGs to carry
some static semantic information along parse trees
• Static semantics are related to the legal form of a
program, not directly related to the meaning of
programs during execution. Many static semantic
rules state the type constraints of a language.
• Primary value of attribute grammars (AGs)
– Static semantics specification
– Compiler design (static semantics checking)
3-25
Attribute Grammars : Definition
• An attribute grammar is a context-free
grammar with the following additions:
– For each grammar symbol x there is a set A(x)
of attribute values
– Each rule has a set of semantic functions that
define certain attributes of the nonterminals in
the rule
– Each rule has a (possibly empty) set of
predicates to check for attribute consistency
3-26
Attribute Grammars: Definition
(cont)
• Let X0  X1 ... Xn be a rule
• Functions of the form S(X0) = f(A(X1), ... , A(Xn))
define synthesized attributes, which are used to
pass semantic information up a parse tree.
• Functions of the form I(Xj) = f(A(X0), ... , A(Xn)), for
1<= j <= n, define inherited attributes, which pass
semantic information down and across a tree.
• Initially, there are intrinsic attributes on the leaves,
whose values are determined outside the parse
tree. For example, the types of variables come
from the symbol table.
3-27
Attribute Grammars: An Example
• Syntax
<assign> -> <var> = <expr>
<expr> -> <var> + <var> | <var>
<var> -> A | B | C
• Attributes
– actual_type: synthesized for <var> and
<expr>
– expected_type: inherited for <expr>
• We assume the variables can be one of two types:
int or real.
• In the next page, the look-up function looks up a
given variable name in the symbol table and
returns the type.
3-28
Example of an Attribute Grammar
(cont)
1. Syntax rule: <assign>  <var> = <expr>
Semantic rules: <expr>.expected_type 
<var>.actual_type
2. Syntax rule: <expr>  <var>[2] + <var>[3]
Semantic rules:
<expr>.actual_type  if (<var>[2].actual_type ==
int) and (<var>[3].actual_type == int) then int else
real end if
Predicate:
<expr>.actual_type == <expr>.expected_type
3. Syntax rule: <expr>  <var>
Semantic rules:
<expr>.actual_type  <var>.actual_type
Predicate:
<expr>.actual_type == <expr>.expected_type
4. Syntax rule: <var>  A | B | C
Semantic rule:
<var>.actual_type  lookup (<var>.string)
3-29
Computing Attribute Values
• How are attribute values computed?
– If all attributes were inherited, the tree could be
decorated in top-down order.
– If all attributes were synthesized, the tree could
be decorated in bottom-up order.
– In many cases, both kinds of attributes are used,
and it is some combination of top-down and
bottom-up that must be used.
3-30
Example of Computing Attribute Values
• For the sentence: A = A + B
1. <var>.actual_type  look-up(A) (Rule4)
2. <expr>.expected_type  <var>.actual_type
(Rule1)
3. <var>[2].actual_type  lookup (A) (Rule4)
<var>[3].actual_type  lookup (B) (Rule4)
4. <expr>.actual_type  either int or real
(Rule2)
5. <expr>.expected_type ==
<expr>.actual_type is either TRUE or
FALSE (Rule2)
3-31
Example of Computing Attribute Values
(cont)
3-32
Semantics
• There is no single widely acceptable
notation or formalism for describing
semantics
• Axiomatic Semantics
– Based on formal logic (predicate calculus)
– Axioms or inference rules are defined for each
statement type in the language, to state the
meaning of statements and programs.
– The main purpose is for formal program
verification. We will talk about this in Chapter 8.
3-33