Lecture Data Structures and Practise

Download Report

Transcript Lecture Data Structures and Practise

Lecture 3
Concepts of Programming
Languages
Arne Kutzner
Hanyang University / Seoul Korea
Topics
• Formal Methods of Describing Syntax /
BNF and Context free grammars
• Attribute Grammars
• Describing the Meanings of Programs /
Semantics
Concepts of Programming Languages
L2.2
Introduction
• Syntax: form and structure of the program
code
• Semantics: meaning of the program code
(meaning of expressions, statements,
control structures, …)
• Syntax and semantics provide a language’s
definition
– Users of a language definition
• Implementers
• Programmers (the users of the language)
Concepts of Programming Languages
L2.3
General Problem of Describing
Syntax: Terminology
• A sentence is a string of characters
over some alphabet
• A language is a set of sentences
• A lexeme is the lowest level syntactic
unit of a language (e.g., *, sum,
begin)
• A token is a category of lexemes (e.g.,
identifier)
Concepts of Programming Languages
L2.4
Formal Definition of Languages
• Generators (Grammars)
– A device that generates sentences of a language
– One can determine if the syntax of a particular
sentence is syntactically correct by comparing it to
the structure of the generator
• Recognizers (Parser)
– A recognition device reads input strings over the
alphabet of the language and decides whether the
input strings belong to the language
– Example: syntax analysis part of a compiler
• We will first look into the world of generators
and later we will study the recognizer
Concepts of Programming Languages
L2.5
BNF / Context-Free Grammars
• Context-Free Grammars (CFG)
– Developed by Noam Chomsky in the mid-1950s
– Language generators, meant to describe the
syntax of natural languages
– Define a class of languages called context-free
languages
• Backus-Naur Form (1959)
– Invented by John Backus to describe Algol 58
• BNF and context-free grammars represent
equal concepts
Concepts of Programming Languages
L2.6
BNF (CFG) Fundamentals
• In BNF, abstractions are used to
represent classes of syntactic
structures--they act like syntactic
variables (also called nonterminal
symbols, or just nonterminals)
• Terminals are lexemes or tokens
• A rule has a left-hand side (LHS),
which is a nonterminal, and a righthand side (RHS), which is a string of
terminals and/or nonterminals
Concepts of Programming Languages
L2.7
BNF / GCF grammar Rules
• An abstraction (or nonterminal symbol)
can have more than one RHS
<stmt>  <single_stmt>
| begin <stmt_list> end
Concepts of Programming Languages
L2.8
BNF (CFG) Fundamentals
• Nonterminals are often enclosed in angle
brackets
– Examples of BNF rules:
<ident_list> → identifier | identifier,
<ident_list>
<if_stmt> → if <logic_expr> then <stmt>
• Grammar: a finite non-empty set of rules
• A start symbol is a special element of the
nonterminals of a grammar
Concepts of Programming Languages
L2.9
CFG - Formal Definition
• A context-free grammar G is defined by the 4-tuple
G = (V, Σ, R, S) where
1. V is a finite set; each element v in V is called a nonterminal character or a variable. Each variable
represents a different type of phrase or clause in the
sentence.
2. Σ is a finite set of terminals, disjoint from V, which make
up the actual content of the sentence. The set of
terminals is the alphabet of the language defined by the
grammar G .
3. R is a finite relation from V to (V + Σ)*, where the asterisk
represents the Kleene star operation. The members of R,
are called the (rewrite) rules or productions of the
grammar. (also commonly symbolized by a P)
4. S is the start variable (or start symbol), used to
represent the whole sentence (or program). It must be an
element of V. Concepts of Programming Languages
L2.10
Example: Describing Lists
• Syntactic lists are described using recursion
<ident_list>  ident
| ident ',' <ident_list>
Concepts of Programming Languages
L2.11
Derivation
• A derivation is a repeated application
of rules, starting with the start symbol
and ending with a sentence (all terminal
symbols)
Concepts of Programming Languages
L2.12
An Example Grammar
<program>  <stmts>
<stmts>  <stmt> | <stmt> ';' <stmts>
<stmt>  <var> '=' <expr>
<var>  'a' | 'b' | 'c' | 'd'
<expr>  <term> '+' <term> | <term> '-' <term>
<term>  <var> | 'const'
Concepts of Programming Languages
L2.13
An Example Derivation
<program> => <stmts> => <stmt>
=> <var> = <expr>
=> a = <expr>
=> a = <term> + <term>
=> a = <var> + <term>
=> a = b + <term>
=> a = b + const
Concepts of Programming Languages
L2.14
Derivations – Some Notions
• Every string of symbols in a derivation is a
sentential form
• A sentence is a sentential form that has only
terminal symbols
• A leftmost derivation is one in which the
leftmost nonterminal in each sentential form
is the one that is expanded. Accordingly a
rightmost derivation for the rightmost
nonterminal.
– A derivation may be neither leftmost nor rightmost
Concepts of Programming Languages
L2.15
Parse Tree
• A parse tree represents a hierarchical
representation of a derivation by means
of a tree.
<program>
Example:
<stmts>
<stmt>
<var>
=
<expr>
a <term> +
<term>
<var>
const
b
Concepts of Programming Languages
L2.16
Ambiguity in Grammars
• A grammar is ambiguous if and only if
it generates a sentential form (the
language generated by the grammar
comprises at least one word) that has
two or more distinct parse trees
Concepts of Programming Languages
L2.17
Example for Ambiguous
Grammar
<expr>  <expr> <op> <expr> | const
<op>  / | word: const – const / const
<expr>
<expr>
<expr>
<op> <expr>
<expr> <op>
<expr> <op> <expr>
<expr> <op> <expr>
const
-
const
<expr>
/
const
const
Concepts of Programming Languages
-
const /
const
L2.18
Unambiguous Grammar
• Grammar on the slide before in
unambiguous form:
– We get some form of operator precedence
<expr>  <expr> - <term> | <term>
<term>  <term> / const | const
<expr>
<expr>
-
<term>
<term>
<term> /
const
const
const
Concepts of Programming Languages
L2.19
Associativity of Operators
• Operator associativity can also be indicated by
a grammar. Example:
<expr> -> <expr> + <expr> | const
(ambiguous)
<expr> -> <expr> + const | const
(unambiguous)
<expr>
<expr>
<expr>
<expr>
+
+
const
const
const
Concepts of Programming Languages
L2.20
The empty word ε
• Grammars can comprise rules of the
form: <A>  ε
• ε is the empty word; the above rule
indicates that the non-terminal <A> can
be deleted without any replacement
• We will use the empty world later in the
context of parsers.
Concepts of Programming Languages
L2.21
Extended BNF
• Optional parts are placed in brackets [ ]
<proc_call> -> ident [(<expr_list>)]
• Alternative parts of RHSs are placed
inside parentheses and separated via
vertical bars
<term> -> <term> (+|-) const
• Repetitions (0 or more) are placed inside
braces { }
<ident> -> letter {letter|digit}
Concepts of Programming Languages
L2.22
BNF and EBNF - Examples
• BNF
<expr>  <expr> + <term>
| <expr> - <term>
| <term>
<term>  <term> * <factor>
| <term> / <factor>
| <factor>
• EBNF
<expr>  <term> {(+ | -) <term>}
<term>  <factor> {(* | /) <factor>}
Concepts of Programming Languages
L2.23
Variations in EBNF
• Use of a colon instead of ->
• Use of opt for optional parts
• Use of oneof for choices
Concepts of Programming Languages
L2.24
Static Semantics
• Nothing to do with meaning
• Context-free grammars (CFGs) cannot
describe all of the syntax of programming
languages
• Categories of constructs that are “troublemaker”:
– Still Context-free, but cumbersome (e.g., types of
operands in expressions)
– Non-context-free (e.g., variables must be declared
before they are used)
Concepts of Programming Languages
L2.25
Attribute Grammars
• Attribute grammars (AGs) have
additions to CFGs to carry some
semantic info on parse tree nodes
• Primary value of AGs:
– Static semantics specification for static
semantics checks
Concepts of Programming Languages
L2.26
Attribute Grammars : Definition
• Def: An attribute grammar is a context-free
grammar with the following additions:
– For each grammar symbol x there is a set A(x) of
attribute values
– Each rule has a set of functions that define
certain attributes of the nonterminals in the rule
– Each rule has a (possibly empty) set of
predicates to check for attribute consistency
Concepts of Programming Languages
L2.27
Attribute Grammars: Definition
• Let X0  X1 ... Xn be a rule
• Functions of the form S(X0) = f(A(X1), ... ,
A(Xn)) define synthesized attributes
• Functions of the form I(Xj) = f(A(X0), ... ,
A(Xn)), for 1 <= j <= n, define inherited
attributes
• Initially, there are intrinsic attributes on the
leaves
Concepts of Programming Languages
L2.28
Attribute Grammars: An
• Syntax
Example
<assign> -> <var> = <expr>
<expr> -> <var> + <var> | <var>
<var> -> id
• Attributes
– actual_type: synthesized with the nonterminals <var> and <expr>
– expected_type: inherited with the nonterminal <expr>
Concepts of Programming Languages
L2.29
•
•
•
•
Attribute Grammar (Example)
Syntax rule: <assign>  <var> = <expr>
Semantic rules:
<expr>.expected_type  <var>.actual_type
Syntax rule: <var>  id
Semantic rule:
<var>.actual_type  lookup (id.type)
Syntax rule: <expr>  <var>
Semantic rules:
<expr>.actual_type  <var>.actual_type
Predicate:
<expr>.expected_type == <expr>.actual_type
Syntax rule: <expr>  <var>[1] + <var>[2]
Semantic rules:
<expr>.actual_type  <var>[1].actual_type
Predicate:
<var>[1].actual_type == <var>[2].actual_type
<expr>.expected_type == <expr>.actual_type
Concepts of Programming Languages
L2.30
Attribute Grammars (continued)
• How are attribute values computed?
– If all attributes were synthesized, the tree
could be decorated in bottom-up order.
– If all attributes were inherited, the tree
could be decorated in top-down order.
– In many cases, both kinds of attributes are
used, and it is some combination of topdown and bottom-up that must be used.
Concepts of Programming Languages
L2.31
Semantics
• There is no single generally, widely accepted
notation or formalism for describing program
semantics
• We have several general approaches:
– Operational Semantics
(Behavior oriented)
– Axiomatic Semantics
(Formal Logic, Predicate calculus oriented)
– Denotational Semantics
(Domain oriented)
Concepts of Programming Languages
L2.32
Operational Semantics
• Approach 1:
Describe the meaning of the language
constructs by using clean and unambiguous
natural language in mix with formal elements
Example:
Current ISO C++ standard (ISO/IEC 14882:2012(E) )
Concepts of Programming Languages
L2.33
Operational Semantics
(continued)
• Approach 2: Use some form of formal
method/machinery as description tool.
Examples:
– Vienna Definition Language (VDL)
http://en.wikipedia.org/wiki/Vienna_Development_Method
– Formal (state) machine approach. You get the meaning of
some code by executing it on a formal machine
• May require a translation scheme for getting formal machine
code out of language code.
– Use a mathematical formal system like a calculus as
replacement for the formal machine in the above approach
Concepts of Programming Languages
L2.34
Operational Semantics
(continued)
• Uses of operational semantics:
– Language definition itself for compiler
constructors as well as language users.
– Proofing the correctness of automatic
program transformations/optimizations.
• Later we will meet some form of
operational semantics ...
Concepts of Programming Languages
L2.35
Axiomatic Semantics
• Based on formal logic (predicate calculus)
• Original purpose:
Formal program verification
• Axioms or inference rules are defined for
each statement type in the language (to allow
transformations of logic expressions into
more formal logic expressions)
• The logic expressions are called assertions
Concepts of Programming Languages
L2.36
Logic Implications
• P implies Q (symbolized as P⇒ Q)
Meaning:
– If condition P is true, then condition Q must be true
– If condition P is false, the implication is true
independent of the value of Q
As logic table:
P
Q
P⇒ Q
T
T
F
F
T
F
T
F
T
F
T
T
Concepts of Programming Languages
L2.37
Logic Implications
• Example: b>10 ⇒ b>5
Proof: (Range check for b)
b in [11..infinite] b>10 is true and b>5 is true
b in [6..10]
b>10 is false and b>5 is true
b in [-infinite..5] b>10 is false and b>5 is false
• Intuition: With respect to the range of all
integers b>10 is “more often false”, then b>5.
So the condition b>10 it is less general and
more restrictive; We say it is a stronger
condition. Concepts of Programming Languages
L2.38
Logic Implications
• Weaker versus stronger conditions:
P⇒Q
stronger condition
weaker condition
• P is a stronger condition then Q and
Q is a weaker condition then P
Concepts of Programming Languages
L2.39
Axiomatic Semantics Form
{P} statement {Q}
Precondition
Postcondition
• Preconditions and postconditions are called assertions and
will be evaluated to either true or false
• The pair of a prepcondition and postcondition is called a
specification.
• A statement is correct with respect to its specification, if
pre-condition and post-condition become true
Concepts of Programming Languages
L2.40
Axiomatic Semantics Form
An example
– a = b + 1 {a > 1}
– One possible precondition: {b > 10}
– Weakest precondition:
{b > 0}
• A weakest precondition is the least
restrictive precondition that will
guarantee the postcondition
Concepts of Programming Languages
L2.41
Inference rules
• H1, H2, ..., Hn
H
• This notation can be interpreted as:
If H1, H2, ..., Hn have all been verified,
we may conclude that H is valid.
Concepts of Programming Languages
L2.42
Rule of consequence
{P}S {Q}, P'  P, Q  Q'
{P'} S {Q'}
Meaning: We can strengthen the precondition
and we can weaken the postcondition in order
to get a fresh “valid” specification
Example:
,b>30 ⇒ b>22, a>10 ⇒a>5
{b > 22} a = b / 2 - 1 {a > 10}
{b > 30} a = b / 2 - 1 {a > 5}
Concepts of Programming Languages
L2.43
Axiomatic Semantics: Axioms
• An axiom for assignment statements
(x = E): {Qx->E} x = E {Q}
Q
{b > 22} a = b / 2 - 1 {a > 10}
x
E
b / 2 – 1 > 10 => b > 22
• Delivers the weakest precondition for Q
Concepts of Programming Languages
L2.44
Sequences
• {P1} S1 {Q}, {Q} S2 {P2}
{P1} S1; S2 {P2}
{x < 2}
3 * x + 1 < 7 => x < 2
{x < 2} y = 3 * x + 1; {y < 7}
{y < 7} x = y + 3;
{x < 10}
y + 3 < 10 => y < 7
{x < 10}
Concepts of Programming Languages
L2.45
Selection Statements
• {B and P} S1 {Q}, {(not B) and P} S2 {Q}
{P} if B then S1 else S2 {Q}
{y > 1}
apply the rule of consequence
if x > 0 then
a = y - 1
else
a = y + 1
{a > 0}
and x > 0
{y > 1} a = y – 1 {a > 0}
{y > -1} a = y + 1 {a > 0}
and not (x > 0)
Concepts of Programming Languages
L2.46
Axiomatic Semantics: while
• Inference rule for logical pretest loops
{I and B} S {I}
{I} while B do S end while{I and (not B)}
where I is the loop invariant (the
inductive hypothesis)
• Axiomatic description for while loops:
{P} while B do S end {Q}
Concepts of Programming Languages
L2.47
Loop Invariant Requirements
•
The loop invariant must satisfy the following
requirements in order to be useful:
1. P => I (precondition implies invariant)
2. {I and B} S {I} (axiomatic semantics during loop
iteration / the condition B is true)
3. {I and (not B)} => Q (invariant at termination
implies postcondition)
4. loop terminates
Concepts of Programming Languages
L2.48
Finding a loop invariant …
• Example:
while y != x do y = y + 1 end {y = x}
• One possible technique: unroll the loop..
Zero iterations:
{y = x}
One iteration:
{y = x - 1} y = y + 1 {y = x}
Two iterations:
{y = x – 2} y = y + 1 {y = x - 1}
Three iterations:
{y = x – 3} y = y + 1 {y = x - 2}
y <= x
assumed loop invariant …
Concepts of Programming Languages
L2.49
Check whether the loop invariant
is a (weakest) precondition
•
We select P as {y <= x} and I as {y <= x} :
1. P = I directly implies P=>I
2. {I and B} S {I}
We have:
{y <= x and y != x} y = y + 1 {y <= x}
Verification: We get {y + 1 <= x} , which is equivalent to
{y < x}, equivalent to {y <= x and y != x}
3. {I and (not B)} => Q
We have:
{(y < = x) and not (y !=x)} => {y = x}
this is trivially true …
4. Loop terminates obviously
•
So, the loop invariant can be used as
precondition.
Further it represents a weakest precondition
Concepts of Programming Languages
L2.50
Loop invariant as weakest
precondition
• Unrolling and using the invariant as
precondition may not deliver a weakest
precondition
• Example:
while s > 1 do s = s / 2 end {s = 1}
As loop invariant we get by unrolling:
{s is a nonnegative power of 2}
However, a much broader correct
precondition is {s > 1}
Concepts of Programming Languages
L2.51
Evaluation of Axiomatic
Semantics
• Developing axioms or inference rules for all of
the statements in a language is difficult
• It is a good tool for correctness proofs, and an
excellent framework for reasoning about
programs, but it is not as useful for language
users and compiler writers
• Its usefulness in describing the meaning of a
programming language is limited for language
users or compiler writers
Concepts of Programming Languages
L2.52
Denotational Semantics
• Based on recursive function theory
• The most abstract semantics
description method
• Originally developed by Scott and
Strachey (1970)
Concepts of Programming Languages
L2.53
Denotational Semantics
• Characteristics of denotational semantics:
– Defined on the foundation of a mathematical
structure call (scott-)domain,
– Domain elements represent program outcomes as
single mathematical objects
– Maps sentences of the language to domain
elements
– Difficult point: Non-termination. A domain
comprises an entity ┴ (pronounced bot or bottom)
that represents all non-terminating sentences
(programs)
• The problem, whether an arbitrary program terminates or
not is not decidable 
Concepts of Programming Languages
L2.54
Example Semantics
• Operational semantics with denotational
elements for a small language with:
– Numbers,
– Assignments,
– Boolean evaluation,
– while construct
Concepts of Programming Languages
L2.55
Central Idea: Program State
• The state of a program is the values of all its
current variables
s = {<i1, v1>, <i2, v2>, …, <in, vn>}
• Let VARMAP be a function that, when given a
variable name and a state, returns the current
value of the variable
VARMAP(ij, s) = vj
Concepts of Programming Languages
L2.56
Decimal Numbers
<dec_num> 
Syntax
'0' | '1' | '2' | '3' | '4' | '5' |
'6' | '7' | '8' | '9' |
‘
<dec_num> ('0'
| '1' | '2' | '3' |
'4' | '5' | '6' | '7' |
'8' | '9')
Mdec('0') = 0,
Mdec (<dec_num>
Mdec (<dec_num>
…
Mdec (<dec_num>
Mdec ('1') = 1, …, Mdec ('9') = 9
'0') = 10 * Mdec (<dec_num>)
'1’) = 10 * Mdec (<dec_num>) + 1
v
'9') = 10 * Mdec (<dec_num>) + 9
Semantic interpreattion
Concepts of Programming Languages
L2.57
Expressions
• Map expressions onto Z  {error}
• We assume expressions are decimal
numbers, variables, or binary
expressions having one arithmetic
operator and two operands, each of
which can be an expression
Concepts of Programming Languages
L2.58
Expressions
Me(<expr>, s) =
case <expr> of
<dec_num> => Mdec(<dec_num>, s)
<var> =>
if VARMAP(<var>, s) == undef
then error
else VARMAP(<var>, s)
<binary_expr> =>
if (Me(<binary_expr>.<left_expr>, s) == undef
OR Me(<binary_expr>.<right_expr>, s) =
undef)
then error
else
if (<binary_expr>.<operator> == '+' then
Me(<binary_expr>.<left_expr>, s) +
Me(<binary_expr>.<right_expr>, s)
else Me(<binary_expr>.<left_expr>, s) *
Me(<binary_expr>.<right_expr>, s)
...
Concepts of Programming Languages
L2.59
Assignment Statements
• Maps state sets to state sets
Ma(x := E, s) =
if Me(E, s) == error
then error
else s’ =
{<i1’,v1’>,<i2’,v2’>,...,<in’,vn’>},
where for j = 1, 2, ..., n,
if ij == x
then vj’ = Me(E, s)
else vj’ = VARMAP(ij, s)
Concepts of Programming Languages
L2.60
Logical Pretest Loops
• Maps state sets to state sets
Ml(while B do L, s) =
if Mb(B, s) == undef
then error
else if Mb(B, s) == false
then s
else if Msl(L, s) == error
then error
else Ml(while B do L, Msl(L, s))
Recursive unfolding of the iteration
Concepts of Programming Languages
L2.61
Loop Meaning
• The meaning of the loop is the value of the program
variables after the statements in the loop have been
executed the prescribed number of times, assuming
there have been no errors
– In the case of non-termination of the loop the example
semantics will never deliver any value. 
• In essence, the loop has been converted from
iteration to recursion, where the recursive control is
mathematically defined by other recursive state
mapping functions
Concepts of Programming Languages
L2.62