Transcript Powerpoint

EBNF:
A Notation for Describing Syntax
n
n
n
n
n
n
Languages and Syntax
EBNF Descriptions and Rules
More Examples of EBNF
Syntax and Semantics
EBNF Description of Sets
Advanced EBNF (recursion)
Quote of the Day
“When teaching a rapidly changing technology,
perspective is more important than content.”
15-200
2
Why Study EBNF
EBNF is a notation for formally describing syntax:
how to write symbols in a language. We will use EBNF
to describe the syntax of Java. But there is a more
compelling reason to begin our study of programming
with EBNF: it is a microcosm of programming. There
is a strong similarity between the control forms of
EBNF and the control structures of Java: sequence,
decision, repetition, recursion, and the ability to name
descriptions. There is also a strong similarity between
the process of writing EBNF descriptions and writing
Java programs. Finally studying EBNF introduces a
level of formality that will continue throughout the
semester.
15-200
3
Languages and Syntax
n
n
EBNF: Extended Backus-Naur Form
John Backus (IBM) invented a notation called BNF

n
He used it to describe FORTRAN’s syntax (1956)
Peter Naur popularized BNF

He used it to describe ALGOL's syntax (1958)

n
Niklaus Wirth used and Extended form of BNF (called EBNF) to
describe the syntax of his Pascal programming language (1976)
Noam Chomsky (MIT linguist and philospher)



n
15-200
4
Invented a Hierarchy of Notations for Natural Languages
4 levels: 0-3 with 0 being the most powerful
BNF is at level 2; programming languages are at level 0
Formal Languages and Computability

is the study of different families of notations and their power
EBNF Descriptions and Rules
n
n
n
n
Each Description is a list of Rules
Rule Form: LHS  RHS (read  as “is defined as”)
Rule Names (LHS) are italicized, hyphenated words
Control Forms in RHS

Sequence
Items appear left to right; order is important

Choice
Alternatives separated by | (stroke); exactly
one item is chosen from the alternatives

Option
Optional item enclosed between [ and ];
it can be included or discarded

Repetition
Repeatable item enclosed between { and };
it can be repeated 0 or more times
15-200
5
An EBNF Description of Integers
A symbol (sequence of characters) is classified legal
by an EBNF rule if we can process all the characters
in the symbol when we reach the end of the right
hand side of the EBNF rule.
digit
 0|1|2|3|4|5|6|7|8|9
integer
[+|-]digit{digit}
digit is defined as any of the alternatives 0 through 9
integer is defined as a sequence of three items: (1) an
optional sign (if it is included, it must be the
alternative + or -), followed by (2) any digit,
followed by (3) a repetition of zero or more digits.
The integer RHS combines and illustrates all EBNF
control forms: sequence, option, alternative, repetition.
n
15-200
6
15-200
7
Proofs In English
n
Is the symbol 7 an integer? Yes, the proof:
In the integer EBNF rule, start with the optional sign;
discard the option. Next in the sequence is a digit: choose the
7 alternative. Next in the sequence is a repetition; choose 0
repetitions. End of symbol & integer reached.
n
Is the symbol +127 an integer? Yes, the proof.
In the integer EBNF rule, start with the optional sign;
include the option; choose the + alternative. Next in the
sequence is a digit: choose the 1 alternative. Next in the
sequence is a repetition; choose 2 repetitions; choose the 2
alternative for the first; choose the 7 alternative for the
second. End of symbol & integer reached.
n
Are the symbols 1,024
integer?
A5
15-
1+2 an
Tabular Proof
Tabular Proof Replacement Rules
(1) Replace a name (LHS) by its definition (RHS)
(2) Choose an alternative
(3) Include or Discard an Option
(4) Choose the number of repetitions
Status
integer
[+|-]digit{digit}
[+]digit{digit}
+digit{digit}
+1{digit}
+1digit digit
+12digit
+127
Reason
Given
Replace LHS by RHS (1)
Choose + alternative (2)
Include option (3)
Replace digit by 1 alternative (1&2)
Choose two repetitions (4)
Replace digit by 2 alternative (1&2)
Replace digit by 7 alternative (1&2)
15-200
8
15-200
9
Graphical Proof
integer
[+|-]
digit
[+]
1
+
{digit}
digit
digit
2
7
A graphical proof replaces multiple (equivalent) tabular proofs,
since the order of rule application (which is unimportant) is
often absent in graphical proofs.
Identical vs Equivalent Descriptions
sign
 +|digit
 0|1|2|3|4|5|6|7|8|9
integer [sign]digit{digit}
x  +|y  0|1|2|3|4|5|6|7|8|9
z [x]y{y}
These two descriptions are not identical but they are
equivalent: Although they use different EBNF rule
names (consistently), asking whether a symbol is an
integer is the same as asking whether the symbol is a z.
15-200
10
Two Problematical Descriptions
A “simplified but equivalent” definition of integer?
sign
digit
integer
 +| 0|1|2|3|4|5|6|7|8|9
[sign]{digit}
A “good” definition of integers with commas (1,024)?
sign
 +|comma-digit  0|1|2|3|4|5|6|7|8|9|,
comma-integer[sign]comma-digit{comma-digit}
Both definitions classify “non-obvious” symbols as
legal integer or comma-integer. Find such symbols.
15-200
11
Syntax and Semantics
n
n
n
Syntax = Form
Semantics = Meaning
Key Questions


n
Can two different symbols have the same meaning?
Can a symbol have many meanings (depending on context)?
Do the following symbols have the same meaning?



n
15-200
12
1 and +1, 000193 and 193
9.000 and 9.0
Rich and rich
EBNF specifies syntax, not semantics


Semantics is supplied informally: English, examples, ...
Formal semantics is a research area in CS, AI, Linguistics, ...
Structured Integers
Allow non-adjacent embedded underscores to add a
special structure to a number
2_10_54
1_800_555_1212
1_000_000 (compared to 1000000; figure each value fast)
Define structured-integer
digit
structured-integer
 0|1|2|3|4|5|6|7|8|9
[sign]digit{[_]digit}
Semantically, the underscore is ignored
1_2 has the same meaning as 12
How can we fix the date problem: 12_5_1987 and
1_25_1987
15-200
13
15-200
14
Syntax Charts
Sequence
ABCD
A
Choice
B
Option
[A]
C
D
A|B|C|D
A
B
C
D
Repetition
A
{A}
A
Syntax Charts for integer and digit
digit
0
1
2
3
4
5
6
7
8
9
+
integer
-
digit
digit
15-200
15
A Syntax Chart with no other names
+
integer
-
Which Syntax chart for
integer is simpler? The
previous one (because it
is smaller) or this one
(because it it doesn’t need
another name for digit)?
0
1
2
3
4
5
6
7
8
9
0
1
2
3
4
5
6
7
8
9
15-200
16
Interesting Rules & Their Charts
A
A
B
{A|B}
B
A
B
C
AB|
C
{A}|{B}
15-200
17
Description of Sets
n
Set syntax



Sets start with ( and end with )
Sets contain 0 or more integers
A comma appears between every pair of integers
integer-list
integer-set
n
 integer{,integer}
([integer-list])
Set semantics

Order is unimportant


(1,3,5) is equivalent to (5,1,3) and any other permutation
Duplicate elements are unimportant

(1,3,5,1,3,3,5) is equivalent to (1,3,5)
15-200
18
Proof: (5,-2,11) is an integer-set
Status
integer-set
([integer-list])
(integer-list)
(integer{,integer})
(5{,integer})
(5,integer,integer)
(5,-2,integer)
(5,-2,11)
Reason
Given
Replace integer-set by its RHS
Include option
Replace integer-list by its RHS
Lemma: 5 is an integer
Choose two repetitions
Lemma: -2 is an integer
Lemma: 11 is an integer
15-200
19
Description of Sets with Ranges
n
Ranges syntax

A range is a single integer or a pair separated by ..
integer-range
 integer[..integer]
integer-list  integer-range{,integer-range}
integer-set ([integer-list])
n
Range semantics X..Y

XY: all integers from X up to Y (inclusive)


1..5 is equivalent to 1,2,3,4,5; 5..5 is equivalent just to 5
X>Y: a null range; it contains no values

(1..4,10,5..4,11..13) is equivalent to (1,2,3,4,10,11,12,13)
15-200
20
Recursive Descriptions
A directly recursive EBNF rule has its LHS in its RHS
r1  | Ar1
We read this as r1 is defined as the choice of nothing
or an A followed by an r1. The symbols recognized as
an r1 are of the form An, n 0. Proof that AAA is an r1
r1
Ar1
AAr1
AAAr1
AAA
Given
Replace r1 by the second alternative in its RHS
Replace r1 by the second alternative in its RHS
Replace r1 by the second alternative in its RHS
Replace r1 by the first (empty) alternative in its RHS
This rule is equivalent to r1  {A}
15-200
21
The Power of Recursion
15-200
22
To recognize symbols of the form form An Bn , n 0 we
cannot write r1  {A}{B}, because nothing constrains us
choosing different repetitions of A and B: AAB
The recursive rule r1  | Ar1B works, because each
choice of the second alternative uses exactly one A and
one B. Proof that AAABBB is an r1
r1
Given
Ar1B
Replace r1 by the second alternative in its RHS
AAr1BB
Replace r1 by the second alternative in its RHS
AAAr1BBB Replace r1 by the second alternative in its RHS
AAABBB Replace r1 by the first (empty) alternative in its RHS
Symbols of the form form An Bn , n 0
Problems
Read the EBNF Handout (all but Section 2.7)
n Study and Understand the Review Questions

n
2 (page 10), 2&3 (page 12), 1 (page 16), 2 (page 18)
Be prepared to discuss in class solutions to the
following Exercises (starting on page 23)

1, 2, 4, and expecially 8
See next slide for more problems
15-200
23
15-200
24
Problems (continued)
n
Translate the following RHS of an EBNF rule into
its equivalent syntax chart. Then, classify each of the
examples below as legal or illegal according to this
rule (or its equivalent chart).

A{BA}Z



A{B[C]}Z



AZ
BZ
ABZ
ABAZ
ABABZ ABA
AAAZ
ABABBZ
BZ
ABC
ABBBZ
ACCZ
ABCZ ABCBCZ ABBCBBZ ABCZBCZ
A{B|C}Z


AB
ABC
ABBBZ
BBZ
ABBCCZ ACCBBZ ACBBCZ
ABCZBCZ