A Big Test Result - Knowledge Systems Institute

Download Report

Transcript A Big Test Result - Knowledge Systems Institute

Programming Languages
Language Syntax
This lecture introduces the the lexical structure of
programming languages; the context-free
grammars and their description in BNF; the
representation of syntactic structure using trees;
the issues that arise in constructing BNFs for a
programming language; the EBNFs and syntax
diagrams.
Language Syntax
• Syntax is the structure of a language.
• One of the great advances in programming
languages has been the development of a formal
system for describing syntax that is now almost
universally in use.
• In the 1950s Noam Chomsky developed the idea
of context-free grammars; and John Backus, with
contributions by Peter Naur, developed the BackusNaur forms (BNFs) notational system for describing
these grammars.
Lexical Structures
• The lexical structure of a programming language is
the structure of its words, or tokens.
• Typically, the scanning phase of a translator
collects sequences of characters from the input
program into tokens;
• which are then processed by a parsing phase, that
determines the syntactic structure.
Tokens
• Typical token categories include the following:
• Reserved words, sometimes called keywords, such
as "begin," "if," and "while“.
• Constants or literals, such as 42 (a numeric
constant) or "hello" (a string constant).
• Special symbols, such as ";", "< =", or "+“.
• Identifiers, such as x24, monthly_balance, or write.
Context-Free Grammars And BNFs
• We begin the description of grammars and BNFs
with an example:
• In English, we can express sentences as:
• 1. <sentence>:: = <noun-phrase> <verb-phrase>.
• 2. <noun-phrase> :: = <article> <noun>
3.
<article > ::= a | the
4. <noun > ::= girl | dog
5. <verb-phrase> :: = <verb> <noun-phrase>
6. <verb> ::= sees | pets
Context-Free Grammars And BNFs
• Thus we could construct, or derive, the sentence
"the girl sees a dog." as follows:
Context-Free Grammars And BNFs
• A context-free grammar consist of a series
grammar rules as described; the rules consist of a
left-hand side that is a single structure name;
• followd by a right-hand side consisting of a
sequence of items that can be symbols or other
structure names.
• The names for structures (like <sentence>) are
called nonterminals, since they are broken down
into further structures.
Productions
• The words or token symbols are also called
terminals, since they are never broken down.
• Grarmmar rules are also called productions, since
they "produce" the strings of the language using
derivations.
• Productions are in Backus-Naur form if they are as
given using only the metasymbols ":: = ", "|", "<",
and ">".
• ( Sometimes parentheses are also allowed to
group things together.)
Context-free?
• Why is such a grammar context-free?
• The simple reason is that the nonterminals appear
singly on the left-hand sides of productions.
• This means that each nonterminal can be replaced
by any right-hand side alternative, no matter
where the nonterminal might appear.
• In other words, there is no context under which
only certain replacements can occur.
Context-free?
• Why is such a grammar context-free?
• The simple reason is that the nonterminals appear
singly on the left-hand sides of productions.
• This means that each nonterminal can be replaced
by any right-hand side alternative, no matter
where the nonterminal might appear.
• We shall adopt the view that anything not
expressable using context-free grammars is a
semantic, not a syntactic issue.
Context-sensitivity
• As an example of a context-sensitivity, we noted
that articles that appear at the beginning of
sentences in the preceding grammar should be
capitalized.
• One way of doing this is to rewrite the first rule as:
• <sentence>:: = <beginning> <noun-phrase>
<verb-phrase> '.'
• and then add the context-sensitive rule:
<beginning> <article>:: = The | A
Context-sensitivity (2)
• Now the derivation would look as follows:
• <sentence> -> <beginning><noun-phrase>
<verb-phrase>. (new rule 1)
•
-> <beginning> <article> <noun>
<verb-phrase>. (rule 2)
•
-> The <noun> <verb-phrase>.
(new context-sensitive rule)
•
->…..
BNF form
• Context-free grammars have been studied
extensively by formal language theorists and are
now so well understood that it is natural to
express the syntax of any programming language
in BNF form.
• By doing so makes it easier to write translators for
the language, since the parsing stage can be
automated.
Syntax-directed Semantics
• Syntax establishes structure, not meaning.
• But the meaning of a sentence (or program) must
be related to its syntax.
• To make use of the syntactic structure of a
program to determine its semantics we must have
a way of expressing this structure as determined
by a derivation.
• A standard method for doing this is with a parse
tree.
Parse Tree
• The parse tree describes graphically the
replacement process in a derivation.
• For example, the parse tree for the sentence "the
girl sees a dog." is as follows:
A Simple Arithmetic Expression
Grammar
• A typical simple example of the use of a contextfree grammar in programming languages is the
description of simple integer arithmetic expressions
with addition and multiplication:
• <exp>::=<exp>+<exp>|<exp>*<exp>|
(<exp>) | <number>
• <number> :: = <number><digit > | <digit >
• <digit> :: = 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Ambiguity
• A grammar
for which two
distinct parse
are possible
for the same
string is
ambiguous.
• For example,
if we
construct:
3+4*5
Precedence and Associativity
• The revised disambiguating grammar for simple
arithmetic expression that expresses both
precedence and associativity is given as:
• <exp>
::= <exp> + <term> | <term>
<term>
::= <term>* <factor> | <factor>
<factor> ::= (<exp>) | <number>
<number> ::= <number><digit> | <digit>
<digit>
::= 0|1|2|3|4|5|6|7|8|9
• The above disambiguating rules define the
precedence for * and + operators; and apply the
left-recursive associative rule.
EBNFs
• A special notation for grammar rules is adopted
that expresses more clearly the repetitive nature of
their structures:
• <exp>
::= <term> { + <term>}
<term>
::= <factor> { * <factor>}
<factor>
::= (<exp>) | <number>
<number> ::= <digit> { <digit>}
<digit>
::= 0|1|2|3|4|5|6|7|8|9
• We assume that any operator involved in a curly
bracket repetition is left-associative.
Syntax
Diagrams
• A useful
graphical
representa
tion for a
grammar
rule is the
syntax
diagram.