CSP 506 Comparative Programming Languages

Download Report

Transcript CSP 506 Comparative Programming Languages

CPS 506
Comparative Programming
Languages
Syntax Specification
Compiling Process Steps
• Program  Lexical Analysis
– Convert characters into a stream of
tokens
• Lexical Analysis  Syntactic
Analysis
– Send tokens to develop an abstract
representation or parse tree
2
Compiling Process Steps (con’t)
• Syntactic Analysis  Semantic Analysis
– Send parse tree to analyze for semantic
consistency and convert for efficient run in
the architecture (Optimization)
• Semantic Analysis  Machine Code
– Convert abstract representation to
executable machine code using code
generation
3
Formal Methods and
Language Processing
• Meta-Language
– A language to define other languages
• BNF (Backus-Naur Form)
A set of rewriting rules ρ
A set of terminal symbols ∑
A set of non-terminal symbols Ν
A start symbol S є Ν
ρ: Αω
Α є Ν and ω є (Ν U Σ)
Right-hand side: a sequence of terminal and non-terminal
symbols
– Left-hand side: a non-terminal symbol
–
–
–
–
–
–
–
4
BNF (con’t)
•
The words in Ν : grammatical categories
–
–
–
–
Identifier, Expression, Loop, Program, …
S : principal grammatical category
Symbols in Σ : the basic alphabet
Example 1:
binaryDigit  0
binaryDigit  1
•
or
binaryDigit  0 | 1
– Example 2:
Integer  Digit | Integer Digit
Digit  0|1|2|3|4|5|6|7|8|9
5
BNF (con’t)
• Parse Tree
Integer
Digit
Integer
• Derivation
Integer
Digit
Digit
8
1
2
Integer  Integer Digit  Integer Digit Digit  Digit Digit Digit
 2 Digit Digit  28 Digit  281
6
BNF (con’t)
• Lexeme: The lowest-level syntactic units
• Tokens : A set of all grammatical categories
that define strings of non-blank characters
(Lexical Syntax)
–
–
–
–
–
Identifier (variable names, function names,…)
Literal (integer and decimal numbers,…)
Operator (+,-,*,/,…)
Separator (;,.,(,),{,},…)
Keyword (int, if, for, where,…)
7
BNF (con’t)
Comment
Keyword
Identifier
// comments …
void main ( ) {
float p;
p = 3.14 ;
}
Literal
Separator
Operator
8
BNF (con’t)
9
Regular Expressions
•
An alternative for BNF to define a language lexical rules
– x : A character
– “abc” : A literal string
– A | B : A or B
– A B : Concatenation of A and B
– A* : Zero or more occurrence of A
– A+ : One or more occurrence of A
– A? : Zero or one occurrence of A
– [a-z A-Z] : Any alphabetic character
– [0-9] : Any digit
– . : Any single character
•
Example
Integer :
Identifier :
[0-9]+
[a-z A-Z][a-z A-Z 0-9]*
10
Syntactic Analysis
•
•
•
•
Primary tool: BNF
Input: Tokens from lexical analysis
Output: Parse
Syntactic categories
– Program
•
•
•
•
•
Declaration
Assignment
Expression
Loop
Function definition
11
Syntactic Analysis (con’t)
• Example
Arithmetic Expression Term | Arithmetic Expression + Term
| Arithmetic Expression – Term
Term  Factor | Term * Factor
| Term / Factor
Factor  Identifier | Literal
| ( Arithmetic Expression )
12
Syntactic Analysis (con’t)
• Example
Arithmetic
Expression
2 * a - 3
Term
Term
*
Factor
Factor
Identifier
Literal
Letter
Integer
2
a
Arithmetic
Expression
Term
Factor
Literal
Integer
3
13
Syntactic Analysis (con’t)
• BNF limitations
– Declaration of identifiers?
– Initial value of identifiers?
• In statically typed languages
– Using Type System for the first
problem
– Detect in compile time or run time
14
Ambiguous Grammar
• A string is parsed into two or more various
trees
• Example
Exp  Identifier | Literal | Exp – Exp
Input: A – B – C
Output:
1- A – (B – C)
2- (A – B) – C
• Another example is “dangling else”
– Using BNF rules
– Using extra-grammatical rules
15
Operator Precedence
<expr>  <id> + <expr> | <id> * <expr>
| ( <expr> ) | <id>
A = B + C * A  A = B + (C * A)
A = B * C + A  A = B * (C + A)
Solution
<expr>  <expr> + <term> | <term>
<term>  <term> * <factor> | <factor>
<factor>  ( <expr> ) | <id>
A = B + C * A  A = B + (C * A)
A = B * C + A  A = (B * C) + A
16
Associativity of Operators
A+B+C
A*B*C
• Left Associativity
A/B/C
…
– Left Recursive: In a grammar rule, LHS also appears at the
beginning of its RHS
<expr>  <expr> + <term> | <term>
A+B+C

(A + B) + C
• Right Associativity
– Right Recursive: In a grammar rule, LHS also appears at the
end of its RHS
<factor>  <exp> ** <factor> | <exp>
<exp>  ( <expr> ) | <id>
A + B ** C

A + (B ** C)
17
Extended BNF (EBNF)
• Optional part of an RHS
<if_stmt>  if ( <expr> ) <statement> [ else
<statement> ]
• Repetition, or recursion, part of an RHS
<id_list>  <id> { , <id_list> }
• Multiple choice option of an RHS
<term>  <term> ( * | / | % ) <factor>
• Optional use of * and +
<id_list>  <id> { , <id_list> }*
<integer>  {0 | … | 9}+
18
Extended BNF (EBNF) (con’t)
•
opt
subscript
Conditional Statement  if ( Expr ) Statement { else Statement }opt
• Syntax Diagram
Term
Factor
*|/
19
Case Study
• A BNF or EBNF for one grammar,
such as Expression, different
Literals, or if Statement in Java,
C, C++, or Pascal
• BNF or EBNF for floating point
numbers in Java, C, C++
• BNF or EBNF for loop statements
in one language
20
Abstract Syntax
• Consider the following codes:
• Pascal
• C or Java
While i < 10 do
begin
i := i+ 1;
end;
while (i < 10) {
i = i + 1;
}
Although syntax are different, they
are essentially equivalent
• Abstract Syntax is a solution to show
the essential elements of a language
21
Abstract Syntax (con’t)
• General Form
Abstract Syntax Class = list of essential
components
Member
• Example
Loop = Expression test; Statement body
Element
• A Java class for abstract syntax of loop
}
class Loop extends Statement {
Expression test;
Statement body;
22
Abstract Syntax (con’t)
• More examples
Member
Assignment = Variable target; Expression source
Element
• A Java class for abstract syntax of
Assignment
}
class Assignment extends Statement {
Variable target;
Expression source;
23
Abstract Syntax Tree
• A tree to show the abstract syntax tree
Example
x = 2;
x := 2;
Assignment = Variable target; Expression source
Statement
Assignment
Variable
Expression
x
Value
2
24
Recursive Descent Parser
• A top-down parser to verify the syntax of a
stream of text from left to right
• It contains several recursive methods, each
of which implements a rule of the grammar
• More details and parsing algorithms in
Compiler course
25
Exercises
1. Modify the following grammar to add a
unary minus operator that has higher
precedence than either + or *.
<assign>  <id> = <expr>
<id>  A | B | C
<expr>  <expr> + <term> | <term>
<term>  <term> * <factor> | <factor>
<factor>  ( <expr> ) | <id>
26
Exercises
2. Consider the following grammar:
<S>  <A> a <B> b
<A>  <A> b | b
<B>  a <B> | a
Which of the following sentences are in the language
generated by this grammar?
1.
2.
3.
4.
baab
bbbab
bbaaaaa
bbaab
27
Exercises
3. Convert the following EBNF to BNF:
S  A { bA }
A  a [b]A
4. Using grammar in question 1, add the ++ and –
unary operators of Java.
5. Using grammar in question 1, show a parse tree
and a leftmost derivation for each of the
following statements:
a)
b)
A = (A+B) * C
A = B * (C * (A + B))
28
Exercises
6. Rrewrite the BNF in question 1 to give +
precedence over *, and force + to be
right associative.
7. Using BNF write an algorithm for the
language consisting of strings {ab}n,
where n>0, such as ab, aabb, … .
Can you write this using regular
expressions?
29