CS 2130 Lecture 13 Formal Language Concepts

Download Report

Transcript CS 2130 Lecture 13 Formal Language Concepts

CS 2130
Lecture 13
Formal Language Concepts
Where are we going?
Compilers
or
How does a computer program translate one
program into another?
Compilers
Parts of Compilers
Not part of
all compilers
Front
End
1. Lexical Analysis
Analysis
2. Syntax Analysis
3. Semantic Analysis
Back
End
(0. Preprocessor)
4. Code Generation
5. Optimization
Synthesis
Sidebar...HTML
• What is an HTML file?
• What does a browser do?
• Not all analysis leads to code
For today...
• a = b + c;
Token
• Assume that we have broken a program up into
pieces called tokens
Tokens
• How does the compiler determine if this statement is
legal or not?
• To answer we need to first answer another question...
What is a grammar?
Or to put it another way...
What grammar a is?
Speech
• How does natural language work when two persons
communicate?
• One person speaks a language using known words
arranged in a certain order
• The other person knows the meaning of the words
and the rules of the arrangements and derives
meaning
A Sentence
The quick brown fox jumps over the lazy dog.
A Sentence
The quick brown fox jumps over the lazy dog.
prepositional phrase
subject
predicate
Diagrammatically
fox
jumps
dog
Remember in the 3rd grade when you said, "We'll never see diagramming sentences again?"
Natural Language
• Rules of grammar specify legal syntax
• Rules are typically very complex with numerous
exceptions and special cases
Computer Language
• Still use a grammar to specify syntax
• Grammar is much simpler than natural language
Grammar
• Sentential forms
• Noam Chomsky MIT 60’s and 70’s
• Chomsky Type 2 Grammar
• Backus-Naur Form (BNF)
– Algol
Example
<sentence>
::= <noun-phrase><verb-phrase>
<noun-phrase> ::= <cmplx-noun>
| <cmplx-noun><prep-phrase>
<verb-phrase> ::= <cmplx-verb>
| <cmplx-verb><prep-phrase>
<prep-phrase> ::= <prep><cmplx-noun>
<cmplx-noun> ::= <article><noun>
<cmplx-verb> ::= <verb> | <verb><noun-phrase>
<article>
::= a | the
<noun>
::= boy | girl | flower
<verb>
::= touches | likes | sees
<prep>
::= with
Example
<sentence>
::= <noun-phrase><verb-phrase>
<noun-phrase> ::= <cmplx-noun>
| <cmplx-noun><prep-phrase>
<verb-phrase> ::= <cmplx-verb>
| <cmplx-verb><prep-phrase>
<prep-phrase> ::= <prep><cmplx-noun>
<cmplx-noun> ::= <article><noun>
<cmplx-verb> ::= <verb> | <verb><noun-phrase>
<article>
::= a | the
<noun>
::= boy | girl | flower
<verb>
::= touches | likes | sees
<prep>
::= with
Terminal Symbols
Example
<Non-terminal
Symbols>
<sentence>
::= <noun-phrase><verb-phrase>
<noun-phrase> ::= <cmplx-noun>
| <cmplx-noun><prep-phrase>
<verb-phrase> ::= <cmplx-verb>
| <cmplx-verb><prep-phrase>
<prep-phrase> ::= <prep><cmplx-noun>
<cmplx-noun> ::= <article><noun>
<cmplx-verb> ::= <verb> | <verb><noun-phrase>
<article>
::= a | the
<noun>
::= boy | girl | flower
<verb>
::= touches | likes | sees
<prep>
::= with
<Start Symbol>
Example
<sentence>
::= <noun-phrase><verb-phrase>
<noun-phrase> ::= <cmplx-noun>
| <cmplx-noun><prep-phrase>
<verb-phrase> ::= <cmplx-verb>
| <cmplx-verb><prep-phrase>
<prep-phrase> ::= <prep><cmplx-noun>
<cmplx-noun> ::= <article><noun>
<cmplx-verb> ::= <verb> | <verb><noun-phrase>
<article>
::= a | the
<noun>
::= boy | girl | flower
<verb>
::= touches | likes | sees
<prep>
::= with
Rules
Example
<sentence>
::= <noun-phrase><verb-phrase>
<noun-phrase> ::= <cmplx-noun>
| <cmplx-noun><prep-phrase>
<verb-phrase> ::= <cmplx-verb>
| <cmplx-verb><prep-phrase>
<prep-phrase> ::= <prep><cmplx-noun>
<cmplx-noun> ::= <article><noun>
<cmplx-verb> ::= <verb> | <verb><noun-phrase>
<article>
::= a | the
<noun>
::= boy | girl | flower
<verb>
::= touches | likes | sees
<prep>
::= with
Note
<noun>
::= boy | girl | flower
Equivalent to:
<noun>
<noun>
<noun>
::= boy
::= girl
::= flower
<sentence>
::= <noun-phrase><verb-phrase>
<noun-phrase> ::= <cmplx-noun>
| <cmplx-noun><prep-phrase>
<verb-phrase> ::= <cmplx-verb>
| <cmplx-verb><prep-phrase>
<prep-phrase> ::= <prep><cmplx-noun>
<cmplx-noun> ::= <article><noun>
<cmplx-verb> ::= <verb> | <verb><noun-phrase>
<article>
::= a | the
<noun>
::= boy | girl | flower
<verb>
::= touches | likes | sees
<prep>
::= with
<sentence> ::=
::=
::=
::=
::=
::=
::=
Typical
Derivation
<noun-phrase><verb-phrase>
<cmplx-noun><verb-phrase>
<article><noun><verb-phrase>
a <noun><verb-phrase>
a boy <cmplx-verb>
a boy verb <verb>
a boy sees
BNF Grammar
•
•
•
•
Terminal Symbols
Non-Terminal Symbols
Start Symbol (non-terminal)
Rules
•
•
•
•
•
Use grammar to parse statements into n-ary tree
Each statement becomes a tree
Terminal symbols correspond to leaf nodes
Non-terminal symbols correspond to internal nodes
Start symbol corresponds to root node
Example: Expression Grammar
<expr>
<term>
<factor>
<num>
::=
::=
::=
::=
<expr> + <term> | <term>
<term> * <factor> | <factor>
“(“ <expr> “)” | <num>
0 | 1 | 2 | 3| 4 | 5 | 6 | 7 | 8 | 9
<...>
+ * ( ) 0 1 2 3 4 5 6 7 8 9
Non-terminal symbol only
appearing on right side
or
Left-hand side of first rule
Non-terminal symbols
Terminal symbols
Rules
Start symbol
Grammars
<expr> ::= <expr> + <term> | <term>
<term> ::= <term> * <factor> | <factor>
<factor> ::= '(' <expr ')' | <num>
<num> ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
• This grammar will parse arithmetic expressions
involving + and * which use parentheses for grouping
and numbers from 0 to 9
• What if a given "sentence" can't be parsed?
<expr> ::= <expr> + <term> | <term>
<term> ::= <term> * <factor> | <factor>
<factor> ::= '(' <expr ')' | <num>
<num> ::= 0 | 1 | 2 | 3 | 4 |
5 | 6 | 7 | 8 | 9
Parse Tree
(Syntax Tree)
1 + 2 * 3
<expr>
<expr>
<term>
+
<term>
<term>
<factor>
<factor>
<num>
<num>
1
2
1
+
2
*
<factor>
<num>
3
*
3
Note
• All symbols were used
• Tree has all valid leaf nodes that are terminal
symbols
• Said to be a "Successful parse"
• Sentence "1 + 2 * 3" was syntactically correct
Parse Tree
(Syntax Tree)
1 + 2 * 3
1
+
2
*
3
Binary Tree!
1 + 2 * 3
+
1
*
2
3
Traversals?
1 + 2 * 3
+
Preorder
In order
Post order
*
1
2
Breadth-first
Depth-first
3
Traversals?
1 + 2 * 3
+
Preorder:
+ 1 * 2 3
In order:
1 + 2 * 3
Post order: 1 2 3 * +
*
1
2
3
Traversals?
1 + 2 * 3
+
Preorder:
+ 1 * 2 3
In order:
1 + 2 * 3
Post order: 1 2 3 * +
*
1
2
Preorder returns Prefix
notation similar to that
used in Lisp or Scheme
3 (+ 1 (* 2 3))
Traversals?
1 + 2 * 3
+
Preorder:
+ 1 * 2 3
In order:
1 + 2 * 3
Post order: 1 2 3 * +
*
1
2
3
Preorder can also be
used to reproduce the
original tree
Traversals?
1 + 2 * 3
+
Preorder:
+ 1 * 2 3
In order:
1 + 2 * 3
Post order: 1 2 3 * +
*
1
2
In order returns original
expression.
3
What is the value of
1 + 2 * 3
a.) 1 + 2 * 3 = 9
b.) 1 + 2 * 3 = 7
• Why?
Traversals?
1 + 2 * 3
+
Preorder:
+ 1 * 2 3
In order:
1 + 2 * 3
Post order: 1 2 3 * +
*
1
2
Post order returns
post-fix notation
otherwise known as
3 Reverse Polish Notation
or RPN
Note
• Both preorder and postorder traversals (which
generated prefix and postfix notation) which followed
the "normal" rules of precedence did so because the
grammar was set up to do so.
Historical Note
• Hewlett-Packard Calculators typically use the RPN or
postfix notation system
• The following sequence of keystrokes would
calculate as follows
1 <enter>
2 <enter>
3
*
+
RPN Stack Calculators
stack
0
0
0
1
(Display)
After pressing: 1
RPN Stack Calculators
stack
0
0
1
1
(Display)
After pressing: <ENTER>
RPN Stack Calculators
stack
0
0
1
2
(Display)
After pressing: 2
RPN Stack Calculators
stack
0
1
2
2
(Display)
After pressing: <ENTER>
RPN Stack Calculators
stack
0
1
2
3
(Display)
After pressing: 3
RPN Stack Calculators
stack
0
0
1
6
(Display)
After pressing: *
RPN Stack Calculators
stack
0
0
0
7
(Display)
After pressing: +
The preceding sequence brought to you by
Hewlett Packard
Parse Tree
• Introduces
– Syntax
– Semantics
• Order of Operations
– Must be built-in to grammar
Bad Grammar!
<expr> ::= <expr> + <expr>
<expr> * <expr>
'(' <expr> ')'
0 | 1 | 2 | 3 | 4
5 | 6 | 7 | 8 | 9
|
|
|
|
<expr> ::= <expr> + <expr>
<expr> * <expr>
'(' <expr> ')'
0 | 1 | 2 | 3 | 4
5 | 6 | 7 | 8 | 9
|
|
|
|
<expr>
<expr>
<expr>
+
Problem?
<expr>
1
+
2
7
<expr>
*
3
<expr> ::= <expr> + <expr>
<expr> * <expr>
'(' <expr> ')'
0 | 1 | 2 | 3 | 4
5 | 6 | 7 | 8 | 9
|
|
|
|
<expr>
<expr>
<expr>
*
What about?
<expr>
1
<expr>
+
2
*
9
3
<expr> ::= <expr> + <expr>
<expr> * <expr>
'(' <expr> ')'
0 | 1 | 2 | 3 | 4
5 | 6 | 7 | 8 | 9
<expr>
<expr>
<expr>
<expr>
+
<expr>
1
+
|
|
|
|
2
7
*
<expr>
<expr>
<expr>
3
1
Ambiguous!
<expr>
*
<expr>
+
2
*
9
3
Ambiguous grammars have
their place.
Metasymbols
| means OR (alternation)
::= means "defines a"
<...> means a non-terminal
(...) means grouping
' means enclosed metasymbol is terminal
There are others...
...and there is no "standard" set of
symbols.
Metasymbol Examples
<num> ::= 0|1|2|3|4|5|6|7|8|9
<signed num> ::= + <num> | - <num>
<signed num> ::= (+|-) <num>
Using pipe in Unix:
foo '|' bar '|' baz
Questions?