Transcript Chapter 2: Introduction to Microprocessor
by: Er. Sukhwinder kaur
Facts: 1. each non terminal symbol can derive many different strings.
2. Every string in a derivation is called a sentential form.
called a sentence.
3. Every sentential form containing no non terminal symbols is 4. The language L(G) generated by a CFG G is the set of sentences derivable from a distinguished non terminal called the start symbol of G. (eg.
a CFG is a quadruple G = (N, S ,P,S) where N is a finite set (of non terminal symbols) S is a finite set (of terminal symbols) disjoint from N.
S N is the start symbol.
P is a a finite subset of N x (N S )* (The productions) Conventions: Non terminals: A,B,C,… terminals: a,b,c,… strings in (N S )* : a,b,g ,… Each (A, a ) P is called a production rule and is usually written as: A A set of rules with the same LHS: A A a a 1 1 A | a 2 | a a 2 3 . A a 3 can be abbreviated as a .
S S S ( S ) ( S ) e e
Features of the parse tree: 1. The root node is [labeled by] the start symbol: S 2. The left to right traversal of all leaves corresponds to the input string : ( ) ( ).
3. If X is an internal node and Y 1 Y 2 … Y K are an left-to-right listing of all its children in the tree, then X --> Y 1 Y 2 … Y k is a rule of G.
4. Every step of derivation corresponds to one-level growth of an internal node
Definition
. A
left-most derivation
of a sentential form is one in which rules transforming the left-most nonterminal are always applied
Definition
. A
right-most derivation
of a sentential form is one in which rules transforming the right-most nonterminal are always applied
Left recursion: A
A
a
Right recursion: A
a
A
Most algorithms have trouble with one, In recursive descent, avoid left recursion.
• • • • • • Let G be a CFG for some L-{ e } Definition: G is said to be in Chomsky Normal Form if all its productions are in one of the following two forms: • A BC where A,B,C are variables, or • A a where a is a terminal G has no useless symbols G has no unit productions G has no e -productions
• Is this grammar in CNF?
G 1 : 1.
2.
3.
4.
E T E+T | T*F | (E) | Ia | Ib | I0 | I1 T*F | (E) | Ia | Ib | I0 | I1 F I (E) | Ia | Ib | I0 | I1 a | b | Ia | Ib | I0 | I1 Checklist: • G has no • But… e -productions • G has no unit productions • G has no useless symbols • the normal form for productions is violated So, the grammar is not in CNF
A CFG is in Greibach normal form if each rule has one these forms: i.
ii.
iii.
A A S where aA a a S 1 A 2 …A and A i n V – { S } for i = 1, 2,…, n
Removing
e
-Productions
Remove all e productions: (1) If there is a rule
P
a
Q
b and
Q
is nullable, Then: Add the rule
P
ab . (2) Delete all rules
Q
e .
A
unit production
is a rule whose right-hand side consists of a single nonterminal symbol. Example:
S
X
A B
Y T X Y A B
b |
T Y
a | c
Removing Unit
Productions
removeUnits
(
G
) = 1. Let
G
=
G
.
2. Until no unit productions remain in
G
do: 2.1 Choose some unit production
X
Y.
2.2 Remove it from
G
where b
V
*, do: Add to
G
.
2.3 Consider only rules that still remain. For every rule
Y
b , the rule
X
b unless it is a rule that has already been removed once.
3. Return
G
Example: .
S
X
A B
Y
T
X Y A B
b
T Y
| | a c
S A
B
T
X
Y
X Y
a b c a c | | b b
• • • • FAs recognize regular languages.
What kinds of machines recognize CFLs ?
PDA: • • • • ===> Pushdown automata (PDAs) Like FAs but with an additional • Actions of a PDA 1. Move right one tape cell (as usual FAs) 2. push a symbol onto stack stack 3. pop a symbol from the stack.
• Actions of a PDA depend on as working memory.
• • 1. current state 2. currently scanned I/P symbol 3. current top stack symbol.
• A string x is accepted by a PDA if it can enter a final state (or clear all stack • More details defer to later chapters.
If a language L is accepted by a DFA M with m states, then any string x in L with |x| > m can be written as x = uvw such that (1) v ≠ε, and (2) uv*w is a subset of L (i.e., for any n> 0, uv w in L).
Consider the path associated with x (|x| > m).
x
Since |x| > m, # of nodes on the path is At least m+1. Therefore, there is a state Appearing twice.
v u w v ≠ ε because M is DFA uw in L because there is a path associated with uw from initial state to a final state. due to the same reason as above