Transcript ppt
Announcements HW1 due on Monday February 8th Name and date your submission Submit electronically in Homework Server AND on paper in the beginning of class Make sure you have an account in HW Server! You may submit late on Thursday February 11th for 50% credit No other submissions accepted Email questions to [email protected] Spring 16 CSCI 4430, A Milanova 1 Last Class Top-down parsing vs. bottom-up parsing Top-down parsing Introduction A backtracking parser Recursive descent predictive parser Table-driven top-down parser LL(1) parsing tables, FIRST and FOLLOW sets Spring 16 CSCI 4430, A Milanova 2 Today’s Lecture Outline Top-down (also called LL) parsing LL(1) parsing tables, FIRST, FOLLOW and PREDICT sets Writing an LL(1) grammar Bottom-up (also called LR) parsing Model of a bottom-up (LR) parser Spring 16 CSCI 4430, A Milanova 3 Programming Language Syntax Parsing Read: finish Chapter 2.3.2 and start Chapter 2.3.3 LL(1) Parsing Tables One dimension is nonterminal to expand Other dimension is lookahead token a A α E.g., entry “nonterminal A on terminal a” contains production A α This means, when the parser is at nonterminal A and the lookahead token in the stream is a, the parser must expand A by production A α 5 LL(1) Parsing Table start expr $$ expr term term_tail term id factor_tail term_tail + term term_tail | ε factor_tail * id factor_tail | ε id + * $$ start expr $$ - - - expr term term_tail - - - term_tail - + term term_tail - ε term id factor_tail - - - factor_tail - ε * id factor_tail ε Spring 16 CSCI 4430, A Milanova 6 LL(1) Parsing Tables We can construct an LL(1) parsing table for any context-free grammar In general, the table will contain multiply-defined entries. That is, for some nonterminal and lookahead token, more than one productions apply A grammar whose LL(1) parsing table has no multiply-defined entries is said to be LL(1) grammar LL(1) grammars are a very special subclass of contextfree grammars. Why? Spring 16 CSCI 4430, A Milanova 7 FIRST and FOLLOW sets Let α be any sequence of nonterminals and terminals FIRST(α) is the set of terminals a that begin the strings derived from α If there is a derivation α * ε, then ε is in FIRST(α) Let A be a nonterminal FOLLOW(A) is the set of terminals b (including special end-of-input marker $$) that can appear immediately to the right of A in some sentential form: start * …Ab… *… Spring 16 CSCI 4430, A Milanova 8 Computing FIRST Notation: α is an arbitrary sequence of terminals and nonterminals. Apply these rules until no more terminals or ε can be added to any FIRST(α) set (1) If α starts with a terminal a, then FIRST(α) = {a} (2) If α is a nonterminal X, where X ε, then add ε to FIRST(α) (3) If α is a nonterminal X Y1Y2…Yk then place a in FIRST(X) if for some i, a is in FIRST(Yi) and ε is in all of FIRST(Y1), … FIRST(Yi-1). If ε is in all of FIRST(Y1), … FIRST(Yk), add ε to FIRST(X). Everything in FIRST(Y1) is surely in FIRST(X) If Y1 does not derive ε, then we add nothing more; Otherwise, we add FIRST(Y2), and so on Similarly, if α is Y1Y2…Yk , we’ll repeat the above 9 Warm-up Exercise start expr $$ expr term term_tail term id factor_tail term_tail + term term_tail | ε factor_tail * id factor_tail | ε FIRST(term) = { id } FIRST(expr) = FIRST(start) = FIRST(term_tail) = FIRST(+ term term_tail) = FIRST(factor_tail) = Spring 16 CSCI 4430, A Milanova 10 Exercise start S $$ SxS|Ay A BCD | ε BzS|ε CvS|ε DwS Compute FIRST sets: FIRST(x S) = FIRST(A y) = FIRST(BCD) = FIRST(z S) = FIRST(v S) = FIRST(w S) = FIRST(S) = FIRST(A) = FIRST(B) = FIRST(C) = FIRST(D) = 11 Computing FOLLOW Notation: A,B,S are nonterminals. α,β are arbitrary sequences of terminals and nonterminals. Apply these rules until nothing can be added to any FOLLOW(A) set (1) If there is a production A αBβ, then everything in FIRST(β) except for ε should be added to FOLLOW(B) (2) If there is a production A αB, or a production A αBβ, where FIRST(β) contains ε, then everything in FOLLOW(A) should be added to FOLLOW(B) Spring 16 CSCI 4430, A Milanova 12 Warm-up start expr $$ expr term term_tail term id factor_tail term_tail + term term_tail | ε factor_tail * id factor_tail | ε FOLLOW(expr) = { $$ } FOLLOW(term_tail) = FOLLOW(term) = FOLLOW(factor_tail) = Spring 16 CSCI 4430, A Milanova 13 Exercise start S $$ SxS|Ay A BCD | ε BzS|ε CvS|ε DwS Compute FOLLOW sets: FOLLOW(A) = FOLLOW(B) = FOLLOW(C) = FOLLOW(D) = FOLLOW(S) = Spring 16 CSCI 4430, A Milanova 14 PREDICT Sets if α does not derive ε PREDICT(A α) = (FIRST(α) – {ε}) U FOLLOW(A) if α derives ε Spring 16 CSCI 4430, A Milanova 15 Constructing LL(1) Parsing Table Algorithm uses PREDICT sets foreach production A α in grammar G foreach symbol c in PREDICT(A α) add A α to entry parse_table[A,c] If all entries in parse_table contain at most one production, then G is said to be LL(1) Spring 16 CSCI 4430, A Milanova 16 Exercise start S $$ SxS|Ay A BCD | ε BzS|ε CvS|ε DwS Compute PREDICT sets: PREDICT(S x S) = PREDICT(S A y) = PREDICT(A BCD) = PREDICT(A ε) = … etc… Spring 16 CSCI 4430, A Milanova 17 Writing an LL(1) Grammar Most context-free grammars are not LL(1) grammars Obstacles to LL(1)-ness expr expr + term | term term term * id | id Left recursion is an obstacle. Why? Common prefixes are an obstacle. stmt if b then stmt else stmt | Why? if b then stmt | a Spring 16 CSCI 4430, A Milanova 18 Removal of Left Recursion Left recursion can be removed from a grammar mechanically Started from this left-recursive expression grammar: expr expr + term | term term term * id | id After removal of left recursion we obtain this equivalent grammar, which is LL(1): Spring 16 CSCI 4430, A Milanova expr term term_tail term_tail + term term_tail | ε term id factor_tail factor_tail * id factor_tail | ε 19 Removal of Common Prefixes Common prefixes can be removed mechanically as well, by using left-factoring Original if-then-else grammar: stmt if b then stmt else stmt | if b then stmt | a After left-factoring: stmt if b then stmt else_part | a else_part else stmt | ε Spring 16 CSCI 4430, A Milanova 20 Exercise Compute FIRSTs: start stmt $$ stmt if b then stmt else_part | a else_part else stmt | ε FIRST(stmt $$), FIRST(if b then stmt else_part), FIRST(a), FIRST(else stmt) Compute FOLLOW: FOLLOW(else_part) Compute PREDICT sets for all 5 productions Construct the LL(1) parsing table. Is this grammar an LL(1) grammar? Spring 16 CSCI 4430, A Milanova 21 Lecture Outline Top-down (also called LL) Parsing (continue) LL(1) parsing table, FIRST, FOLLOW and PREDICT sets Writing an LL(1) grammar Bottom-up (also called LR) Parsing Model of the bottom-up (LR) parser Spring 16 CSCI 4430, A Milanova 22 Bottom-up Parsing list id list_tail list_tail , id list_tail | ; Terminals are seen in the order of appearance in the token stream id , id , id ; list id list_tail , Parse tree is constructed From the leaves to the top A right-most derivation in reverse Spring 16 CSCI 4430, A Milanova id list_tail , id list_tail ; 23 Bottom-up Parsing Stack id id, id,id id,id, id,id,id id,id,id; Spring 16 CSCI 4430, A Milanova list id list_tail list_tail , id list_tail | ; Input Action id,id,id; ,id,id; id,id; ,id; id; ; shift shift shift shift shift shift reduce by list_tail; 24 Bottom-up Parsing Stack id,id,id list_tail Input list id list_tail list_tail , id list_tail | ; Action reduce by list_tail ,id list_tail id,id list_tail reduce by list_tail ,id list_tail id list_tail reduce by list id list_tail list Spring 16 CSCI 4430, A Milanova ACCEPT 25 Bottom-up Parsing Also called LR parsing LR parsing is better than LL parsing LR parsers work with LR(k) grammars Accepts larger class of languages Just as efficient! L stands for “left-to-right” scan of input R stands for “rightmost” derivation k stands for “need k tokens of lookahead” We are interested in LR(0) and LR(1) and variants in between Spring 16 CSCI 4430, A Milanova 26 LR Parsing The parsing method used in practice LR parsers recognize virtually all PL constructs LR parsers recognize a much larger set of grammars than predictive parsers LR parsing is efficient LR parsing variants SLR (or Simple LR) LALR (or Lookahead LR) – yacc/bison generate LALR parsers LR (Canonical LR) SLR < LALR < LR Spring 16 CSCI 4430, A Milanova 27 Main Idea Stack Input Stack: holds the part of the input seen so far Input: holds the remaining part of the input A string of both terminals and nonterminals A string of terminals Parser performs two actions Reduce: parser pops a “suitable” production right-hand-side off the stack, and pushes the production left-hand-side on the stack Shift: parser pushes next terminal from the input on top of the stack Spring 16 CSCI 4430, A Milanova 28 Example Recall the grammar expr expr + term | term term term * id | id This is not LL(1) because it is left recursive LR parsers can handle left recursion! Consider string id + id * id Spring 16 CSCI 4430, A Milanova 29 id + id*id Stack Input id+id*id id +id*id term +id*id expr +id*id expr+ id*id expr+id *id Spring 16 CSCI 4430, A Milanova expr expr + term | term term term * id | id Action shift id reduce by term id reduce by expr term shift + shift id reduce by term id 30 expr expr + term | term term term * id | id id + id*id Stack Input Action expr+term *id expr+term* id expr+term*id expr+term expr Spring 16 CSCI 4430, A Milanova shift * shift id reduce by termterm *id reduce by exprexpr+term ACCEPT, SUCCESS 31 id + id*id expr expr + term | term term term * id | id Sequence of reductions performed by parser id+id*id • A right-most derivation in reverse term+id*id expr+id*id • The stack (e.g., expr) concatenated with remaining expr+term*id input (e.g., +id*id) gives a expr+term expr Spring 16 CSCI 4430, A Milanova sentential form (expr+id*id) in the right-most derivation 32 expr expr + term | term term term * id | id Handle A handle Formally, if we have a right-most derivation S … αAw αβw, then we say that A β at position α is a handle of αβw Notation: S and A are nonterminals, w is a sequence of terminals, α and β are arbitrary sequences (of both terminals and nonterminals) Recall our example id+id*id Stack expr+term expr+term*id Spring 16 CSCI 4430, A Milanova Input *id Is expr expr+term a handle? Is term id a handle? 33 Model of an LR parser Input: Stack: State Symbol a1 ai … an … $$ LR Parsing Program sm Xm sm-1 Xm-1 … Parsing table: s0 action action[s,a]: Do we shift or reduce? Spring 16 CSCI 4430, A Milanova goto goto[s,A]: After reduction to nonterminal A, what state is pushed 34 on top of the stack? Model of an LR parser Stack is (s0,X1,s1,…Xm,sm), input pointer at ai action[sm,ai] is shift s Push ai and state s on stack: (s0,X1,s1,…Xm,sm,ai,s) action[sm,ai] is reduce by A β Pop β (i.e., pop 2*|β| things off the stack - all symbols in β plus all their corresponding states): (s0,X1,s1,…Xm-|β|,sm-|β|) Push A and goto[sm-|β|,A]=s on top of the stack: (s0,X1,s1,…Xm-|β|,sm-|β|,A,s) Spring 16 CSCI 4430, A Milanova 35 White – action table Blue – goto table LR Parsing Table 1. expr expr + term 2. expr term state 0 1 2 id 7 * $$ s3 3 4 5 6 + 3. term term * id 4. term id s4 r2 s5 acc r2 r4 r4 r4 s3 s7 expr term 1 2 6 r1 s5 r1 r3 r3 r3 36 Summary Top-down (also called LL) Parsing (continue) LL(1) parsing table and predict sets Writing an LL(1) grammar Bottom-up (also called LR) Parsing Model of the bottom-up (LR) parser LR parsing table Spring 16 CSCI 4430, A Milanova 37 Next Class We will continue with Bottom-up Parsing. Keep reading Chapter 2.3.3 Spring 16 CSCI 4430, A Milanova 38