Transcript Ch03Part1

Chapter 3 Describing Syntax and Semantics

ISBN 0-321-33025-0

Chapter 3 Topics

• 3.1 Introduction • 3.2 The General Problem of Describing Syntax • 3.3 Formal Methods of Describing Syntax • 3.4 Attribute Grammars • 3.5 Describing the Meanings of Programs: Dynamic Semantics Copyright © 2006 Addison-Wesley. All rights reserved.

1-2

3.1 Introduction

• Syntax and semantics provide a language’s definition • Syntax: the form or structure of the expressions, statements, and program units • Semantics: the meaning of the expressions, statements, and program units • E.g., while (x>20) { sum = sum + x; x = x+1; } Copyright © 2006 Addison-Wesley. All rights reserved.

1-3

3.2 The General Problem of Describing Syntax: Terminology

• A

language

is a set of sentences • A

sentence/statement

is a string of characters over some alphabet • A

token

is a category of lexemes (e.g., identifier) • A

lexeme

is the lowest level syntactic unit of a language (e.g., * , sum, begin ) Copyright © 2006 Addison-Wesley. All rights reserved.

1-4

3.2 The General Problem of Describing Syntax: Terminology

• E.g. language: { token: identifier statement int index, count; … token: int literal index = 2 * count + 17; } lexeme Copyright © 2006 Addison-Wesley. All rights reserved.

1-5

Formal Definition of Languages

• Recognizers – A recognition device reads input strings of the language and decides whether the input strings belong to the language – Example: syntax analysis part of a compiler – Detailed discussion in Chapter 4 • Generators – A device that generates sentences of a language – One can determine if the syntax of a particular sentence is correct by comparing it to the structure of the generator Copyright © 2006 Addison-Wesley. All rights reserved.

1-6

3.3 Formal Methods of Describing Syntax

• 3.3.1 Backus-Naur Form and Context Free Grammars (BNF form) – Most widely known method for describing programming language syntax • 3.3.2 Extended BNF – Improves readability and writability of BNF Copyright © 2006 Addison-Wesley. All rights reserved.

1-7

BNF and Context-Free Grammars

• Context-Free Grammars – Developed by Noam Chomsky in the mid-1950s – Natural language linguist – Described four classes of grammars that define four classes of languages – Two classes (context-free and regular) turned out to be useful for describing the syntax of programming languages Copyright © 2006 Addison-Wesley. All rights reserved.

1-8

Backus-Naur Form (BNF)

• Backus-Naur Form (1959) – Invented by John Backus to describe Algol 58, later modified by Peter Naur – BNF is equivalent to context-free grammars – BNF is a metalanguage another language used to describe – In BNF, abstractions are used to represent classes of syntactic structures--they act like syntactic variables (also called nonterminal symbols ) • E.g. -> = Copyright © 2006 Addison-Wesley. All rights reserved.

1-9

BNF Fundamentals

• Non-terminals: BNF abstractions • Terminals: lexemes and tokens • Grammar: a collection of rules – Examples of BNF rules: → identifier | identifier,

if

then

Copyright © 2006 Addison-Wesley. All rights reserved.

1-10

BNF Rules

• A rule has a left-hand side (LHS) and a right-hand side (RHS), and consists of

terminal

and

nonterminal

symbols • A grammar is a finite nonempty set of rules • An abstraction (or nonterminal symbol) can have more than one RHS Qs:

?

| begin end

?

?

?

Copyright © 2006 Addison-Wesley. All rights reserved.

1-11

Specific Rule for Describing Lists

• Syntactic lists are described using recursion  ident | ident, Copyright © 2006 Addison-Wesley. All rights reserved.

1-12

Grammars and Derivations

• A derivation is a repeated application of rules, starting with the start symbol and ending with a sentence (all terminal symbols) Copyright © 2006 Addison-Wesley. All rights reserved.

1-13

An Example Grammar And Derivation

Grammar  begin end | ; = A | B | C + | - | Copyright © 2006 Addison-Wesley. All rights reserved.

Derivation of

Begin A = B + C; B = C; End

=> begin end  begin ; end  begin = ; end  begin A = ; end  begin A = + ; end  begin A = B + C; end  Begin A = B + C; end  begin A = B + C; = end  begin A = B + C; B = end  begin A = B + C; B = end  begin A = B + C; B = C end 1-14

Derivation

• Every string of symbols in the derivation is a sentential form • A sentence is a sentential form that has only terminal symbols • A leftmost derivation is one in which the leftmost nonterminal in each sentential form is the one that is expanded • A derivation can be rightmost or neither leftmost nor rightmost • Derivation order has no effect on the language generated by a grammar Copyright © 2006 Addison-Wesley. All rights reserved.

1-15

Another Example

Grammar =  A | B | C + | * | () | Derivation of “A = B * (A + C )” => = =>A = => A = * => A = B * => A = B * () => A = B * ( + ) => A = B * (A + ) = > A = B * (A + ) => A = B * ( A + C ) Copyright © 2006 Addison-Wesley. All rights reserved.

1-16

Parse Tree

• A hierarchical representation of a derivation

A = * B ( )

Copyright © 2006 Addison-Wesley. All rights reserved.

+ A C

1-17

Ambiguity in Grammars

• A grammar is

ambiguous

if and only if it generates a sentential form that has two or more distinct parse trees Copyright © 2006 Addison-Wesley. All rights reserved.

1-18

An Ambiguous Expression Grammar

a.

A -> = -> A|B|C -> + | * |() | = + * B C

Copyright © 2006 Addison-Wesley. All rights reserved.

A

A = B + C * A

b.

= A * + A B C

1-19

Operator Precedence

• An operator in an arithmetic expression is generated lower in the parse tree (and therefore must be evaluated first) can be used to indicate that it has precedence over an operator produced higher up in the tree • As in previous slides – Tree a: A = B + (C * A) – Tree b: A = (B + C ) * A Copyright © 2006 Addison-Wesley. All rights reserved.

1-20

An Unambiguous Expression Grammar

-> = - > A | B | C - > + | - > * | - >() |

A = + * A B C

Copyright © 2006 Addison-Wesley. All rights reserved.

1-21

Leftmost and rightmost derivations

leftmost: => = =>A =

A = +

A = +

        

A = + A = + A = B + A = B + * A = B + * A = B + * A = B + C * A = B + C * A = B + C * A rightmost: => = => = + => = + * => = + * => = + *A => = + * A

= + * A

= + C * A

= + C * A

= + C * A

= + C * A

= B + C * A

A = B + C * A Every derivation with an unambiguous grammar has a unique parse tree, although that tree can be represented by different derivations.

Copyright © 2006 Addison-Wesley. All rights reserved.

1-22

3.3.2 Extended BNF

• Optional parts are placed in brackets [ ]

-> if () [else ]

• Repetitions (0 or more) are placed inside braces { }

-> {, }

• When a single element must be chosen from a group, the options are placed in parentheses and separted by the OR operator, |.

-> (* | / | %)

Copyright © 2006 Addison-Wesley. All rights reserved.

1-23

BNF and EBNF

• BNF + | - | * | / | • EBNF {(+ | -) } {(* | /) } Copyright © 2006 Addison-Wesley. All rights reserved.

1-24

EBNF variations

• In place of the arrow, a colon is used and the RHS is placed on the next line • Instead of a vertical bar to separate alternative RHSs, they are simply placed on separate lines • In place of squared brackets to indicate something being optional, the subscript opt is used. E.g.

– ConstructorDeclarator -> SimpleName(FormalParameterList opt ) • Rather than using the | symbol in a parenthesized list of elements to indicate a choice, the words “one of” are used. E.g. – AssignmentOperator -> one of = *= /= %= += -= <<= >>= &= |= Copyright © 2006 Addison-Wesley. All rights reserved.

1-25