Compilation - ort.org.il

Download Report

Transcript Compilation - ort.org.il

Lexical Analyzer

Lecturer: Esti Stein

winter 2006-7

brd4.ort.org.il/~esti2

61102 Compilers Software Eng. Dept. – Ort Braude

What is a lexical analyzer?

Source Program Stream of (token, value) pairs Symbol table Read in characters and group them into tokens.

[most of the compilation time is spent on lexical analysis].

winter 2006-7

61102 Compilers Software Eng. Dept. – Ort Braude

Why using a lexical analyzer?

• Modular design – partitioning the compiler to independent parts.

• The parser is dealing with words (not characters).

• Isolate character set dependencies: –

ASCII versus EBCDIC

• Isolate representation of symbols: –

< > versus != , { } versus begin..end

winter 2006-7

61102 Compilers Software Eng. Dept. – Ort Braude

A token is:

A place holder for logical entity: • keywords • constants • operators • punctuation • Identifiers Not white spaces and comments.

winter 2006-7

61102 Compilers Software Eng. Dept. – Ort Braude

Example of tokenizing

if( val1 + val2 >= 6.5) winter 2006-7

token

if ( val1 + val2 >= 6.5

) todo = false ; todo = false;

token#

10 20 50 1 50

value

val1 + val2 2 51 21 >= 6.5

50 3 todo 50 false 4

61102 Compilers Software Eng. Dept. – Ort Braude comment

keyword Left parenth.

identifier Add op.

identifier Relational op.

Float const.

Right parenth.

identifier Assign op.

identifier seperator

Example [program]:

token Getoken( ) { SkipWhiteSpace( ); c = getchar( ); if( isletter(c )) return( ScanForIdentifier( ) ); if( isdigit(c )) return( ScanForConstant( ) ); switch( c) { case ‘(‘: return( LEFT_PAREN); case ‘)‘: return( RIGHT_PAREN); case ‘+’: return( ScanForAddOrIncrement( )); case ‘=‘: return( ScanForAssignOrEqual( )); case ‘/’: return( ScanForCommentOrDivide( )); … default: return( ERROR); } } winter 2006-7

61102 Compilers Software Eng. Dept. – Ort Braude

Automating:

Most tokens can be easily defined by a regular grammar: • the user defines tokens in a form equivalent to regular grammar • the system converts the grammar into code.

Variety of tools – lex, flex ..

winter 2006-7

61102 Compilers Software Eng. Dept. – Ort Braude

Regular Expressions & Automata

See at the “Technion” tutorial – about automata.

winter 2006-7

61102 Compilers Software Eng. Dept. – Ort Braude

Exercise 1:

A real number consists of two parts: • The integer part, consisting of one or more digits. A number may not begin with a zero, unless the integer part is just zero.

• The decimal part, consisting of a decimal point followed by one or more digits.

Construct a regular expression for real numbers. winter 2006-7

61102 Compilers Software Eng. Dept. – Ort Braude

Converting an NDFA to a DFA

State a b

S S,A S A B C D C error C D,F error D,F B,C D,F Convert to DFA… winter 2006-7

61102 Compilers Software Eng. Dept. – Ort Braude

Converting an NDFA to a DFA[2]

State a b

S SA S SA SAC S SAC SBC SAC SAC SBC SBCDF=F winter 2006-7

61102 Compilers Software Eng. Dept. – Ort Braude

S: SA: SAC:

The Code

c = getchar( ); if( c = = ‘a’) goto SA; if( c = = ‘b’) goto S; error( ); c = getchar( ); if( c = = ‘a’) goto SAC; if( c = = ‘b’) goto S; error( ); c = getchar( ); if( c = = ‘a’) goto SAC; if( c = = ‘b’) goto SBC; error( ); … winter 2006-7

61102 Compilers Software Eng. Dept. – Ort Braude

The Code[2]

token LexicalDriver( LexTable) { state = laststate; for(;;) { c = NextChar( ); state = LexTable[ state, c]; if( state != error && state != finalstate) { AddToToken( c); AdvanceInput( ); } else break; } if( state != finalstate) return( ERROR); else return( Token[ finalstate]); } winter 2006-7

61102 Compilers Software Eng. Dept. – Ort Braude

Output Lexical Errors

• A compiler produce a listing of the compiled program + error messages – near the locations of the errors.

• The errors are queued and printed once a new-line is reached.

• Two ways for recover: – Ignore erroneous token, and start new token.

– Delete the 1 st char. Read and start re-reading the input. (complicate!) • Be careful not to propagate error messages!

winter 2006-7

61102 Compilers Software Eng. Dept. – Ort Braude

LEX – the Lexical Analyzer

See at the “Technion” tutorial – about the Lex. winter 2006-7

61102 Compilers Software Eng. Dept. – Ort Braude