Transcript Compilation - ort.org.il
Lexical Analyzer
Lecturer: Esti Stein
winter 2006-7
brd4.ort.org.il/~esti2
61102 Compilers Software Eng. Dept. – Ort Braude
What is a lexical analyzer?
Source Program Stream of (token, value) pairs Symbol table Read in characters and group them into tokens.
[most of the compilation time is spent on lexical analysis].
winter 2006-7
61102 Compilers Software Eng. Dept. – Ort Braude
Why using a lexical analyzer?
• Modular design – partitioning the compiler to independent parts.
• The parser is dealing with words (not characters).
• Isolate character set dependencies: –
ASCII versus EBCDIC
• Isolate representation of symbols: –
< > versus != , { } versus begin..end
winter 2006-7
61102 Compilers Software Eng. Dept. – Ort Braude
A token is:
A place holder for logical entity: • keywords • constants • operators • punctuation • Identifiers Not white spaces and comments.
winter 2006-7
61102 Compilers Software Eng. Dept. – Ort Braude
Example of tokenizing
if( val1 + val2 >= 6.5) winter 2006-7
token
if ( val1 + val2 >= 6.5
) todo = false ; todo = false;
token#
10 20 50 1 50
value
val1 + val2 2 51 21 >= 6.5
50 3 todo 50 false 4
61102 Compilers Software Eng. Dept. – Ort Braude comment
keyword Left parenth.
identifier Add op.
identifier Relational op.
Float const.
Right parenth.
identifier Assign op.
identifier seperator
Example [program]:
token Getoken( ) { SkipWhiteSpace( ); c = getchar( ); if( isletter(c )) return( ScanForIdentifier( ) ); if( isdigit(c )) return( ScanForConstant( ) ); switch( c) { case ‘(‘: return( LEFT_PAREN); case ‘)‘: return( RIGHT_PAREN); case ‘+’: return( ScanForAddOrIncrement( )); case ‘=‘: return( ScanForAssignOrEqual( )); case ‘/’: return( ScanForCommentOrDivide( )); … default: return( ERROR); } } winter 2006-7
61102 Compilers Software Eng. Dept. – Ort Braude
Automating:
Most tokens can be easily defined by a regular grammar: • the user defines tokens in a form equivalent to regular grammar • the system converts the grammar into code.
Variety of tools – lex, flex ..
winter 2006-7
61102 Compilers Software Eng. Dept. – Ort Braude
Regular Expressions & Automata
See at the “Technion” tutorial – about automata.
winter 2006-7
61102 Compilers Software Eng. Dept. – Ort Braude
Exercise 1:
A real number consists of two parts: • The integer part, consisting of one or more digits. A number may not begin with a zero, unless the integer part is just zero.
• The decimal part, consisting of a decimal point followed by one or more digits.
Construct a regular expression for real numbers. winter 2006-7
61102 Compilers Software Eng. Dept. – Ort Braude
Converting an NDFA to a DFA
State a b
S S,A S A B C D C error C D,F error D,F B,C D,F Convert to DFA… winter 2006-7
61102 Compilers Software Eng. Dept. – Ort Braude
Converting an NDFA to a DFA[2]
State a b
S SA S SA SAC S SAC SBC SAC SAC SBC SBCDF=F winter 2006-7
61102 Compilers Software Eng. Dept. – Ort Braude
S: SA: SAC:
The Code
c = getchar( ); if( c = = ‘a’) goto SA; if( c = = ‘b’) goto S; error( ); c = getchar( ); if( c = = ‘a’) goto SAC; if( c = = ‘b’) goto S; error( ); c = getchar( ); if( c = = ‘a’) goto SAC; if( c = = ‘b’) goto SBC; error( ); … winter 2006-7
61102 Compilers Software Eng. Dept. – Ort Braude
The Code[2]
token LexicalDriver( LexTable) { state = laststate; for(;;) { c = NextChar( ); state = LexTable[ state, c]; if( state != error && state != finalstate) { AddToToken( c); AdvanceInput( ); } else break; } if( state != finalstate) return( ERROR); else return( Token[ finalstate]); } winter 2006-7
61102 Compilers Software Eng. Dept. – Ort Braude
Output Lexical Errors
• A compiler produce a listing of the compiled program + error messages – near the locations of the errors.
• The errors are queued and printed once a new-line is reached.
• Two ways for recover: – Ignore erroneous token, and start new token.
– Delete the 1 st char. Read and start re-reading the input. (complicate!) • Be careful not to propagate error messages!
winter 2006-7
61102 Compilers Software Eng. Dept. – Ort Braude
LEX – the Lexical Analyzer
See at the “Technion” tutorial – about the Lex. winter 2006-7
61102 Compilers Software Eng. Dept. – Ort Braude