Transcript Scan Gen

ScanGen
1
ScanGen




Scangen accepts descriptions of tokens
written as regular
produces tables for a finite automata driver
program
written by Gary Sevitsky in spring 1979 and
modified and enhanced by Robert Gray in
spring 1980
Later changes were made by Charles Fischer
in 1981 and 1982
2
ScanGen
User defines the input to ScanGen in the form of a file with three sections:
 Options,


Character Classes,
Token Definitions:
Token name {minor,major} = regular expression


Regular expression can include except clauses, and
{Toss} attributes
Example of ScanGen input:
textbook page 61: extended Micro
3
Options Section



The is optional
Followed by one or more option names
(which are not reserved)
The option names may appear in any
order, separated by blanks or commas
4
Class Definitions


The specify the character classes that make
up the alphabet used by the regular
expressions.
The character classes are sets of ASCII
characters, which are defined, as in the
example, by using single characters within
quotes or by using ranges of characters.
5
Regular Expression definitions

Specify the tokens, are built using the
character classes and the following operations





positive closure ("+")
Kleene closure ("*")
concatenation (".")
union (",")
Precedences can be overridden by use of parentheses
6
Token Number


Major token number
 Define a token class
Minor token number
 Specify the member of
that class.
 If not specified



default value "0".
Token numbers
 Must be non-negative integers
 Same token number may be used for different tokens
Tokens that are to be deleted (comments, spaces, etc)
 Are assigned a major token number of "0".
7
"NOT" operation

Used in


Only be used to


definitions of "StrConst"
and "RunOnString".
complement a union of character classes.
The complement is taken relative to the
classes specified in the class definitions.

character class "EPSILON" stays out of complements.
8
"TOSS" feature

Tell the scanner


If a character is not to be appended


put a "{TOSS}" after the name of its character class in
the token definition.
A "{TOSS}" may only appear


whether or not to append a character to the token string
it is building.
after the name of a character class or after "NOT(...)".
Careless use of the TOSS feature

can lead to a toss/save conflict
9
"TOSS" feature

For example

a toss/save conflict would occur if "StrConst" were
defined by:

Quote{TOSS} . (NOT(Quote, Linefeed),
Quote.Quote{TOSS})*. Quote{TOSS}

This conflict can be seen by comparing scanner
actions on the strings

'a' and 'a''b'.
10
Example 1










OPTIONS
tables,list
CLASS
letter = 'A'..'Z', 'a'..'z';
digit = '0'..'9';
blank = ' ';
DEFINITION
TOKEN emptyspace {0} = blank+;
TOKEN identifier {1} = letter.(letter, digit)*;
TOKEN number {2} = digit+;
11
Example 2

OPTIONS
List, tables
CLASS
E
= 'E', 'e';
OtherLetter = 'A'..'D','F'..'Z','a'..'d','f'..'z';
Digit
= '0'..'9';
Blank
= ' ';
Dot
= '.';
Plus
= '+';
Minus
= '-';
Quote
= '''';
Linefeed = 10;
12
Example 2

DEFINITION
TOKEN EmptySpace {0} = (Blank, Linefeed)+;
Letter = E, OtherLetter;
TOKEN Identifier {1} = Letter.(Letter,Digit)*
EXCEPT
'BEGIN' {4},
'END' {5};
TOKEN IntConst {2,1} = Digit+;
TOKEN RealConst {2,2} = IntConst.Dot.IntConst.
(EPSILON, E.(EPSILON, Plus, Minus).IntConst);
TOKEN StrConst {2,3} = Quote{TOSS}.
(NOT(Quote, Linefeed),Quote{TOSS}.Quote)*.
Quote{TOSS};
TOKEN RunOnString {3} = Quote{TOSS}.
(NOT(Quote, Linefeed), Quote{TOSS}.Quote)*.
Linefeed{TOSS};
13
ScanGen Driver

The driver routine provides the actual
scanner routine, which is called by the
parser.
void scanner(codes *major,
codes *minor,
char *token_text)

It reads the input character stream, and
drives the finite automata, using the tables
generated by ScanGen, and returns the
found token.
14
ScanGen Tables

The finite automata table has the form
next_state[NUMSTATES][NUMCHARS]

In addition, an action table tells the driver
when a complete token is recognized and what
to do with the “lookahead” character:
action[NUMSTATES][NUMCHARS]
15
Action Table

The action table has 6 possible values:
ERROR
scan error.
MOVEAPPEND
current_token += ch and go on.
MOVENOAPPEND
discard ch and go on.
HALTAPPEND
current_token += ch, token found, return it.
HALTNOAPPEND
discard ch, token found, return it.
HALTREUSE
save ch for later reuse, token found, return it.
Driver program on textbook pages 65,66
16
Output tables

This file consists of the following five sections:





Section 1: Parameters for the Scanner
Section 2: Character Class Mapping
Section 3: Reserved Word to Token Mapping.
Section 4: Reserved Word List
Section 5: Transition Table of the Minimal
Deterministic Finite Automaton.
17
18
Execute ScanGen
1.Download the SCANGEN.zip and expand into cygwin
/usr/src/scangen directory
2. Run
./scangen.exe < adacs.scan
3. Type Tables when show
s c a n g e n -- automatic lexical analyzer generator version 2.0
(12/82)
options used for this run: tables, optimize
construction of finite automaton completed
Output file `Tables': Tables
19
Complie and Test Scanner



1.Download the scanner.example.rar and expand into
cygwin /usr/src/scanner.example directory
2. Copy Tables file from /usr/src/scangen into
/usr/src/scanner.example
3.compile with makefile


4. run a.exe


make
./a
5. type source file

test
20