Compiler Construction - Northwestern University

Download Report

Transcript Compiler Construction - Northwestern University

Compiler Construction
Vana Doufexi
[email protected]
office #317 @ CS dept
1
Administrative info
• class webpage
– http://www.cs.northwestern.edu/academics/courses/322
– contains:
• news
• staff information
• lecture notes & other handouts
• homeworks & manuals
• policies, grades
• newsgroup portal
• useful links
2
What is a compiler
• A program that reads a program written in
some language and translates it into a
program written in some other language
– Modula-2 to C
– Java to bytecodes
– COOL to MIPS code
3
Why study compilers?
• Application of a wide range of theoretical
techniques
• Good SW engineering experience
• Better understand languages
4
Features of compilers
• Correctness
– preserve the meaning of the code
• Speed of target code
– vs. speed of compilation?
• Good use of resources (size, power)
• Good error reporting/handling
5
Compiler structure
source
code
Front End
IR
Back End
target
code
• Use intermediate representation
– Why?
6
Compiler Structure
• Front end
– Recognize legal/illegal programs
• report/handle errors
– Generate IR
– The process can be automated
• Back end
– Translate IR into target code
•
•
•
•
instruction selection
register allocation
instruction scheduling
lots of NPC problems -- use approximations
7
Compiler Structure
• Optimization: Middle stage
– goals
• improve running time of generated code
• improve space, power consumption, etc.
– how?
• perform a number of transformations on the IR
• multiple passes
– important: preserve meaning of code
8
The Front End
• Scanning (a.k.a. lexical analysis)
– recognize "words"
• Parsing (a.k.a. syntax analysis)
– check syntax
• Semantic analysis
– examine meaning (e.g. type checking)
• Other issues:
– symbol table (to keep track of identifiers)
– error detection/reporting/recovery
9
The Scanner
• Its job:
– given a character stream, recognize words
(tokens)
• e.g. x = 1 becomes IDENTIFIER EQUAL INTEGER
– collect identifier information
• e.g. IDENTIFIER corresponds to a lexeme (the actual
word x) and its type (acquired from the declaration of
x).
– ignore white space and comments
– report errors
• Good news
– the process can be automated
10
The Parser
• Its job:
– Check and verify syntax based on specified
syntax rules
• e.g. IDENTIFIER LPAREN RPAREN make up an
EXPRESSION.
• Coming soon: how context-free grammars specify
syntax
– Report errors
– Build IR
• often a syntax tree
• Good news
– the process can be automated
11
Semantic analysis
• Its job:
– Check the meaning of the program
• e.g. In x=y, is y defined before being used? Are x and
y declared?
• e.g. In x=y, are the types of x and y such that you can
assign one to the other?
– Meaning may depend on context
– Report errors
12
IRs
• Graphical
– e.g. parse tree, DAG
• Linear
– e.g. three-address code
• Hybrid
– e.g. linear for blocks of straight-line code, a
graph to connect blocks
• Low-level or high-level
13
The scanning process
• Main goal: recognize words
• How? by recognizing patterns
– e.g. an identifier is a sequence of letters or
digits that starts with a letter.
• Lexical patterns form a regular language
• Regular languages are described using
regular expressions (REs)
• Can we create an automatic RE recognizer?
– Yes! (Hold that thought)
14
The scanning process
• Definition: Regular expressions (over alphabet )
–  is an RE denoting {}
– If , then  is an RE denoting {}
– If r and s are REs, then
•
•
•
•
(r) is an RE denoting L(r)
r|s is an RE denoting L(r)L(s)
rs is an RE denoting L(r)L(s)
r* is an RE denoting the Kleene closure of L(r)
• Property: REs are closed under many operations
– This allows us to build complex REs.
15
The scanning process
• Definition: Deterministic Finite Automaton
– a five-tuple (, S, , s0, F) where
•
•
•
•
•
 is the alphabet
S is the set of states
 is the transition function (SS)
s0 is the starting state
F is the set of final states (F  S)
• Notation:
– Use a transition diagram to describe a DFA
• DFAs are equivalent to REs
– Hey! We just came up with a recognizer!
16
The scanning process
• Goal: automate the process
• Idea:
– Start with an RE
– Build a DFA
• How?
– We can build a non-deterministic finite automaton
(Thompson's construction)
– Convert that to a deterministic one
(Subset construction)
– Minimize the DFA
(Hopcroft's algorithm)
– Implement it
• Existing scanner generator: flex
17