Course Overview Mooly Sagiv [email protected] Schrierber 317 03-640-7606 Wed 10:00-12:00 html://www.math.tau.ac.il/~msagiv/courses/wcc.html Textbook:Modern Compiler Implementation in C Andrew Appel.

Download Report

Transcript Course Overview Mooly Sagiv [email protected] Schrierber 317 03-640-7606 Wed 10:00-12:00 html://www.math.tau.ac.il/~msagiv/courses/wcc.html Textbook:Modern Compiler Implementation in C Andrew Appel.

Course Overview
Mooly Sagiv
[email protected]
Schrierber 317
03-640-7606
Wed 10:00-12:00
html://www.math.tau.ac.il/~msagiv/courses/wcc.html
Textbook:Modern Compiler Implementation in C
Andrew Appel
Outline
• High level programming languages
• Interpreter vs. Compiler
• Abstract Machines
• Why study compilers?
• Main Compiler Phases
High Level Programming Languages
• Imperative
– Algol, PL1, Fortran, Pascal, Ada, Modula, and C
– Closely related to ``von Neumann'' Computers
• Object-oriented
– Simula, Smalltalk, Modula3, C++, Java
– Data abstraction and ‘evolutionary’
form of program development
• Class An implementation of an abstract data type (data+code)
• Objects Instances of a class
• Fields Data (structure fields)
• Methods Code (procedure with overloading)
• Inheritance Refining the functionality of a class with different fields
and methods
• Functional
• Logic Programming
Other Languages
• Hardware description languages
– VHDL
– The program describes Hardware components
– The compiler generates hardware layouts
• Shell-languages Shell, C-shell, REXX
– Include primitives constructs from the current software
environment
• Graphics and Text processing
TeX, LaTeX, postscript
– The compiler generates page layouts
• Web/Internet
– HTML, MAWL, Telescript, JAVA
• Intermediate-languages
– P-Code, Java bytecode, IDL
Interpreter
• Input
– A program
– An input for the program
• Output
– The required output
source-program
program’s input
interpreter
program’s input
Example
scanf(“%d”, &x);
x=x+1;
printf(“%d”, x);
5
C interpreter
6
Compiler
• Input
– A program
• Output
– An object program that reads the input and
writes the output
source-program
compiler
program’s input
object-program
program’s input
Example
scanf(“%d”, &x);
x=x+1;
printf(“%d”, x);
Sparc-cc-compiler
add %fp,-8, %l1
mov %l1, %o1
call scanf
ld
[%fp-8],%l0
add %l0,1,%l0
st %l0,[%fp-8]
ld [%fp-8], %l1
mov %l1, %o1
call printf
assembler/linker
5
object-program
6
Interpreter vs. Compiler
• Conceptually simpler
(the definition of the
programming
language)
• Easier to port
• Can provide more
specific error report
• Normally faster
• More efficient
– Compilation is done once
for all the inputs --- many
computations can be
performed at compile-time
– Sometimes even
compile-time + execution-time <
interpretation-time
• Can report errors before
input is given
Interpreters provide specific error report
• Input-program scanf(“%d”, &y);
if (y < 0)
x = 5;
...
if (y <= 0)
z = x + 1;
• Input data y=0
Compilers are usually more efficient
scanf(“%d”, &x);
y=5;
z=7;
x = x +y*z;
printf(“%d”, x);
Sparc-cc-compiler
add %fp,-8, %l1
mov %l1, %o1
call scanf
mov 5, %l0
st %l0,[%fp-12]
mov 7,%l0
st %l0,[%fp-16]
ld
[%fp-8], %l0
ld [%fp-8],%l0
add %l0, 35 ,%l0
st %l0,[%fp-8]
ld [%fp-8], %l1
mov %l1, %o1
call printf
Compilers provide errors before
actual input is given
• Input-program
int a[100], x, y ;
scanf(“%d”, y) ;
if (y < 0)
/* line 4*/
y=a;
• Compiler-Output
“line 4: improper pointer/integer combination: op =''
Compilers provide errors before
actual input is given
• Input-program
scanf(“%”, y);
if (y < 0)
x = 5;
...
if (y <= 0)
/* line 88 */ z = x + 1;
• Compiler-Output
“line 88: x may be used before set''
Abstract Machines
•
•
•
•
A compromise between compilers and interpreters
An intermediate program representation
The intermediate representation is interpreted
Example: Zurich P4 Pascal Compiler(1981)
Pascal Program
Pascal compiler
P-code
program’s input
interpreter
program’s input
• Other examples, Algol object code, Java bytecode
• The intermediate code can be compiled
Why Study Compilers
• Become a compiler writer
– New programming languages
– New machines
– New compilation modes: ``just-in-time'', ``run-time-codegeneration'’, “binary-translation”
• Using some of the techniques in other contexts
• Design a very big software program using a reasonable
effort
• Learn applications of many CS results (formal languages,
decidability, graph algorithms, dynamic programming, ...
• Better understating of programming languages and machine
architectures
• Become a better programmer
Course Requirements
• Theoretical assignments 10%
– 3 assignments
• Compiler Project 50%
– Develop a Tiger compiler in teams
• Final Exam 40%
Compiler Phases
• The compiler program is usually written as sequence of
• well defined phases
• The interfaces between the phases is well defined (another
language)
• It is sometimes convenient to use auxiliary global
information (e.g., symbol table)
• Advantages of the phase separation:
– Modularity
– Simplicity
– Reusabilty
Basic Compiler Phases
Source program (string)
Finite automata
lexical analysis
Tokens
syntax analysis
Pushdown automata
Abstract syntax tree
semantic analysis
Memory organization
Translate
Intermediate representation
Instruction selection
Register Allocation
Dynamic programming
Assembly
graph algorithms
Fin. Assembly
Example:straight-line programming
Stm ::=Stm ; Stm
Stm ::=id := Exp
Stm ::= print (ExpList)
Exp ::= id
Exp ::= num
Exp ::= Exp Binop Exp
Exp ::= (Stm, Exp)
ExpList ::= Exp, ExpList
ExpList ::= Exp
Binop ::= +
Binop ::= Binop ::= *
Binop ::= /
//(CompoundStm)
// (AssignStm)
// (PrintStm)
// (IdExp)
// (NumExp)
// (OpExp)
// (EseqExp)
// (PairExpList)
// (LastExpList)
// (Plus)
// (Minus)
// (Times)
// (Div)
Lexical Analysis
• Input string
a\b := 5 + 3 ;\nb := (print(a, a-1), 10 * a) ;\nprint(b)
• Tokens
id (“a”) assign num (5) + num(3) ;
id(“b”) assign
print(id(“a”) , id(“a”) - num(1)), num(10) * id(“a”)) ;
print(id(“b”))
Syntax Analysis
• Tokens
•
id (“a”) assign num (5) + num(3) ;
id(“b”) assign
print(id(“a”) , id(“a”) - num(1)), num(10) * id(“a”)) ;
print(id(“b”))
CompoundStm
Abstract Syntax tree
CompoundStm
AssignStm
AssignStm
opExp
id
a
numExp Plus numExp
b
5
3
eseqExp
opExp
id
PrintStm
Summary
• Phases drastically simplifies the problem of
writing a good compiler
• The Textbook offers a reasonable partition
into phases with interface definition (in C)
• In the next meeting we will learn the details
of the rest of the phases