Course Overview Mooly Sagiv [email protected] Schrierber 317 03-640-7606 Wed 10:00-12:00 html://www.math.tau.ac.il/~msagiv/courses/wcc.html Textbook:Modern Compiler Implementation in C Andrew Appel.
Download ReportTranscript Course Overview Mooly Sagiv [email protected] Schrierber 317 03-640-7606 Wed 10:00-12:00 html://www.math.tau.ac.il/~msagiv/courses/wcc.html Textbook:Modern Compiler Implementation in C Andrew Appel.
Course Overview Mooly Sagiv [email protected] Schrierber 317 03-640-7606 Wed 10:00-12:00 html://www.math.tau.ac.il/~msagiv/courses/wcc.html Textbook:Modern Compiler Implementation in C Andrew Appel Outline • High level programming languages • Interpreter vs. Compiler • Abstract Machines • Why study compilers? • Main Compiler Phases High Level Programming Languages • Imperative – Algol, PL1, Fortran, Pascal, Ada, Modula, and C – Closely related to ``von Neumann'' Computers • Object-oriented – Simula, Smalltalk, Modula3, C++, Java – Data abstraction and ‘evolutionary’ form of program development • Class An implementation of an abstract data type (data+code) • Objects Instances of a class • Fields Data (structure fields) • Methods Code (procedure with overloading) • Inheritance Refining the functionality of a class with different fields and methods • Functional • Logic Programming Other Languages • Hardware description languages – VHDL – The program describes Hardware components – The compiler generates hardware layouts • Shell-languages Shell, C-shell, REXX – Include primitives constructs from the current software environment • Graphics and Text processing TeX, LaTeX, postscript – The compiler generates page layouts • Web/Internet – HTML, MAWL, Telescript, JAVA • Intermediate-languages – P-Code, Java bytecode, IDL Interpreter • Input – A program – An input for the program • Output – The required output source-program program’s input interpreter program’s input Example scanf(“%d”, &x); x=x+1; printf(“%d”, x); 5 C interpreter 6 Compiler • Input – A program • Output – An object program that reads the input and writes the output source-program compiler program’s input object-program program’s input Example scanf(“%d”, &x); x=x+1; printf(“%d”, x); Sparc-cc-compiler add %fp,-8, %l1 mov %l1, %o1 call scanf ld [%fp-8],%l0 add %l0,1,%l0 st %l0,[%fp-8] ld [%fp-8], %l1 mov %l1, %o1 call printf assembler/linker 5 object-program 6 Interpreter vs. Compiler • Conceptually simpler (the definition of the programming language) • Easier to port • Can provide more specific error report • Normally faster • More efficient – Compilation is done once for all the inputs --- many computations can be performed at compile-time – Sometimes even compile-time + execution-time < interpretation-time • Can report errors before input is given Interpreters provide specific error report • Input-program scanf(“%d”, &y); if (y < 0) x = 5; ... if (y <= 0) z = x + 1; • Input data y=0 Compilers are usually more efficient scanf(“%d”, &x); y=5; z=7; x = x +y*z; printf(“%d”, x); Sparc-cc-compiler add %fp,-8, %l1 mov %l1, %o1 call scanf mov 5, %l0 st %l0,[%fp-12] mov 7,%l0 st %l0,[%fp-16] ld [%fp-8], %l0 ld [%fp-8],%l0 add %l0, 35 ,%l0 st %l0,[%fp-8] ld [%fp-8], %l1 mov %l1, %o1 call printf Compilers provide errors before actual input is given • Input-program int a[100], x, y ; scanf(“%d”, y) ; if (y < 0) /* line 4*/ y=a; • Compiler-Output “line 4: improper pointer/integer combination: op ='' Compilers provide errors before actual input is given • Input-program scanf(“%”, y); if (y < 0) x = 5; ... if (y <= 0) /* line 88 */ z = x + 1; • Compiler-Output “line 88: x may be used before set'' Abstract Machines • • • • A compromise between compilers and interpreters An intermediate program representation The intermediate representation is interpreted Example: Zurich P4 Pascal Compiler(1981) Pascal Program Pascal compiler P-code program’s input interpreter program’s input • Other examples, Algol object code, Java bytecode • The intermediate code can be compiled Why Study Compilers • Become a compiler writer – New programming languages – New machines – New compilation modes: ``just-in-time'', ``run-time-codegeneration'’, “binary-translation” • Using some of the techniques in other contexts • Design a very big software program using a reasonable effort • Learn applications of many CS results (formal languages, decidability, graph algorithms, dynamic programming, ... • Better understating of programming languages and machine architectures • Become a better programmer Course Requirements • Theoretical assignments 10% – 3 assignments • Compiler Project 50% – Develop a Tiger compiler in teams • Final Exam 40% Compiler Phases • The compiler program is usually written as sequence of • well defined phases • The interfaces between the phases is well defined (another language) • It is sometimes convenient to use auxiliary global information (e.g., symbol table) • Advantages of the phase separation: – Modularity – Simplicity – Reusabilty Basic Compiler Phases Source program (string) Finite automata lexical analysis Tokens syntax analysis Pushdown automata Abstract syntax tree semantic analysis Memory organization Translate Intermediate representation Instruction selection Register Allocation Dynamic programming Assembly graph algorithms Fin. Assembly Example:straight-line programming Stm ::=Stm ; Stm Stm ::=id := Exp Stm ::= print (ExpList) Exp ::= id Exp ::= num Exp ::= Exp Binop Exp Exp ::= (Stm, Exp) ExpList ::= Exp, ExpList ExpList ::= Exp Binop ::= + Binop ::= Binop ::= * Binop ::= / //(CompoundStm) // (AssignStm) // (PrintStm) // (IdExp) // (NumExp) // (OpExp) // (EseqExp) // (PairExpList) // (LastExpList) // (Plus) // (Minus) // (Times) // (Div) Lexical Analysis • Input string a\b := 5 + 3 ;\nb := (print(a, a-1), 10 * a) ;\nprint(b) • Tokens id (“a”) assign num (5) + num(3) ; id(“b”) assign print(id(“a”) , id(“a”) - num(1)), num(10) * id(“a”)) ; print(id(“b”)) Syntax Analysis • Tokens • id (“a”) assign num (5) + num(3) ; id(“b”) assign print(id(“a”) , id(“a”) - num(1)), num(10) * id(“a”)) ; print(id(“b”)) CompoundStm Abstract Syntax tree CompoundStm AssignStm AssignStm opExp id a numExp Plus numExp b 5 3 eseqExp opExp id PrintStm Summary • Phases drastically simplifies the problem of writing a good compiler • The Textbook offers a reasonable partition into phases with interface definition (in C) • In the next meeting we will learn the details of the rest of the phases