No Slide Title
Download
Report
Transcript No Slide Title
Chapter 1: Introduction to Compiling
CSE244
Aggelos Kiayias
Computer Science & Engineering Department
The University of Connecticut
371 Fairfield Road, Box U-1155
Storrs, CT 06269-3155
[email protected]
http://www.cse.uconn.edu/~akiayias
Notes Credits:
Steven A. Demurjian
CSE, UCONN
Robert LaBarre
United Technologies Research Center
CH1.1
Introduction to Compilers
CSE244
As a Discipline, Involves Multiple CS&E Areas
Programming Languages and Algorithms
Theory of Computing & Software Engineering
Computer Architecture & Operating Systems
Has Deceivingly Simplistic Intent:
Source
program
Compiler
Target
Program
Error messages
Diverse & Varied
CH1.2
Classifications of Compilers
CSE244
Compilers Viewed from Many Perspectives
Single Pass
Multiple Pass
Construction
Load & Go
Debugging
Optimizing
Functional
However, All utilize same basic tasks to
accomplish their actions
CH1.3
The Model
The TWO Fundamental Parts:
Analysis: Decompose Source into an
intermediate representation
CSE244
Synthesis: Target program generation
from representation
We Will Discuss Both in This Class, and
FOCUS on analysis.
CH1.4
Important Notes
CSE244
Today: There are many Software Tools for helping with the
Analysis Part. This Wasn’t the Case in Early Days. (some)
analysis is also important in:
Structure / Syntax directed editors: Force
“syntactically” correct code to be entered
Pretty Printers: Standardized version for program
structure (i.e., blank space, indenting, etc.)
Static Checkers: A “quick” compilation to detect
rudimentary errors
Interpreters: “real” time execution of code a “line-at-atime”
CH1.5
Important Notes
CSE244
Compilation Is Not Limited to Programming Language
Applications
Text Formatters
LATEX & TROFF Are Languages Whose
Format Text
Commands
Silicon Compilers
Textual / Graphical: Take Input and Generate Circuit Design
Database Query Processors
Database Query Languages Are Also a Programming
Language
Input is compiled Into a Set of Operations for Accessing the
Database
CH1.6
The Many Phases of a Compiler
Source Program
1
CSE244
2
3
Symbol-table
Manager
Syntax Analyzer
Semantic Analyzer
Error Handler
4
5
6
1, 2, 3 : Analysis - Our Focus
4, 5, 6 : Synthesis
Lexical
Analyzer
Intermediate
Code Generator
Code Optimizer
Code Generator
Target Program
CH1.7
Language-Processing System
Source Program
1
CSE244
2
3
4
5
Pre-Processor
Compiler
Assembler
Relocatable
Machine Code
Loader
Link/Editor
Library,
relocatable
object files
Executable
CH1.8
The Analysis Task For Compilation
CSE244
Three Phases:
Linear / Lexical Analysis:
L-to-r Scan to Identify Tokens
token: sequence of chars having a collective meaning
Hierarchical Analysis:
Grouping of Tokens Into Meaningful Collection
Semantic Analysis:
Checking to ensure Correctness of Components
CH1.9
Phase 1. Lexical Analysis
CSE244
Easiest Analysis - Identify tokens which
are the basic building blocks
For
Example: Position := initial + rate * 60 ;
_______ __ _____ _ ___ _ __ _
All are tokens
Blanks, Line breaks, etc. are scanned out
CH1.10
Phase 2. Hierarchical Analysis
aka Parsing or Syntax Analysis
assignment
statement
CSE244
identifier
position
:=
expression
identifier
initial
For previous example,
we would have
Parse Tree:
expression
+
expression
*
expression
expression
identifier
rate
number
60
Nodes of tree are constructed using a grammar for the language
CH1.11
What is a Grammar?
Grammar is a Set of Rules Which Govern the
Interdependencies & Structure Among the Tokens
CSE244
statement
is an
assignment statement, or
while statement, or if
statement, or ...
assignment statement is an
identifier := expression ;
expression
(expression), or expression +
expression, or expression *
expression, or number, or
identifier, or ...
is an
CH1.12
Why Have We Divided Analysis
in This Manner?
CSE244
Lexical Analysis - Scans Input, Its Linear Actions
Are Not Recursive
Identify Only Individual “words” that are the
the Tokens of the Language
Recursion Is Required to Identify Structure of an
Expression, As Indicated in Parse Tree
Verify that the “words” are Correctly
Assembled into “sentences”
What is Third Phase?
Determine Whether the Sentences have One
and Only One Unambiguous Interpretation
… and do something about it!
e.g. “John Took Picture of Mary Out on the
Patio”
CH1.13
Phase 3. Semantic Analysis
CSE244
Find More Complicated Semantic Errors and
Support Code Generation
Parse Tree Is Augmented With Semantic Actions
:=
:=
position
position
+
initial
initial
*
rate
+
60
*
rate inttoreal
60
Compressed Tree
Conversion Action
CH1.14
Phase 3. Semantic Analysis
CSE244
Most Important Activity in This Phase:
Type Checking - Legality of Operands
Many Different Situations:
Real := int + char ;
A[int] := A[real] + int ;
while char <> int
do
…. Etc.
CH1.15
Supporting Phases/
Activities for Analysis
CSE244
Symbol Table Creation / Maintenance
Contains Info (storage, type, scope, args) on
Each “Meaningful” Token, Typically Identifiers
Data Structure Created / Initialized During
Lexical Analysis
Utilized / Updated During Later Analysis &
Synthesis
Error Handling
Detection of Different Errors Which
Correspond to All Phases
What Kinds of Errors Are Found During the
Analysis Phase?
What Happens When an Error Is Found?
CH1.17
The Many Phases of a Compiler
Source Program
1
CSE244
2
3
Symbol-table
Manager
Syntax Analyzer
Semantic Analyzer
Error Handler
4
5
6
1, 2, 3 : Analysis - Our Focus
4, 5, 6 : Synthesis
Lexical
Analyzer
Intermediate
Code Generator
Code Optimizer
Code Generator
Target Program
CH1.18
The Synthesis Task For Compilation
Intermediate Code Generation
CSE244
Abstract Machine Version of Code Independent of Architecture
Easy to Produce and Do Final, Machine
Dependent Code Generation
Code Optimization
Find More Efficient Ways to Execute Code
Replace Code With More Optimal Statements
2-approaches: High-level Language &
“Peephole” Optimization
Final Code Generation
Generate Relocatable Machine Dependent Code
CH1.19
Reviewing the Entire Process
position := initial + rate * 60
lexical analyzer
id1 := id2 + id3 * 60
CSE244
syntax analyzer
:=
+
id1
id2l
*
id3
60
semantic analyzer
:=
Symbol
Table
position ....
+
id1
id2l
*
id3
inttoreal
60
initial ….
rate….
intermediate code generator
E
r
r
o
r
s
CH1.20
Reviewing the Entire Process
Symbol Table
CSE244
position ....
initial ….
rate….
intermediate code generator
temp1 := inttoreal(60)
temp2 := id3 * temp1
temp3 := id2 + temp2
id1 := temp3
E
r
r
o
r
s
3 address code
code optimizer
temp1 := id3 * 60.0
id1 := id2 + temp1
final code generator
MOVF id3, R2
MULF #60.0, R2
MOVF id2, R1
ADDF R1, R2
MOVF R1, id1
CH1.21
Assemblers
CSE244
Assembly code: names are used for instructions,
and names are used for memory addresses.
MOV a, R1
ADD #2, R1
MOV R1, b
Two-pass Assembly:
First Pass: all identifiers are assigned to
memory addresses (0-offset)
e.g. substitute 0 for a, and 4 for b
Second Pass: produce relocatable machine
code:
0001 01 00 00000000 *
0011 01 10 00000010
0010 01 00 00000100 *
relocation
bit
CH1.22
Loaders and Link-Editors
Loader: taking relocatable machine code, altering
the addresses and placing the altered instructions
into memory.
Link-editor: taking many (relocatable) machine
code programs (with cross-references) and produce
a single file.
Need to keep track of correspondence between
variable names and corresponding addresses in
each piece of code.
CSE244
CH1.23
Compiler Cousins: Preprocessors
Provide Input to Compilers
1. Macro Processing
CSE244
#define in C: does text substitution before
compiling
#define X 3
#define Y A*B+C
#define Z getchar()
CH1.24
2. File Inclusion
#include in C - bring in another file before compiling
CSE244
defs.h
//////
//////
//////
main.c
#include “defs.h”
…---…---…--…---…---…--…---…---…---
//////
//////
//////
…---…---…--…---…---…--…---…---…---
CH1.25
3. Rational Preprocessors
CSE244
Augment “Old” Languages With Modern
Constructs
Add Macros for If - Then, While, Etc.
#Define Can Make C Code More Pascal-like
#define begin {
#define end
}
#define then
CH1.26
4. Language Extensions for a
Database System
EQUEL - Database query language embedded in C
CSE244
## Retrieve (DN=Department.Dnum) where
##
is
Department.Dname = ‘Research’
Preprocessed
into:
ingres_system(“Retr…..Research’”,____,____);
a procedure call in a programming language.
CH1.27
The Grouping of Phases
CSE244
Front End : Analysis + Intermediate Code Generation
vs.
Back End : Code Generation + Optimization
Number of Passes:
A pass: requires r/w intermediate files
Fewer passes: more efficiency.
However: fewer passes require more
sophisticated memory management and
compiler phase interaction.
Tradeoffs ……..
CH1.28
Compiler Construction Tools
CSE244
Parser Generators : Produce Syntax
Analyzers
Scanner Generators : Produce Lexical
Analyzers
Syntax-directed Translation Engines :
Generate Intermediate Code
Automatic Code Generators : Generate
Actual Code
Data-Flow Engines : Support Optimization
CH1.29