No Slide Title

Download Report

Transcript No Slide Title

Chapter 1: Introduction to Compiling
CSE244
Aggelos Kiayias
Computer Science & Engineering Department
The University of Connecticut
371 Fairfield Road, Box U-1155
Storrs, CT 06269-3155
[email protected]
http://www.cse.uconn.edu/~akiayias
Notes Credits:
Steven A. Demurjian
CSE, UCONN
Robert LaBarre
United Technologies Research Center
CH1.1
Introduction to Compilers

CSE244

As a Discipline, Involves Multiple CS&E Areas
 Programming Languages and Algorithms
 Theory of Computing & Software Engineering
 Computer Architecture & Operating Systems
Has Deceivingly Simplistic Intent:
Source
program
Compiler
Target
Program
Error messages
Diverse & Varied
CH1.2
Classifications of Compilers

CSE244
Compilers Viewed from Many Perspectives
Single Pass
Multiple Pass
Construction
Load & Go
Debugging
Optimizing

Functional
However, All utilize same basic tasks to
accomplish their actions
CH1.3
The Model

The TWO Fundamental Parts:
Analysis: Decompose Source into an
intermediate representation
CSE244
Synthesis: Target program generation
from representation

We Will Discuss Both in This Class, and
FOCUS on analysis.
CH1.4
Important Notes

CSE244
Today: There are many Software Tools for helping with the
Analysis Part. This Wasn’t the Case in Early Days. (some)
analysis is also important in:

Structure / Syntax directed editors: Force
“syntactically” correct code to be entered

Pretty Printers: Standardized version for program
structure (i.e., blank space, indenting, etc.)

Static Checkers: A “quick” compilation to detect
rudimentary errors

Interpreters: “real” time execution of code a “line-at-atime”
CH1.5
Important Notes

CSE244
Compilation Is Not Limited to Programming Language
Applications

Text Formatters
 LATEX & TROFF Are Languages Whose
Format Text

Commands
Silicon Compilers
 Textual / Graphical: Take Input and Generate Circuit Design

Database Query Processors
 Database Query Languages Are Also a Programming
Language
 Input is compiled Into a Set of Operations for Accessing the
Database
CH1.6
The Many Phases of a Compiler
Source Program
1
CSE244
2
3
Symbol-table
Manager
Syntax Analyzer
Semantic Analyzer
Error Handler
4
5
6
1, 2, 3 : Analysis - Our Focus
4, 5, 6 : Synthesis
Lexical
Analyzer
Intermediate
Code Generator
Code Optimizer
Code Generator
Target Program
CH1.7
Language-Processing System
Source Program
1
CSE244
2
3
4
5
Pre-Processor
Compiler
Assembler
Relocatable
Machine Code
Loader
Link/Editor
Library,
relocatable
object files
Executable
CH1.8
The Analysis Task For Compilation

CSE244
Three Phases:

Linear / Lexical Analysis:
 L-to-r Scan to Identify Tokens
token: sequence of chars having a collective meaning

Hierarchical Analysis:
 Grouping of Tokens Into Meaningful Collection

Semantic Analysis:
 Checking to ensure Correctness of Components
CH1.9
Phase 1. Lexical Analysis
CSE244
Easiest Analysis - Identify tokens which
are the basic building blocks
For
Example: Position := initial + rate * 60 ;
_______ __ _____ _ ___ _ __ _
All are tokens
Blanks, Line breaks, etc. are scanned out
CH1.10
Phase 2. Hierarchical Analysis
aka Parsing or Syntax Analysis
assignment
statement
CSE244
identifier
position
:=
expression
identifier
initial
For previous example,
we would have
Parse Tree:
expression
+
expression
*
expression
expression
identifier
rate
number
60
Nodes of tree are constructed using a grammar for the language
CH1.11
What is a Grammar?

Grammar is a Set of Rules Which Govern the
Interdependencies & Structure Among the Tokens
CSE244
statement
is an
assignment statement, or
while statement, or if
statement, or ...
assignment statement is an
identifier := expression ;
expression
(expression), or expression +
expression, or expression *
expression, or number, or
identifier, or ...
is an
CH1.12
Why Have We Divided Analysis
in This Manner?

CSE244


Lexical Analysis - Scans Input, Its Linear Actions
Are Not Recursive
 Identify Only Individual “words” that are the
the Tokens of the Language
Recursion Is Required to Identify Structure of an
Expression, As Indicated in Parse Tree
 Verify that the “words” are Correctly
Assembled into “sentences”
What is Third Phase?
 Determine Whether the Sentences have One
and Only One Unambiguous Interpretation
 … and do something about it!
 e.g. “John Took Picture of Mary Out on the
Patio”
CH1.13
Phase 3. Semantic Analysis

CSE244 
Find More Complicated Semantic Errors and
Support Code Generation
Parse Tree Is Augmented With Semantic Actions
:=
:=
position
position
+
initial
initial
*
rate
+
60
*
rate inttoreal
60
Compressed Tree
Conversion Action
CH1.14
Phase 3. Semantic Analysis
CSE244

Most Important Activity in This Phase:

Type Checking - Legality of Operands

Many Different Situations:
Real := int + char ;
A[int] := A[real] + int ;
while char <> int
do
…. Etc.
CH1.15
Supporting Phases/
Activities for Analysis

CSE244

Symbol Table Creation / Maintenance
 Contains Info (storage, type, scope, args) on
Each “Meaningful” Token, Typically Identifiers
 Data Structure Created / Initialized During
Lexical Analysis
 Utilized / Updated During Later Analysis &
Synthesis
Error Handling
 Detection of Different Errors Which
Correspond to All Phases
 What Kinds of Errors Are Found During the
Analysis Phase?
 What Happens When an Error Is Found?
CH1.17
The Many Phases of a Compiler
Source Program
1
CSE244
2
3
Symbol-table
Manager
Syntax Analyzer
Semantic Analyzer
Error Handler
4
5
6
1, 2, 3 : Analysis - Our Focus
4, 5, 6 : Synthesis
Lexical
Analyzer
Intermediate
Code Generator
Code Optimizer
Code Generator
Target Program
CH1.18
The Synthesis Task For Compilation

Intermediate Code Generation

CSE244
Abstract Machine Version of Code Independent of Architecture
Easy to Produce and Do Final, Machine
Dependent Code Generation
Code Optimization
 Find More Efficient Ways to Execute Code
 Replace Code With More Optimal Statements
 2-approaches: High-level Language &
“Peephole” Optimization



Final Code Generation
 Generate Relocatable Machine Dependent Code
CH1.19
Reviewing the Entire Process
position := initial + rate * 60
lexical analyzer
id1 := id2 + id3 * 60
CSE244
syntax analyzer
:=
+
id1
id2l
*
id3
60
semantic analyzer
:=
Symbol
Table
position ....
+
id1
id2l
*
id3
inttoreal
60
initial ….
rate….
intermediate code generator
E
r
r
o
r
s
CH1.20
Reviewing the Entire Process
Symbol Table
CSE244
position ....
initial ….
rate….
intermediate code generator
temp1 := inttoreal(60)
temp2 := id3 * temp1
temp3 := id2 + temp2
id1 := temp3
E
r
r
o
r
s
3 address code
code optimizer
temp1 := id3 * 60.0
id1 := id2 + temp1
final code generator
MOVF id3, R2
MULF #60.0, R2
MOVF id2, R1
ADDF R1, R2
MOVF R1, id1
CH1.21
Assemblers

CSE244
Assembly code: names are used for instructions,
and names are used for memory addresses.
MOV a, R1
ADD #2, R1
MOV R1, b

Two-pass Assembly:
 First Pass: all identifiers are assigned to
memory addresses (0-offset)
e.g. substitute 0 for a, and 4 for b
 Second Pass: produce relocatable machine
code:
0001 01 00 00000000 *
0011 01 10 00000010
0010 01 00 00000100 *
relocation
bit
CH1.22
Loaders and Link-Editors

Loader: taking relocatable machine code, altering
the addresses and placing the altered instructions
into memory.

Link-editor: taking many (relocatable) machine
code programs (with cross-references) and produce
a single file.
 Need to keep track of correspondence between
variable names and corresponding addresses in
each piece of code.
CSE244
CH1.23
Compiler Cousins: Preprocessors
Provide Input to Compilers
1. Macro Processing
CSE244
#define in C: does text substitution before
compiling
#define X 3
#define Y A*B+C
#define Z getchar()
CH1.24
2. File Inclusion
#include in C - bring in another file before compiling
CSE244
defs.h
//////
//////
//////
main.c
#include “defs.h”
…---…---…--…---…---…--…---…---…---
//////
//////
//////
…---…---…--…---…---…--…---…---…---
CH1.25
3. Rational Preprocessors
CSE244

Augment “Old” Languages With Modern
Constructs

Add Macros for If - Then, While, Etc.

#Define Can Make C Code More Pascal-like
#define begin {
#define end
}
#define then
CH1.26
4. Language Extensions for a
Database System
EQUEL - Database query language embedded in C
CSE244
## Retrieve (DN=Department.Dnum) where
##
is
Department.Dname = ‘Research’
Preprocessed
into:
ingres_system(“Retr…..Research’”,____,____);
a procedure call in a programming language.
CH1.27
The Grouping of Phases
CSE244
Front End : Analysis + Intermediate Code Generation
vs.
Back End : Code Generation + Optimization
Number of Passes:
A pass: requires r/w intermediate files
Fewer passes: more efficiency.
However: fewer passes require more
sophisticated memory management and
compiler phase interaction.
Tradeoffs ……..
CH1.28
Compiler Construction Tools
CSE244
Parser Generators : Produce Syntax
Analyzers
Scanner Generators : Produce Lexical
Analyzers
Syntax-directed Translation Engines :
Generate Intermediate Code
Automatic Code Generators : Generate
Actual Code
Data-Flow Engines : Support Optimization
CH1.29