Chapter 10: Compilers and Language Translation
Download
Report
Transcript Chapter 10: Compilers and Language Translation
Compilers and Language
Translation
Gordon College
What’s a compiler?
All computers only understand machine language
This is
a program
10000010010110100100101……
Therefore, high-level language instructions must be
translated into machine language prior to execution
2
What’s a compiler?
Compiler
A piece of system software that translates high-level
languages into machine language
while (c!='x')
{
if (c == 'a' || c == 'e' || c == 'i')
printf("Congrats!");
else
if (c!='x')
printf("You Loser!");
}
program.c
Compiler
Congrats!
prog
10000010010110100100101……
gcc -o prog program.c
3
Assembler (a kind of compiler)
LOAD
(opcode table)
0101
X
Assembly
(symbol table)
0000 0000 1001
Machine Language
One-to-one translation
4
Compiler (high-level language translator)
a = b + c - d;
0101 00001110001
0111 00001110010
0110 00001110011
0100 00001110100
LOAD B
ADD C
SUBTRACT D
STORE A
0101 00001110001 0111 00001110010…….
One-to-many translation
5
Goals of a compiler
Code produced must be correct
A = (B+C)-(D+E);
Possible translation:
LOAD B
ADD C
STORE B
LOAD D
ADD E
STORE D
LOAD B
SUBTRACT D
STORE A
Is this correct?
No - STORE B and STORE D
changes the values of variables
B and D which is the high-level
language does not intend
6
Goals of a compiler
Code produced should be reasonably efficient
and concise
Compute the sum - 2x1+ 2x2+ 2x3+ 2x4+…. 2x50000
sum = 0.0
for(i=0;i<50000;i++) {
sum = sum + (2.0 * x[i]);
Optimizing compiler:
sum = 0.0
for(i=0;i<50000;i++) {
sum = sum + x[i];
sum = sum * 2.0;
49,999 less instructions
7
General Structure of a Compiler
8
The Compilation Process
Phase I: Lexical analysis
Compiler examines the individual characters in the
source program and groups them into syntactical
units called tokens
Phase II: Parsing
Source
code
Scanner
Groups
of
tokens
The sequence of tokens formed by the scanner is
checked to see whether it is syntactically correct
Groups
of
tokens
Parser
correct
not correct
9
The Compilation Process
Phase III: Semantic analysis and code
generation
The compiler analyzes the meaning of the
high-level language statement and generates
the machine language instructions to carry
out these actions
Groups
of
tokens
Code
Generator
Machine
language
10
The Compilation Process
Phase IV: Code optimization
The compiler takes the generated code and
sees whether it can be made more efficient
Machine
language
Code
Optimizer
Machine
language
11
Overall Execution Sequence on a High-Level
Language Program
12
The Compilation Process
Source program
Original high-level language program
Object program
Machine language translation of the source
program
13
Phase I: Lexical Analysis
Lexical analyzer
The program that performs lexical analysis
More commonly called a scanner
Job of lexical analyzer
Group input characters into tokens
• Tokens: Syntactical units that are treated as single,
indivisible entities for the purposes of translation
Classify tokens according to their type
14
Phase I: Lexical Analysis
Program statement
sum = sum + a[i];
Digital perspective:
tab,s,u,m,blank,=,blank,s,u,m,blank,+,blank,a,[,i,],;
Tokenized:
sum,=,sum,+,a[i],;
15
Phase I: Lexical Analysis
Typical Token Classifications
TOKEN TYPE
Symbol
Number
=
+
;
==
If
Else
(
)
[
]
…
CLASSIFICATION NUMBER
1
2
3
4
5
6
7
8
9
10
11
12
13
16
Phase I: Lexical Analysis
Lexical Analysis Process
1. Discard blanks, tabs, etc. - look for beginning of token.
2. Put characters together
3. Repeat step 2 until end of token
4. Classify and save token
5. Repeat steps 1-4 until end of statement
6. Repeat steps 1-5 until end of source code
Scanner
sum=sum+a[i];
sum
=
+
a
[
i
]
;
1
3
4
1
12
1
13
6
17
Phase I: Lexical Analysis
Input to a scanner
- A high-level language statement from the source
program
Scanner’s output
- A list of all the tokens in that statement
- The classification number of each token found
Scanner
sum=sum+a[i];
sum
=
+
a
[
i
]
;
1
3
4
1
12
1
13
6
18
Phase II: Parsing
Parsing phase
A compiler determines whether the tokens
recognized by the scanner are a syntactically
legal statement
Performed by a parser
19
Phase II: Parsing
Output of a parser
A parse tree, if such a tree exists
An error message, if a parse tree cannot be
constructed
Successful construction of a parse tree is proof that
the statement is correctly formed
20
Example
High-level language statement: a = b + c
21
Grammars, Languages, and
BNF
Syntax
The grammatical structure of the language
The parser must be given the syntax of the
language
BNF (Backus-Naur Form)
Most widely used notation for representing the syntax of a programming
language
literal_expression ::= integer_literal | float_literal
| string | character
22
Grammars, Languages, and
BNF
In BNF
The syntax of a language is specified as a set of
rules (also called productions)
A grammar
• The entire collection of rules for a language
Structure of an individual BNF rule
left-hand side ::= “definition”
23
Grammars, Languages, and
BNF
BNF rules use two types of objects on the righthand side of a production
Terminals
• The actual tokens of the language
• Never appear on the left-hand side of a BNF rule
Nonterminals
• Intermediate grammatical categories used to help
explain and organize the language
• Must appear on the left-hand side of one or more rules
24
Grammars, Languages, and
BNF
Goal symbol
The highest-level nonterminal
The nonterminal object that the parser is
trying to produce as it builds the parse tree
All nonterminals are written inside angle
brackets
Java BNF
25
BNF Example
<postal-address> ::= <name-part> <street-address> <zip-part>
<name-part> ::= <personal-part> <last-name> <opt-jr-part> <EOL>
| <personal-part> <name-part>
<personal-part> ::= <first-name> | <initial> "."
<street-address> ::= <opt-apt-num> <house-num> <street-name> <EOL>
<zip-part> ::= <town-name> "," <state-code> <ZIP-code> <EOL>
<opt-jr-part> ::= "Sr." | "Jr." | <roman-numeral> | ""
Identify the following:
Goal symbol, terminals, nonterminals, a individual rule
Is this a legal postal address?
Steve Moses Sr.
215 Rose Ave.
Everywhere, NC 43563
26
Parsing Concepts and
Techniques
Fundamental rule of parsing:
By repeated applications of the rules of the
grammarIf the parser can convert the sequence of input
tokens into the goal symbol
the sequence of tokens is a syntactically valid
statement of the language
else
the sequence of tokens is not a syntactically
valid statement of the language
27
Is the following http address legal:
http://www.csm.astate.edu/~rossa/cs3543/bnf.html
Parsing Example
<httpaddress> ::= http:// <hostport> [ / <path> ] [ ? <search> ]
<hostport> ::= <host> [ : <port> ]
<host>
::= <hostname> | <hostnumber>
<hostname> ::= <ialpha> [ . <hostname> ]
<hostnumber> ::= <digits> . <digits> . <digits> . <digits>
<port>
::= <digits>
<path>
::= <void> | <xpalphas> [ / <path> ]
<search>
::= <xalphas> [ + <search> ]
<xalpha>
::= <alpha> | <digit> | <safe> | <extra> | <escape>
<xalphas>
::= <xalpha> [ <xalphas> ]
<xpalpha>
::= <xalpha> | +
<xpalphas> ::= <xpalpha> [ <xpalpha> ]
<ialpha>
::= <alpha> [ <xalphas> ]
<alpha>
::= a | b | … | z | A | B | … | Z
<digit>
::= 0 |1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
<safe>
::= $ | - | _ | @ | . | & | ~
<extra>
::= ! | * | " | ' | ( | ) | : | ; | , | <space>
<escape>
::= % <hex> <hex>
<hex>
::= <digit> | a | b | c | d | e | f | A | B | C | D | E | F
<digits>
::= <digit> [ <digits> ]
<void>
::=
28
Parsing Concepts and
Techniques
Look-ahead parsing algorithms - intelligent parsers
One of the biggest problems in building a compiler
is designing a grammar that:
Includes every valid statement that we want to be in
the language
Excludes every invalid statement that we do not want
to be in the language
29
Parsing Concepts and
Techniques
Another problem in constructing a compiler:
Designing a grammar that is not ambiguous
An ambiguous grammar allows the
construction of two or more distinct parse
trees for the same statement
NOT GOOD - multiple interpretations
30
Phase III: Semantics and Code
Generation
Semantic analysis
The compiler makes a first pass over the parse tree
to determine whether all branches of the tree are
semantically valid
• If they are valid
the compiler can generate machine language
instructions
else
there is a semantic error; machine language
instructions are not generated
31
Phase III: Semantics and Code
Generation
Semantic analysis
Syntactically correct, but semantically incorrect
example:
sum = a + b;
int a;
double sum;
char b;
Semantic records
data
typeinteger
mismatch
a
sum
double
b
char
32
Phase III: Semantics and Code
Generation
Semantic analysis
Parse tree
b
a
integer
char
<expression> + <expression>
Semantic record
Semantic record
<expression>
temp
?
Semantic record
33
Phase III: Semantics and Code
Generation
Semantic analysis
Parse tree
b
a
integer
integer
<expression> + <expression>
Semantic record
Semantic record
<expression>
temp
integer
Semantic record
34
Phase III: Semantics and Code
Generation
Code generation
Compiler makes a second pass over the
parse tree to produce the translated code
35
Phase IV: Code Optimization
Two types of optimization
Local
Global
Local optimization
The compiler looks at a very small block of
instructions and tries to determine how it can
improve the efficiency of this local code block
Relatively easy; included as part of most compilers:
36
Phase IV: Code Optimization
Examples of possible local optimizations
Constant evaluation
x = 1 + 1 ---> x = 2
Strength reduction
x = x * 2 ---> x = x + x
Eliminating unnecessary operations
37
Phase IV: Code Optimization
Global optimization
The compiler looks at large segments of the program
to decide how to improve performance
Much more difficult; usually omitted from all but the
most sophisticated and expensive production-level
“optimizing compilers”
Optimization cannot make an inefficient algorithm
efficient - “only makes an efficient algorithm more
efficient”
38
Summary
A compiler is a piece of system software that
translates high-level languages into machine
language
Goals of a compiler: Correctness and the production
of efficient and concise code
Source program: High-level language program
39
Summary
Object program: The machine language translation
of the source program
Phases of the compilation process
Phase I: Lexical analysis
Phase II: Parsing
Phase III: Semantic analysis and code generation
Phase IV: Code optimization
40