Chapter 1 Background

Download Report

Transcript Chapter 1 Background

Chapter 2
Assemblers
System Software
Chih-Shun Hsu
Basic Assembler Functions





Convert mnemonic operation codes to their machine
language equivalent
Convert symbolic operands to their equivalent machine
addresses
Build the machine instructions in the proper format
Convert the data constants specified in the source
program into their machine representations
Write the object program and the assembly listing
Two Pass Assembler(2/1)





Forward reference—a reference to a label that is defined
later in the program
Because of forward reference, most assembler make
two pass over the source program
The first pass does little more than scan the source
program for label definitions and assign addresses
The second pass performs most of the actual translation
Assembler directives (or pseudo-instructions) provide
instructions to the assembler itself
Two Pass Assembler(2/2)

Pass 1 (define symbols)




Assign addresses to all statements in the program
Save the values (addresses) assigned to all labels
Perform some processing of assembler directives
Pass 2 (assemble instructions and generate object
program)




Assemble instructions (translating operation codes and looking
up addresses
Generate data values defined by BYTE, WORD, etc.
Perform processing of assembler directives not done during
Pass 1
Write the object program and the assembly listing
Assembler Data Structure and
Variable

Two major data structures:



Operation Code Table (OPTAB): is used to look up mnemonic
operation codes and translate them to their machine language
equivalents
Symbol Table (SYMTAB): is used to store values (addresses)
assigned to labels
Variable:



Location Counter (LOCCTR) is used to help the assignment of
addresses
LOCCTR is initialized to the beginning address specified in the
START statement
The length of the assembled instruction or data area to be
generated is added to LOCCTR
OPTAB and SYMTAB





OPTAB must contain the mnemonic operation code and
its machine language
In more complex assembler, it also contain information
about instruction format and length
For a machine that has instructions of different length,
we must search OPTAB in the first pass to find the
instruction length for incrementing LOCCTR
SYMTAB includes the name and value (address) for
each label, together with flags to indicate error
conditions
OPTAB and SYMTAB are usually organized as hash
tables, with mnemonic operation code or label name as
the key, for efficient retrieval
Example of a SIC Assembler
Language Program (3/1)
Example of a SIC Assembler
Language Program (3/2)
for (int i=0; i<4096; i++)
{
scanf(“%c”,&BUFFER[i]);
if (BUFFER[i]==0) break;
}
LENGTH=i;
Example of a SIC Assembler
Language Program (3/3)
for (int i=0; i<LENGTH; i++)
{
printf(“%c”,BUFFER[i]);
}
Program with Object Code (3/1)
14
1033
Program with Object Code (3/2)
54
1039+
8000=
9039
Program with Object Code (3/3)
SYMTAB
symbol
value
FIRST
flags
BUFFER
1039
1000
RDREC
2039
CLOOP
1003
RLOOP
203F
ENDFIL
1015
EXIT
2057
EOF
102A
INPUT
205D
THREE
102D
MAXLEN
205E
ZERO
1030
WRREC
2061
RETADR
1033
WLOOP
2064
LENGTH
1036
OUTPUT
2079
Object Program Format

Header record (H)




Text record (T)




Col. 2-7 program name
Col. 8-13 Starting address of object program (Hex)
Col. 14-19 Length of object program in bytes (Hex)
Col. 2-7 Starting address for object code in this record
(Hex)
Col. 8-9 length of object code in this record (Hex)
Col 10-69. object code, represented in Hex
End record (E)

Col.2-7 address of first executable instruction in object
program (Hex)
Object Program
Algorithm for Pass 1 of
Assembler(3/1)
read first input line
if OPCODE=‘START’ then
begin
save #[OPERAND] as starting address
initialize LOCCTR to starting address
write line to intermediate file
read next input line
end
else
initialize LOCCTR to 0
while OPCODE≠’END’ do
begin
if this is not a comment line then
begin
if there is a symbol in the LABEL field then
Algorithm for Pass 1 of
Assembler(3/2)
begin
search SYMTAB for LABEL
if found then
set error flag (duplicate symbol)
else
insert (LABEL, LOCCTR) into SYMTAB
end {if symbol}
search OPTAB for OPCODE
if found then
add 3 {instruction length} to LOCCTR
else if OPCODE=‘WORD’ then
add 3 to LOCCTR
else if OPCODE=‘RESW’ then
add 3 * #[OPERAND] to LOCCTR
Algorithm for Pass 1 of
Assembler(3/3)
else if OPCODE=‘RESB’ then
add #[OPERAND] to LOCCTR
else if OPCODE=‘BYTE’ then
begin
find length of constant in bytes
add length to LOCCTR
end {if BYTE}
else
set error flag (invalid operation code)
end {if not a comment}
write line to intermediate file
read next input line
end {while not END}
Write last line to intermediate file
Save (LOCCTR-starting address) as program length
Algorithm for Pass 2 of
Assembler(3/1)
read first input line (from intermediate file)
If OPCODE=‘START’ then
begin
write listing line
read next input line
end {if START}
Write Header record to object program
Initialize first Text record
While OPCODE≠ ‘END’ do
begin
if this is not a comment line then
begin
search OPTAB for OPCODE
if found then
begin
Algorithm for Pass 2 of
Assembler(3/2)
if there is a symbol in OPERAND field then
begin
search SYMTAB for OPERAND
if found then
store symbol value as operand address
else
begin
store 0 as operand address
set error flag (undefined symbol)
end
end {if symbol}
else
store 0 as operand address
assemble the object code instruction
end {if opcode found}
Algorithm for Pass 2 of
Assembler(3/3)
else if OPCODE=‘BYTE’ or ‘WORD’ then
convert constant to object code
if object code will not fit into the current Text record then
begin
write Text record to object program
initialize new Text record
end
add object code to Text record
end {if not comment}
write listing line
read next input line
end {while not END}
write last Text record to object program
Write End record to object program
Write last listing line
Machine-Dependent Assembler
Features





Indirect addressing is indicated by adding the prefix @ to
the operand
Immediate operands are denoted with the prefix #
The assembler directive BASE is used in conjunction
with base relative addressing
The extended instruction format is specified with the
prefix + added to the operation code
Register-to-register instruction are faster than the
corresponding register-to-memory operations because
they are shorter and because they do not require
another memory reference
Example of SIC/XE Program(3/1)
Example of SIC/XE Program(3/2)
Example of SIC/XE Program(3/3)
Program with Object Code (3/1)
Object Code Translation





Format 3
op(6) n i x b p e
Format 4
op(6)
n i xbpe
disp(12)
address(20)
Line 10: STL=14, n=1, i=1ni=3, op+ni=14+3=17, RETADR=0030,
x=0, b=0, p=1, e=0xbpe=2, PC=0003, disp=RETADR-PC=030003=02D, xbpe+disp=202D, obj=17202D
Line 12: LDB=68, n=0, i=1ni=1, op+ni=68+1=69, LENGTH=0033,
x=0, b=0, p=1, e=0xbpe=2, PC=0006, disp=LENGTH-PC=033006=02D, xbpe+disp=202D, obj=69202D
Line 15: JSUB=48, n=1, i=1ni=3, op+ni=48+3=4B, RDREC=01036,
x=0, b=0, p=0, e=1, xbpe=1, xbpe+RDREC=101036, obj=4B101036
Line 40: J=3C, n=1, i=1ni=3, op+ni=3C+3=3F, CLOOP=0006, x=0,
b=0, p=1, e=0xbpe=2, PC=001A, disp=CLOOP-PC=0006-001A=14=FEC(2’s complement), xbpe+disp=2FEC, obj=3F2FEC
Line 55: LDA=00, n=0, i=1ni=1, op+ni=00+1=01, disp=#3003,
x=0, b=0, p=0, e=0xbpe=0, xbpe+disp=0003, obj=010003
Program with Object Code (3/2)
Object Code Translation
op(8)



r1(4)
r2(4)
Line 125: CLEAR=B4, r1=X=1, r2=0, obj=B410
Line 133: LDT=74, n=0, i=1ni=1, op+ni=74+1=75,
x=0, b=0, p=0, e=1xbpe=1, #4096=01000,
xbpe+address=101000, obj=75101000
Line 160: STCH=54, n=1, i=1ni=3, op+ni=54+3=57,
BUFFER=0036, B=0033, disp=BUFFER-B=003, x=1,
b=1, p=0, e=0xbpe=C, xbpe+disp=C003,
obj=57C003
Program with Object Code (3/3)
SYMTAB
SYMBOL
VALUE
FIRST
FLAGS
SYMBOL
VALUE
0000
RDREC
1036
CLOOP
0006
RLOOP
1040
ENDFIL
001A
EXIT
1056
EOF
002D
INPUT
105C
RETADR
0030
WRREC
105D
LENGTH
0033
WLOOP
1062
BUFFER
0036
OUTPUT
1076
FLAGS
Program Relocation





The actual starting address of the program is not known
until load time
An object program that contains the information
necessary to perform this kind of modification is called a
relocatable program
No modification is needed: operand is using programcounter relative or base relative addressing
The only parts of the program that require modification at
load time are those that specified direct (as opposed to
relative) addresses
Modification record


Col. 2-7 Starting location of the address field to be modified,
relative to the beginning of the program (Hex)
Col. 8-9 Length of the address field to be modified, in half-bytes
(Hex)
Examples of Program Relocation
Object Program
Machine-Independent Assembler
Features
Literals
 Symbol-defining statements
 Expressions
 Program block
 Control sections and program linking

Program with Additional Assembler
Features(3/1)
Program with Additional Assembler
Features(3/2)
Program with Additional Assembler
Features(3/3)
Literals(2/1)





Write the value of a constant operand as a part of the
instruction that uses it
Such an operand is called a literal
Avoid having to define the constant elsewhere in the
program and make up a label for it
A literal is identified with the prefix =, which is followed
by a specification of the literal value
Examples of literals in the statements:


45
215
001A
1062
ENDFIL
WLOOP
LDA
TD
=C’EOF’
=X’05’
032010
E32011
Literals(2/2)







With a literal, the assembler generates the specified value
as a constant at some other memory location
The address of this generated constant is used as the
target address for the machine instruction
All of the literal operands used in the program are
gathered together into one or more literal pools
Normally literals are placed into a pool at the end of the
program
A LTORG statement creates a literal pool that contains all
of the literal operands used since the previous LTORG
Most assembler recognize duplicate literals: the same
literal used in more than one place and store only one
copy of the specified data value
LITTAB (literal table): contains the literal name, the
operand value and length, and the address assigned to
the operand when it is placed in a literal pool
Symbol-Defining Statements





Assembler directive that allows the programmer to define symbols
and specify their values
General form: symbol
EQU
value
Line 133: +LDT
#4096
MAXLEN
EQU
4096
+LDT
#MAXLEN
It is much easier to find and change the value of MAXLEN
Assembler directive that indirect assigns values to symbols ORG
STAB
SYMBOL
VALUE
FLAGS
RESB
EQU
EQU
EQU
1100
STAB
STAB+6
STAB+9
STAB
SYMBOL
VALUE
FLAGS
RESB
ORG
RESB
RESW
RESW
ORG
1100
STAB
6
1
2
STAB+1100
Expressions




Assembler allow arithmetic expressions formed
according to the normal rules using the operator +, -, *,
and /
Individual terms in the expression may be constants,
user-defined symbols, or special terms
The most common such special term is the current
value of the location counter (designed by *)
Expressions are classified as either absolute
expressions or relative expressions
Symbol
Type
Value
RETADR
R
0030
BUFFER
R
0036
BUFFEND
R
1036
MAXLEN
A
1000
Program Block(2/1)



Program blocks: segments of code that are
rearranged within a single object unit
Control sections: segments that are translated into
independent object program units
USE indicates which portions of the source program
belong to the various blocks
Block name
(default)
CDATA
CBLKS
Block number Address Length
0
0000
0066
1
0066
000B
2
0071
1000
Program Block(2/2)



Because the large buffer area is moved to the
end of the object program, we no longer need to
used extended format instructions
Program readability is improved if the definition
of data areas are placed in the source program
close to the statements that reference them
It does not matter that the Text records of the
object program are not in sequence by address;
the loader will simply load the object code from
each record at the indicated address
Example Program with Multiple
Program Blocks(3/1)
Example Program with Multiple
Program Blocks(3/2)
Example Program with Multiple
Program Blocks(3/3)
Program Blocks Traced Through
Assembly and Loading Processes
Object Program
Control sections(3/1)




References between control sections are called external
references
The assembler generates information for each external
reference that will allow the loader to perform the
required linking
The EXTDEF (external definition) statement in a control
section names symbol, called external symbols, that are
define in this section and may be used by other sections
The EXTREF (external reference) statement names
symbols that are used in this control section and are
defined elsewhere
Control sections(3/2)

Define record (D)
 Col.
2-7 Name of external symbol defined in this
control section
 Col. 8-13 Relative address of symbol within this
control section (Hex)
 Col. 14-73 Repeat information in Col. 2-13 for other
external symbols

Refer record (R)
 Col.
2-7 Name of external symbol referred to in this
control section
 Col. 8-73 Names of other external reference symbols
Control sections(3/3)

Modification record (revised : M)
 Col.
2-7 Starting address of the field to be modified,
relative to the beginning of the control section (Hex)
 Col. 8-9 Length of the field to be modified, in halfbytes (Hex)
 Col. 10 Modification flag (+ or -)
 Col. 11-16 External symbol whose value is to be
added to or subtracted from the indicated field
Example Program with Control
Sections(3/1)
Example Program with Control
Sections(3/2)
Example Program with Control
Sections(3/3)
Object Program(2/1)
Object Program(2/2)
One-Pass Assemblers


Eliminate forward references: require that all
such areas be defined in the source program
before they are referenced
One-pass assembler:
 Generate
their object code in memory for immediate
execution
 Load-and-go assembler is useful in a system that is
oriented toward program development and testing
Handle Forward Reference




The symbol used as an operand is entered into the
symbol table
This entry is flagged to indicate that the symbol is
undefined
The address of the operand field of the instruction that
refers to undefined symbol is added to a list of forward
references associated with the symbol table entry
When the definition for a symbol is encountered, the
forward reference list for that symbol is scanned, and the
proper address is inserted into any instructions
previously generated
Sample Program for One-Pass
assembler(3/1)
Sample Program for One-Pass
assembler(3/2)
Sample Program for One-Pass
assembler(3/3)
Example of Handling Forward
Reference(2/1)
Example of Handling Forward
Reference(2/2)
Multi-Pass Assemblers(6/1)
HALFSZ
MAXLEN
PREVBT
……….
BUFFER
BUFFEND EQU
EQU
EQU
EQU
MAXLEN/2
BUFFEND-BUFFER
BUFFER-1
RESB
*
4096
Multi-Pass Assemblers(6/2)
Multi-Pass Assemblers(6/3)
Multi-Pass Assemblers(6/4)
Multi-Pass Assemblers(6/5)
Multi-Pass Assemblers(6/6)
MASM Assembler





An MASM assembler language program is written as a
collection of segments
Commonly used classes are CODE, DATA, CONST, and
STACK
During program execution, segments are addressed via
the x86 segment registers
ASSUME tells MASM the contents of a segment register;
a programmer must provide instructions to load this
register when the program is executed
A near jump is a jump to a target in the same code
segment; a far jump is a jump to a target in a different
code segment
SPARC Assembler

A SPARC assembler language program is divided into
units called sections








.TEXT
.DATA
.RODATA
.BSS
Executable instructions
Initialized read/ write data
Read-only data
Uninitialized data areas
A global symbol is either symbol that is defined in the
program and made accessible to others
A weak symbol is similar to a global symbol, but the
definition of a weak symbol may be overridden by a
global symbol with the same name
SPARC branch instructions are delayed branches: the
instruction immediately following a branch instruction is
actually executed before the branch is taken
Programmers often place NOP (no-operation)
instructions in delay slots