Transcript Slide 1

OBJECT MODULE FORMATS
The object module format we have employed as
an educational device is called OMF
(relocatable object format).
It’s one of the earliest forms, but all the
subsequent formats contain the basic elements
that are present in OMF
Here is a depiction of the main formats that followed
pe/coff+
mach-o
for Mac osx10.6
pe/coff
elf
coff
omf
mach-o
a.out
All of them contain separate sections for data,
code, and relocation information (i.e. fixups).
All of them, incidentally, were designed by
committees with the objective of making them
machine and language indepedent to varying
degrees.
So the committees included a wealth of fields
that they thought might possibly be helpful, but
which are in fact never used in practice.
So why didn’t we pick on one of these later
formats to employ for our Project 4?
It just would not have been possible to do this in
a one-semester compiler course.
Even in a two-semester course, the amount of
extra detail required would be out of proportion
to the gain in education value.
OMF was devised by Intel
and at roughly the same time period, AT&T
released A.OUT for use with Unix systems.
In order to provide for debugging information
and shared libraries,
COFF (common object file format) was released
by AT&T
together with the introduction of Unix System V.
The object module formats in use today by
Linux, Unix, and Microsoft, are basically
variants of COFF
COFF supported symbolic debugging by in
effect including a symbol table which specified
not only the offset of variables,
but also the offset of code corresponding to the
line number of the source - so as to aid e.g. in
the setting of breakpoints.
Limitations of COFF include:
It places a limit on section names (which
correspond to our segment names)
and on the number of sections allowed,
and its symbolic debugging information is
insufficient for supporting some of the features
of languages such as C++.
In response, AT&T released ELF,
a minor variant of COFF
with the introduction of System V, version 4 .
Microsoft created its own version of COFF.
For the sake of concreteness let’s examine its
main features
- as described in the Microsoft document
“Microsoft Portable Executable and Common
Object File Format Specification”, September
21, 2010.
The name of the specification is abbreviated as
PE / COFF
while the version released to accommodate 64
bit machines is called PE / COFF+.
PE is the format of the output of the linker and .
loader,
in which the various modules that make up the
program are linked
all external references resolved
all relocation (fixups) completed
and the image obtained finally written into
memory
The COFF component of PE / COFF is the
format of the object module that serves as
input to the linker
It closely follows that of the original COFF
specification.
The main difference is that the Microsoft version
does not make use of the debugging facilities
supplied by the original COFF
such as e.g the line number information
It relies on Visual C++ type debug information.
As a compiler writer, your responsibility in
writing a compiler for Windows is the production
of an object module for input to the linker.
The PE formatted output of the linker, and the
operating system, are the responsibility of
Microsoft.
MICROSOFT’S COFF FORMAT
Here is an
illustration
of the coff
structure
SECTIONS
The sections correspond to our segments.
Except for the segment associated with
uninialized data, each segment consists of a
header,
the raw data,
and a relocation component.
The .text section is the code section
and the relocation information corresponds to
our fixups.
There are two data sections.
One is for initialized data,
to e.g contain the initial value of variables, as in:
num dw 23
The other data section, called .bss above,
is for unitialized data,
as in:
array2 dw 1000 dup(?)
The .bss section consists only of a header that
specifies what space is to be involved at
execution time.
The “named sections”, if present, may be used
for purposes such as functions that the program
employs.
The name of the section would then normally be
the same as that of the function.
Section Headers.
The fields involved in the section headers include:
the section name. If the name has 8 characters or
less, it is contained in the header, otherwise it is
included in the String table (which corresponds to
our ID_S), and the name field of the section
header then contains a pointer to its offset there.
the section’s virtual address (i.e. offset within the
object module itself).
the sections’s physical address (i.e. the offset from
the start of the program that it will have at
execution time)
the size of the section
a pointer to the section’s raw data
a pointer to the corresponding relocation
entries
a specification of whether the section contains
executable code, initialized data, or unitialized
data
a specification of whether the section may or
may not be read, written, or executed
THE FILE HEADER
The fields involved in the file header include:
a number identifying the target machine
e.g. those employing the 386 or later Pentium, or various machines
produced by Hitachi, Mitsubishi, etc.
a time and date stamp, indicating when the file
was created
the number of section headers
a pointer to the symbol table’s starting
address
THE SYMBOL TABLE
The symbol table entries are each 18 bytes
long, and include:
the name of the symbol. The same scheme is
employed as described above for section
header names, i.e. if the name is longer than
8 bytes it is stored in the string table, and a
pointer to it employed instead
the section the item is defined in
it’s offset within that section
it’s storage class, e.g. whether it is external,
static, or is a function
Some of the entries, such as e.g. those for
functions, require more than the 18 bytes an
entry provides for its information.
In such cases, the main entry for the name is
followed by an additional entry (referred to as
an auxillary entry).
THE STRING TABLE
As mentioned, this corresponds to our id_s.
It starts off with 4 bytes specifying its length.
This is followed by null-terminated strings, in
general representing names.
Note that the segdef, pubdef, and extdef
records we have been using
are replaced by entries in the symbol table and
the string table.
THE PE MODULE FORMAT
As mentioned, the compiler writer, in the case
where target is not an intermediate language, is
concerned with producing the object module
input to the linker.
He or she is not directly involved with the PE
module that the linker produces. Let us
however look at the main features of the PE
format.
Here is a diagram of its structure
The components the linker has added to the
Coff format are:
(a) the DOS stub
(b) the optional file header
(c) the data directories
THE DOS STUB
The purpose of the DOS stub is to detect when
an attempt is made to execute the program
under DOS, and then issue an error message
such as:
This program can only be run under Windows
THE OPTIONAL FILE HEADER
The loader needs to be able to relocate the
program in the case where it is unable to load it
into the base location employed by the linker.
Some of the items listed on the next slide are
included for this purpose
The information the optional file header
contains includes:
(a) the amount of memory space that will be
occupied by executable code, initialized
data, and uninialized data
(b) the offsets from the beginning of the
program where the above items will be
located in memory
(c) the offset from the beginning of the program
of it’s entry point
(d) the amount of space needed for the stack
(e) the amount of space needed for the heap
(f) the alignment of the sections. The default is
at an address divisible by 512, but any
power of 2 up to 64k can be used.
(g) the offsets within the module of the data
(h) directories and their sizes.
THE DATA DIRECTORIES
These include:
(a) the Export Table
(b) the Import Table
(c) the Resource Table
(d) the Base Relocation Table
The Export Table is employed mainly by DLLs to
supply the entry points of the various functions they
provide.
The Import Table is used by programs to supply the
externals references that the linker was unable
resolve, usually those to DLL functions.
Note that the location of the DLL functions may
change between one Load & execute of the program
to another.
The unresolved calls in the memory image to
such external routines are not directly fixed up.
They are instead replaced by the linker as calls
to a table of external addresses which the
loader fills in.
The pentium has a call indirect instruction for
this purpose.
The Resource Table table contains information
about resources the program employs, such as
dialog boxes, menus, icons, etc.
The Base Relocation Table replaces the Coff
version, as much of the relocation and linking
involved has already be carried out by the
linker.
SOURCES
1. Microsoft Portable Executable and Common
Object File Format Specification, Revision
8.2, Sept. 2010.
2. Application Report spraa08-April 2009,
Texas Instruments.