2 Representing executable files
2.1 Three Standard Sections
At a minimum, any executable file format will need to specify where the code and data are in the binary file. These are the two primary sections within an executable file.
One additional component we have not mentioned until now
is storage space of uninitialised global variables. If we
declare a variable and give it an initial value, this value
needs to be stored in the executable file so that at program
start it can be initalised to the correct value. However many
variables are uninitialised (or zero) when the program is first
executed. Making space for these in the executable and then
simply storing zero or NULL values is a waste of space,
needlessly bloating the executable file-size on disk. Thus most
binary formats define the concept of a additional
BSS
section as a place-holder
size for zeroed, uninitialised data. On program load the extra
memory described by the BSS can be allocated (and set to zero!).
BSS probably stands for Block Started by
Symbol, an assembly command for a old IBM computer; the exact
derivation is probably lost to history.
2.2 Binary Format
The executable is created by the toolchain from the source code. This file needs to be in a format explicitly defined such that the compiler can create it and the operating system can identify it and load into memory, turning it into a running process that the operating system can manage. This executable file format can be specific to the operating system, as we would not normally expect that a program compiled for one system will execute on another (for example, you don't expect your Windows programs to run on Linux, or your Linux programs to run on OS X).
However, the common thread between all executable file formats is that they include a predefined, standardised header which describes how program code and data are stored in the rest of the file. In words, it would generally describe "the program code starts 20 bytes into this file, and is 50 kilobytes long. The program data follows it and is 20 kilobytes long".
In recent times one particular format has become the de facto
standard for executable representation for modern UNIX type
systems. It is called the Executable and Linker
Format
, or ELF for short; we'll be looking at
it in more detail soon.
2.3 Binary Format History
2.3.1 a.out
ELF was not always the standard; original UNIX systems
used a file format called
a.out
. We can see the
vestiges of this if you compile a program without the
-o
option to specify an output file name; the
executable will be created with a default name of
a.out
1.
a.out
is a very simple
header format that only allows a single data, code and BSS
section. As you will come to see, this is insufficient for
modern systems with dynamic libraries.
2.3.2 COFF
The Common Object File Format, or COFF, was the precursor to ELF. Its header format was more flexible, allowing more (but limited) sections in the file.
COFF also has difficulties with elegant support of shared libraries, and ELF was selected as an alternative implementation on Linux.
However, COFF lives on in Microsoft Windows as the
Portable Executable
or PE
format. PE is to Windows as ELF is to Linux.