Representing executable files

Three Standard Sections

At a minimum, any executable file format will need to specify where the code and data are in the binary file. These are the two primary sections within an executable file.

One additional component we have not mentioned until now is storage space of uninitialised global variables. If we declare a variable and give it an initial value, this value needs to be stored in the executable file so that at program start it can be initalised to the correct value. However many variables are uninitialised (or zero) when the program is first executed. Making space for these in the executable and then simply storing zero or NULL values is a waste of space, needlessly bloating the executable file-size on disk. Thus most binary formats define the concept of a additional BSS section as a place-holder size for zeroed, uninitialised data. On program load the extra memory described by the BSS can be allocated (and set to zero!). BSS probably stands for Block Started by Symbol, an assembly command for a old IBM computer; the exact derivation is probably lost to history.

Binary Format

The executable is created by the toolchain from the source code. This file needs to be in a format explicitly defined such that the compiler can create it and the operating system can identify it and load into memory, turning it into a running process that the operating system can manage. This executable file format can be specific to the operating system, as we would not normally expect that a program compiled for one system will execute on another (for example, you don't expect your Windows programs to run on Linux, or your Linux programs to run on OS X).

However, the common thread between all executable file formats is that they include a predefined, standardised header which describes how program code and data are stored in the rest of the file. In words, it would generally describe "the program code starts 20 bytes into this file, and is 50 kilobytes long. The program data follows it and is 20 kilobytes long".

In recent times one particular format has become the de facto standard for executable representation for modern UNIX type systems. It is called the Executable and Linker Format, or ELF for short; we'll be looking at it in more detail soon.

Binary Format History

a.out

ELF was not always the standard; original UNIX systems used a file format called a.out. We can see the vestiges of this if you compile a program without the -o option to specify an output file name; the executable will be created with a default name of a.out[23].

a.out is a very simple header format that only allows a single data, code and BSS section. As you will come to see, this is insufficient for modern systems with dynamic libraries.

COFF

The Common Object File Format, or COFF, was the precursor to ELF. Its header format was more flexible, allowing more (but limited) sections in the file.

COFF also has difficulties with elegant support of shared libraries, and ELF was selected as an alternative implementation on Linux.

However, COFF lives on in Microsoft Windows as the Portable Executable or PE format. PE is to Windows as ELF is to Linux.



[23] In fact, a.out is the default output filename from the linker. The compiler generally uses randomly generated file names as intermediate files for assembly and object code.