6 Extending ELF concepts

6.1 Debugging

Traditionally the primary method of post mortem debugging is referred to as the core dump. The term core comes from the original physical characteristics of magnetic core memory, which uses the orientation of small magnetic rings to store state.

Thus a core dump is simply a complete snapshot of the program as it was running at a particular time. A debugger can then be used to examine this dump and reconstruct the program state. Example 6.1.1, Example of creating a core dump and using it with gdb shows a sample program that writes to a random memory location in order to force a crash. At this point the processes will be halted and a dump of the current state is recorded.

$ cat coredump.c
int main(void) {
	char *foo = (char*)0x12345;
	*foo = 'a';

	return 0;
}

$ gcc -Wall -g -o coredump coredump.c

$ ./coredump
Segmentation fault (core dumped)

$ file ./core
./core: ELF 32-bit LSB core file Intel 80386, version 1 (SYSV), SVR4-style, from './coredump'

$ gdb ./coredump
...
(gdb) core core
[New LWP 31614]
Core was generated by `./coredump'.
Program terminated with signal 11, Segmentation fault.
#0  0x080483c4 in main () at coredump.c:3
3		*foo = 'a';
(gdb)
Example 6.1.1 Example of creating a core dump and using it with gdb

Thus a core-dump is just another ELF file with a range of sections understood to the debugger to represent parts of the running program.

6.1.1 Symbols and Debugging Information

As Example 6.1.1, Example of creating a core dump and using it with gdb shows, the debugger gdb requires the original executable and the core dump to reconstruct the environment for the debugging session. Note that the original executable was built with the -g flag, which instructs the compiler to include all debugging information. This extra debugging information is kept in special sections of the ELF file. It describes in detail things like what register values currently hold which variables used in the code, size of variables, length of arrays, etc. It is generally in the standard DWARF format (a pun on the almost-synonym ELF).

Including debugging information can make executable files and libraries very large; although this data is not required resident in memory for actually running it can still take up considerable disk space. Thus the usual process is to strip this information from the ELF file. While it is possible to arrange for shipping of both stripped and unstripped files, most all current binary distribution methods provide the debugging information in separate files. The objcopy tool can be used to extract the debugging information (--only-keep-debug) and then add a link in the original executable to this stripped information (--add-gnu-debuglink). After this is done, a special section called .gnu_debuglink will be present in the original executable, which contains a hash so that when a debugging sessions starts the debugger can be sure it associates the right debugging information with the right executable.

$ gcc -g -shared -o libtest.so libtest.c
$ objcopy --only-keep-debug libtest.so libtest.debug
$ objcopy --add-gnu-debuglink=libtest.debug libtest.so
$ objdump -s -j .gnu_debuglink libtest.so

libtest.so:     file format elf32-i386

Contents of section .gnu_debuglink:
 0000 6c696274 6573742e 64656275 67000000  libtest.debug...
 0010 52a7fd0a                             R... 
Example 6.1.1.1 Example of stripping debugging information into separate files using objcopy

Symbols take up much less space, but are also targets for removal from final output. Once the individual object files of an executable are linked into the single final image there is generally no need for most symbols to remain. As discussed in Section 3.2, Symbols and Relocations symbols are required to fix up relocation entries, but once this is done the symbols are not strictly necessary for running the final program. On Linux the GNU toolchain strip program provides options to remove symbols. Note that some symbols are required to be resolved at run-time (for dynamic linking, the focus of Chapter 9, Dynamic Linking) but these are put in separate dynamic symbol tables so they will not be removed and render the final output useless.

6.1.2 Inside coredumps

A coredump is really just another ELF file; this illustrates the flexibility of ELF as a binary format.

$ readelf --all ./core
ELF Header:
  Magic:   7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 
  Class:                             ELF32
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              CORE (Core file)
  Machine:                           Intel 80386
  Version:                           0x1
  Entry point address:               0x0
  Start of program headers:          52 (bytes into file)
  Start of section headers:          0 (bytes into file)
  Flags:                             0x0
  Size of this header:               52 (bytes)
  Size of program headers:           32 (bytes)
  Number of program headers:         15
  Size of section headers:           0 (bytes)
  Number of section headers:         0
  Section header string table index: 0

There are no sections in this file.

There are no sections to group in this file.

Program Headers:
  Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
  NOTE           0x000214 0x00000000 0x00000000 0x0022c 0x00000     0
  LOAD           0x001000 0x08048000 0x00000000 0x01000 0x01000 R E 0x1000
  LOAD           0x002000 0x08049000 0x00000000 0x01000 0x01000 RW  0x1000
  LOAD           0x003000 0x489fc000 0x00000000 0x01000 0x1b000 R E 0x1000
  LOAD           0x004000 0x48a17000 0x00000000 0x01000 0x01000 R   0x1000
  LOAD           0x005000 0x48a18000 0x00000000 0x01000 0x01000 RW  0x1000
  LOAD           0x006000 0x48a1f000 0x00000000 0x01000 0x153000 R E 0x1000
  LOAD           0x007000 0x48b72000 0x00000000 0x00000 0x01000     0x1000
  LOAD           0x007000 0x48b73000 0x00000000 0x02000 0x02000 R   0x1000
  LOAD           0x009000 0x48b75000 0x00000000 0x01000 0x01000 RW  0x1000
  LOAD           0x00a000 0x48b76000 0x00000000 0x03000 0x03000 RW  0x1000
  LOAD           0x00d000 0xb771c000 0x00000000 0x01000 0x01000 RW  0x1000
  LOAD           0x00e000 0xb774d000 0x00000000 0x02000 0x02000 RW  0x1000
  LOAD           0x010000 0xb774f000 0x00000000 0x01000 0x01000 R E 0x1000
  LOAD           0x011000 0xbfeac000 0x00000000 0x22000 0x22000 RW  0x1000

There is no dynamic section in this file.

There are no relocations in this file.

There are no unwind sections in this file.

No version information found in this file.

Notes at offset 0x00000214 with length 0x0000022c:
  Owner                 Data size	Description
  CORE                 0x00000090	NT_PRSTATUS (prstatus structure)
  CORE                 0x0000007c	NT_PRPSINFO (prpsinfo structure)
  CORE                 0x000000a0	NT_AUXV (auxiliary vector)
  LINUX                0x00000030	Unknown note type: (0x00000200)

$ eu-readelf -n ./core

Note segment of 556 bytes at offset 0x214:
  Owner          Data size  Type
  CORE                 144  PRSTATUS
    info.si_signo: 11, info.si_code: 0, info.si_errno: 0, cursig: 11
    sigpend: <>
    sighold: <>
    pid: 31614, ppid: 31544, pgrp: 31614, sid: 31544
    utime: 0.000000, stime: 0.000000, cutime: 0.000000, cstime: 0.000000
    orig_eax: -1, fpvalid: 0
    ebx:     1219973108  ecx:     1243440144  edx:              1
    esi:              0  edi:              0  ebp:     0xbfecb828
    eax:          74565  eip:     0x080483c4  eflags:  0x00010286
    esp:     0xbfecb818
    ds: 0x007b  es: 0x007b  fs: 0x0000  gs: 0x0033  cs: 0x0073  ss: 0x007b
  CORE                 124  PRPSINFO
    state: 0, sname: R, zomb: 0, nice: 0, flag: 0x00400400
    uid: 1000, gid: 1000, pid: 31614, ppid: 31544, pgrp: 31614, sid: 31544
    fname: coredump, psargs: ./coredump 
  CORE                 160  AUXV
    SYSINFO: 0xb774f414
    SYSINFO_EHDR: 0xb774f000
    HWCAP: 0xafe8fbff  <fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov clflush dts acpi mmx fxsr sse sse2 ss tm pbe>
    PAGESZ: 4096
    CLKTCK: 100
    PHDR: 0x8048034
    PHENT: 32
    PHNUM: 8
    BASE: 0
    FLAGS: 0
    ENTRY: 0x8048300
    UID: 1000
    EUID: 1000
    GID: 1000
    EGID: 1000
    SECURE: 0
    RANDOM: 0xbfecba1b
    EXECFN: 0xbfecdff1
    PLATFORM: 0xbfecba2b
    NULL
  LINUX                 48  386_TLS
    index: 6, base: 0xb771c8d0, limit: 0x000fffff, flags: 0x00000051
    index: 7, base: 0x00000000, limit: 0x00000000, flags: 0x00000028
    index: 8, base: 0x00000000, limit: 0x00000000, flags: 0x00000028
Example 6.1.2.1 Example of using readelf and eu-readelf to examine a coredump.

In Example 6.1.2.1, Example of using readelf and eu-readelf to examine a coredump. we can see an examination of the core file produced by Example 6.1.1, Example of creating a core dump and using it with gdb using firstly the readelf tool. There are no sections, relocations or other extraneous information in the file that may be required for loading an executable or library; it simply consists of a series of program headers describing LOAD segments. These segments are raw data dumps, created by the kernel, of the current memory allocations.

The other component of the core dump is the NOTE sections which contain data necessary for debugging but not necessarily captured in straight snapshot of the memory allocations. The eu-readelf program used in the second part of the figure provides a more complete view of the data by decoding it.

The PRSTATUS note gives a range of interesting information about the process as it was running; for example we can see from cursig that the program received a signal 11, or segmentation fault, as we would expect. Along with process number information, it also includes a dump of all the current registers. Given the register values, the debugger can reconstruct the stack state and hence provide a backtrace; combined with the symbol and debugging information from the original binary the debugger can show exactly how you reached the current point of execution.

Another interesting output is the current auxiliary vector (AUXV), discussed in Section 8.1, Kernel communication to programs. The 386_TLS describes global descriptor table entries used for the x86 implementation of thread-local storage (see Section 4.1.1.3, Fast System Calls for more information on use of segmentation, and Section 4.3.1.1, Threads for information on threads1).

The kernel creates the core dump file within the bounds of the current ulimit settings — since a program using a lot of memory could result in a very large dump, potentially filling up disk and making problems even worse, generally the ulimit is set low or even at zero, since most non-developers have little use for a core dump file. However the core dump remains the single most useful way to debug an unexpected situation in a postmortem fashion.

6.2 Custom sections

For the most part, organisation of code, data and symbols is something a programmer can leave up the toolchain defaults. However, there are times when it makes sense to extend or customise sections and their contents. One common example of this is with Linux kernel modules which are used to dynamically load drivers and other features into the running kernel. Because these modules are not portable, in so much as they only work with one fixed kernel build version, the interface between modules and the kernel can be flexible and is not bound to particular standards. This means the methods of storing things like license information, authorship, dependencies and paramaters for the moudule can be uniquely and wholly defined by the kernel.

The modinfo tool can inspect this information within a module and present it to the user. Below we use the example of the FUSE Linux kernel module, which allows user-space libraries to provide file-system implementations to the kernel.

$ cd /lib/modules/$(uname -r)

$ sudo modinfo ./kernel/fs/fuse/fuse.ko 
filename:       /lib/modules/3.2.0-4-amd64/./kernel/fs/fuse/fuse.ko
alias:          devname:fuse
alias:          char-major-10-229
license:        GPL
description:    Filesystem in Userspace
author:         Miklos Szeredi <miklos@szeredi.hu>
depends:        
intree:         Y
vermagic:       3.2.0-4-amd64 SMP mod_unload modversions 
parm:           max_user_bgreq:Global limit for the maximum number of backgrounded requests an unprivileged user can set (uint)
parm:           max_user_congthresh:Global limit for the maximum congestion threshold an unprivileged user can set (uint)

$ objdump -s -j .modinfo ./kernel/fs/fuse/fuse.ko 

./kernel/fs/fuse/fuse.ko:     file format elf64-x86-64

Contents of section .modinfo:
 0000 616c6961 733d6465 766e616d 653a6675  alias=devname:fu
 0010 73650061 6c696173 3d636861 722d6d61  se.alias=char-ma
 0020 6a6f722d 31302d32 32390070 61726d3d  jor-10-229.parm=
 0030 6d61785f 75736572 5f636f6e 67746872  max_user_congthr
 0040 6573683a 476c6f62 616c206c 696d6974  esh:Global limit
 0050 20666f72 20746865 206d6178 696d756d   for the maximum
 0060 20636f6e 67657374 696f6e20 74687265   congestion thre
 0070 73686f6c 6420616e 20756e70 72697669  shold an unprivi
 0080 6c656765 64207573 65722063 616e2073  leged user can s
 0090 65740070 61726d74 7970653d 6d61785f  et.parmtype=max_
 00a0 75736572 5f636f6e 67746872 6573683a  user_congthresh:
 00b0 75696e74 00706172 6d3d6d61 785f7573  uint.parm=max_us
 00c0 65725f62 67726571 3a476c6f 62616c20  er_bgreq:Global 
 00d0 6c696d69 7420666f 72207468 65206d61  limit for the ma
 00e0 78696d75 6d206e75 6d626572 206f6620  ximum number of 
 00f0 6261636b 67726f75 6e646564 20726571  backgrounded req
 0100 75657374 7320616e 20756e70 72697669  uests an unprivi
 0110 6c656765 64207573 65722063 616e2073  leged user can s
 0120 65740070 61726d74 7970653d 6d61785f  et.parmtype=max_
 0130 75736572 5f626772 65713a75 696e7400  user_bgreq:uint.
 0140 6c696365 6e73653d 47504c00 64657363  license=GPL.desc
 0150 72697074 696f6e3d 46696c65 73797374  ription=Filesyst
 0160 656d2069 6e205573 65727370 61636500  em in Userspace.
 0170 61757468 6f723d4d 696b6c6f 7320537a  author=Miklos Sz
 0180 65726564 69203c6d 696b6c6f 7340737a  eredi <miklos@sz
 0190 65726564 692e6875 3e000000 00000000  eredi.hu>.......
 01a0 64657065 6e64733d 00696e74 7265653d  depends=.intree=
 01b0 59007665 726d6167 69633d33 2e322e30  Y.vermagic=3.2.0
 01c0 2d342d61 6d643634 20534d50 206d6f64  -4-amd64 SMP mod
 01d0 5f756e6c 6f616420 6d6f6476 65727369  _unload modversi
 01e0 6f6e7320 00                          ons .           
Example 6.2.1 Example of modinfo output

As you can see above, modinfo is parsing the .modinfo section embedded within the module file to present the details of the module. Example 6.2.2, Putting module info into sections shows how one field, the "author" is put into the module. The code mostly comes from include/linux/module.h.

/*
 * Start at the bottom, and work your way up!
 */

/* Indirect macros required for expanded argument pasting, eg. __LINE__. */
#define ___PASTE(a,b) a##b
#define __PASTE(a,b) ___PASTE(a,b)


#define __UNIQUE_ID(prefix) __PASTE(__PASTE(__UNIQUE_ID_, prefix), __COUNTER__)

/* Indirect stringification.  Doing two levels allows the parameter to be a
 * macro itself.  For example, compile with -DFOO=bar, __stringify(FOO)
 * converts to "bar".
 */

#define __stringify_1(x...)     #x
#define __stringify(x...)       __stringify_1(x)

#define __MODULE_INFO(tag, name, info)                                    \
static const char __UNIQUE_ID(name)[]                                     \
  __used __attribute__((section(".modinfo"), unused, aligned(1)))         \
  = __stringify(tag) "=" info

/* Generic info of form tag = "info" */
#define MODULE_INFO(tag, info) __MODULE_INFO(tag, tag, info)

/*
 * Author(s), use "Name <email>" or just "Name", for multiple
 * authors use multiple MODULE_AUTHOR() statements/lines.
 */
#define MODULE_AUTHOR(_author) MODULE_INFO(author, _author)

/* ---- */

MODULE_AUTHOR("Your Name <your@name.com>");
Example 6.2.2 Putting module info into sections

At first, this looks like a macro nightmare, but it can be unravelled step by step. Starting at the bottom, we see that MODULE_AUTHOR is a wrapper around the more generic __MODULE_INFO macro, which is where most of the magic happens. There, we can see that we are building up a static const char [] variable to hold the string "author=Your Name <your@name.com>". The interesting thing to note is that the variable has an extra parameter __attribute__((section(".modinfo"))) which is telling the compiler to not put this in the data section with all the other variables, but to stash it in its own ELF section called .modinfo. The other parameters stop the variable being optimised away because it looks unused and to ensure we pack the variables in next to each other by specifying the alignment.

There is extensive use of stringification macros, which are rather arcane tricks used within the C pre-processor to ensure that strings and definitions can live together. The only other trick is the use of the __COUNTER__ special define provided by gcc, which provides a unique, incrementing value each time it is called; this allows multiple MODULE_AUTHOR calls to in the one file and not end up with the same variable name.

We can inspect the symbols placed in the final module to see the end result:

$ objdump --syms ./fuse.ko | grep modinfo

0000000000000000 l    d  .modinfo	0000000000000000 .modinfo
0000000000000000 l     O .modinfo	0000000000000013 __UNIQUE_ID_alias1
0000000000000013 l     O .modinfo	0000000000000018 __UNIQUE_ID_alias0
000000000000002b l     O .modinfo	0000000000000011 __UNIQUE_ID_alias8
000000000000003c l     O .modinfo	000000000000000e __UNIQUE_ID_alias7
000000000000004a l     O .modinfo	0000000000000068 __UNIQUE_ID_max_user_congthresh6
00000000000000b2 l     O .modinfo	0000000000000022 __UNIQUE_ID_max_user_congthreshtype5
00000000000000d4 l     O .modinfo	000000000000006e __UNIQUE_ID_max_user_bgreq4
0000000000000142 l     O .modinfo	000000000000001d __UNIQUE_ID_max_user_bgreqtype3
000000000000015f l     O .modinfo	000000000000000c __UNIQUE_ID_license2
000000000000016b l     O .modinfo	0000000000000024 __UNIQUE_ID_description1
000000000000018f l     O .modinfo	000000000000002a __UNIQUE_ID_author0
00000000000001b9 l     O .modinfo	0000000000000011 __UNIQUE_ID_alias0
00000000000001d0 l     O .modinfo	0000000000000009 __module_depends
00000000000001d9 l     O .modinfo	0000000000000009 __UNIQUE_ID_intree1
00000000000001e2 l     O .modinfo	000000000000002f __UNIQUE_ID_vermagic0
Example 6.2.3 Module symbols in .modinfo sections

6.3 Linker Scripts

In Example 3.3.2.2, Sections we described how sections make up segments in the final output. It is the job of the linker to build these sections into segments; to achieve this it uses a linker script which describes where segments start, what sections go into them and various other parameters.

Example 6.3.1, The default linker script shows an extract of the default linker script, which the linker will show when given its verbose flag via specifying -Wl,--verbose to gcc. The default script is built-in to the linker and is based on the standard API definitions to create working user-space programs for the building platform.

$ gcc -Wl,--verbose -o test test.c
GNU ld (GNU Binutils for Debian) 2.26
...
using internal linker script:
==================================================
OUTPUT_FORMAT("elf64-x86-64", "elf64-x86-64",
	      "elf64-x86-64")
OUTPUT_ARCH(i386:x86-64)
ENTRY(_start)
SEARCH_DIR("=/usr/local/lib/x86_64-linux-gnu"); ...
SECTIONS
{
  /* Read-only sections, merged into text segment: */
  PROVIDE (__executable_start = SEGMENT_START("text-segment", 0x400000)); . = SEGMENT_START("text-segment", 0x400000) + SIZEOF_HEADERS;
  .interp         : { *(.interp) }
  .note.gnu.build-id : { *(.note.gnu.build-id) }
  .hash           : { *(.hash) }
  .gnu.hash       : { *(.gnu.hash) }
  .dynsym         : { *(.dynsym) }
  .dynstr         : { *(.dynstr) }
  .gnu.version    : { *(.gnu.version) }
  .gnu.version_d  : { *(.gnu.version_d) }
  .gnu.version_r  : { *(.gnu.version_r) }
  .rela.dyn       :
    {
    ...
    }
  PROVIDE (etext = .);
  .rodata         : { *(.rodata .rodata.* .gnu.linkonce.r.*) }
  .rodata1        : { *(.rodata1) }
...
Example 6.3.1 The default linker script

You can roughly see how the linker script specifies things like starting locations and what sections to group into various segments. In the same way -Wl is used to pass the --verbose to the linker via gcc, customised linker scripts can be provided by flags. Regular user-space developers are unlikely to need to override the default linker script. However, often very customised applications such as kernel builds require customised linker scripts.