5 Working with libraries and the linker

The presence of the dynamic linker provides both some advantages we can utilise and some extra issues that need to be resolved to get a functional system.

5.1 Library versions

One potential issue is different versions of libraries. With only static libraries there is much less potential for problems, as all library code is built directly into the binary of the application. If you want to use a new version of the library you need to recompile it into a new binary, replacing the old one.

This is obviously fairly impractical for common libraries, the most common of course being libc which is included in most all applications. If it were only available as a static library any change would require every single application in the system be rebuilt.

However, changes in the way the dynamic library work could cause multiple problems. In the best case, the modifications are completely compatible and nothing externally visible is changed. On the other hand the changes might cause the application to crash; for example if a function that used to take an int changes to take an int *. Worse, the new library version could have changed semantics and suddenly start silently returning different, possibly wrong values. This can be a very nasty bug to try and track down; when an application crashes you can use a debugger to isolate where the error occurs whilst data corruption or modification may only show up in seemingly unrelated parts of the application.

The dynamic linker requires a way to determine the version of libraries within the system so that newer revisions can be identified. There are a number of schemes a modern dynamic linker can use to find the right versions of libraries.

5.1.1  sonames

Using sonames we can add some extra information to a library to help identify versions.

As we have seen previously, an application lists the libraries it requires in DT_NEEDED fields in the dynamic section of the binary. The actual library is held in a file on disc, usually in /lib for core system libraries or /usr/lib for optional libraries.

To allow multiple versions of the library to exist on disk, they obviously require differing file names. The soname scheme uses a combination of names and file system links to build a hierarchy of libraries.

This is done by introducing the concept of major and minor library revisions. A minor revision is one wholly backwards compatible with a previous version of the library; this usually consists of only bug fixes. A major revision is therefore any revision that is not compatible; e.g. changes the inputs to functions or the way a function behaves.

As each library revision, major or minor, will need to be kept in a separate file on disk, this forms the basis of the library hierarchy. The library name is by convention libNAME.so.MAJOR.MINOR1. However, if every application were directly linked against this file we would have the same issue as with a static library; every time a minor change happened we would need to rebuild the application to point to the new library.

What we really want to refer to is the major number of the library. If this changes, we reasonably are required to recompile our application, since we need to make sure our program is still compatible with the new library.

Thus the soname is the libNAME.so.MAJOR. The soname should be set in the DT_SONAME field of the dynamic section in a shared library; the library author can specify this version when they build the library.

Thus each minor version library file on disc can specify the same major version number in its DT_SONAME field, allowing the dynamic linker to know that this particular library file implements a particular major revision of the library API and ABI.

To keep track of this, an application called ldconfig is commonly run to create symbolic links named for the major version to the latest minor version on the system. ldconfig works by running through all the libraries that implement a particular major revision number, and then picks out the one with the highest minor revision. It then creates a symbolic link from libNAME.so.MAJOR to the actual library file on disc, i.e. libNAME.so.MAJOR.MINOR.

XXX : talk about libtool versions

The final piece of the hierarchy is the compile name for the library. When you compile your program, to link against a library you use the -lNAME flag, which goes off searching for the libNAME.so file in the library search path. Notice however, we have not specified any version number; we just want to link against the latest library on the system. It is up to the installation procedure for the library to create the symbolic link between the compile libNAME.so name and the latest library code on the system. Usually this is handled by your package management system (dpkg or rpm). This is not an automated process because it is possible that the latest library on the system may not be the one you wish to always compile against; for example if the latest installed library were a development version not appropriate for general use.

The general process is illustrated below.

Describing the soname system
Figure 5.1.1.1  sonames
5.1.1.1 How the dynamic linker looks up libraries

When the application starts, the dynamic linker looks at the DT_NEEDED field to find the required libraries. This field contains the soname of the library, so the next step is for the dynamic linker to walk through all the libraries in its search path looking for it.

This process conceptually involves two steps. Firstly the dynamic linker needs to search through all the libraries to find those that implement the given soname. Secondly the file names for the minor revisions need to be compared to find the latest version, which is then ready to be loaded.

We mentioned previously that there is a symbolic link setup by ldconfig between the library soname and the latest minor revision. Thus the dynamic linker should need to only follow that link to find the correct file to load, rather than having to open all possible libraries and decide which one to go with each time the application is required.

Since file system access is so slow, ldconfig also creates a cache of libraries installed in the system. This cache is simply a list of sonames of libraries available to the dynamic linker and a pointer to the major version link on disk, saving the dynamic linker having to read entire directories full of files to locate the correct link. You can analyse this with /sbin/ldconfig -p; it actually lives in the file /etc/ldconfig.so.cache. If the library is not found in the cache the dynamic linker will fall back to the slower option of walking the file system, thus it is important to re-run ldconfig when new libraries are installed.

5.2 Finding symbols

We've already discussed how the dynamic linker gets the address of a library function and puts it in the PLT for the program to use. But so far we haven't discussed just how the dynamic linker finds the address of the function. The whole process is called binding, because the symbol name is bound to the address it represents.

The dynamic linker has a few pieces of information; firstly the symbol that it is searching for, and secondly a list of libraries that that symbol might be in, as defined by the DT_NEEDED fields in the binary.

Each shared object library has a section, marked SHT_DYNSYM and called .dynsym which is the minimal set of symbols required for dynamic linking -- that is any symbol in the library that may be called by an external program.

5.2.1 Dynamic Symbol Table

In fact, there are three sections that all play a part in describing the dynamic symbols. Firstly, let us look at the definition of a symbol from the ELF specification

typedef struct {
          Elf32_Word    st_name;
          Elf32_Addr    st_value;
          Elf32_Word    st_size;
          unsigned char st_info;
          unsigned char st_other;
          Elf32_Half    st_shndx;
} Elf32_Sym;
Example 5.2.1.1 Symbol definition from ELF
Table 5.2.1.1 ELF symbol fields
FieldValue
st_name An index to the string table
st_value Value - in a relocatable shared object this holds the offset from the section of index given in st_shndx
st_size Any associated size of the symbol
st_info Information on the binding of the symbol (described below) and what type of symbol this is (a function, object, etc).
st_other Not currently used
st_shndx Index of the section this symbol resides in (see st_value

As you can see, the actual string of the symbol name is held in a separate section (.dynstr; the entry in the .dynsym section only holds an index into the string section. This creates some level of overhead for the dynamic linker; the dynamic linker must read all of the symbol entries in the .dynsym section and then follow the index pointer to find the symbol name for comparison.

To speed this process up, a third section called .hash is introduced, containing a hash table of symbol names to symbol table entries. This hash table is pre-computed when the library is built and allows the dynamic linker to find the symbol entry much faster, generally with only one or two lookups.

5.2.2 Symbol Binding

Whilst we usually say the process of finding the address of a symbol refers is the process of binding that symbol, the symbol binding has a separate meaning.

The binding of a symbol dictates its external visibility during the dynamic linking process. A local symbol is not visible outside the object file it is defined in. A global symbol is visible to other object files, and can satisfy undefined references in other objects.

A weak reference is a special type of lower priority global reference. This means it is designed to be overridden, as we will see shortly.

Below we have an example C program which we analyse to inspect the symbol bindings.

$ cat test.c
static int static_variable;

extern int extern_variable;

int external_function(void);

int function(void)
{
        return external_function();
}

static int static_function(void)
{
        return 10;
}

#pragma weak weak_function
int weak_function(void)
{
        return 10;
}

$ gcc -c test.c
$ objdump --syms test.o

test.o:     file format elf32-powerpc

SYMBOL TABLE:
00000000 l    df *ABS*  00000000 test.c
00000000 l    d  .text  00000000 .text
00000000 l    d  .data  00000000 .data
00000000 l    d  .bss   00000000 .bss
00000038 l     F .text  00000024 static_function
00000000 l    d  .sbss  00000000 .sbss
00000000 l     O .sbss  00000004 static_variable
00000000 l    d  .note.GNU-stack        00000000 .note.GNU-stack
00000000 l    d  .comment       00000000 .comment
00000000 g     F .text  00000038 function
00000000         *UND*  00000000 external_function
0000005c  w    F .text  00000024 weak_function

$ nm test.o
         U external_function
00000000 T function
00000038 t static_function
00000000 s static_variable
0000005c W weak_function

Example 5.2.2.1 Examples of symbol bindings

Notice the use of #pragma to define the weak symbol. A pragma is a way of communicating extra information to the compiler; its use is not common but occasionally is required to get the compiler to do out of the ordinary operations.x

We inspect the symbols with two different tools; in both cases the binding is shown in the second column; the codes should be quite straight forward (are are documented in the tools man page).

5.2.2.1 Overriding symbols

It is often very useful for a programmer to be able to override a symbol in a library; that is to subvert the normal symbol with a different definition.

We mentioned that the order that libraries is searched is given by the order of the DT_NEEDED fields within the library. However, it is possible to insert libraries as the last libraries to be searched; this means that any symbols within them will be found as the final reference.

This is done via an environment variable called LD_PRELOAD which specifies libraries that the linker should load last.

$ cat override.c
#define _GNU_SOURCE 1
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/types.h>
#include <dlfcn.h>

pid_t getpid(void)
{
        pid_t (*orig_getpid)(void) = dlsym(RTLD_NEXT, "getpid");
        printf("Calling GETPID\n");

        return orig_getpid();
}

$ cat test.c
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

int main(void)
{
        printf("%d\n", getpid());
}

$ gcc -shared -fPIC -o liboverride.so override.c -ldl
$ gcc -o test test.c
$ LD_PRELOAD=./liboverride.so ./test
Calling GETPID
15187
Example 5.2.2.1.1 Example of LD_PRELOAD

In the above example we override the getpid function to print out a small statement when it is called. We use the dlysm function provided by libc with an argument telling it to continue on and find the next symbol called getpid.

5.2.2.1.1 Weak symbols over time

The concept of the weak symbol is that the symbol is marked as a lower priority and can be overridden by another symbol. Only if no other implementation is found will the weak symbol be the one that it used.

The logical extension of this for the dynamic loader is that all libraries should be loaded, and any weak symbols in those libraries should be ignored for normal symbols in any other library. This was indeed how weak symbol handling was originally implemented in Linux by glibc.

However, this was actually incorrect to the letter of the Unix standard at the time (SysVr4). The standard actually dictates that weak symbols should only be handled by the static linker; they should remain irrelevant to the dynamic linker (see the section on binding order below).

At the time, the Linux implementation of making the dynamic linker override weak symbols matched with SGI's IRIX platform, but differed to others such as Solaris and AIX. When the developers realised this behaviour violated the standard it was reversed, and the old behaviour relegated to requiring a special environment flag (LD_DYNAMIC_WEAK) be set.

5.2.2.2 Specifying binding order

We have seen how we can override a function in another library by preloading another shared library with the same symbol defined. The symbol that gets resolved as the final one is the last one in the order that the dynamic loader loads the libraries.

Libraries are loaded in the order they are specified in the DT_NEEDED flag of the binary. This in turn is decided from the order that libraries are passed in on the command line when the object is built. When symbols are to be located, the dynamic linker starts at the last loaded library and works backwards until the symbol is found.

Some shared libraries, however, need a way to override this behaviour. They need to say to the dynamic linker "look first inside me for these symbols, rather than working backwards from the last loaded library". Libraries can set the DT_SYMBOLIC flag in their dynamic section header to get this behaviour (this is usually set by passing the -Bsymbolic flag on the static linkers command line when building the shared library).

What this flag is doing is controlling symbol visibility. The symbols in the library can not be overridden so could be considered private to the library that is being loaded.

However, this loses a lot of granularity since the library is either flagged for this behaviour, or it is not. A better system would allow us to make some symbols private and some symbols public.

5.2.2.3 Symbol Versioning

That better system comes from symbol versioning. With symbol versioning we specify some extra input to the static linker to give it some more information about the symbols in our shared library.

$ cat Makefile
all: test testsym

clean:
        rm -f *.so test testsym

liboverride.so : override.c
        $(CC) -shared -fPIC -o liboverride.so override.c

libtest.so : libtest.c
        $(CC) -shared -fPIC -o libtest.so libtest.c

libtestsym.so : libtest.c
        $(CC) -shared -fPIC -Wl,-Bsymbolic -o libtestsym.so libtest.c

test : test.c libtest.so liboverride.so
        $(CC) -L. -ltest -o test test.c

testsym : test.c libtestsym.so liboverride.so
        $(CC) -L. -ltestsym -o testsym test.c

$ cat libtest.c
#include <stdio.h>

int foo(void) {
        printf("libtest foo called\n");
        return 1;
}

int test_foo(void)
{
        return foo();
}

$ cat override.c
#include <stdio.h>

int foo(void)
{
        printf("override foo called\n");
        return 0;
}

$ cat test.c
#include <stdio.h>

int main(void)
{
        printf("%d\n", test_foo());
}

$ cat Versions
{global: test_foo;  local: *; };

$ gcc -shared -fPIC -Wl,-version-script=Versions -o libtestver.so libtest.c

$ gcc -L. -ltestver -o testver test.c

$ LD_LIBRARY_PATH=. LD_PRELOAD=./liboverride.so ./testver
libtest foo called

100000574 l     F .text	00000054              foo
000005c8 g     F .text	00000038              test_foo
Example 5.2.2.3.1 Example of symbol versioning

In the simplest case as above, we simply state if the symbol is global or local. Thus in the case above the foo function is most likely a support function for test_foo; whilst we are happy for the overall functionality of the test_foo function to be overridden, if we do use the shared library version it needs to have unaltered access nobody should modify the support function.

This allows us to keep our namespace better organised. Many libraries might want to implement something that could be named like a common function like read or write; however if they all did the actual version given to the program might be completely wrong. By specifying symbols as local only the developer can be sure that nothing will conflict with that internal name, and conversely the name he chose will not influence any other program.

An extension of this scheme is symbol versioning. With this you can specify multiple versions of the same symbol in the same library. The static linker appends some version information after the symbol name (something like @VER) describing what version the symbol is given.

If the developer implements a function that has the same name but possibly a binary or programatically different implementation he can increase the version number. When new applications are built against the shared library, they will pick up the latest version of the symbol. However, applications built against earlier versions of the same library will be requesting older versions (e.g. will have older @VER strings in the symbol name they request) and thus get the original implementation. XXX : example