Symbolic collisions

Another problem is that of symbol collisions. The semantics of ELF are unfortunately based on the old static linking days. When a program is executed, the dependency tree of the binary is walked by /lib/ld-linux.so (which is the ELF dynamic linker). If a program, "foo" depends on libbar.so, which in turn is linked against libpng.so, then foo, libbar.so and libpng.so will all be mapped into memory. Semantically, all the symbols from these objects are dumped into one big pot, and this is the crux of the problem. When performing symbol fixup, the glibc ELF interpreter will always choose the first symbol that matches, regardless of what the object being processed is linked against.

For example, let's take our foo binary, and link it against two libraries, libA and libB. libA is in turn linked against libA1, and libB is linked against libB1. Now libA1 and libB1 are different libraries, BUT they both define a symbol called someFunction() . They have the same name, but do completely different things. The expectation is to expect libA to be linked to the definition in libA1, and libB to be linked to the definition in libB1, that is what makes intuitive sense (and is what happens on Windows). But that's not what happens on Linux. They will BOTH be linked to the symbol in libA1, because that's the one that came first. D'oh. This usually results in a nearly instant segfault on startup.

OK, so why does this cause problems with binary portability? Well, although having two libraries that declare a function with the same name is unusual, having two different versions of the same library in use at once is a lot more common. Libpng has 2 major versions in wide usage, libpng.so.2 and libpng.so.3 - they are source compatible (but not binary compatible). If I compile on a Linux distro that uses libpng.so.3, then my program will also be linked against libpng.so.3. If a user then wishes to run it on an older distro, say one which was compiled against libpng.so.2, they'll need to install the newer version for my app to work. Normally we say, so what? Unfortunately, my app (let's pretend it's a game) doesn't just link against libpng.so.3, it also links against libSDL.

Now libSDL links against libSDL_image, which in turn links against libpng.so.2 because it was compiled by the distro vendor. So, now when my app is loaded, 2 different versions of libpng, both libpng.so.2 and libpng.so.3 will be linked in together, and things go boom. Not good.

Note that the two versions are source, but not ABI compatible. That means the user can fix the problem by recompiling my app against libpng.so.2 - this time. It's not always that easy.

As a result, binaries can occasionally end up tied, often unknowingly, to the set of libraries the developer used when compiling. Running it on another distro might work, but there are no guarantees.

Luckily, there is a solution to this problem in the form of an extension to the ELF symbol fixup rules, originally implemented by Sun in Solaris. Direct and grouped fixup allows scope restriction of the symbols, preventing such collisions. Unluckily, it's not implemented by glibc. Volunteers? The problem is big enough that at some point, we (the autopackage hackers) may have to down tools and go work on glibc for a few months.