In C, a translation unit is the input unit into a C compiler that is “translated” into a object file. It is roughly a C source file after all the preprocessing of #include and #def has been done.

Lately I found out that translation unit is not only meaningful for compilation concepts like one definition rule but it is also significant in linking. When we link to a static library comprising of several objects file, the linking happens not at the library level, but at the object file level, thus the translation unit devision matters.

We are going to do some tests to check whether this is the case.

Construction of tests

The test contains building two programs with different translation unit devision but same code contents thus same behavior. This is done by a common part which both programs do the same, and a part where two programs differ.

Common

We first generate two shared libraries, of which one contains a function we will use, and the other contains a function we will never use.

shared.h

#include <stdio.h>

void shared_in_use(void);
void shared_not_in_use(void);

shared_in_use.c

#include "shared.h"

void shared_in_use(void) {
    puts("shared_in_use called");
}

shared_not_in_use.c

#include "shared.h"

void shared_not_in_use(void) {
    puts("shared_not_in_use called");
}

Let’s generate two shared objects

gcc -fpic -c shared_in_use.c
gcc -fpic -c shared_not_in_use.c
gcc -shared -o libshared_in_use.so shared_in_use.o
gcc -shared -o libshared_not_in_use.so shared_not_in_use.o

The two test programs do the same thing by calling the same function. As a result, we only have one main.c. The difference of these two test programs are how the static libraries are linked but not the code contents.

main.c

#include "static.h"

int main(void) {
    static_in_use();
    return 0;
}

where

static.h

void static_in_use(void);
void static_not_in_use(void);

Difference

The first static library libstatic_not_combined.a is generated from two source files.

static_in_use.c

#include "static.h"
#include "shared.h"

void static_in_use(void) {
    shared_in_use();
}

static_not_in_use.c

#include "static.h"
#include "shared.h"

void static_not_in_use(void) {
    shared_not_in_use();
}
gcc -c static_in_use.c
gcc -c static_not_in_use.c
ar rcs libstatic_not_combined.a static_in_use.o static_not_in_use.o

The second static library libstatic_combined.a is generated from one combined source file instead of the two separate files above.

static_combined.c

#include "shared.h"
#include "static.h"

void static_in_use(void) {
    shared_in_use();
}

void static_not_in_use(void) {
    shared_not_in_use();
}
gcc -c static_combined.c
ar rcs libstatic_combined.a static_combined.o

And we compile and link to two executables

gcc -o main_not_combined main.c -L. -lstatic_not_combined -lshared_in_use -lshared_not_in_use

and

gcc -o main_combined main.c -L. -lstatic_combined -lshared_in_use -lshared_not_in_use

Finally, let’s add the current directory to dynamically library search path

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:.

Result

Let’s run these two executables. We get

$ ./main_not_combined
shared_in_use called
./main_combined
shared_in_use called

Ok. The outputs are the same. What about their run-time depedencies? Check the dynamic loading sections of the two binaries, we get

$ ldd ./main_not_combined
    linux-vdso.so.1 =>  (0x00007fff5fbfe000)
    libshared_in_use.so => ./libshared_in_use.so (0x00007faecc326000)
    libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007faecbf42000)
    /lib64/ld-linux-x86-64.so.2 (0x00007faecc52a000)
$ ldd ./main_combined
    linux-vdso.so.1 =>  (0x00007fffaeaba000)
    libshared_in_use.so => ./libshared_in_use.so (0x00007f13733ed000)
    libshared_not_in_use.so => ./libshared_not_in_use.so (0x00007f13731eb000)
    libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f1372e06000)
    /lib64/ld-linux-x86-64.so.2 (0x00007f13735f1000)

It turns out even main_combined does not actually use anything in libshared_not_in_use.so, it still links to it, and opens the shared library at run time. A further question is “what about the static_not_in_use”? List the symbols from these two binaries, we get

$ nm ./main_not_combined | grep _in_use
                 U shared_in_use
00000000004006ad T static_in_use
$ nm ./main_combined | grep _in_use
                 U shared_in_use
                 U shared_not_in_use
000000000040071d T static_in_use
0000000000400728 T static_not_in_use

We do see static_not_in_use included in main_combined, which is where the reference to shared_not_in_use comes from.

So it seems to be a good idea to consider splitting the code into separate files if they are more or less decoupled. This can make the final executable more compact, and suffer fewer “undefined reference” problems.

Test source can be downloaded in here