I've recently gave a small talk at a local meetup about static compiling in Rust. I'd like to drop here some thoughts, the topic is really interesting.
The slides are available here (org-mode file exportable to HTML via revealJS).
It all began when I began deploying a Rust application server on remote servers. One of the problems of compiled languages (such as Rust) is that the binary you compile sometimes might not work in production because it's linked against a different set of libraries (either system libraries or project dependencies). With interpreted languages or stuff sitting on a bytecode grinder (such as the Java virtual machine) you don't have this problem.
§ Quick preamble to brush off some basics
Here's a dumbed down summary of the problem. A compiled executable often uses external libraries to do its job, example: why would I want to write my json parser when there's a great library written by someone else? I use that library and call it a day.
Libraries are "linked" to an executable by means of a lookup table that says "the function
getJsonFromString
in this external library with specific version X.Y
is found at certain memory
address (available when the library is loaded), so my executable knows where to look when it needs
to use that function. I ship my executable with a mapping of all these external functions I need and
I'm confident that this magic will work.
When I deploy on another workstation, my executable expects to find the function getjsonFromString
in a library with version X.Y
at the same known address. If this check fails, here we have a
runtime error that I could not find the function, my executable crashes.
§ Why this happens
On our workstation we typically have a much more recent Linux distribution than our staging/production, where our code will run. This is one of the reasons why people don't deploy directly from their workstation but use a CI/CD service to build their stuff on an environment virtually identical to the target machine and then deploy the artifacts.
As such, if you have a CI/CD, you can stop reading as you're probably already doing the right thing.
However static compiling has some interesting use cases. In my case, I mentioned before I was exploring a different way to deploy services. Another interesting case is building a tool I'm using for running database migrations ([diesel_cli], more details below). Others might want to build slim Docker containers with Alpine. Maybe also useful when cross-compiling?
And here is where musl enters the game.
§ What is musl?
musl (all in lowercase) is a toolchain that reimplements an alternate libc
and compiles static and
dynamic binaries for a number of architecture (namely x86/x86_64, ARM32/64 and a few others). For
more information, see the homepage of the project. The goal of musl is to provide small
and fast binaries, see a comparison on their wiki.
The Rust programming language supports some target platforms compiled with musl, see the platform support page. I will underline that while musl is suitable for both static and dynamic linking, on Rust is currently only used for static linking. This can be a source of confusion (see last paragraph for some more context).
The use case for musl is for (as stated in project website) multimedia appliances, routers/firewalls, VoIP devices, rescue disks, mobile phones, kiosks, light desktop systems. It's also the toolchain of choice for the Alpine Linux distribution.
§ Dynamic linking (the usual way)
In order to illustrate the difference between static and dynamic linking, I'll be using an example project that uses Diesel cli, a command line tool to handle database migrations.
One of my motivation was to try to statically link the Diesel cli, this is an example of why you might want that:
- The Diesel cli has a couple of dependencies: three different database clients (Postgres, MySQL/MariaDB and sqlite3) and OpenSSL (required by the Postgres client).
- It's a tool that you use all around and you want to keep it self-contained from the host: if I only have Postgres on the server, why do I need to install also the MySQL client package? I'd rather have a "fat" binary embedding all the clients and keep the host system clean of unneeded dependency.
- I don't care about timely updating the dependencies of this tool because it is under my direct control and it is used not so often.
Let's start with our usual workflow. I will use the usual Rust build tool cargo
(a sort of make
for the rustc
compiler).
$ cargo build
Compiling cc v1.0.66
Compiling pkg-config v0.3.19
Compiling proc-macro2 v1.0.24
Compiling autocfg v1.0.1
Compiling unicode-xid v0.2.1
Compiling libc v0.2.81
Compiling bitflags v1.2.1
Compiling syn v1.0.57
Compiling pq-sys v0.4.6
Compiling byteorder v1.3.4
Compiling openssl v0.10.32
Compiling foreign-types-shared v0.1.1
Compiling lazy_static v1.4.0
Compiling cfg-if v1.0.0
Compiling openssl-probe v0.1.2
Compiling foreign-types v0.3.2
Compiling openssl-sys v0.9.60
Compiling libsqlite3-sys v0.18.0
Compiling quote v1.0.8
Compiling diesel_derives v1.4.1
Compiling diesel v1.4.5
Compiling using-diesel v0.1.0 (./examples/using-diesel)
error: linking with `cc` failed: exit code: 1
|
= note: "cc" "-Wl,--as-needed" "-Wl,-z,noexecstack" "-m64" "-Wl,--eh-frame-hdr" "-L"
... "-Wl,-Bdynamic" "-lpq" "-lssl" "-lcrypto" "-lgcc_s" "-lutil" "-lrt" "-lpthread"
"-lm" "-ldl" "-lc"
= note: /usr/bin/ld: cannot find -lpq
collect2: error: ld returned 1 exit status
Hmm, a great start. It won't compile because the linker doesn't find -lpq
(translated: I cannot
link to the pq
library, because I cannot find the Postgres headers). The other dependencies in
that list of -lXX
are satisfied, example the OpenSSL headers (-lssl
).
Let's install the missing dependencies (in my case an apt install libpq-dev
), recompile and observe the resulting binary:
$ du -h target/release/using-diesel
5.0M target/release/using-diesel
$ strip target/release/using-diesel
$ du -h target/release/using-diesel
1.9M target/release/using-diesel
Ok we have a binary of around 5mb, when stripped of all debug symbols goes down to 1.9mb.
Let's check the dependencies:
$ ldd target/release/using-diesel
linux-vdso.so.1 (0x00007ffdc0fab000)
libpq.so.5 => /lib/x86_64-linux-gnu/libpq.so.5 (0x00007f37bd3e7000)
libssl.so.1.1 => /lib/x86_64-linux-gnu/libssl.so.1.1 (0x00007f37bd354000)
libcrypto.so.1.1 => /lib/x86_64-linux-gnu/libcrypto.so.1.1 (0x00007f37bd060000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f37bd046000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f37bd024000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f37bcee0000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f37bced8000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f37bcd13000)
/lib64/ld-linux-x86-64.so.2 (0x00007f37bd660000)
libgssapi_krb5.so.2 => /lib/x86_64-linux-gnu/libgssapi_krb5.so.2 (0x00007f37bccc0000)
libldap_r-2.4.so.2 => /lib/x86_64-linux-gnu/libldap_r-2.4.so.2 (0x00007f37bcc6a000)
libkrb5.so.3 => /lib/x86_64-linux-gnu/libkrb5.so.3 (0x00007f37bcb90000)
libk5crypto.so.3 => /lib/x86_64-linux-gnu/libk5crypto.so.3 (0x00007f37bcb5e000)
libcom_err.so.2 => /lib/x86_64-linux-gnu/libcom_err.so.2 (0x00007f37bcb58000)
libkrb5support.so.0 => /lib/x86_64-linux-gnu/libkrb5support.so.0 (0x00007f37bcb49000)
liblber-2.4.so.2 => /lib/x86_64-linux-gnu/liblber-2.4.so.2 (0x00007f37bcb38000)
libresolv.so.2 => /lib/x86_64-linux-gnu/libresolv.so.2 (0x00007f37bcb1e000)
libsasl2.so.2 => /lib/x86_64-linux-gnu/libsasl2.so.2 (0x00007f37bcb01000)
libgnutls.so.30 => /lib/x86_64-linux-gnu/libgnutls.so.30 (0x00007f37bc900000)
libkeyutils.so.1 => /lib/x86_64-linux-gnu/libkeyutils.so.1 (0x00007f37bc8f9000)
libp11-kit.so.0 => /lib/x86_64-linux-gnu/libp11-kit.so.0 (0x00007f37bc7c5000)
libidn2.so.0 => /lib/x86_64-linux-gnu/libidn2.so.0 (0x00007f37bc7a4000)
libunistring.so.2 => /lib/x86_64-linux-gnu/libunistring.so.2 (0x00007f37bc622000)
libtasn1.so.6 => /lib/x86_64-linux-gnu/libtasn1.so.6 (0x00007f37bc60a000)
libnettle.so.8 => /lib/x86_64-linux-gnu/libnettle.so.8 (0x00007f37bc5c2000)
libhogweed.so.6 => /lib/x86_64-linux-gnu/libhogweed.so.6 (0x00007f37bc579000)
libgmp.so.10 => /lib/x86_64-linux-gnu/libgmp.so.10 (0x00007f37bc4f8000)
libffi.so.7 => /lib/x86_64-linux-gnu/libffi.so.7 (0x00007f37bc4ec000)
whoa, some dependencies in there. However we notice that are all satisfied (there is no "not found" anywhere in that list).
Let's also run readelf
, This should give a clearer view of what our binary needs in term of shared
libraries it depends on:
$ readelf -d target/release/using-diesel | grep NEEDED
0x0000000000000001 (NEEDED) Shared library: [libpq.so.5]
0x0000000000000001 (NEEDED) Shared library: [libssl.so.1.1]
0x0000000000000001 (NEEDED) Shared library: [libcrypto.so.1.1]
0x0000000000000001 (NEEDED) Shared library: [libgcc_s.so.1]
0x0000000000000001 (NEEDED) Shared library: [libpthread.so.0]
0x0000000000000001 (NEEDED) Shared library: [libm.so.6]
0x0000000000000001 (NEEDED) Shared library: [libdl.so.2]
0x0000000000000001 (NEEDED) Shared library: [libc.so.6]
0x0000000000000001 (NEEDED) Shared library: [ld-linux-x86-64.so.2]
Great! Now we have our application, time to deploy it! I'll simulate a remote server with a Debian Buster virtual machine (the stable Debian distribution at the time of writing).
$ scp -C target/release/using-diesel vm-debian10:
$ ssh vm-debian10
$ ldd ./using-diesel
./using-diesel: /lib/x86_64-linux-gnu/libm.so.6: version `GLIBC_2.29' not found
(required by ./using-diesel)
linux-vdso.so.1 (0x00007ffe41582000)
libpq.so.5 => not found
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f00c8c9f000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f00c8c7e000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f00c8afb000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f00c8af6000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f00c8935000)
/lib64/ld-linux-x86-64.so.2 (0x00007f00c8ea2000)
sigh we have a couple of problems here. We check with ldd
again the shared dependencies and we
discover that the Postgres library is missing ("libpq.so.5 => not found") but more importantly
there's an older version of the GLIBC! This is a huge pain in the ass, certainly I cannot upgrade
it.
§ Let's build a static binary
This time we will use the musl toolchain to statically link all dependencies and bake them into our binary. The musl toolchain can be installed on your workstation and live side-by-side with the gcc one, but since I am lazy and don't want to pollute my workstation, I will use a Docker container for that.
Namely I will download the image of this nice project rust-musl-builder because it already has all the dependencies I need (postgres and openssl). If more dependencies are needed, one can always rebuild the image adding what is needed.
As per the project instructions I only need to run this:
$ alias rust-musl-builder='docker run --rm -it -v "$(pwd)":/home/rust/src ekidd/rust-musl-builder'
$ rust-musl-builder cargo build --release
Let's inspect the resulting binary (notice that is now under x86_64-unknown-linux-musl
):
$ du -h target/x86_64-unknown-linux-musl/release/using-diesel
8.8M target/x86_64-unknown-linux-musl/release/using-diesel
$ strip target/x86_64-unknown-linux-musl/release/using-diesel
$ du -h target/x86_64-unknown-linux-musl/release/using-diesel
5.0M target/x86_64-unknown-linux-musl/release/using-diesel
$ ldd target/x86_64-unknown-linux-musl/release/using-diesel
not a dynamic executable
$ readelf -d target/x86_64-unknown-linux-musl/release/using-diesel
There is no dynamic section in this file.
The binary is definitely fat (stripped size went from 1.9mb to 5.0mb) but it's completely independent ELF 64-bit LSB executable. Let's upload it:
$ scp -C target/debug/using-diesel vm-debian10:using-diesel.musl
$ ssh vm-debian10
$ ./using-diesel.musl
Hello, world!
No DATABASE_URL set, so doing nothing
yay, it works! And I didn't need to install any additional package on the server.
§ The point of static linking
At this point you might be asking if you should link statically or not. The answer is "it depends". I kept this part as last because it is contentious, there are various opinions in favor and against. Recently also Linus Torvalds expressed an opinion against dynamic linking and the usual Hacker News crowd provided comments that don't help get the general picture, because...
Static linking is not a cure for portability, there are some thoughts when all dependencies are baked into your binary:
-
You are pulling the rug from under the feet of distro maintainers. Now they cannot fix anymore a CVE (a vulnerability) affecting one of more libraries used by your application. You are now personally responsible to rebuild your application + dependencies.
-
On the other hand, you are also shielded by an issue affecting a library installed on the OS (example, I need to link to the MySQL client - which I don't use - but then I'm also affected by a vuln affecting said library).
-
There is a point about statically linked binaries taking more space and resources. You may have in memory some libraries twice, example the OpenSSL installed in the OS and the one embedded in your binary. Though how much this in 2021 is a problem is up for debate.
More interesting opinions from a Gentoo maintainer that links to the Gentoo's wiki and not everyone agrees on these points, though.
This is to say that static linking should be carefully evaluated and there are lots of argument in favor and against, certainly I'm not going to lay out all of them here, but I hope these references will help you take a decision, regardless of how authoritative is the source.
§ The future of musl in Rust
Lately there's an ongoing discussion in the Rust core team about the musl target. An RFC has been opened and it's gathering comments, seems that it will be pushed forward.
The gist of the proposal (to the best of my understanding) is to fix the assumption that "musl == static linking". This means that the default musl target will become dynamic linking and optionally the user can decide to link statically.
The Rust team would like to correct these misconceptions that Rust got it wrong since the beginning about using musl. A musl maintainer explains the musl POV, which I agree with.
However, right or wrong when people associate musl to static linking and the default target
switches to dynamic linking, all hell breaks loose. CIs and Docker containers relying on this
assumption, such as the rust-musl
I mentioned before, will stop working.
There must be a slow, well-thought and clear migration path for this change, in my opinion even subject to an edition-grade upgrade. Later this year Rust will release the "Edition 2021" (a sort of LTS), that could be a good time to set a "watershed line" for this to happen.