Hi,
How can I create a static rust binary (without glibc) like for a scratch image? I can only get distroless or a slim debian to work so far.
Hi,
How can I create a static rust binary (without glibc) like for a scratch image? I can only get distroless or a slim debian to work so far.
You'll need to statically link to libc. The MUSL-based targets will do that.
But how? As far as I understand now by installing the musl library and setting:
ENV CC=musl-gcc
ENV RUSTFLAGS='-C target-feature=+crt-static'
and specifying the target - it should work - but it does not.
FROM clux/muslrust:1.78.0-stable AS chef
USER root
RUN cargo install cargo-chef
WORKDIR /app
RUN apt-get update && apt-get install lld clang musl-dev musl-tools -y
FROM chef as planner
COPY . .
RUN cargo chef prepare --recipe-path recipe.json
FROM chef as builder
COPY --from=planner /app/recipe.json recipe.json
RUN rustup target add aarch64-unknown-linux-musl
RUN cargo chef cook --release --target aarch64-unknown-linux-musl --recipe-path recipe.json
COPY . .
ENV SQLX_OFFLINE true
ENV CC=musl-gcc
ENV RUSTFLAGS='-C target-feature=+crt-static'
RUN cargo build --frozen --release --bin zero2prod --target aarch64-unknown-linux-musl
FROM scratch AS runtime
USER 1000
COPY --from=builder /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/
COPY --from=builder /app/target/release/zero2prod /zero2prod
COPY configuration /configuration
ENV APP_ENVIRONMENT production
ENV RUST_LOG="error,$BINARY_NAME=info"
ENTRYPOINT ["/zero2prod"]
this line was missing
COPY --from=builder /app/target/aarch64-unknown-linux-musl/release/zero2prod /zero2prod
I was using
COPY --from=builder /app/target/release/zero2prod /zero2prod
Alright,
a few considerations about MUSL & static linking.
The standard way you are describing works fine for single threaded applications that are not performance sensitive. However, the moment you use anything async, concurrency, or even remotely performance sensitive you will measure between a 10x and 40x drop in performance on all metrics, for example nr requests, latency, whatever.
The TWEAG Blog explains the details quite well. Also, I added more links to background information and benchmarks in the repo linked at the end of this post:
SUPERCHARGING YOUR RUST STATIC EXECUTABLES WITH MIMALLOC
The root-cause isn't MUSL per se, but the default memory allocator used by MUSL because it suffers severely from thread congestion. The new ng allocator isn's solving anything yet, unfortunately.
What can you do to have a single static binary in a scratch image while still persevering good performance?
In a nutshell, you have to patch the memory allocator in MUSL. I know this sounds crazy, but I actually did that and stuffed everything in an easy to use multi-arch build container.
The way you use this is very similar to any other build container:
Build the build container and publish it in the registry of your choice. I've added an example for GCP cloud build and a GH action for that.
Use that build container to build your Rust code as single static binary.
Stuff this single static binary in a scratch image and your done.
This is identical to any other multi-stage build with the exception that the patched memory allocator makes sure your static binary runs as fast and in some cases even faster than the glibc linked version.
The thing with scratch images is, you have to add SSL certificates and a user / group manually to ensure your can handle secure networking and run in unprivileged mode. Again, I made a simple example with Dockerfile for that.
Hope that helps.
Links:
Build container with patched memory allocator. Feel free to fork & customize:
Rust scratch image with SSL certificates and unprivileged user.
Sample Dockerfile to build a static binary with patched memory allocator and stuff it into a custom scratch image with SSL certs.
Thanks for the great answer!
This brings me to some follow-up question though.
# Strip the binary to reduce its size
RUN strip -s /cmdb
Isnt this no longer required when setting the following for cargo
s release profile:
[profile.release]
strip = true
Furthermore: How to support multi-arch neatly?
You are executing:
RUN cargo build -p cmdb --release --target aarch64-unknown-linux-musl
whereas I would be required to run:
RUN cargo chef cook --release --target x86_64-unknown-linux-musl --recipe-path recipe.json
RUN cargo build --frozen --release --target x86_64-unknown-linux-musl
COPY --from=builder /app/target/x86_64-unknown-linux-musl/release/zero2prod /zero2prod
RUN cargo chef cook --release --target aarch64-unknown-linux-musl --recipe-path recipe.json
RUN cargo build --frozen --release --target aarch64-unknown-linux-musl --bin zero2prod
COPY --from=builder /app/target/aarch64-unknown-linux-musl/release/zero2prod /zero2prod
This branching logic is rather inconvenient i.e., especially if other targets such as windows, osx, or embedded are also needed.
What is a good way of structuring this for supporting multiple targets? Hand-coding a multi-step dockerfile for each target seems rather inefficient. But loops inside of the dockerfile themselves are also not really practical. How would this look like in i.e. github actions?
What I mean is how would this fit into such a workflow which previously netly produced a multi arch image for each option:
- name: Build and push Docker image
uses: docker/build-push-action@v5.0.0
with:
context: .
file: ./Dockerfile
platforms: linux/amd64,linux/arm64
I would assume that the initial stage would be required to run without qemu - otherwise, the build process is painfully slow. Then the cross-compiled binary must be imported in a platform-specific scratch image. However, I cannot yet determine how this would work in a loop/GHA matrix build (such as Cross Compiling Rust Projects in GitHub Actions - House Absolute(ly Pointless)).
Oh man, @geoheil, you just opened the worst possible can of worms on planet GitHub: Multi-arch Docker builds.
True, the strip isn't needed when you set the strip cargo flag. However, a lot of folks don't configure strip in cargo, so that's why it's there. As I said, feel free to fork and customize according to your requirements.
Now, to the elephant: Multi-Arch in Docker.
In short, it's broken beyond repair.
Also, full discloser, meanwhile, I build all my images without Docker using rules_oci in Bazel so the files & repos I shared are all from my legacy build system that was decommissioned about 3 months ago.
Multi-Arch; I don't even know where to start... So let's go back to the very beginning.
The actual problem is, as you correctly pointed out, QEMU. The moment you build an ARM build on an Intel box, your 10 minutes build become 2 -3 hours, if it even finishes. It's bad beyond words.
A) The official way with GH Build matrix ya-da-da-la-la. I did this only once and the build time exploded to multi-hours.. I suppose the Docker company is quite desperate to upsell its pointless cloud build.
B) Build ARM native on ARM boxes.
This one is more reasonable and considerate. Basically, you take the template I've shared, replace X86 with your ARM architecture, and run the entire build on a native ARM box. There are a small handful of Docker build companies that officially offer ARM boxes for native ARM builds. Cost a bit of money, but it's money well spend compared to the official solution, which is just borderline stupid.
That is more or less the solution I was referring to when I said the repo supports multi-arch "nicely". I apologize, I should have made this explicit and elaborated in more details.
No matter how you turn it, the more images you build, the more the ancient architecture, broken caching, and, totally broken design of Docker becomes self-evident. Specifically,
Because all these things are broken, Docker always falls back to rebuild the entire container. There are only very few corner cases were incremental container image build actually works and you better thank God every day when it does work for you.
There was a time when builds were flaky by default, OS dependencies were prevalent, so containerizing the entire thing was the way to go and Docker solves this problem quite well to this day. I think you still see this kind of operations in larger enterprises meaning Docker has a reason of existance for enterprises and everyone else that can't escape those legacy builds.
In the current day and age, with contemporary languages such as Go, Rust, Zig or similar, the only question you have to answer is do you really need a standard C lib (glibc). More often, the answer is no. There are still perfectly valid cases when you need a C-lib, for example everything related to files or databases and then you just use a minimal distroless or minimal chainguard image. Due to various unpatched glic security images in distroless for those cases when a c-lib is needed, I actually switched to the more secure chainguard base image. Anyway, I digress.
Given that a Docker / OCI image is just a bunch of tar files lumped together with some meta-data files, it's incomprehensible to me why anyone would use the ancient Docker engine to build them. Obviously, if you just build one or two images, the discussion is mood. However, if you build containers in bulk, this is really thorny issue that slows down your entire CI pipeline.
Because my repo was growing at a staggering rate months over months, I switched to Bazel to speed up builds. Cargo is an excellent tool and I still use it in parallel to Bazel mostly to lean on code and dependency analysis tools, but at some point you just hit the wall with Cargo. It's just normal when projects grow and there is a good discussion in this forum related to scaling Rust repos.
The transition to Bazel was only driven by rapid code growth. Otherwise, I would still use Cargo as primary build tool to this day.
Anyway, with Bazel there are two ways to build containers, one is the legacy rules_docker that require a docker installation that comes with all the aforementioned problems or the newer rules_oci that are docker-free. Obviously, I opted for the second option. Because rules_oci leans heavily on Bazel's caching and parallel execution, you only build containers where the code actually changed and you build them all in parallel. Also, you can offload the entire build to cloud-build farm, which I do with BuildBuddy.
You can build either static Rust binaries with MUSEL as you described initially, or "normal" Rust binaries that run on a distroless / chaingaurd base image with glibc. The patched allocator variation for static binaries is theoretically possible, but it's really hard to configure with Bazel. Multi-stage Docker builds are futile with Bazel because it would nullify the Bazel caching and parallel execution.
Fundamentally, the archtiecture of the build box must be the same as the production / target box to prevent needless cross compilation. Cross compilation actually works really well in Bazel, I've configured it for remote build execution because my Dev box is a M3 Macbook (ARM) whereas the the RBE build server is AMD with the target box also being AMD. The good thing about BuildBuddy is, if you really need ARM builds, you just add another worker node with ARM and configure BuildBuddy to run all ARM builds on only that node. I never did this, but the documentation is quite clear you can do it.
Long story short, previously, a single Docker build for a single Docker container lasted between 10 to 12 minutes. That's actually okay for my taste, except I have so many of those builds. After the migration to Bazel & rules_oci, all container builds complete within 2 -3 minutes, no matter how many containers build thanks to BuildBuddy cloud. Their free tier gives you up to 80 CPU cores, so that's a lot to burn through.
What does it all means?
Thank you! Really interesting.
Are there some good examples (hopefully simple) E2E showcasing how the cross build with bazel, rules_oci would look like?
@geoheil Sure
rules_oci is easy.
You have to pay more attention to rules_rust:
The docs are generally good:
An E2E walkthrough of how to build Rust with Bazel:
In a nutshell, there are two ways to handle Rust dependencies in Bazel:
A) Generate Bazel deps from Cargo.toml.
That is used in the Tweag Example.
The problem is, whenever your Cargo.toml changes, you trigger a full rebuild. Don't do this.
B) Native Bazel dependencies ("Direct Packages"):
This way, you only trigger a rebuild when changing the Rust version otherwise, everything else gets cached and incrementally compiled.
This basically means, you convert deps from Cargo.toml into deps declared in Bazel Workspace.
See the docs:
See this example:
If you go with direct packages declared in Workspace from day one, you save yourself a lot of pain and troubles.
You sent me down a rabbit hole with this information and the links. Thank you!
I have found the GitHub - marvin-hansen/fluvio-examples: Fluvio DeepCausality Example Project many thanks for this great example!
Is there anything remotely similar available for a bazel-based setup?
Also:
You can build either static Rust binaries with MUSEL as you described initially, or "normal" Rust binaries that run on a distroless / chaingaurd base image with glibc. The patched allocator variation for static binaries is theoretically possible, but it's really hard to configure with Bazel.
How would this be handled? Or is it not recommended to patch? Or use the raw image patched as outlined above and then feed this into bazel (which would sound strange)?
In the article you linked (Building a Rust workspace with Bazel) they write: Cargo makes it easy to specify the dependencies and build your Rust project on top of them. In Bazel we also need to explicitly declare all the external dependencies and it would be extremely painful to manually write BUILD
files for every Rust crate our project depends on
However, you recommend:
B) Native Bazel dependencies ("Direct Packages"):
Is there a good way to set this up (less painful)?
Ok, a few things.
The article was referring to an older Bazel version and almost certainly an old rules rust version. In the meantime, whatever configuration you choose, rules rust generates the bulk of those Bazel files for you using some clever macros.
I've quite some experience with Bazel from previous Golang projects so for me the transition and less structured documentation let alone missing bits you only find in GH issues wasn't an obstacle.
In the meantime, the TWEAG post is as goods as it gets so please work your way to bazelfy a single crate and then switch to direct dependencies to experience that nothing bad will happen and more importantly when you add or remove external crates the entire bootstrapping and recompile madness doesn't happen any more. This really kills your CI more than anything else.
Let me come back in one or two days once I've returned to office as I'm currently traveling in Thailand.
Not yet. Sorry, I don't have the time right now to add an example repo.
I was thinking about this issue over the past two days and did some online research. Here is my take away:
The alloc patch might not be needed at all because Rust actually allows swapping in and out the memory allocator given the alloc trait from std gets implemented. I found at least 4 different allocators with these Rust implementations on GH. The way this works is, you just add another crate to your Cargo.toml or bazel Workspace, then override the global allocator in your main or lib file, and just recompile. However, you do need a c compiler on the host and CI machine to compile the allocator dependency of the crate. This should work with standard Cargo builds, Docker and Bazel builds regardless of the target meaning you can cross compile to Musl and get a custom allocator without patching anything.
The big idea is discussed in the GH issue of the MUSL build image:
Why not use mimalloc?
That said, I have to run some tests and benchmarks to confirm that this indeed works for Musl static binaries. It's funny because the same conclusion was drawn in the issue linked above so it's worth sharing some results.
Also, its important to add that once you run fully static binaries, you cannot access files (use network or a DB instead) and, most importantly, default DNS support from the std lib is gone. Default DNS is gone because, for legacy reasons, the default OS DNS resolver requires a host file, which you cannot access because, well, files are not supported in staticly linked binaries. This might not be the full truth in all details, but that's roughly what I've encountered in practice. In Golang, I solved this with an external go module, which was then configured to use the internal (k8s) DNS server using an env variable. In Rust, I basically do the same thing.
The topic of static binaries with same or similar performance to the normal binaries is very close to my internal security requirements to eliminate glibc in production as much as possible meaning once I've figured this out, I share the demo repo. Hard to tell how long this will take because I have some other things on my plate, but it's on my list.
Here you go. Feel free to fork this repo.
It's my playground to investigate building static rust binaries with different memory allocators with Bazel. Currently, it just builds a simple dynamic linked binary stored in a distroless image, but lets see how far custom static binaries can be build with Bazel.
wow thanks! I will explore it very soon.
This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.