Treeshaking wasm?

I have a Rust/wasm32 binary output of size 5.3M.

This is not too bad concerned to modern JS spa webapps. However, it is a bit surprising.

Besdies modifying Cargo.toml to optimize for size, is there anything else to reduce size? [Perhaps treeshaking unused code?]

Thanks!

There are many other ways to minimize the size of Rust binaries in general (enabling lto, etc). See this repository for a comprehensive and updated list of techniques: GitHub - johnthagen/min-sized-rust: 🦀 How to minimize Rust binary size 📦

For WASM specifically, you may check the Rust and WebAssembly Book: Shrinking .wasm Size - Rust and WebAssembly
This one wasn’t updated since 2021 however. I especially recommend you to not use wee_alloc as they recommend, this crate is not maintained anymore and there is an unbounded memory leak bug and possibly other bad things lurking in it (you can check the issue tracker for more). An alternative is lol_alloc, but keep in mind that it’s not considered "production-grade" either, use at your own risk.

3 Likes

“Tree shaking” isn't a step you can take here because that's a concept applying to programs distributed in a theoretically editable form (originally Lisp memory images, then JavaScript source code), where you have to choose to discard unused things because the easiest path is to not perform the analysis and keep all of them.

With compiled programs, the linker or the compiler will automatically leave out unused items (unless they are marked as to be kept regardless, such as with #[used] or #[no_mangle]).

3 Likes

Is there a "size profiler" of sorts? To output things like

crate foo; file bar.rs is responsible for 100kB of final output size

etc ...

Yes, twiggy (I still recommend you read the resources I posted above, twiggy was mentioned there)

1 Like

wasm-opt, either the cli program or Rust library, does shrink the size of Rust generated WASM most(?) of the time.
But if you didn't get to 5MB because of a debug build, the result will still be unusable big.

1 Like

This is a release build. :frowning:

Sounds like you need to look at your dependency tree and start looking at what you do or don't need. The best way to make your binary smaller is to just not include as much code.

Does anything here look particularly offensive ?

anyhow = "1.0.51"
anymap = "0.12.1"
async-channel = "1.9.0"
async-mutex = "1.4.0"
base64 = "0.13.0"
byteorder = "1.4.3"
console_error_panic_hook =    "0.1.7"
field-offset = "0.3.4"
flate2 = "1.0.23"
futures = "0.3.5"
futures-lite = "1.12.0"
futures-signals = { version = "0.3.20", default-features=false, features = [] }
getrandom = { features = ["js" ] , version = "0.2"}
image =                                                    { version = "0.24.7",  default-features=false, features = ["png", "jpeg" ]}
instant = { version = "0.1.11", features = [ "wasm-bindgen"] }
js-sys = "0.3.58"
json = "0.12.4"
jsonrpc-core = "18.0.0"
lazy_static = "1.4.0"
lz4_flex = { version = "0.9.3"  }
memoffset =                   "0.8.0"
nanoserde = "0.1.33"
nanoserde-derive =  "0.1.20"
once_cell = "1.7.2"
regex = "1.5.4"
smallvec = "1.9.0"
tar = "0.4.38"
wasm-bindgen = "^0.2.81"
wasm-bindgen-futures = "0.4.31"
wgpu =                        { version = "0.16.1", features = ["webgl"] }

web-sys = { version = "^0.3.61", features = [
    "AbortController",
    "AbortSignal",
    "AudioContext",
    "AudioDestinationNode",
    "AudioNode",
    "AudioParam",
    "BinaryType",
    "Blob",
    "BlobPropertyBag",
    "CanvasRenderingContext2d",
    "console",
    "CssStyleDeclaration",
    "DedicatedWorkerGlobalScope",
    "Document",
    "DragEvent",
    "DynamicsCompressorNode",
    "DynamicsCompressorOptions",
    "Element",
    "ErrorEvent",
    "Event",
    "EventTarget",
    "FileReader",
    "GainNode",
    "Headers",
    "HtmlBodyElement",
    "HtmlButtonElement",
    "HtmlCanvasElement",
    "HtmlDivElement",
    "HtmlIFrameElement",
    "HtmlPreElement",
    "HtmlSelectElement",
    "HtmlSpanElement",
    "HtmlStyleElement",
    "ImageData",
    "IirFilterNode",
    "KeyboardEvent",
    "Location",
    "MessageChannel",
    "MessageEvent",
    "MessagePort",
    "MouseEvent",
    "Navigator",
    "NodeList",
    "OscillatorNode",
    "OscillatorType",
    "PannerNode",
    "PeriodicWave",
    "ProgressEvent",
    "Request",
    "RequestInit",
    "RequestMode",
    "Response",
    "ScriptProcessorNode",
    "StereoPannerNode",
    "Storage",
    "UiEvent",
    "Url",
    "WaveShaperNode",
    "WebGl2RenderingContext",
    "WebGlBuffer",
    "WebglDrawBuffers",
    "WebGlFramebuffer",
    "WebGlProgram",
    "WebGlRenderbuffer",
    "WebGlShader",
    "WebGlTexture",
    "WebGlUniformLocation",
    "WebSocket",
    "WheelEvent",
    "Window",
    "Worker",
    "WorkerOptions",
    "WorkerType"
]}



[target.'cfg(target_arch = "wasm32")'.dependencies]
winit = { version = "0.28.6", default-features = false }

[still figuring out how to read twiggy wasm output]

Dead code would be caught by wasm-opt. But it shouldn't have been included by the LLVM codegen to begin with.

cargo build ~9mb
after wasmbindgen cli tools: ~5mb
after wasm-opt: ~3mb

Yep. What I'm suggesting is to remove existing code and replace it with something simpler.

Just because it's not detected as dead by the optimiser, doesn't mean it isn't overkill for what you need. The dead code analysis passes also aren't perfect - for example, if I instantiate a trait object but only ever use one method, the other methods will still be marked as used because they need to be put in the vtable. You also have situations which are logically impossible, but not probable, but that's more of a micro-optimisation.

5mb of WebAssembly is pretty chunky, and if you want to cut that down you'll need to start making architecture changes and cutting dependencies.

It might be better to look at the output from cargo tree because transitive dependencies are a prime candidate for bloat. For example, if jsonrpc-core pulls in tokio, that'll add up quickly.

Similarly, look out for duplicate dependencies, both in versions and functionality.

For example, I can practically guarantee you've already got serde-json in your dependency tree somewhere, so using the json crate means you'll be including two JSON parsers in your project. Similarly, you'll have both nanoserde and serde in your dependency tree, so that's multiple serialisation frameworks.

4 Likes

Yes, but I hesitate to suggest "rewrite it in JavaScript, or, if you really don't want to use JS, in a language that compiles to JS, like Purescript or Rescript. And only after profiling maybe use WASM (not necessarily written in Rust) for the long running, performance critical parts." in a Rust forum :wink:

Is 5MB even unreasonable for ~50k lines of code [not counting dependencies; as counted by cloc] ?

This is what, 100 bytes per ";" ? This actually seems within the limits of reasonableness to me. Especially factoring the blowup from monomorphism and external libraries.

The question isn't if it's unreasonable to use 5MB for a 50k LoC executable (which it is for a size-optimised Version without embedded "resources"), but if you really need that much code for what you are trying to accomplish.
And having megabytes of code (without any other files) is (almost) always way too much for the web. Don't forget that that gigantic WASM-blob isn't something like a "native executable", but has to be interpreted by the browser. Don't get mé wrong, it is nice and interesting as a way to see what is possible using WASM (and Rust), but for "real" use that's not what you want to do.

"People" rightfully complain about the "usual" number of JS's package dependencies, but Rust's package dependency bloat is actually worse by size (maybe not number), it just doesn't matter most of the time, except when it does ;). You have to be really, really careful which crates to use, to keep the number of dependencies small (by size and number).

2 Likes

I'm building a 3D game that runs in the browser. All the code I have written is necessary.

My motivation for optimizing output size is not "how does the end user feel about a 5MB blob" but rather "how much is my CDN going to charge me" :slight_smile:

That's a bit of information that is quite important :wink: For a game to load the user is prepared to wait quite some time (not too much, of course).

2 Likes

You have a few duplications in there:

  • futures and futures-lite provide mostly the same functionality
  • async-channel provides mostly the same functionality as futures-channel, which you might pull in through futures (cargo tree -d is your friend)
  • async-mutex' functionality should also be provided by futures (futures-util)
  • once_cell provides all the functionality of lazy_static and more
  • you are pulling in multiple compression libs, can you avoid that? (flate2, lz4_flex)

You could also try replacing futures-signals with eyeball / eyeball-im (my crates), though I only built them to be easier to use, the artifact size might only be marginally better, or worse than futures-signals.

3 Likes

Also try setting default-features = false for all your dependencies and adding explicit features as needed. In most cases this will have no effect on size since it will just remove some already-dead code (though it would decrease compilation time), but in some cases it will allow the library to simplify its code to handle fewer situations.

1 Like

If I'm reading the output of twiggy correctly, the thing that is killing me is naga from wgpu. However, I already have default-features = false on that crate, and both webgl and wgsl are necessary features. [Why the ???? am I bundling a wgsl -> glsl transpiler?]