How does wasm "string formatting machinery" bloat work?

When reading about optimizing a Rust program for wasm w.r.t. binary size I often see the advice to avoid the "string formatting machinery": format!(), .to_string() and so on.

Now the question I have is: does each instance of format!() add an equal amount of bloat, i.e. having 10 instances of the same format string bloats the app 10x compared to just having one, OR does "the string formatting machinery" amortize the more you use it?

Advice about avoiding string formatting here: Shrinking .wasm Size - Rust and WebAssembly

As far as I know, formatting machinery in the standard library takes great care to use dyn Trait internally as much as possible to avoid bloat from monomorphization (which could result in code generated for every usage). So it should be more of a “you include all the formatting machinery once, if you use it at all” kind of situation, but once you do that, you can use it as much as you like. Though of course, also for every type used, you’ll also generate at least a little bit of code for the respective Display implementation. But at least individual invocations of format! with different format strings, will only be represented in building a data structure representing them, not in new specific code (beyond the code that builds up said data structure).

On the decision of using trait objects in formatting machinery: This trade-off is worth it presumably, as user-readable output via formatting, especially once you get to printing stuff, should be sufficiently cold code that the possible performance impact of less monomorphization isn’t bad. It can however mean that for simple value-to-string conversions (e.g. just turning integers into String), formatting via format! and friends is slightly less performant than doing it with code specific to this job[1], and also, if that’s all you use the “formatting machinery” for, i.e. all the code that a call to format! would depend on, then that’s probably more bloat (although, as explained above, only once) than if you had some specific integer-to-string conversion.


  1. so, as an extreme case/example: maybe don’t use format! in a hot loop for logic like counting the number of decimal digits in a number - use algorithms on integers for that instead - mod operator is our friend ^^ ↩︎

2 Likes

One concern to keep in mind that even if your code doesn't use string formatting, some transitive dependency might (especially if the log or tracing crates appear in the tree, which is highly likely). So that no matter what, you may not be able to get rid of it completely.

And this might not be a big deal, because string formatting is actually pretty small. It shouldn't stand out except for degenerate cases where the WASM binary does basically nothing useful because it has very little code to begin with.

In other words, if sting formatting is useful, there isn't really any reason to try to get rid of it. You might instead consider adding the log and tracing feature flags that conditionally compile a minimal logging level. Which gets rid of a bunch of trace! and debug! logs that you will never look at anyway. These are the kinds of things you should be looking to optimize in applications.

1 Like

Is it really motivated by performance, though? I think it would be absolutely impossible to implement something like fmt::Arguments using generics, because it needs to refer to a variable number of things to be formatted. Maybe possible if the language had variadic generics, but even in that case, it would probably be very annoying to use.

In my efforts to create a type-safe embedded DSL for SQL-like database operations, I encountered a problem very similar to string formatting, which was the interpolation of various, statically-typed arguments into a SQL query (except that they only generate placeholders in the SQL string and store the serialized value elsewhere). It turned out to be the only place where I simply had to use trait objects as an internal representation, even though the API was still statically typed, because there was no other way to express deferred conversion of an arbitrary number of Rust types into SQL values.

I think the std::fmt mechanism is probably constrained by the same problem.

2 Likes

Maybe some parts of the decision on favor of trait objects help with a simpler implementation, too. However details such as for example that fmt::write and thus also fmt::Write::write_fmt that calls it, works with a &mut dyn fmt::Write instead of a generic, seem to be motivated by nothing else than a performance vs. code bloat trade-off.


Regarding fmt::Arguments, I wouldn't be surprised if there was a way to design something like that without trait objects, e. g. through something like nested tuples (S, (T, (U, ()))) to work around non-existing varargs. But I of course haven't actually tried anything like this before.