What is the overhead with the format! macro?

It pretty much says it in the title. What is the overhead of the format! macro when compared to String::add, building a string from a Vec<char>, etc.?

The format! macro calls the format function, which pre-allocates a String based on the "estimated capacity" of the format args, to try to avoid reallocations during formatting:

https://doc.rust-lang.org/1.41.1/src/alloc/fmt.rs.html#568-573

Depending on the number and position of the parameters in the format string, this estimated capacity ranges from 0 to twice the length of the format string:

https://doc.rust-lang.org/1.41.1/src/core/fmt/mod.rs.html#334

In terms of allocations, the overhead should generally be the same or lower than using String::add repeatedly.

Building a String from a Vec<char> (e.g. using vec.into_iter().collect()) ends up calling the Extend<char> implementation for String, which pre-allocates one byte per char:

https://doc.rust-lang.org/1.41.1/src/alloc/string.rs.html#1808-1815

For a string of all ASCII characters, this is optimal. For a string containing any non-ASCII characters, this will under-allocate initially, and will require up to two reallocations as the string grows (because the actual length of the UTF-8 string can be up to three bytes per char).

3 Likes

If you're calling format lots of times, it's also worth noting that the Display::fmt or Debug::fmt calls themselves are virtual calls, which are slightly more expensive than regular function calls and aren't inlined at all as often.

fast_fmt is a crate which demonstrates some of the inefficiencies of std::fmt by reimplementing it, but it unfortunately hasn't gotten a lot of love, and doesn't have a format!-equivalent (only manually formatting individual arguments akin to fmt::Display).

4 Likes

Also note that the Vec<char> is not what you want in most cases. A char in Rust is backed by 4 byte integer underneath, as it represents unicode code point which spans 0 to 0x10FFFF excluding surrogate pairs in the middle. ['h', 'e', 'l', 'l', 'o'] takes 20 bytes in memory. Vec<char> also wastes more memory than String, and more importantly, it cannot be converted into String via reusing buffer or copying via memcpy but requires UTF-8 encoding, which can be a lot more expensive than memcpy.

3 Likes

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.