Can Rust be fast, if String manipulation all happens on the heap? Granted, slices and GC-lessness mitigate this somewhat, but string-processing can still be a painful lot of allocating!
I did a (beginners) proof of concept of strings on the stack. Naturally they can't grow beyond the allocated memory, but that's often fine. I have 2 types, FixedString
for formatting stuff like e.g. ISO-dates or UUIDs. And SizedString
of any length within a fixed buffer, for known-max-size things like numbers or natural dates (depending on longest month- and day-name).
Any short string fully fitting about into the 24 bytes, which String
needs for its metadata alone, is a natural winner. The rest of what Bencher found is startling. The times within each type grow fairly plausibly with size, so I hope I got black_box right.
I run 9 string types, except incompatible flexstr, kstring & smol_str (all immutable), through the same 2 test scenarios:
- Base instantiates (copying in a &str) and modifies one byte
- Move does the same, but moves it out of a block once (which for my types means also copying the full data)
These 2 tests are run for an input of a round length, and again for the same length + 1 (just overflowing a round memory size.) This gives 4 related tests per type. The factors from one type to another are stunning. String only gradually starts to compensate the heap overhead around 4 kibibytes. And the types optimized to have an advantage up to 24 bytes… don't. I have reformatted the Bencher output, putting 4 times ns/iter onto 1 row:
Len Type Base Move Base + 1 Move + 1
2 ArrayString<2> 3 ns ±0 10 ns ±1 3 ns ±0 10 ns ±0
ArrayString<4> 3 ns ±0 10 ns ±1 7 ns ±0 10 ns ±0
FixedString<2> 12 ns ±0 7 ns ±1 7 ns ±0 10 ns ±1
SizedString<4> 7 ns ±0 10 ns ±0 6 ns ±0 65 ns ±17
SizedString<2> 7 ns ±0 68 ns ±2 7 ns ±0 10 ns ±3
String 158 ns ±12 196 ns ±131 157 ns ±9 340 ns ±275
CompactString<2> 181 ns ±10 253 ns ±11 180 ns ±10 256 ns ±23
SmartString<LazyCompact> 201 ns ±11 319 ns ±19 201 ns ±42 326 ns ±113
SmartString<Compact> 202 ns ±101 317 ns ±27 201 ns ±9 393 ns ±102
4 FixedString<4> 4 ns ±1 7 ns ±0 7 ns ±1 10 ns ±1
ArrayString<4> 4 ns ±0 10 ns ±2 7 ns ±0 10 ns ±3
ArrayString<8> 8 ns ±1 10 ns ±0 7 ns ±0 10 ns ±0
SizedString<8> 17 ns ±0 23 ns ±2 17 ns ±3 23 ns ±11
SizedString<4> 7 ns ±0 10 ns ±0 5 ns ±0 65 ns ±13
String 161 ns ±16 186 ns ±9 153 ns ±13 183 ns ±11
CompactString<4> 182 ns ±22 252 ns ±10 180 ns ±11 252 ns ±10
SmartString<Compact> 202 ns ±12 321 ns ±20 201 ns ±34 318 ns ±10
SmartString<LazyCompact> 201 ns ±10 318 ns ±11 201 ns ±38 324 ns ±56
8 FixedString<8> 3 ns ±0 8 ns ±1 7 ns ±0 10 ns ±0
ArrayString<8> 7 ns ±0 10 ns ±0 10 ns ±1 14 ns ±0
ArrayString<16> 14 ns ±2 17 ns ±0 14 ns ±1 17 ns ±0
SizedString<16> 19 ns ±0 23 ns ±2 14 ns ±0 24 ns ±2
SizedString<8> 19 ns ±1 22 ns ±1 19 ns ±2 23 ns ±6
String 159 ns ±25 188 ns ±10 164 ns ±136 195 ns ±10
CompactString<8> 167 ns ±34 234 ns ±10 174 ns ±6 245 ns ±11
SmartString<LazyCompact> 208 ns ±9 336 ns ±126 208 ns ±10 317 ns ±14
SmartString<Compact> 255 ns ±126 337 ns ±53 208 ns ±6 318 ns ±40
16 FixedString<16> 3 ns ±0 7 ns ±0 7 ns ±0 10 ns ±0
ArrayString<32> 14 ns ±1 17 ns ±0 14 ns ±3 18 ns ±2
ArrayString<16> 14 ns ±0 17 ns ±1 21 ns ±6 21 ns ±2
SizedString<16> 11 ns ±0 22 ns ±2 19 ns ±0 27 ns ±11
SizedString<32> 18 ns ±0 23 ns ±5 19 ns ±1 24 ns ±4
String 159 ns ±137 190 ns ±11 166 ns ±43 193 ns ±15
CompactString<16> 163 ns ±9 234 ns ±14 181 ns ±8 253 ns ±16
SmartString<Compact> 212 ns ±26 321 ns ±42 209 ns ±10 320 ns ±24
SmartString<LazyCompact> 214 ns ±42 320 ns ±15 208 ns ±7 321 ns ±22
24 FixedString<24> 7 ns ±0 10 ns ±0 7 ns ±0 10 ns ±1
SizedString<24> 20 ns ±5 22 ns ±2 17 ns ±1 22 ns ±2
SizedString<48> 21 ns ±1 26 ns ±3 19 ns ±4 24 ns ±1
ArrayString<48> 17 ns ±0 21 ns ±6 86 ns ±5 50 ns ±4
ArrayString<24> 17 ns ±1 21 ns ±2 76 ns ±12 65 ns ±6
String 170 ns ±46 182 ns ±42 167 ns ±117 212 ns ±143
CompactString<24> 130 ns ±6 280 ns ±505 291 ns ±48 332 ns ±59
SmartString<LazyCompact> 444 ns ±40 526 ns ±33 440 ns ±228 541 ns ±203
SmartString<Compact> 443 ns ±37 527 ns ±124 438 ns ±124 702 ns ±533
32 FixedString<32> 7 ns ±0 10 ns ±1 10 ns ±0 14 ns ±1
SizedString<32> 19 ns ±2 25 ns ±3 20 ns ±2 24 ns ±5
SizedString<64> 23 ns ±2 25 ns ±3 17 ns ±2 23 ns ±1
ArrayString<32> 25 ns ±2 28 ns ±6 79 ns ±16 65 ns ±6
ArrayString<64> 89 ns ±5 95 ns ±14 86 ns ±10 54 ns ±3
String 167 ns ±29 201 ns ±39 167 ns ±39 199 ns ±63
CompactString<32> 288 ns ±50 333 ns ±58 297 ns ±106 330 ns ±55
SmartString<Compact> 440 ns ±84 546 ns ±120 442 ns ±89 537 ns ±41
SmartString<LazyCompact> 441 ns ±124 539 ns ±177 668 ns ±492 540 ns ±136
64 FixedString<64> 14 ns ±1 18 ns ±1 18 ns ±1 21 ns ±7
SizedString<64> 23 ns ±2 25 ns ±2 25 ns ±2 28 ns ±1
SizedString<128> 32 ns ±3 37 ns ±7 27 ns ±9 28 ns ±2
ArrayString<64> 29 ns ±3 32 ns ±16 85 ns ±6 71 ns ±7
ArrayString<128> 108 ns ±6 111 ns ±8 119 ns ±7 79 ns ±8
String 173 ns ±69 200 ns ±158 169 ns ±83 200 ns ±29
CompactString<64> 285 ns ±51 328 ns ±70 303 ns ±59 338 ns ±37
SmartString<Compact> 440 ns ±89 553 ns ±118 456 ns ±56 570 ns ±370
SmartString<LazyCompact> 436 ns ±43 552 ns ±60 457 ns ±126 623 ns ±302
128 FixedString<128> 28 ns ±2 32 ns ±3 46 ns ±3 47 ns ±13
SizedString<128> 35 ns ±23 36 ns ±4 58 ns ±12 66 ns ±39
SizedString<256> 61 ns ±15 59 ns ±9 59 ns ±10 66 ns ±10
ArrayString<128> 46 ns ±3 50 ns ±6 104 ns ±8 86 ns ±8
ArrayString<256> 223 ns ±21 225 ns ±61 217 ns ±20 127 ns ±15
String 184 ns ±35 202 ns ±19 211 ns ±47 236 ns ±21
CompactString<128> 322 ns ±47 356 ns ±78 332 ns ±49 362 ns ±59
SmartString<Compact> 468 ns ±72 557 ns ±106 479 ns ±77 576 ns ±111
SmartString<LazyCompact> 470 ns ±112 557 ns ±114 482 ns ±136 586 ns ±94
256 FixedString<256> 46 ns ±4 50 ns ±3 49 ns ±4 50 ns ±4
SizedString<256> 61 ns ±12 60 ns ±6 55 ns ±7 57 ns ±11
SizedString<512> 79 ns ±6 82 ns ±17 56 ns ±37 58 ns ±47
ArrayString<256> 54 ns ±4 65 ns ±45 220 ns ±20 138 ns ±29
String 215 ns ±83 232 ns ±49 217 ns ±29 241 ns ±46
ArrayString<512> 302 ns ±27 309 ns ±42 279 ns ±29 155 ns ±12
CompactString<256> 338 ns ±40 373 ns ±82 342 ns ±41 376 ns ±88
SmartString<LazyCompact> 490 ns ±93 583 ns ±66 516 ns ±82 610 ns ±119
SmartString<Compact> 493 ns ±159 592 ns ±117 541 ns ±287 599 ns ±59
512 FixedString<512> 74 ns ±2 79 ns ±5 75 ns ±25 72 ns ±6
SizedString<512> 79 ns ±7 80 ns ±13 81 ns ±43 81 ns ±16
SizedString<1024> 134 ns ±21 136 ns ±35 79 ns ±12 81 ns ±12
ArrayString<512> 90 ns ±7 95 ns ±29 319 ns ±76 203 ns ±36
String 241 ns ±89 265 ns ±50 242 ns ±30 263 ns ±64
CompactString<512> 366 ns ±53 397 ns ±32 368 ns ±102 402 ns ±41
ArrayString<1024> 466 ns ±37 448 ns ±62 433 ns ±53 238 ns ±25
SmartString<Compact> 595 ns ±89 668 ns ±56 587 ns ±99 683 ns ±154
SmartString<LazyCompact> 593 ns ±75 668 ns ±134 606 ns ±344 709 ns ±479
1024 FixedString<1024> 134 ns ±34 136 ns ±28 135 ns ±22 164 ns ±165
SizedString<1024> 132 ns ±15 136 ns ±15 133 ns ±20 179 ns ±54
SizedString<2048> 270 ns ±152 261 ns ±64 152 ns ±9 202 ns ±56
ArrayString<1024> 149 ns ±23 151 ns ±27 491 ns ±56 325 ns ±77
String 293 ns ±107 320 ns ±82 299 ns ±245 451 ns ±351
CompactString<1024> 425 ns ±88 466 ns ±87 448 ns ±142 475 ns ±88
ArrayString<2048> 753 ns ±99 730 ns ±137 727 ns ±81 446 ns ±90
SmartString<Compact> 685 ns ±130 767 ns ±125 687 ns ±129 996 ns ±271
SmartString<LazyCompact> 680 ns ±170 777 ns ±101 688 ns ±151 1077 ns ±848
2048 SizedString<2048> 249 ns ±98 260 ns ±50 480 ns ±63 478 ns ±24
FixedString<2048> 343 ns ±51 275 ns ±46 465 ns ±58 457 ns ±53
ArrayString<2048> 342 ns ±78 267 ns ±39 866 ns ±102 560 ns ±183
SizedString<4096> 686 ns ±123 670 ns ±97 499 ns ±41 499 ns ±171
String 830 ns ±107 808 ns ±130 1002 ns ±236 1022 ns ±252
CompactString<2048> 1234 ns ±320 918 ns ±74 1069 ns ±195 1135 ns ±235
SmartString<Compact> 1741 ns ±210 1851 ns ±427 2164 ns ±377 2164 ns ±2
ArrayString<4096> 2839 ns ±646 2249 ns ±216 2162 ns ±185 1201 ns ±126
SmartString<LazyCompact> 2206 ns ±526 1885 ns ±394 2361 ns ±279 2165 ns ±598
4096 SizedString<4096> 663 ns ±125 667 ns ±94 919 ns ±111 931 ns ±94
FixedString<4096> 808 ns ±126 804 ns ±77 822 ns ±272 816 ns ±106
SizedString<8192> 1152 ns ±270 1131 ns ±129 922 ns ±96 919 ns ±69
String 1011 ns ±119 1212 ns ±154 1057 ns ±438 1369 ns ±225
CompactString<4096> 1136 ns ±304 1362 ns ±226 1329 ns ±139 1185 ns ±101
ArrayString<4096> 923 ns ±103 919 ns ±91 2522 ns ±476 1826 ns ±773
SmartString<LazyCompact> 2440 ns ±384 2802 ns ±1 2797 ns ±229 2538 ns ±441
SmartString<Compact> 2415 ns ±323 2707 ns ±362 2454 ns ±426 3152 ns ±236
ArrayString<8192> 5932 ns ±3 4559 ns ±691 4251 ns ±382 2344 ns ±828
8192 SizedString<8192> 1132 ns ±217 1128 ns ±112 1630 ns ±509 1595 ns ±128
FixedString<8192> 1575 ns ±110 1589 ns ±262 1580 ns ±140 1592 ns ±143
String 1887 ns ±477 1900 ns ±299 1964 ns ±187 2094 ns ±360
CompactString<8192> 2082 ns ±157 2121 ns ±500 2103 ns ±213 2646 ns ±1
ArrayString<8192> 1587 ns ±124 1603 ns ±137 4731 ns ±760 2833 ns ±388
SmartString<LazyCompact> 4337 ns ±584 4423 ns ±633 4381 ns ±959 4282 ns ±309
SmartString<Compact> 6373 ns ±4 4422 ns ±850 4364 ns ±482 4290 ns ±907
SizedString<16384> 12003 ns ±1 6921 ns ±786 1883 ns ±747 1598 ns ±174
ArrayString<16384> 12786 ns ±2 11793 ns ±1 12616 ns ±1 7089 ns ±660
Currently my types can mostly "xxx".into()
, Display and modify as unsafe { s.as_bytes_mut()[0] = b'-' }
. Ideally they would share a common trait with String, doing everything they can, i.e. except ::with_capacity()
, .reserve()
or .as_mut_vec()
. For .push()/+=
SizedString would need variants that silently ignore overflowing (???) or panic. Where the compiler can determine that it would under- or overflow, it should give an error (e.g. let s: FixedString<4> = "ab".into();
or let mut s: SizedString<4> = "ab".into(); s += "cde";
)
Most importantly, I'd love an untyped let s = format!()
to directly write into FixedString if size can be calculated at compile-time (e.g. "{year:04/}-{month:02/}-{day:02/}" suggested truncate-if-bigger syntax), SizedString if max-size can be determined, else String. Else context type (e.g. let s: SizedString<20> = format!()
) would be used.
I have a Playground, but sadly it doesn't offer cargo bench
, so you should copy it to your own nightly (only coz of Bencher, my classes are stable) compiler.
NB: It turns out the byte buffer grows in chunks of the size of the length, so I used u16 instead of a more wasteful usize=u64 (which would use 16 bytes for length < 9 and then 24 bytes for < 17). I tried <T=u8>
to optimize space even further while giving control. But since u*
share no trait Unsigned
some of my ops wouldn't compile. Not sure if Rust can currently express this, but ideally the smallest u*
that can hold <const N: usize>
would be used automatically.