Statically sized strings


#1

Isn’t there a more compact way of creating fixed-size (potentially stack-allocated) strings than this:

use std::str;
let x: &[u8] = &[b'a', b'b'];
let stack_str: &str = str::from_utf8(x).unwrap();

Isn’t a macro motivated here? That macro should check that the characters fed as literals are all correct unicode letters and statically infer the strings size from UTF-8 encoding of those letters.

A crate for this, anyone?

BTW: Why isn’t the block of code above syntax highlighted?


#2

Why do you want it on the stack? Do you need it to be mutable, or have dynamic contents?


#3

I want it to be immutable and fixed-sized.

Interestingly, it could provide non-iterator-invalidating append (with capacity).


#4

OK, asked another way, why don’t you want a normal &'static str literal?


#5

I need it for efficient allocation and compact storage of a large number (million) of small strings (words). Kind of like only the small part of a small-length-optimized vector/string :slight_smile:

Google Chrome (and probably Firefox aswell) makes heavy use of such strings/vectors because it saves both memory and gains performance.

At the cost of code complexity, of course.


#6

Does this look like it has what you’re looking for? It’s a string API backed by a statically-sized array. It’s like a small-string-optimized string, but without the ability to switch to heap storage.


#7

You probably shouldn’t put millions of anything on the stack though.

If you write millions of literal "foo"/"bar"/"baz" strings, they should all be “allocated” compactly in the binary’s .rodata section (or platform equivalent), so I guess I still don’t understand what you’re after.


#8

I’m gonna store them in a Vec of such fixed-length strings. The Vec must, of course, be heap-allocated.

So the goal here is to have only one level of heap-allocation.


#9

Yep, that’s what I had in mind. Thanks!


#10

The call to unwrap here

let mut string = ArrayString::<[_; 3]>::from("foo").unwrap();

shouldn’t be needed. Assuming Rust can figure out byte-count of UTF-8 string literals.


#11

Storing a Vec<ArrayString<[u8; SIZE]>> will waste the extra space if you have a lot of strings that don’t need the entire size. If you’re making a read-only collection of strings, you can get more space efficiency with an arena allocator. That way, instead of laying out your strings in an ArrayString<[u8; 8]> like

first...second..third...fourth.. (. = unused space), you can store them as

firstsecondthirdfourth, with a Vec<&str> storing the offset and length of each string.


#12

Rust doesn’t have great support for compile-time calculations at the moment. There’s a decent chance the optimizer will do constant propagation and see that the error branch of the unwrap is dead code, though.


#13

That is interesting. Can I use a specific allocator “locally” for this type?


#14

I’m aware of that, but most words in human languages fit in the 3 machinewords needed by Vec. On a 64-bit system that means 24 bytes. That can hold 23 english letters plus one byte for length. Most english words are smaller than that.


#15

I think it’s because you used four space indents:

let x = 5;

vs triple graves:

let x = 5;

#16

Thanks.


#17

If there are some strings that don’t fit in the fixed-size storage, then ArrayString alone won’t work. You’ll need either a reimplementation of C+±style small-string-optimized strings, an enum storing both a String and an ArrayString (which wastes space for the discriminant), or two separate arrays, one Vec<String> for the long ones and one Vec<ArrayString<[u8; 24]> for the rest.

You could also consider storing Vec<Id>, where Id is a type that stores the offset and length of a given slice in your backing storage, but uses less than 64 bits for each.


#18

Hi nordlow

I had a similar issue some time ago, but for byte-strings. The solution was to write a macro evaluating the bytesize of the string during compile time.

https://crates.io/crates/bytestool

I used it for
https://crates.io/crates/releasetag
https://crates.io/crates/sizedbytes