Isn't there a more compact way of creating fixed-size (potentially stack-allocated) strings than this:
use std::str;
let x: &[u8] = &[b'a', b'b'];
let stack_str: &str = str::from_utf8(x).unwrap();
Isn't a macro motivated here? That macro should check that the characters fed as literals are all correct unicode letters and statically infer the strings size from UTF-8 encoding of those letters.
A crate for this, anyone?
BTW: Why isn't the block of code above syntax highlighted?
I need it for efficient allocation and compact storage of a large number (million) of small strings (words). Kind of like only the small part of a small-length-optimized vector/string
Google Chrome (and probably Firefox aswell) makes heavy use of such strings/vectors because it saves both memory and gains performance.
Does this look like it has what you're looking for? It's a string API backed by a statically-sized array. It's like a small-string-optimized string, but without the ability to switch to heap storage.
You probably shouldn't put millions of anything on the stack though.
If you write millions of literal "foo"/"bar"/"baz" strings, they should all be "allocated" compactly in the binary's .rodata section (or platform equivalent), so I guess I still don't understand what you're after.
Storing a Vec<ArrayString<[u8; SIZE]>> will waste the extra space if you have a lot of strings that don't need the entire size. If you're making a read-only collection of strings, you can get more space efficiency with an arena allocator. That way, instead of laying out your strings in an ArrayString<[u8; 8]> like
first...second..third...fourth.. (. = unused space), you can store them as
firstsecondthirdfourth, with a Vec<&str> storing the offset and length of each string.
Rust doesn't have great support for compile-time calculations at the moment. There's a decent chance the optimizer will do constant propagation and see that the error branch of the unwrap is dead code, though.
I'm aware of that, but most words in human languages fit in the 3 machinewords needed by Vec. On a 64-bit system that means 24 bytes. That can hold 23 english letters plus one byte for length. Most english words are smaller than that.
If there are some strings that don't fit in the fixed-size storage, then ArrayString alone won't work. You'll need either a reimplementation of C++-style small-string-optimized strings, an enum storing both a String and an ArrayString (which wastes space for the discriminant), or two separate arrays, one Vec<String> for the long ones and one Vec<ArrayString<[u8; 24]> for the rest.
You could also consider storing Vec<Id>, where Id is a type that stores the offset and length of a given slice in your backing storage, but uses less than 64 bits for each.
I had a similar issue some time ago, but for byte-strings. The solution was to write a macro evaluating the bytesize of the string during compile time.