The idea behind this is to have a third String class that owns its content like a String but has an immutable length.
This means that it is possible to drop the capacity field off the String lowering its internal memory footprint while allowing mutation inside the allocated area.
It should also be easily possible to allow short-string-optimization to avoid heap allocation.
Internal Layout:
struct OwnedStr {
buffer: Unique<u8>, // pointer to memory for the immutable String
len: usize // the length of the allocated buffer
}
This type would significantly reduce the memory footprint of storing massive amounts of Strings which have a static length.
For example, compare the following collections and their memory footprint with ten-tousands of elements:
Vec<String>
Vec<OwnedStr>
sizeof(usize) * (n+1) // their difference in size with n elements
Their instanciation:
let a = "Hello, World!".to_owned_str();
let b = OwnedStr::new("Hello, World!");
// etc ...
This concept of an array with a static length at runtime could be generalized with a type of DynArray:
No, Box<str> is exactly what you proposed. The Box points to the string data itself. A Box<str> is really just a Box<[u8; N]> (for some unknown N) in disguise.
This sounds pretty cool but I cannot really imagine the data layout for Box as you have described since [T;N] stores the length N (which is a compile-time constant!) in its data type but with the into_boxed_str or into_boxed_slice method these things are decided at runtime and not at compiletime.
So there is a need to store the length somewhere in [u8;N] as a field in case the size can only be determined at runtime.
Or is it possible to return arrays of different sizes from the same function?
Can you point me to a place where I can read more about how this behaves?
str, [T] and Trait are all "dynamically sized types", in that the compiler doesn't know how big they are at compile time. Each has an extra bit of information that was erased from the type, and must be carried around with them in order to work (for str and [T], that's the length; for Trait, it's the vtable implementation).
As such, any time you have *Dst or &Dst, the pointer becomes fat; the erased data gets appended to the pointer itself. Box<str> contains Unique<str> which contains *mut str, which means the *mut str is actually a (data, len) pair, which means Unique<str> and Box<str> are actually two words long, not one.
You can use these types, and even create instances of them (in some cases), but you can't create new kinds of dynamically sized types. Unless a particular bug in the compiler was fixed, you can't create instances of them by hand in all cases, either (alignment issues).
A library can certainly implement a type OwnedStr like in your post that acts just like a string (e.g. all the same methods, borrows to a str etc.). But it is true that you can't have custom DSTs, only traits and the built in str and [T].
From what I remember (I believe it was spelled out in the Rustonomicon), the compiler basically doesn't correctly align DST fields embedded in structures. So if the DST field has alignment requirements and it's generic, you basically can't guarantee it'll be laid out in a way that makes any sense. Possibly not even where it is, exactly.