Static length String: OwnedStr


#1

How about a String class with a static length?

The idea behind this is to have a third String class that owns its content like a String but has an immutable length.
This means that it is possible to drop the capacity field off the String lowering its internal memory footprint while allowing mutation inside the allocated area.

It should also be easily possible to allow short-string-optimization to avoid heap allocation.

Internal Layout:

struct OwnedStr {
    buffer: Unique<u8>, // pointer to memory for the immutable String
    len: usize          // the length of the allocated buffer
}

This type would significantly reduce the memory footprint of storing massive amounts of Strings which have a static length.

For example, compare the following collections and their memory footprint with ten-tousands of elements:

Vec<String>
Vec<OwnedStr>

sizeof(usize) * (n+1) // their difference in size with n elements

Their instanciation:

let a = "Hello, World!".to_owned_str();
let b = OwnedStr::new("Hello, World!");
// etc ...

This concept of an array with a static length at runtime could be generalized with a type of DynArray:

struct DynArray<T> {
    buffer: Unique<T>,
    len: usize
}

struct OwnedStr {
    buffer: DynArray<u8>
}

Please tell me what you like or dislike about this idea?

Why would you think that such a data structure would be good or bad for rust - I am excited to hear your opinion!


#2

Sounds like Box<str> and Box<[T]> :wink:

See String::into_boxed_str and Vec::into_boxed_slice.


#3

Thank you for your response and for hinting me to these nice methods.

While what they offer sounds very similar, they are not identical to what I was proposing since they require an additional indirection: the box.

So to access the Strings content (the buffer) two pointer-tracers would be needed:
from Box<str> -> str -> buffer.

With OwnedStr it would still be only OwnedStr -> buffer which also saves a lot of memory footprint for unnessecary pointers.


#4

No, Box<str> is exactly what you proposed. The Box points to the string data itself. A Box<str> is really just a Box<[u8; N]> (for some unknown N) in disguise.


#5

This sounds pretty cool but I cannot really imagine the data layout for Box as you have described since [T;N] stores the length N (which is a compile-time constant!) in its data type but with the into_boxed_str or into_boxed_slice method these things are decided at runtime and not at compiletime.

So there is a need to store the length somewhere in [u8;N] as a field in case the size can only be determined at runtime.
Or is it possible to return arrays of different sizes from the same function?

Can you point me to a place where I can read more about how this behaves?


#6

Slice representation is documented here: http://doc.rust-lang.org/std/raw/struct.Slice.html


#7

I think this was the information which I was missing:

pub struct Slice<T> {
    pub data: *const T,
    pub len: usize,
}

With the following note:

This struct is guaranteed to have the layout of types like &[T], &str, and Box<[T]>

To me this sounds like a language feature and not like something that any Rust user can reimplement as a library solution, is that correct?


#8

str, [T] and Trait are all “dynamically sized types”, in that the compiler doesn’t know how big they are at compile time. Each has an extra bit of information that was erased from the type, and must be carried around with them in order to work (for str and [T], that’s the length; for Trait, it’s the vtable implementation).

As such, any time you have *Dst or &Dst, the pointer becomes fat; the erased data gets appended to the pointer itself. Box<str> contains Unique<str> which contains *mut str, which means the *mut str is actually a (data, len) pair, which means Unique<str> and Box<str> are actually two words long, not one.

You can use these types, and even create instances of them (in some cases), but you can’t create new kinds of dynamically sized types. Unless a particular bug in the compiler was fixed, you can’t create instances of them by hand in all cases, either (alignment issues).


#9

A library can certainly implement a type OwnedStr like in your post that acts just like a string (e.g. all the same methods, borrows to a str etc.). But it is true that you can’t have custom DSTs, only traits and the built in str and [T].


#10

Thank you very much for your responses! That really cleared some things up for me. =)


#11

Could you elaborate more about that? Or given the alignment issues numbers?

I need to implement Rc::make_mut for Rc<[T]>. My first attempt is ugly and probably does not work with all T (alignment issues).


#12

From what I remember (I believe it was spelled out in the Rustonomicon), the compiler basically doesn’t correctly align DST fields embedded in structures. So if the DST field has alignment requirements and it’s generic, you basically can’t guarantee it’ll be laid out in a way that makes any sense. Possibly not even where it is, exactly.

Which is inconvenient, to say the least.


#13

Is this the issue? If so, it got fixed.


#14

Looks like the one, though that doesn’t mean there are no other problems with DSTs I don’t know about. :stuck_out_tongue: