What are String and str?

This, imho, should be the start point. What are Vec<u8> and [u8]?

  • [u8] represents the type for sequences of bytes of any length (that is, the length is a runtime property). In Rust parlance, this is called a slice.

    • Since the length is only known at runtime (!Sized), a slice cannot be inlined into the stack, since stack memory is managed with compile-time (and thus fixed) parameters. This is what prevents us from using !Sized stuff directly.

      We can circumvent this restriction with indirection: any sequence of bytes, whatever its length may be, once in memory, can be referred to by a reference / pointer to the first element and a second field with the number of elements (we call this a fat pointer). This is the case of, for instance:

      • shared reference to a slice, &[u8] (or more generally, &[T]),

      • unique reference to a slice, &mut [u8] (or more generally, &mut [T]),

      • and owning references / pointers, such as Box<[u8]>, Rc<[u8]>, Arc<[u8]>.

  • One way to crate an element with variable length (dynamic allocation) is by using the heap. This works in multiple steps:

  1. We ask the heap-allocator for a chunk of memory able to hold capacity elements;

  2. If the allocator succeeds, we get back a pointer to the heap, to the beginning of the allocated (but uninitialised memory);

  3. We can then initialiase any len number of elements, so as long as len <= capacity (else a reallocation is needed).

  • That's why such a heap-managed structured must have at least these three fields (ptr, len, capacity), and this is exactly what a Vec<u8> (or more generally, a Vec<T>) is.

    • a corollary of that is that from a ptr, len, capacity tuple, we can choose to keep ptr, len only. ptr... len... This rings a bell... Oh, right, we have successfully managed to have a reference to a slice!

Ok, ok, but the OP asked about String and str, what has anything to do with it?

Very simple:

String / str is exactly like Vec<u8> / [u8], except that the sequence of bytes must uphold a property / invariant: them being valid utf-8.

That's why there are trivial conversions (casts) from the formers to the latters (<Vec<u8> as From<String>>::from and str::as_bytes), whereas the other way around requires a runtime-checked cast.


cough lazy_static cough :stuck_out_tongue:

12 Likes