What exactly is [u8]?

Normally when we talk about "slices" we mean &[T], also called a "shared slice".

But when I try to get a range of a Vec, I (theoretically) get something different, [T]:

let v: Vec<u8> = vec![1,2,3,4,5];
let s = v[1..3];

However, this [T] can't even exist, at least not as a variable:

error[E0277]: the size for values of type `[u8]` cannot be known at compilation time
 --> src/main.rs:3:9
  |
3 |     let s = v[1..3];
  |         ^   ------- help: consider borrowing here: `&v[1..3]`
  |         |
  |         doesn't have a size known at compile-time
  |

What is it?
Why is the size not known, shouldn't it be 2 machine words (pointer+length)?
Is there any actual use for this type?

1 Like

It's the value pointed at by the pointer with a length only known at runtime. Saving a value to the stack with (at compile-time) unknown size is currently impossible. You either have to borrow a slice, i.e. you have a (pointer, length) tuple with constant size on the stack or you operate on a boxed slice, which also boils down to the same tuple with the additional property of having ownership over the pointed-at memory region.

The type that consists of a pointer and length is &[u8].

What is a type? I think of types as a property a region of memory may or may not have, and that these properties are something you can prove to be true at compile time.

If you know that a region of memory has type u32, you know that the region has length 4. You also know that you can read the memory, i.e. it can't be deallocated.

If you know that a region of memory has type &u32, you know that it has size 8. You also know that if you interpret the bytes as a pointer and dereference it, you will find a region of memory of type u32. This means that e.g. you know that the 8 bytes are not all zeros, because then you could not dereference it and find an u32, as the zero address is not allocated.

If you know that a region of memory has type [u8], you know that it contains a sequence of values of type u8 of some length. This type does not enforce any length on the region you make the claim about. Since the u8 type requires the memory to be allocated, you in turn know this about the entire region.

If a region of memory has type &[u8], you know that this region has length 16 bytes, and that the region starts out with a pointer followed by an usize. Additionally, you know that the other region starting at the pointer and having the specified length must have the type [u8], which e.g. means that you know that the memory is allocated.

11 Likes

Similarly for the str type. If you know that a region has type str, then you know that the region is allocated, and you know that the data in this region is valid utf-8. The claim "this region has type str" does not provide any guarantees about its length.

Similarly the &str type guarantees that a region has size 16, consisting of a pointer and usize, and that the region the pointer points to, with the specified length, must have type str.

1 Like

A use when standalone is in type parameters. With explicit : ?Sized set. Any text reference to the type then is restricted to variables that can compose of unsized types.

1 Like

I call types like [u8] and str bare slices to distinguish them from shared slices. (There are also bare trait objects, which are the other kind of unsized types.)

The useful thing about these types is that even though they don't have a compile-time-known size, even though you can't hold one directly, they can still be composed with other types in the same way Sized types are.

&[u8] is a shared (reference to a) slice, but you can put [u8] in other places as well, like &mut [u8], Box<[u8]>, Arc<[u8]>, Mutex<[u8]>, etc. If there were no [u8] type but just a &[u8], you couldn't express these non-shared types -- Arc<&[u8]> is a different thing than Arc<[u8]>. Note that Box, Arc, Mutex, etc. mean the same thing with [u8] that they do with [u8; 10]; they don't treat unsized types specially. Unsized types are a relatively small language feature that composes well with the rest of the language and the standard library.

This flexibility is what allows you to do cool stuff like temporarily opt in to shared mutation using Cell<[T]>. You still have to put the Cell<[T]> behind a pointer to actually use it, but the fact that you can express [T] as a type means that the pointer doesn't have to be inside the Cell.

8 Likes

Not sure if this is correct. There should be some place to store the runtime length of this boxed (Arced, Mutexed, etc.) slice, right?

In case of Box<[u8]> of .len()==10 and Box<[u8; 10]> the heap layout of both is exactly the same.

For unsized types the length is stored in the pointer, not in the data.

I think it works like this:

&[T] is a fat pointer (composed of a pointer and size) which is a borrowed slice.

Box<[T]> (and other owned pointing types) are fat pointers (again composed of a pointer and size) which are an owned slice.

[T; N] are arrays which have their size known at compile time, so can be stored in place.

Since the size of Box<[T; N]> is also known at compile time, the pointer does not need to store any additional data and is not fat.

In all cases, [T] and [T; N] are some way of describing contiguous memory. The main distinction between a slice and an array is whether the size is known at compile time or not.

This playground seems to confirm this understanding.

1 Like

Sorry. Somehow the quoted text made me think that Box<[T]> itself is equal to Box<[T; N]> and not the heap data it contains.

Remember a slice is just a reference to some other data.

In Rust a type is not valid without a length. so line 3 simple return [u8] which is just a point to the first value without specifying the number of element from that point. but adding a "&" make the return value a valid slice i.e [a fat pointer, which is a 2 world object where first is a pointer to the data and second is the number of element pointend to]

Just when I though I had understood it, I encounter this:

use std::convert::TryInto;

fn main() {
    let v = vec![1, 2, 3, 4, 5];
    takes_array(v[..4].try_into().unwrap());
}

fn takes_array(a: [u8; 4]) {
    dbg!(a);
}

So, if v[..4] is [u8] and that doesn't contain a size, how does try_into() decide whether it fails or succeeds? Or am I misunderstanding what is happening here?

It works because of this impl:

impl<T, const N: usize> TryFrom<&[T]> for [T; N]
where
    T: Copy,
    [T; N]: LengthAtMost32,

You're calling try_into() not on the bare [u8] but on a(n automatic) reference to it, and &[u8] implements TryInto<[u8; 4]> because u8 is Copy. The only magic here is how . can automatically reference its left side in order to find the right method.

2 Likes

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.