Difference of `&[u8;N]` and `&[u8]`

Byte string such as b"abc" is type of &[u8;3]. When used with Regex.replace, for example

       let text = b"this is abckkkkk";
       let re = Regex::new(r"abc").unwrap();
       let result = re.replace(text, &b""[..]);

Why can't we just use b"" in the second parameter in replace. Does it mean both slice &[u8;N] and &[u8] are different thing?

They're different types in that [u8; N] is always of a fixed size, and doesn't store a length at runtime, and &[u8] is the same type for any size, and contains a length.

This usually isn't a problem, because [u8; N] can always be turned into a &[u8], and this is usually done automatically via the Deref trait. For instance, [u8; N] has very few methods itself - most are actually methods on &[u8] which can be accessed via that Deref implementation.

However, as you've discovered, this doesn't always apply. Regex::replace is a generic function (it takes a Replacer). Because it's generic, any number of different types could be passed in. The compiler then doesn't assume you want a &[u8] - and without that assumption, it won't automatically turn &[u8; N] into &[u8].

Passing &[u8; N] to this function will automatically turn it into &[u8]:

fn takes_slice(slice: &[u8])

But it won't for this:

fn takes_anything<T>(thing: T)
1 Like

@daboross So actually byte string is a reference to an array of fixed size N,it is not a slice(In the book Programming Rust,it says byte string is a slice of u8 values) since slice is type of &[T]. Am I right?

1 Like

With "slice," you've almost got it. Slice refers to the [T] in &[T] - &[T] is a reference to a slice.

Could you clarify which you mean by "byte string" though? I haven't read Programming Rust, and that term could refer to many different things. &[u8] could be described as a byte string, but so could Vec<u8>, or [u8; N]. All of these either store or refer to a bunch of bytes in contiguous memory.

1 Like

You are right, they must have overlooked that detail since a string literal is (a reference to) a slice: str, but a bytestring literal is (a reference to) a fixed-size array of bytes: [u8; N].

As @daboross pointed out, this does not often matter, as deref coercions usually make a (slim) reference to a fixed-size array implicitly coerce to a (fat) reference to a slice:

  • when N > 0, at_array: &[T; N] is structurally equivalent to at_first = &at_array[0]: &T

    • N is just a type-level property (zero-cost abstraction).

    • "just" one address ⇒ slim pointer

  • at_slice: &[T] is structurally equivalent to (at_first, len) = (&at_slice[0], at_slice.len()) : (&T, usize)

    • an address with another field ⇒ fat pointer
  • thus the deref coercion of at_array as &[T] is just transforming at_first into (at_first, N): the length moves from being a type-level property to being a runtime property.


Edit: minor nitpicking with myself: the coercions are currently not due to Deref, at least not when N > 32 (we'd need const generics in stable for that). It's really a custom form of coercion; not that it makes any difference in practice

2 Likes

Thanks! Here byte string means a string literal with the b prefix such as b"some_bytes"

1 Like

Yeah, I generally think it was a mistake to have this difference between byte string literals and normal Unicode string literals. But my view is biased since I work in a domain where this tends to happen a lot, so I'm probably more annoyed by it than most. See also: https://docs.rs/bstr/0.2.6/bstr/fn.B.html

1 Like