Slices and &str

Hello friends. I am working with the differ crate and wasm-bindgen's JsValue type.

The docs for Differ::new say I need to pass a slice, &'a [T] for the strings to diff.

I thought I understood slices, until I tried:

fn get_strings(textA: JsValue, textB, JsValue) {
    let str_a = textA.as_string().unwrap();
    let str_b = textB.as_string().unwrap();

    let string_slice_a = &str_a[..];
    let string_slice_b = &str_b[..];

    Differ::new(&slice_a, &slice_b); // mismatched types, expected slice, found &str
}

I thought you could generate a string slice from a String. In the differ docks, they point to taking &'a [T]. Isn't that a slice? Is a string slice not the same as &[T]?

This is how I understand them:

    let s = "hello"; // &str
    let slice_s = &s[..]; // string slice, which is also just a &str
    let string_s = String::from("hello"); // String
    let slice_string = &string_s[..]; // also a string slice

Any help on distinguishing slices and &str would be very helpful. Especially in this Differ context.

Thank you.

Have not tested the code yet, but I believe calling the as_bytes() function on the String slice will give you &[u8] which should work. test it out and let me know

1 Like

str: string slice; it's a [u8] with checks in place to guarantee, that the content is valid UTF-8.

[u8]: regular slice containing u8s; slice types are unsized and cannot be operated on directly, because they don't store any information about their size/length. You need either a &[u8], &mut [u8] or Box<[u8]>, which are represented as a (pointer, length) tuple internally, to operate on a slice.

You can get a reference to [u8] from str either through the implemented trait method AsRef::<[u8]>::as_ref or by calling str::as_bytes. The only differences between the two are, that the latter is a const fn, while the first method isn't — const fns do not yet work with trait methods — and the first is a trait method, i.e. you can use it in a generic context.

Note that two [u8] can be different while they represent the same unicode text. You have to normalize the UTF-8 first to be able to compare them properly.

2 Likes

Thanks for the idea! I tested this, and yes, calling as_bytes() on the String does return a slice of bytes. Wouldn't I have to convert back the diffs to their string representations?

Also, does [T] also represent a string slice? Or is it just an array slice?

[T] is a slice of Ts, so it's what you called an array slice. str is a string slice.

str is essentially [u8] with the guarantee of being valid UTF-8 tacked on.

A bytewise diff of two UTF-8 strings does not necessarily consist of only UTF-8 strings because the spans that differ may start or end in the middle of a multi-byte character. So I'm not sure how you'd do that. Depending on why you're doing this, it might be smarter to split on whitespace or at grapheme cluster boundaries, and diff those sequences, rather than bytewise.

1 Like

@trentj I am trying to build text differ. I hadn't heard about grapheme cluster boundaries. After looking at this crate, I may want to go that route.

@OptimisticPeach Thanks for your response. Is str really a slice? After reading this and this, it seams you'd need to add a reference (to avoid the DST), and I guess that makes it a slice? But then, why can't I pass a &str into a function that takes a slice, like what I mentioned in the question? Maybe, [T] is really an array slice, specifically, and not just "a slice". Does my confusion make sense? I think I just get lost when I read this, as it only mentions [T] array slices, but there are also string slices that aren't [T], ya know?

Some terminologies have different meaning across languages. In Rust, we call [T] "slice" and str "string slice".

Sometimes we call a reference of the slice just slice, since they can't be used without some kind of indirections and the reference is the most common one.

1 Like

From a technical point of view str is the string, which corresponds to a slice of bytes ([u8]), while String is a string buffer. They use a better naming scheme for Path (slice) and PathBuf (slice + capacity).

A slice is an array with variable length, which is why you cannot store it on the stack. The stack takes advantage of statically known sizes to embed constant offsets into the binary. It cannot do that, if you deal with runtime sizes. This is why you have to store slices on the heap, only storing a fat pointer on the stack. The fat pointer has a statically known size of 2 usizes, the pointer to and the length of the slice.

Box<[T]> is an owned slice, &[T] is a shared borrowed slice, &mut [T] is a
n exclusively borrowed slice and *const/mut [T] are raw pointers to a slice. They all have a size of 2 usizes.

1 Like

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.