Question about string slices and vectors

Hi! I've just registered here and I want to say that it's really nice to be part of Rust community. :slight_smile:

I'm currently working through The Book, and doing some exersices that are suggested after the end of some chapters.
This is specific one I'm referring to:

Given a list of integers, use a vector and return the median (when sorted, the value in the middle position) and mode (the value that occurs most often; a hash map will be helpful here) of the list.

I'm doing it with user input, meaning that I got user_input String and then I shadow it as vector of string slices, so I can parse it and convert to integers later.

let user_input: Vec<&str> = user_input.split_whitespace().collect();

My question is: as I uderstand, &str are put into stack, and Strings on the heap because their size in unknown at compile time. But why this vector thing works perfectly fine, even if the user input (anount of numbers that user enters) is unknown at compile time?

Thank you!

2 Likes

&str is a reference (the &) pointing to a string slice, that slice can be on the heap or stack. In your case you have a Vec of references to the heap.

3 Likes

Thank you! That's probably what I missed - string slices could also be on the heap. So the only difference is that case is the're fixed size and I wouldn't be able to grow any specific element of this vector later?

There's more important difference: this vector is borrowing, not owning, and therefore semantically temporary.

When you have a reference to something, this "something" must be statically proven to be alive as long as the reference is alive. Therefore, in your case compiler will not allow to pass this Vec somewhere such that it is used when original strings are already dropped - otherwise it would be a use-after-free.
The thing you might be thinking of is Rc<str> - this is a reference-counted string slice of the fixed size, which is owned in the share fashion and so not tied anymore to the original string.

4 Likes

Thanks!

The idea is that it's temporary, since as soon as I convert user input to integers, I don't need that data. Here is how I've handled it. It compiles and works, and then I can sort new array and everything is fine, but I was wondering, could it bite me later, or it's perfectly fine. :slight_smile:

// Prepare for converting chars from 1st array to int to 2nd
    let user_input: Vec<&str> = user_input.split_whitespace().collect();
    let mut array_of_integers: Vec<i32> = Vec::new();

    // Convert values 
    for number in &user_input {
        let number_as_int: i32 = number
            .trim_matches('\n')
            .parse()
            .expect("\nYou should enter integer values separated by whitespaces\n");
        &mut array_of_integers.push(number_as_int);
    }

Note that the wording "slice" sometimes refers to the pointed-to values and sometimes is used to refer to the reference. It's sometimes used a bit fuzzy.

&str is a Sized type. It is also a (shared) reference. This reference can be either stored on the stack or on the heap. In case of Vec<&str>, the &str is on the heap (a vector stores its elements on the heap).

str in contrast is !Sized. You cant directly create them or pass them by value. Yet they may reside on the stack as in this example:

fn main() {
    let bytes = [65, 66, 67];
    let s: &str = std::str::from_utf8(&bytes).unwrap();
    println!("{s}");
}

(Playground)

I think most of the time, str will be on the heap though.

4 Likes

Note that you don't need &mut here - method call syntax will auto-(de)reference the value as needed.

2 Likes

Instead of String and str, I'm going to talk about Vec<u8> and [u8] (slices). A String is basically a Vec<u8> with additional validity constraints (being UTF8), and a str is basically also a [u8].

A [u8] (note the lack of a & there) is unsized; it could be any length at all that varies at runtime. A &[u8] is a wide pointer: it contains both the address of the slice and the length of the slice (in this case, the number of bytes). A Vec<u8> is quite similar: It's a pointer to some allocated data, the number of the initialized elements (the length of the Vec), and the capacity -- how much memory was allocated. To go from a Vec<u8> to a &[u8] is pretty simple: a wide pointer &[u8] is created from from the data pointer and the length.

A locally created Vec<u8> and &[u8], or String and &str, are both on the stack -- that is, the pointer to the data and the length and (in the case of Vec) the capacity values are all on the stack. Where the data pointer points to could be in the heap, or on the stack, or in static memory, etc. [1]

Here's a diagram of some Rust containers that may be illuminating. Note that the layouts of these types is not guaranteed and the diagram is 5 years old -- I don't think Mutex works like that anymore for example -- but it's still useful as a guide. Most things haven't changed [2].

It's not going to segfault or something, if that's what you mean. One of the main benefits of Rust is that programs without unsafe have memory safety -- no use-after-free, data races, etc.

Literals will probably be in static memory [3].


  1. The Vec will always point to allocated memory, but the language doesn't know or care about that per se. ↩ī¸Ž

  2. and Vec in particular will always be a pointer, length, and capacity (but the specific layout, like their order, is still unspecified) ↩ī¸Ž

  3. if not promoted, they'd be on the stack, but I suspect they're usually promoted -- however I did no actual investigation ↩ī¸Ž

6 Likes

Some more playing with &str, str, and the stack and the heap:

fn main() {
    let hello: String = "Hello World!".to_string();
    let stack_heap: &str = &hello; // `&str` on stack, `str` on heap
    println!("{stack_heap}");
    let mut buf: [u8; 1024] = [0; 1024];
    assert!(stack_heap.len() <= buf.len(), "buffer size too small");
    let buf_part: &mut [u8] = &mut buf[0..stack_heap.len()];
    buf_part.copy_from_slice(stack_heap.as_bytes());
    let stack_stack: &str = std::str::from_utf8(&buf_part).unwrap(); // `&str` on stack, `str` on stack
    println!("{stack_stack}");
    let heap_stack: Box<&str> = Box::new(stack_stack); // `&str` on heap, `str` on stack`
    println!("{heap_stack}");
    let heap_heap: Box<&str> = Box::new(stack_heap); // `str` on heap, `&str` on heap
    println!("{heap_heap}");
}

(Playground)

Ohhh, I see! Thanks for pointing that out; I considered that in my above example and avoided &'static str literals for clarity (except to create hello).

Edit: Though after reading your post, I guess there is no guarantee that String keeps the data heap allocated. I think it could make an exception for short strings, for example (but don't think it does).


  1. if not promoted, they'd be on the stack, but I suspect they're usually promoted -- however I did no actual investigation ↩ī¸Ž

2 Likes

Your code seems mainly fine, but it could optionally do without a temporary array for the slices:

    let user_input = user_input
        .split_whitespace()
        .map(|item| item.trim().parse())
        .collect::<Result<Vec<i32>>>()
        .expect("\nYou should enter integer values separated by whitespaces\n");

The "magic" here is that collect can convert an iterator of results to a result of a a collection, so that if any of the calls the .parse() returns an error, the .collect() will return that error, but if all the numbers is parsed Ok, then collect will return Ok with the collected values.

(In this case, since you use expect to panic on any error anyway, you could just do the expect inside the .map(...). But if you want to handle errors some other way later this is a good step towards that.)

3 Likes

Thanks everyone for very clear and deep explanations about how this str and String things differ from each other! I feel (a bit) more confident about using both now. :slight_smile:

If you want to go more in depth:

It's a classic article around here for beginners.

1 Like

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.