When should structs use `&str` fields

Hello all. I'm pretty new to rust having come originally from C++ and Fortran and having mostly used Julia for the past 5 or 6 years.

I have a question about the use of strings in structs. It seems to me that, roughly speaking, structs should have reference fields are owned by something else. This is consistent with my intuition from C++. Something strange seems to happen with strings however. As far as I can tell, String is necessarily heap allocated and it is preferable to use &str for strings which will never be modified. The problem I run into is that, if I have a struct with &str fields, my struct does not own the underlying data, and I therefore have to guarantee that it lives long enough for my struct. It's easy to come up with lots of cases in which this works fine, but I have been finding that it's very dubious as a general practice, as I occasionally seem to land in situations where I can't guarantee the lifetime of the underlying data and there don't seem to be many good options. Having been burned by this a few times I'm finding myself only ever using String as struct fields. My understanding is that this is inefficient because of the resulting heap allocations.

Is there something else I should be doing here? I am tempted to store a [u8; n] in my struct and refer to it as a &str, but the design of rust really doesn't seem like it wants me to do that (the fact that I'd have to use u8 rather than str is one hint).

On a related note, I also find myself needing to use String (worse, copied Strings) in cases where I have returned a String and I can't tell rust that I want it to use the String in one part of the return value and a reference to that string in another part. Consider the following


struct A {
    name: String,
    x: f64,
}

struct B<'a> {
    h: HashMap<&'a str, A>,
}

impl<'a> B<'a> {
    fn from_vec(a: Vec<A>) {
        B {h: HashMap::from_iter(a.iter().map(|x| (&x.name, x)))}
    }
}

This fails because I cannot convince rust that I promise not to de-allocate the x's before I de-allocate the hash map. The only ways I can think of getting around this are pretty elaborate, so I wind up using a HashMap<String, A> which requires me to also call (gasp) String::clone. This isn't quite the same as the general issue I explained above but is part of my more general frustration with strings.

That's what happens in C++ when you assign a std::string field of some class from some other field of same type. On modern machine/allocators it's pretty fast so you don't need to care much before getting used to the language itself. Just spam .clone() while learning the language, you can always optimize it later with proper benchmark.

6 Likes

That's correct. You can't have owned (i.e. indefinitely-living) string anywhere else, after all.

Not sure where did you get it. There's a special case of &'static str for string literals, but for runtime-generated strings, String in most cases is almost unavoidable. Of course, you can create &str from byte array on stack, but it would be tied to the stack frame it was created in and can't be used in returned value, since, well, the stack frame is destroyed on return.

...and in most such cases the data must be owned, because it must survive going up the stack.

Self-referential structs are the known problem of Rust system, yes. It's usually possible to use runtime borrowing with Rc/Arc, however - unless, of course, you have profiled your code and found this a bottleneck.

4 Likes

If there's just a few strings that you need to reference in many structs, just mem::forget the string, and use &'static str. If you have many identical strings from different sources, use Intern<String> from the internment crate. And probably the best advice for most cases is just to use String and clone when needed. Or if that turns out to be too expensive because you end up cloning frequently, use Arc<String>.

The only structs that should hold (non-static) references are structs whose purpose is to borrow from something else, e.g. iterators.

1 Like

Or Arc<str>, to strip one level of indirection. Arc<String> is non-growable anyway.

4 Likes

When passing slices around shouldn't [..], and/or, #.into() work for those cases?

Example (in particular how internal _edges references contain &'_ str struct field references):
https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=1825a9b741a03c344986cf95fb70abe3

Edit: Example above updated for tests to pass in rust playground (run example from local environment with (cargo test) ... -- --show-output to see struct with slice references).

Edit: Example link above updated to working link.

Forgot to mention: Example I posted above uses slices though it's data isn't highly dynamic; I.e. is expected to be populated from the application level (additionally the strings that are set there aren't expected to change also).

Here's a better example (@ExpandingMan's original example (updated to use only slices)):

use std::collections::HashMap;

#[derive(Debug)]
struct A<'a> {
    name: &'a str,
    x: f64,
}

#[derive(Debug)]
struct B<'a> {
    h: HashMap<&'a str, A<'a>>,
}

impl<'a> B<'a> {
    pub fn from_vec(ayes: Vec<A<'a>>) -> Self {
        B {
            h: HashMap::from_iter(
                ayes.into_iter().map(|a| (a.name, a))
            )
        }
    }
}

fn main () {
    let ayes: Vec<A> = "a e i o u"
    .split_ascii_whitespace()
    .map(|name| A{name, x: 0.0})
    .collect();
    
    println!("{:#?}", B::from_vec(ayes));
}

Rust Playground (scroll passed warnings when running the example)

Rust has caught a use-after-free bug for you here. The a arg in from_vec is droppped at the end of the function, so you're returning a hashmap of dangling pointers.

fn from_slice(a: &'a [A]) {

should work.

But to answer the question from the title: almost never.

The only exception is when the whole struct itself is temporary in nature, used in a single scope. Basically only when the struct is a throw-away alternative view into other owning structure (and even that is very limited in Rust due to inability to hold self-referential structs).

So a lookup index in your case is a good use of a temporary struct.

Even then there are couple of alternatives: Arc<str> and string interning. If you need to index and look up lots of things by their name, then a separate global-ish str to u32 index can help, because then you work with direct 32-bit values rather than 128-bit values and indirection.

That's very strange intuition. C++ have full analogue for the &str in a form of the std::string_view, but it's usually used as function parameter and not as element of structure.

No, &str is for strings owned by someone else. Precisely the same as std::string_view in C++.

I tiny bit less efficient than in C++. But usually good enough.

The fact that you would need to pin that structure is another.

That's good. Strings are expensive. They are expensive in all languages, including C++, of course.

Only C++ hides that inefficiency to you.

In Rust you can see it clearly and there are temptation to try to remove it.

Don't do that unless you know you need that. On a modern CPU it's often faster to copy 20 or 100 bytes rather than try to invent elaborate schemes which would allow you to reuse these 20 or 100 bytes using pointers and references.

When your strings are huge, like megabytes in size… then it's time to start using &str and do other tricks.

P.S. My experience with C++ intuition and Rust was that if they disagree then I just need to go and read about changes in latest C++ standard. Most of the time I find out that C++ intuition is changing in the direction of where Rust is going and not the other way around.

3 Likes

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.