Guidelines for heap data

mvolkmann · January 17, 2021, 5:14pm

Am I over-generalizing to say the following? These seem like good guidelines to me based on my limited experience so far in writing Rust code. While I know there are situations in which these do not apply,
it seems that following these typically reduces ownership issues in my code.

Collections should own their heap data rather than hold references to heap data owned elsewhere.
For example, struct fields that are strings should use String instead of &str.
Pass references to heap data to functions rather than transferring ownership.
For example, a parameter that accepts string data should have the type &str instead of String.
In functions that create and return heap data, transfer ownership to the caller.
For example, return String rather than &str.

2e71828 · January 17, 2021, 5:31pm

As long as you allow yourself to deviate from these when you have a good reason to make an exception, this seems like a good default position.

ZiCog · January 17, 2021, 5:42pm

It's my understanding that structs are not generally referred to as collections. They are structs. The term 'collection' mostly refers to Vectors, HasMaps, Sets, etc.

I see no reason why a collection should not hold references. As long as the lifetimes add up.

    let one = Box::new(1);
    let two = Box::new(2);
    let three = Box::new(3);
    let v = vec![&one, &two, &three];
    println!("{}", v[1]);

Perhaps. But what if you want to transfer ownership. Have the called function consume the thing, use it in a thread, whatever, dropping it when done?

Probably.

H2CO3 · January 17, 2021, 5:43pm

Not every struct represents a collection. Furthermore, sometimes you want a borrowed string in a struct field, e.g. when you have related pieces of data in different fields of a struct and you will only use this to send it through the network (e.g. this is how I usually design my REST/HTTP API wrappers). In these cases, owning fields would lead to unnecessary cloning/allocation, so "all string fields should have type String rather than &str" is not really true.

That's not great advice, either. If you need to take ownership, take the parameter by value – this also reduces unnecessary clones. For example, HashMap::insert() doesn't take the key and the value by reference and clone them immediately – it takes them by value instead.

That's sometimes true except that it's redundant – if you have a newly-allocated String in a local variable in a function, there's no way you can safely return a &str, that's exactly the point of Rust's lifetime checking.

However, if you have a data structure with something like insert-and-return-reference semantics (e.g. a string interning pool), that's fine too, and you don't have to return an owned copy, just because "someone said so".

This sort of advice is often bad to take literally and strictly. Design your interfaces and types after thinking about how you want others to use them and how they work the most naturally. Those common-sense considerations should not be overridden by oversimplified rules of thumb.

mvolkmann · January 17, 2021, 6:53pm

Yeah, perhaps I should have used the term "compound type" instead of collection so it would encompass structs.

I realize that it's okay for collections to hold references, but I'm wondering if that is common. I don't have enough experience yet with Rust to say. Could you take a stab at estimating the percentage of collections in your code that hold references? Likewise, what percentage of your functions want to consume the thing that is passed to them?

mvolkmann · January 17, 2021, 6:57pm

I think I left room for all the cases you identified by saying "While I know there are situations in which these do not apply, it seems that following these typically reduces ownership issues in my code." with the emphasis on "following these typically". In your experience, do think 80% or more of your compound types and functions follow these guidelines? I don't have enough experience yet to say, but in my limited exposure this seems to be the case.

scottmcm · January 17, 2021, 7:04pm

Absolutely. These are a great place to start since they'll always work, just potentially require a few more .to_owned()s than might be optimal.

The one I'd first add some extra nuance to, though, is number 2: "... unless that would mean the first thing the function always does with the whole value is .clone()/.to_string()/.to_owned() it". That's very easy to apply, and avoids the silliness of "I had a string, but had to pass it as a &str, only for the function to immediately copy it into a string again".

ZiCog · January 17, 2021, 7:12pm

I would not like to put a percentage on it but I suspect you might be right.

It's just that I don't feel that is useful to be striving for the "common case" as one develops code.

I'm Rust naive enough not to be able to articulate this well, but I think it is more useful to think "What's the best thing to do in my code just here, now?"

There is some kind of mental algorithm going on here.

The first question is "Does this data item need to be on the heap?". If it is expected to outlive the function call stack it is used in then probably yes, else no.

Then, we know that we can pass parameters by value, transferring ownership, or by mutable or immutable borrow. So the question is which of those makes sense in the case at hand?.

Similarly for return values.

In fact I'm still Rust naive enough that I delegate a lot of that decision making to the compiler and clippy. I write what I think I want, then I have a long chat with the compiler about it. Together we arrive at something that works.

I guess my style there probably leaves some performance on the table, with excessive clones and the like, but that is not often a worry, things are more than fast enough until proved otherwise.

H2CO3 · January 17, 2021, 7:18pm

I think the exact numbers greatly depend on individual coding style and preference, but in my case, probably not.

As for structs containing or not containing references, almost no struct that I write for myself is a collection, for example. I did however implement a serialization format for Serde, and when writing the binary deserializer, the majority of the types I created was borrowing data from the input byte buffer.

I also tend to write typesafe HTTP API wrappers and data abstraction layers, where the input/request/query is typically borrowed (for performance), and the output/response/result set is owned (out of necessity).

That is usually the case. Continuing the above example, you can write code with all-owned contents – if your code is correct, it is correct, regardless of whether it might be a bit less efficient than possible. While you are learning the language, you should simplify matters for yourself until you understand the big picture. My comments above mostly apply to production code or code that you want to show to others.

scottmcm · January 17, 2021, 7:40pm

It's different for the different guidelines.

For #3 it's essentially 100% because of the "create and return" part. There's basically no reasonable way to have a fn foo<'a>() -> &'a str function -- either it leaks memory (obviously bad) or it can only return a string literal (so limited as to be largely irrelevant). (The exception here is if "create" doesn't apply. If it's actually returning subparts of a parameter -- substrings are the usual example -- in such a way that lifetime elision applies, that's pretty common.)

For #1 it's probably 95%+, but depends greatly on the domain. As the saying goes, references in structs are "the evil-hardmode of Rust that will ruin your day." Sometimes they're needed for particular tricky kinds of memory usage optimizations or handle types in libraries, but they're definitely to be avoided. (One exception here: something like a Vec<&str> inside a function can be useful, and not too problematic because it's not escaping the function so the lifetime logic might be easy -- such as if all those &strs are substrings of a parameter &str.)

For #2 it varies wildly. As you get more used to thinking in terms of ownership, you'll start writing the type that best fits the ownership mode you're going for with the intent of the function, before even thinking about how it'll be implemented. None of Vec<f32>, &mut [f32], or &[f32] are "best" in any way; it all depends what one plans to do with them. (You do this outside of programming for real-life objects without thinking about it -- you don't invite a friend to your house by giving them ownership of your house, just its address. But if you're moving to a new city and thus sell your house, you don't do it by giving someone the address so they can make their own copy, followed by you destroying yours. Now, obviously that analogy is imperfect in many ways, but hopefully it gets across the intent.)

system · April 17, 2021, 7:40pm

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.

Topic		Replies	Views
When should structs use `&str` fields help	9	3183	March 10, 2022
Understanding when to use String vs str help	10	22955	March 10, 2024
Working with strings in Rust help	4	675	January 12, 2023
Why is Box &str not owned like String is?	6	1411	December 23, 2021
Structs, closures, references and Strings, oh my! help	6	1160	January 12, 2023

Guidelines for heap data

Related topics