Newby question: Does rust's lifetime rules cause more memory copying in practice?

Coming from a C background, where memory is explicitly cleaned up, it's not uncommon to have the same struct used in two different ways: akin to Rust's borrowing vs taking ownership. This doesn't just apply to the struct but the values it holds.

So in C, the same struct may be used to pass "borrowed" references or pass ownership:

struct Foo {
    char * a;
    char * b;
}

But in rust I need to declare whether strings in a struct are owned by the struct or not:

// Pass ownership
struct Foo {
    a: String,
    b: String,
}
 //  Just pass a borrowed reference to the data
struct Foo<'a> {
    a: &'a str, 
    b: &'a str, 
}

Since parsing functions need to pass ownership, I assume the "normal" thing to do is have a struct that passes ownership:

struct Foo<'a> {
    bar: String,
    baz: String,
}

fn parse_foo(source_data: &[u8]) -> Result<Foo, SomeErrorType> {
    // ...
}

fn format_foo(mut target: Vec<u8>, value: &Foo) -> Result<(), SomeErrorType> {
    // ...
}

The "problem" I'm puzzling over:

In the above example, when code want's call format_foo() it must assemble a Foo {...} and so pass ownership of the strings. This only happens because the same Foo is also used for parse_foo which MUST pass ownership. On it's own format_foo(), it could be written to only accept borrowed references to everything.

Does this typically leed to a little more memory copying in Rust. Is that copying just a normal fact of Rust over a language like C? Is there some other patten I've missed (like having two similar sucts).


Just to be very clear, this isn't a critisizm of Rust, I'm just trying to get to grips with what "normal" patterns look like in Rust.

3 Likes

I don't think there is a definitive answer.

It's true that sometimes Rust requires you to do more copies because it cannot verify the lifetimes, or because the programmer doesn't want to make a borrowed variant of a type (like in your case).

However in C/C++ it also happens that sometimes you make a defensive copy of something because you don't trust (the future) yourself or your team to not create lifetime issues with borrowed pointers/references, since there's nothing in those languages preventing these issues. Meanwhile in Rust if you can write an API using references then you can trust it to be sound and won't have these issues.

18 Likes

Yes, you can do this.

struct Foo {
    bar: String,
    baz: String,
}
struct FooRef<'a> {
    bar: &'a str,
    baz: &'a str,
}
impl Foo {
    fn as_ref(&self) -> FooRef<'_> { ... }
}

There are many other possibilities. You can have a single struct that keeps track of whether it has ownership with a run-time flag, with Cow; use Foo<'static> when borrowing is unwanted:

use std::borrow::Cow;
struct Foo<'a> {
    bar: Cow<'a, str>,
    baz: Cow<'a, str>,
}

You can make Foo generic over whether it borrows; this is like Cow but with only run-time checks:

struct Foo<S> {
    bar: S,
    baz: S,
}
impl<S: AsRef<str>> Foo<S> {
    ...

Or you can use reference-counted strings. The previous Foo<S> would also allow this, as Foo<Arc<str>>, but this version is not generic.

use std::sync::Arc;
struct Foo {
    bar: Arc<str>,
    baz: Arc<str>,
}

All of these introduce some amount of complexity in usage over just &Foo; it's up to you to pick which one is most suitable for your application. And in many applications, “I want a Foo made from borrowed strings, without copying” never comes up, so you don't need to do anything extra.

If you want to write the most flexible code to allow the caller to pick what costs they want to pay for what benefit, the generic Foo<S> is the best choice.

12 Likes

The usual case is exactly the opposite. Why would parsing a string need ownership of the string?

In general, Rust does not make you copy memory gratuitously. If you find yourself making many clones/copies, you are likely writing highly non-idiomatic code.

I don't think I understand your example. Yes, if you want to format Foo, you need to have an instance of Foo. How would you do this otherwise? In specific cases, you may have something like format_foo_from_separate_parts, but that's not a general solution, and not something I would bother with without a good reason.

Are you saying that you make a choice of Foo containing a String, even though in this specific case it could contain a &str? You can handle it using Cow:

struct Foo<'a> {
    bar: Cow<'a, str>,
    baz: &'a str,
}

It's not like there are infinite possibilities of whether you want an owned or borrowed string, and it's in the interest of your sanity to reduce ownership options to the minimum, lest you'll end up mixing them up. Even in Rust, where Cow fully tracks ownership, I'd still usually make a choice of embedding either String or &str, depending on the use case. Use &str when you can obviously borrow it from something long-living (e.g. if you're writing a parser, you can borrow from the source text and just keep it around until the end of processing). Use String otherwise.

Note that Cow isn't in any way special or a language primitive. It's just an ordinary enum with two variants, and you can declare similar enums yourself based on specific needs.

You can also declare separate owning and borrowing versions of your struct, although most of the time it's too much boilerplate to be worth it.

4 Likes

It’s sort of that. I’m saying that in other languages such as C it’s the job of the function to determine if it is passing ownership (or being passed ownership), it’s never the job of the data structure. But in rust, even if the format function accepts a borrowed reference, I can’t easily compose Foo without giving ownership of all of the strings.

I’m not saying this is a problem, I’m highlighting the natural consequence is more copying to duplicate ownership. I’m asking if this is normal or if folks usually use some pattern to avoid it.

Since the question seems to be causing more confusion than anything, I’m going to assume that people don’t generally try to avoid this extra copying much.

It turns out that practically, people don't do more copying - things get passed by reference, even though they own their contents, and it's unusual to copy things explicitly if you can pass references.

Things where ownership is a bit fuzzy get modelled by Rc<T> and Arc<T>, which don't involve copying when you clone them, and that's a very common pattern, but actual copying is not.

2 Likes

That's not true. They just use generics or a different struct with references, as @kpreid described.


Maybe the confusion is because you're focused on avoiding copying when passing structs around, whereas avoiding copying is something that comes up when passing parameters in general. Deciding when to use owned values vs references is a central issue in Rust.

2 Likes

@jumpnbrownweasel thats great. Are you able to offer me a reference or two to libraries using such patterns in practice. That would help me a lot.

As a long time C and C++ user and now Rust exclusively for some years, without getting into technicalities, my take on all this is that:

C has exactly the same lifetime/ownership rules as Rust. At least in principle. The difference being that Rust will tell you at compile time when you try misuse lifetimes, whereas C will not let you know until run-time when you get erroneous results or a segfault or whatever.

Bottom line is that a thing has to exist longer than any reference to it, else the reference is invalid, multiple mutable references to a thing is going to get you into trouble, and so on. This has to be taken into account by the programmer in both C and Rust not to mention other compiled languages. Only Rust helps the programmer with it, the other generally do not.

That's in principle. In practice Rust's rules may be more imposing than absolutely necessary sometimes. Thus leading to coding copies or whatever more than strictly necessary. I'm not about to worry about that given the advantages of compile time checking.

I'm a bit grumpy about this as I've been chasing memory misuse errors in the C code of libssh2 for over week that cause me all kind of failures on rare occasions, in my Rust program that has no unsafe in it.

7 Likes

I think you meant to say "compile time" here, rather than "run time."

2 Likes

you ment at "compile" time? :thinking:

edit: damn, few seconds too late... :smiley:

2 Likes

Oops. Well spotted. I fixed it.

1 Like

I started writing an explanation of how references, ownership, and moves work, along with the examples you asked for, and then realized it would take many pages and more time than I have. In order to give you a shorter, more specific answer, can you say more about where you are in learning Rust? Have you gone through the book?

3 Likes

In my experience, Rust's lifetimes allow writing code that doesn't have a bunch of extra memory copying, whereas in C/C++ it's common to defensively copy things so you know they're still valid when you need them later, as opposed to it being much harder to know if stuff is valid still when you just have a pile of pointers everywhere. Rust makes it easy to know if stuff is valid or not because you'll get a compile time error if you mess up.

3 Likes

@jumpnbrownweasel i don’t need you to write explanations, I mentioned the idea of having two separate struct definitions in my first post. But it didn’t seem very practical to me.

I’m really happy to be proven wrong: What I’m looking for is examples of this pattern being used in anger. Eg references to API documentation or links to existing source code in GitHub.

That’ll help me understand the context where experienced Rust developers have used the pattern “in anger”.

I think you are referring to the pattern where you have an owned and a borrowed version of some sort of data type? Not sure what you mean exactly by "in anger," but ndarray makes liberal usage of this owned vs borrowed representations of multidimensional arrays, see i.e. Array vs ArrayView vs CowArray vs ArcArray, all different representations of the same conceptual data type (a multidimensional array) in respect to ownership of the underlying data.

5 Likes

Jumping a bit too late on the train i guess, but, imho nom library is a good example on how parsing functions do not necessarily need to pass owned data.
If not anything else, then passing of "rest" of input data via it's IResult<&str, ...>

3 Likes

In addition to the examples in the reply from @jofas , @afetisov described using Cow which is a generic way to defer cloning until necessary when the callee may or may not need an owned object. When the caller and the callee both need an owned object (unconditionally), you have to clone -- this is true in any language.

3 Likes

@RustyJoeM nom seems to be a toolkit for building a parser and not a parser in its own right. You’re right that in that layer below the parser (often known as a tokeniser) you don’t need to pass ownership because all that’s happening is slicing up the input. Any structure is easily replaced with iteration, and there’s rarely a need to handle character escaping at that level.

The reason I say parsers typically must pass ownership is because they are typically content generators. Compiled parsing code must be able to allocate RAM for the result because calling code cannot know how much to allocate. But compiled calling code must be responsible for freeing it because the parsing code has completed when the result is still needed.

AFAIK that’s idiomatic ownership passing.

—————

Example: you want to parse an XML subtree:

<Node attribute="hello&gt;&lt;world” />

To parse this, the string hello><world must exist in the result and so must be allocated somewhere in memory.

But that string does not exist anywhere in memory before calling the parser, and the exact length of the string is not known at compile time. Unless the parser destructively consumes the input, the lifetime of the string starts inside the parser and must survive after the parser completes. So the string must be created in the parser and ownership returned to calling code.

Many other examples exist for many other formats and protocols. But in my experience, most protocols and formats need the parser to create vectors and strings.