Pass both referenced and referencing value

This looks like a common problem but I could not see the solution myself.

I have parsed value (a struct) that for efficiency reasons internally refers to parts of original input byte stream (represented as a Vec). The parser and parsed structure are part of an external library (GitHub - negamartin/midly: A feature-complete MIDI parser and writer focused on speed.) so I'd prefer not to change that.
The suggested use of the library looks like

let raw_data = std::fs::read("input.mid").unwrap();
let parsed_midi_source = parse(&raw_data);
// Do something with the parsed value here

And here's no problem: input data is always available while parsed values is used.
But in my case I'd like to send the parsed value around (e.g. to another thread). So as I understand for that I have to bundle both source data and parsed value together so the references to the parts of raw data are still valid.

The minimal example of what I tried is

fn parse(raw: &[u8]) -> &u8 {
    &raw[1]
}

#[derive(Debug)]
struct Holder<'a> {
    source: Vec<u8>,
    n: &'a u8,
}

impl Holder<'_> {
    fn init(raw: Vec<u8>) -> Holder<'static> {
        Holder { source: raw, n: parse(&raw) }
    }
}

fn main() {
    let file_data = vec![3u8, 4, 5];
    let holder = Holder::init(file_data);
    println!("Holder {:#?}", holder);
}

The immediate problem is that raw data stays in the function (I thought that it can be moved into the new struct)

       Holder { source: raw, n: parse(&raw) }
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^----^^^
         |                              |
         |                              `raw` is borrowed here
         returns a value referencing data owned by the current function

OK, I understand that move may actually copy the value to another place, so I tried to keep the data in an Arc so it stays put somewhere in heap, but then it seems that borrowing value in Arc does not work as I expected: it seems that Arc itself is borrowed instead of the contained value. So that did not work either. So is this the right idea to bundle everything in the same struct, and then how do I tell compiler that the data is still available after the struct is created?

Another question is the declaration of the Holder struct itself. I tried to hint the compiler that parsed value have same lifetime as the containing structure by adding a lifetime annotation that would match the lifetime of the parsed structure. But then the Holder structure itself has explicit lifetime that is not really needed anywhere else as the Holder structure is self-contained (has no dependencies). If that is necessary, how do I hide it from users of Holder structure?

You can't usefully construct such a self-referential struct in safe Rust.

There exists some crates that could help.

2 Likes

You can't borrow a value inside an Arc and then move the Arc, otherwise you could drop the Arc, potentially dropping the inner value and leaving a dangling reference.

#[derive(Debug)]
struct Holder<'a> {
    source: Vec<u8>,
    n: &'a u8,
}

means that n borrows a u8 from outside the struct. There's no way to say "this field borrows from another field" (ignoring macros third-party crates like ouroboros).


What I would try is make a struct like this

#[derive(Debug)]
struct Holder {
    source: Vec<u8>,
    header: Header,
    unparsed_bytes: usize,
}

which stores the Header from the initial parse call and the number of bytes that haven't been parsed yet, which can be read from TrackIter::unread. Afterwards I'd create a method for parsing some of the remaining bytes on demand with TrackIter::new, using (and updating) unparsed_bytes to let you skip already parsed bytes.

2 Likes

You seem to expect the compiler to understand that an Arc heap-allocates and that its contents have a stable address. It doesn't. There's no difference between pointing to the stack, the heap, the static segment, or anywhere else, when it comes to Rust's pointer-like types. As far as the compiler is concerned, Arc could be implemented by storing the value inline. Borrowing from
the Arc is thus equivalent with borrowing the inner value itself, and when you move the Arc, the compiler has to assume that the inner value is moved, too.

As often as it might seem to come up, this is not a problem to be solved. Self-referential structs are actually a code smell most of the time. You should redesign your data structure in the following way:

  • split up the owning and view types into separate parts;
  • pass the owned data around by default;
  • only construct the view temporarily, when needed.
2 Likes

Thanks! I do not like "code smell" wording as it refers to intuition and is context-independent. Even if
that is normally a problem I would prefer to understand how something is a problem in a particular case. Namely, I am puzzled why people call it a self reference. "Self" normally implies a cycle which should indeed be tricky to handle. However I do not see any cyclic references in this case:
Given Stuct, Data, and View.

  • Struct contains/owns Data.
  • Struct contains/owns View.
  • View refers Data.

So this is a DAG, no cycles. I understand that the compiler cannot handle the situation but it seems to me is sound as Data cannot go away on its own while Struct is still available. It seem to not be true when Struct fields are mutable, though. If the Drop order of struct fields is predictable (say, the first field goes away last), then compiler can imply that Data is always available for View if it goes before the view in the field list.

A I mentioned above, in my case the whole parsing is done by an external library, so I do not control that part. Besides the parsed structure is somewhat expensive while the app is meant to be real-time. Hence I cannot construct the view every time I need it. From what I read so far ouroboros suggested by quinedot and Heliozoa seem to be a bearable workaround, although macroses add too much magic to my taste.

I can call it a "design error", if you prefer that. And the reason is that creating (or trying to create) a self-referential type a sign of misunderstanding the ownership system. They are impossible for a good reason, namely that they would cause dangling pointers.

This is your misconception – since the view refers to the owned data, it necessarily also refers to whatever the owned data is owned by. You can't take a reference to a struct field without considering that the field is inside the struct.

This is not true, either. This is not the compiler's fault, it is a fundamental logical limitation. If you move the struct, then the value at the original location is invalidated. The self-reference won't be updated automatically, so it will still point to the invalidated old place, causing it to be a dangling pointer.

1 Like

It can be sound with enough care and abstraction, and that's what libraries like ouroboros attempt to do. However, a self-referential struct [1] is impossible to use "normally" for a number of reasons, of which data going away (or moving) is only one.


Another is aliasing: a cornerstone of Rust's model is that data behind a &mut can only be observed through that &mut. A usable alias to data behind a &mut is instant UB (even if never used). So consider

struct Snek<'a, T> {
    innards: T,
    ptr: &'a T,
}

fn foo<T>(mut snek: Snek<'_, T>) {
    let observer = snek.ptr; // Makes a copy of the &T
    let exclusive = &mut snek; // or &mut snek.innards
    drop(observer);
}

If snek.ptr points to snek.innards, this program is UB. But it's a perfectly valid and safe Rust program. Therefore, it must not be UB. Therefore, it must be impossible for snek.ptr to point to snek.innards.

So unless the language grows the ability to recognize some self-referential relationship between fields, it's impossible to have a &mut to a self-referential struct, which means it's impossible to move one either (or you could give away ownership to a function like foo who could then take &muts).


Drop glue order is stable, but you can destructure a struct and drop in a different order manually. Also, the compiler would again need to understand the relationship between fields to enforce the drop order (which is currently just syntactically determined).

You can also implement Drop... and Drop::drop takes a &mut self, so the same aliasing considerations come into play.


There are also problems with lifetimes more generally, as those are the result of a static analysis. But if a pointer into one's self had a lifetime, it would have to be at least as long as the value liveness, which is a dynamic property generally. It is, however, statically known for values that don't move.

(But a more common approach is to use non-lifetime carrying pointers of some sort. However then all of Rust's lifetime-related guarantees (like aliasing) need to be upheld in some other way.)

Also the typical pattern for construction goes something like:

  • Create the struct by placing the field to be referred to and using a dummy reference
  • Borrow that field and set the reference, overwriting the dummy value

For normal references, the second step looks something like

// Snek<'a, _>
//         v Must have lifetime 'a
self.ptr = &self.innards
//  ^ mutable access to this field

Which means we must be doing borrow splitting on a &'a mut Snek<'a, _>. This is generally a red flag pattern, as it means Snek<'a, _> is exclusively borrowed for the rest of it's validity. You can only use the Snek via that &'a mut Snek in some way after it is created.

That means you can't create new borrows of it, you can't move it, and you can't call Drop::drop on it either.


Perhaps all that made it more clear why the typical response is "you can't do that (in safe Rust)". Without further language support, is a safe lifetime-based self-referential struct impossible? Actually no, but it has to be very restricted: You can't implement Drop, you can't move or reborrow it after it's created, and this means it's pinned to the stack and all uses of it have to be downstream in the call stack.

But here you go, a self-referential struct in safe Rust.

Does anyone do this in practice? I don't think so, you could just pass references to both the raw_data and the parsed_midi_source down the stack without sticking them both in the same struct instead. The result is the same: A scoped scenario where you do everything downstream in the call stack.

I.e. it doesn't help with your original goal: Something you can pass around, e.g. to (I presumed) spawned threads. That's where lifetimeless approaches like ouroboros come in. But alternatively, you could adapt to the situation midly has put you in, and try to work within the scoped scenario.

One approach that is compatible with the scenario is taking higher-ranked closures.

fn do_work<F>(raw_data: Whatever, f: F)
where
    F: FnOnce(&Whatever, DerivedFromWhatever<'_>)
{
    let derived = raw_data.derive();
    f(&raw_data, derived);
}

Another is to use scoped threads instead of spawned threads.


  1. a struct which owns a reference (&, &mut) or other lifetime-defined pointer to something else it also owns ↩︎

5 Likes

Love the snake reference :sweat_smile:

1 Like

Thanks! I had expected a lot more from the compiler :slight_smile: which likely a good thing since it is easier to see what is happening...

Where can I read more about this? Does this declaration mean that _ is effective by default and 'a only when explicitly used?

In my case the data is will likely be in order of megabytes, so if it cannot move, that is even a good thing. Also both View and Data parts are immutable once loaded. And for immutable data most of the described problems go away.

I have considered putting it into a closure but that seemed a bit bizarre (having a function to only be a data container). Maybe I should have another look.

Yes, the suggested library usage does indeed allocates both on stack and passing it to a function would have been easy. In my case there is a thread that processes events from different sources (implemented as Iterators of events) that can be added dynamically some of the sources are backed by the parsed file. The processor thread is expected to run indefinitely so scoped threads would not help. I do not want to do any I/O or heavy processing in that thread to not introduce occasional delays. So the event source preferably should be constructed elsewhere and then passed to that thread. After reading all the above now I think that just converting all the events from the source into self contained structs probably makes sense as it does not require any trickery. I did not want to do that initially since I already have the data and do not really need any conversions.

_ was just a placeholder for the generic type T here, and I'm not sure how to interpret your question. Oh wait -- maybe you meant the difference between these?

fn briefly_set(x: &mut Thing<'a>) { /* ... */ } // aka &'_ mut Thing<'a>
fn forever_set(x: &'a mut Thing<'a>) { /* ... */ }

In which case, the difference is that the first &mut borrow lifetime can be anywhere as brief as the function call and anywhere as long as 'a, where as in the second the &mut borrow lifetime must be exactly 'a.

Anyway, more about the antipattern in general. The antipattern of &'a mut Thing<'a> is an outer &'a mut containing something else with the same lifetime 'a. The inner lifetime is invariant (cannot be coerced to something longer or shorter) for soundness reasons (to avoid UB), so this means the Thing<'a> is mutably -- exclusively -- borrowed for the entirety of its remaining validity (leading to all those "can't move it, can't Drop::drop it, can't borrow it again, can only use it via that &'a mut somehow" restrictions).

I wish I had a more official citation, but I don't. If you search for more threads about self-referential structs, it comes up for the same reason [1]; sometimes it also comes up when people create lifetime-carrying structs more generally and write something like

impl Foo<'a> {
    fn foo(&'a mut self) { /* ... */ }
    // Self is `Foo<'a>` so that's `&'a mut Foo<'a>`

But the former conversations tend to go like this or even shorter [2], and the latter tend to just go like "remove that 'a on the &mut" (unless they also want something impossible), so I'm not sure how much you'd gain by reading them.


  1. it's the only way to do the construction in safe Rust ↩︎

  2. i.e. there's no straightforward way to do what they actually want ↩︎

2 Likes

A related RFC Pin, just keeping it here for reference.
Also here std::pin.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.