What really counts as self referential vs DAG?

I've heard a few times people say "Rust prohibits self referential structs" and "Safe Rust requires your object graph to be a DAG" (assuming you don't want leaks, so ignoring that Rc allows cycles) as sort of equivalent off hand comments but I'm wondering if that's really correct.

The classic example of an obviously self referential struct would be:

struct Foo(i32, &i32);

Where the self.1 would point at self.0 -- the compiler will stop you because there's no way to write the lifetime you need for self.1.

Now consider a Box holding an i32 and a reference pointing into the Box:

struct Foo(Box<i32>, &i32);

This still won't compile because there is still no way to write the lifetime for the reference. However if you draw a graph with a node for each allocation and directed edges for each pointer, you can see this data structure is a DAG. So AFAICT it's incorrect to say that Safe Rust forces you to express your data structures as DAGs, it's actually enforcing something stricter, but I don't know how to succinctly describe it.

You might then say that an object and all of its suballocations belong to same "node" in the graph, therefore the struct above is really cyclic and so not a DAG. However, this is allowed:

struct Foo(Rc<i32>, Rc<i32>);

Where both Rc point to the same i32. If all suballocations count as belonging to the same node, then this is cyclic, yet it is allowed. And really if you take it to an extreme it's nonsensical because everything is a suballocation of something else, so all objects would be the same node and all pointers would be self-referential.

1 Like

As far as borrow checking is concerned, i32 and Box<i32> are essentially the same, so (rightfully) ruling out (i32, &i32) will prohibit (Box<i32>, i32), too.

If you consider the former not to be a DAG, maybe it helps the intuition if you would - for borrows (i. e. references) - not necessarily only consider the target they point to, but also the original thing that was borrowed in the first place: I you assume thst you can only get a &i32 by borrowing the whole Box<i32> and then transforming this &Box<i32> to &i32, then - as far as lifetimes are concerned - the &i32 would not literally point to the whole Box<i32> field, but at least it's derived from such a borrow, which has almost the same effect.

3 Likes

So, in other words, it's the ownership/borrowing graph which is the DAG that Rust requires, not just the physical memory layout. The raw memory layout is a transitive-subset of the ownership graph, but the ownership graph can and often does point to ancestors of where the physical memory references point.

4 Likes

Could you elaborate on what you mean by transitive-subset? Both an allocation graph and an ownership graph would necessarily involve all objects, but I would expect what we consider a node to not be the same in both graphs so I'm not sure how one can subset the other.

In graphviz notation, here is how I would picture each graph. Ownership graph for struct Foo(Box<i32>, &i32):

"Foo" -> "Box<i32>"
"Foo" -> "&i32"
"Box<i32>" -> "i32"

Memory allocation graph for struct Foo(Box<i32>, &i32):

"Foo.0" -> "i32"
"Foo.1" -> "i32"

Ownership graph for struct Foo(Rc<i32>, Rc<i32>):

"Foo" -> "Rc<i32> (1)"
"Foo" -> "Rc<i32> (2)"
"Rc<i32> (1)" -> "i32"
"Rc<i32> (2)" -> "i32"

Memory allocation graph for struct Foo(Rc<i32>, Rc<i32>):

"Foo.0" -> "i32"
"Foo.1" -> "i32"

Hmm neither of my graphs have ancestor connections, but I am assuming owners point to owned. Do you mean a graph where let b = Box::new(10); let c = &*b; results in:

"c" -> "b"

?

If you're not using Rc, then the ownership graph must be a tree, not just acyclic.

Basically, the rule is that for non-Rc pointers, they can only have one parent (incoming edge) in the ownership graph.

6 Likes

I mean that in the Foo(Box<i32>, &i32) example, the physical memory layout is

"Foo.0" -> "i32"
"Foo.1" -> "i32"

but the ownership/borrowing graph is

"Foo.0" => "i32"
"Foo.1" -> "Foo.0"

The physical memory graph is a "transitive-subset" in that `"Foo.1" has edges to the subgraph it points to in the ownership/borrowing graph.

The Foo.0 is an "ancestor" of the i32 in the memory graph, because it can reach the i32. A direct parent is an ancestor, just a simple one.

2 Likes

The approximation I recently used over here is

  • a self-referential struct is one that owns a reference (&, &mut) or other lifetime-defined pointer to something else it owns

Which doesn't have anything to do with Rc or other lifetimeless models. It's the approximation I use for the "I want the borrower and the borrowee to live together" discussions (not "my Rc cycle leaks memory" discussions... of which I've seen barely any on this forum, vs. dozens of the other). I don't consider those two to be related really, and memory leaks aren't part of Rust's safety guarantees.

The other topic mentioned DAGs and cycles too. If you combine your ownership and "points at" graph (Foo.1 :arrow_right: *Foo.0), the key part with the above definition is being able to reach the same memory through different paths, as that implies I can split an exclusive borrow at some point (&mut Foo to &mut Foo.0 and &mut Foo.1) but observe the same memory.

That's a cycle if you erase the directions, but I think of it more as "owning a reference (&) into one's self".

But if you don't have alias-free guarantees (via &mut), most the problems tend to go away: it's ok to observe the same memory. (Note that Rc and other "shared ownership" types don't offer &mut unless they can guarantee a single owner.)

1 Like

I see, this makes sense if you consider what the lifetime annotation for the reference would be if you could write it, it would be a lifetime corresponding to the box.

The point is that the borrow checker doesn't know about things like "the heap". If you use an operation that's equivalent with calling a function of the signature fn(&T) -> &U, i.e. a projection, then the compiler will unconditionally assume that the &U points "inside" (to a component of) the T.

This is for two reasons, one technical and one design-related:

  1. What specific memory segment a value lives in is a dynamic property (you can move stuff around freely at basically any time), so it's not really expressible using lifetimes, which are a compile-time concept. For example, even an Rc could be implemented by first storing the value inline, and only moving it to the heap when the Rc is cloned.
  2. Code that uses references shouldn't need to care where the reference comes from. It would be annoying if e.g. HashMap::get() took the key by "reference-to-heap" (if such a type existed), just because it heap-allocates keys. You couldn't then use a stack-allocated key to look up values in the map.

These two together mean that for all the compiler is concerned, it must conservatively act as if a reference derived from another one points inside the allocation of the pointee of the original reference. Thus, your struct Foo(Box<i32>, &i32); is not a DAG: self.1 borrowing from the Box borrows from self.0, hence a cycle.

A good mental model to understand this is to temporarily pretend that the value you get from dereferencing a smart pointer (or any other type) is always a field directly stored inside the Box/Rc/Vec/etc.

2 Likes