Indexing into Vec with reference

I am trying to make a graph structure with elements stored in a Vec for something like this.

For the most of my algorithm, the nodes only need read references and managing dataflow through the Tree structure is cumbersome. I'm trying to refractor so that the nodes use references and then when the tree grow from a node, the Tree converts the &Node to an &mut Node (which should be valid since the tree has ownership).

In a C language, you could get the index by (node.ptr - ver.ptr)/node.size. Is there something similar I can do in Rust

Function line: fn mut_node(&mut self, node: &N) -> &mut N;

However cumbersome it is, I can promise that using references directly is going to be much more so due to the need to juggle lifetimes. And as for tricks like pointer arithmetic – you'd have to use raw pointers, which means unsafe. There's no way Rust can allow something like node_ptr - vec.ptr in safe code, because the result of that operation isn't even defined if node_ptr doesn't point into one of the vector's elements.

2 Likes

Correct me if I'm wrong, but I don't think lifetimes will be a major issue since all nodes have the same lifetime (that of the tree).

Are there any additional memory risks to doing the pointer arithmetic and then check if the index is less than the length of the array and greater than 0? If not, how you you get the pointer value from a ref?

Here's the code I've made. Please let me know if you think it could have any unexpected behavior.

pub(crate) fn mut_node(&mut self, node: &N) -> &mut N {
        unsafe {
            let node_ptr = node as *const i32;
            let vec_ptr = &self.nodes as *const i32;
            let diff = (node_ptr as isize - vec_ptr as isize) / size_of::<N>() as isize;
            if let Ok(usize_index) = diff.try_into() {
                if usize_index < self.nodes.len() {
                    self.nodes.get_mut(usize_index)
                } else {
                    panic!("Attempted to get a Node not in the tree");
                }
            }
            panic!("Attempted to get a Node not in the tree");
        }
    }

If the tree owns the nodes and you've borrowed &N from it, then it's impossible to call any &mut self method on the tree. The borrow checker won't allow it.

Even setting aside &mut self, this giving aliased &mut N and &N is immediate Undefined Behavior. Remember that the caller's node still exists, and may have copies in any number of places too.

2 Likes

You could possibly break it into two calls:

impl Tree {
    fn get_index_of(&self, node: &N) -> usize;
    fn get_mut(&mut self, index: usize) -> &mut N;
}

The borrow checker will only let you call the latter when you are free and clear of all shared references.

It's possible you're conflating Rust lifetimes (those '_ things) with the liveness of values / the entire data structure. Even though it's common to also refer to the latter as a value's "lifetime", they are not the same thing.

Rust references are generally for short-term borrows, not long-lived data structures. Their lifetime is part of a compile-time analysis of how long the borrow lasts. In particular, the lifetime is not "until the referent value is no longer alive". Lifetimes are used during borrow checking to ensure memory safety. Additionally, reference types themselves have rules which, if violated, are UB (as already noted in this thread). Overriding the compiler by using unsafe to bypass borrow checker errors often leads to UB, especially if you don't have a solid understanding of the rules. The Rust rules, which are not the same as the C rules.

It's a lot different from working with pointers in C.

(I say this as someone who, when learning Rust, applied knowledge of how things would work in C to reason about how to use unsafe to get around the borrow checker in what I thought was a sensible manner, only to later learn I didn't know Rust's rules and had walked straight into UB.)

For example, "&muts cannot alias" is a fundamental rule, and it doesn't matter who has ownership. To quote the nomicon,

  • Transmuting an & to &mut is always Undefined Behavior.
  • No you can't do it.
  • No you're not special.
4 Likes

I appreciate the help. The way I currently have the datastructure set up is that all nodes and the Tree share the same lifetime ('a). Is there something fundamentally flawed with this approach? I would like the Node to be physically in a Vec for cache performance but still have the flexibility of arbitrary referencing.

Would it still be considered transmuting if I separate as suggested by @cuviper an use unsafe to get the NodeId, and then get the mutable ref?

If you want something where the Nodes have references (&, &mut) into a Vec, and the either

  • the Nodes themselves are in the Vec, or
  • the Vec and the Nodes are in the same containing data structure

then you're talking about a self-referencial struct, which isn't going to be useful.

For example.

#[derive(Default, Copy, Clone)]
struct Node<'a> {
    // `&'a Node<'a>` is already a red flag.  It's shared-borrowed
    // forever so you can never get a `&mut Node<'_>`.
    left: Option<&'a Node<'a>>,
    right: Option<&'a Node<'a>>,
}

struct Tree<'a> {
    nodes: Vec<Node<'a>>,
}

fn example() {
    // This works on its own, but...
    let mut nodes = vec![Node::default(); 2];
    let (root, child) = nodes.split_at_mut(1);
    root[0].left = Some(&child[0]);

    // ...now the `Vec` is exclusively borrowed forever
    let _ = &nodes[0];
    // error[E0502]: cannot borrow `nodes` as immutable
    // because it is also borrowed as mutable
}

You can work around some issues by keeping the data structures separate, but it still has downsides -- your owning data structure would still be borrowed for as long as the tree exists, which also pins it to the stack. (You generally can't move borrowed things, so if you have a reference-based data structure, all work has to happen before the owning structure gets destructed or moved -- including returning it from a function.)

Using references for this use case in general has a lot of downsides because the language has strong invariants they must uphold and really wants to check all those invariants at compile time.


So perhaps what you want is shared ownership in place of references, and interior mutability to so you can still mutate things. (Shared ownership types can only offer & access directly, or they would violate &muts exclusivity requirements.) Two common patterns are Rc<RefCell<_>> for single-threaded and Arc<Mutex<_>> for multi-threaded.

Example.

Or just indices...

#[derive(Default, Clone, Debug, Copy)]
struct Node {
    left: Option<usize>,
    right: Option<usize>,
}

fn example() {
    let nodes = vec![Node { left: Some(1), right: None }, Node::default()];
}

Or atomic indices and perhaps interior mutability for actual data...

use atomic::Ordering::Release;
use std::sync::atomic::AtomicUsize;

// n.b. 0 represents `None` (there is sadly no `NonZeroAtomicUsize`)
#[derive(Default, Debug)]
struct Node {
    left: AtomicUsize,
    right: AtomicUsize,
}

fn main() {
    let nodes = vec![Node::default(), Node::default()];
    nodes[0].left.store(1, Release);
    let _ = &nodes;
}

Or maybe you have a completely owned tree and only the data payload lives in a Vec somewhere, and you use indices for that.

Or perhaps something else, depending on your use case.


You may want to read this book about implementing linked lists in Rust. There's a number of overlapping considerations with implementing a tree.

Sorry there's no good simple answer (or maybe I'm just bad at answering very high-level design questions).

There was a recent thread on the same topic here. Like this thread, people are saying in 10 different ways that self-referential structs should not be used in Rust. :slight_smile:

The post in that thread that seemed like the best resolution to me was the one recommending PetGraph.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.