Raw pointer ergonomics

I'm currently working on a VM inspired by interaction nets (a graph-based model of computation), and I need to use a lot of cyclic data structures with manual memory management, i.e. a lot of unsafe and raw pointers. And I'm finding that Rust is extremely painful for this use case. I think a lot of the pain would be alleviated by providing an ergonomic pointer-offset mechanism, which could be as simple as a field access. For example:

use std::ptr;

#[derive(Debug, Clone, Copy)]
struct Node(*mut Node, *mut Node);

fn main() {
    let mut node = Node(ptr::null_mut(), ptr::null_mut());
    let node_ptr = &mut node as *mut Node;
    
    // let node_0_ptr = node_ptr.0; // Why not let this be pointer offset?
    
    // Instead, I have to do this:
    let node_0_ptr = unsafe { &mut (*node_ptr).0 as *mut _ };
}

When I want to get an interior pointer, I would like to be able to just do node_ptr.0. Is there a reason this can't be supported?

It would also be nice if you could directly take a pointer using *mut node rather than needing to do &mut node as *mut Node, but I'm guessing supporting *mut node would considerably complicate parsing (if not make it ambiguous). Luckily, that pain is at least alleviated by automatic coercion from &mut _ to *mut _, and can be further alleviated by a user-defined macro.

1 Like

Go for https://doc.rust-lang.org/stable/std/ptr/macro.addr_of.html.

That gives you a raw pointer, and will help you avoid stacked borrows issues from mutable references keeping you from using stuff.

3 Likes

The question is not whether it could, but whether it should be allowed. IMO, it should not. Raw pointers were never designed to be "ergonomic"; they were designed to be unambiguous and explicit so as to avoid memory safety problems.

(Anyway, I deeply despise conflating ergonomics with arbitrarily cooked up syntactic sugar. That's what most often people actually mean, but there is so much more to ergonomics than that. Having your raw pointers be offset explicily is probably more ergonomic in the bigger picture of things, because it will pre-empt many kinds of hard-to-debug abuse.)

Back to the topic: if you are trying to describe arbitrary graphs, use indices/node IDs/adjacency matrices/etc., basically any other representation instead of raw pointers. Or just use an existing graph library that abstracts away the boilerplate for you.

1 Like

That does look handy, but if I'm not mistaken, it doesn't help with the case where you only have a pointer and you want to just offset that pointer. Am I mistaken?

This is something that is better discussed on IRLO as it’s an idea for improving Rust, rather than asking for help using Rust as it stands today.

Okay, thanks, I'll do that. I suppose it's possible that I'm just missing something, though, so perhaps someone will point out a more ergonomic way to do this here.

In most instances I would, but this is a case where that would make the code even more opaque and less type safe. I need graphs of nodes with non-uniform size, with pointers to the interiors of other nodes.

Or just use an existing graph library that abstracts away the boilerplate for you.

I'm writing a low level VM with strict performance considerations. I use inlined functions to abstract away the pointer offset boilerplate, but I end up needing a separate function for each node and field.

Also perhaps worth noting that the current situation requires sprinkling unsafe everywhere, even though the operation, i.e. pointer offset, is not actually unsafe. This makes it more difficult to minimize the use of the unsafe keyword, which will make future auditing of the unsafe code more difficult.

Pointer offsets are unsafe. Same thing as in C, too, it's just not obvious there.

2 Likes

Can you describe a bit more about your memory management strategy, such as how you're enforcing Rust's no-aliasing for &mut rules and deciding when to deallocate Nodes?

Some existing unsafe-based abstractions, like qcell::TCell, might reduce or eliminate the need for you to write your own unsafe code. Without more information, though, it's hard to make a good recommendation.

That's only because the offset function is unconstrained. If you have a valid node_ptr: *mut Node, there would be no way for node_ptr.0 or node_ptr.1 (i.e. offsetting to the given field) to wrap.

Note that wrapping_offset is not marked unsafe.

Nodes are never exposed to safe rust directly. Only a safe, owning TermGraph wrapper is. Nodes are manually allocated and deallocated based on the semantics of interaction nets. Basically, interaction nets are linear, with explicit dup nodes, all interactions between nodes are local, and when a node will no longer be reachable, it is manually deallocated. Deallocation has to be dynamic, based on the links in the graph.

Some links in the graph require bi-directional pointers to enable local updates, a bit like a double-linked list.

My first instinct here is to use something like Rc<QCell<Node>> for the forward links, Weak<QCell<Node>> for the backlinks, and store the QCellOwner inside TermGraph.

Once that's working, if it's not performant enough in practice, look to replacing the Rcs and Weaks with your own unsafe abstraction that can take advantage of other invariants in your system.

1 Like

Thanks for the suggestion! The overhead of Rc (or really Arc in this case) would be unacceptable for this application, but I'll keep these options in mind in the future!

I got some helpful feedback from the internals forum: Raw pointer ergonomics - #10 by scottjmaddox - Rust Internals

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.