An approach to inner pointers and parent pointers - is it safe?


#1

After @pcwalton reminded me that HRTBs exist yesterday, I realized that something I’ve wanted to do for a while seems to be possible. Maybe it’s obvious :0, but I haven’t seen it done before.

The idea is to have a structure which can contain pointers into itself, including, say, child structures which have a pointer back to their parent - preferably without the overhead of reference counting. To do this on the stack is often easy:

struct Test<'a> {
    ptr_into_self: Cell<&'a u32>,
    val: u32,
}
// 'ptr_into_self: &t.val' doesn't work, and we can't mutably borrow it to stick it in after
// the fact, so we must use Cell
let t = Box::new(Test { ptr_into_self: Cell::new(five), val: 5 });
t.ptr_into_self.set(&t.val);

However, t doesn’t feel like a first-class value: it cannot be moved, returned, stored in a previously allocated structure, etc., which is fairly limiting. This makes sense, since in general the inner pointers will become invalid if t is moved; but if it’s boxed, the pointer will remain stable and this should be safe, just like it’s done in umpteen C and C++ programs. (That’s the biggest motivation: even if there are various workarounds and different designs that would work in Rust, having such an ubiquitous coding pattern from other languages be flat-out rejected in Rust feels constraining.) The language doesn’t have any special support for this, but it should be possible to implement in a library.

The idea is to have a special wrapper around Box, which the user can call a borrow method on to get an &'a Test<'a>, which asserts that borrows within the Test and the Test itself live at least as long as each other. This allows t.ptr_into_self.set(&t.val); like above.

  • The user cannot get an &'a mut Test<'a>, because that would be illegal aliasing (since there are always existing immutable borrows). Another active thread is discussing how evil Cells are stylistically, and it probably isn’t wrong, but we need them to do anything interesting. Oh well.

  • borrow<'a>(&'a self) -> &'a Test<'a> would not work, because then the user could store any stack data that their reference to the box outlives into it, only for the pointer to become dangling later. There has to be a new, separate lifetime.

There is no way to do this in a return type, but with HRTB it can be done with a callback:

        fn apply<F, R>(&self, f: F) -> R
            where F: for<'a> FnOnce(&'a Test<'a>) -> R;

This is my full implementation. It has a slight complication to deal with the “Drop-Check Rule”:

use std::mem::transmute;
use std::mem::uninitialized;
macro_rules! declare_container {
    ($container_name:ident, $base_ty:ident) => {

        struct $container_name {
            _raw: Box<$base_ty<'static>>,
        }
        impl $container_name {
            fn new(base: $base_ty<'static>) -> $container_name {
                $container_name { _raw: Box::new(base) }
            }
            fn apply<F, R>(&self, f: F) -> R
                where F: for<'a> FnOnce(&'a $base_ty<'a>) -> R {
                unsafe {
                    if true {
                        f(transmute(&*self._raw))
                    } else {
                        // This dead code asserts that $base_ty does not trigger the
                        // sound-generic-drop dropcheck rule - that is, that its type parameter is
                        // not required to strictly outlive itself.  Requiring that would defeat
                        // the whole purpose.
                        let a: $base_ty = uninitialized();
                        f(&a)
                    }
                }
            }
        }
    }
}

And a more practical example of usage:

struct Test<'a> {
    children: RefCell<Vec<Child<'a>>>,
}
struct Child<'a> {
    parent: &'a Test<'a>,
}
impl<'a> Child<'a> {
    fn get_num_siblings(&self) -> usize {
        self.parent.children.borrow().len() - 1
    }
}

declare_container!(TestContainer, Test);

fn get_tc() -> TestContainer {
    let tc = TestContainer::new(Test { children: RefCell::new(Vec::new()) });
    for _ in 0..5 {
        tc.apply(|t| {
            t.children.borrow_mut().push(Child { parent: t });
        });
    }
    tc
}

fn main() {
    let tc = get_tc();
    println!("{}", tc.apply(|t| t.children.borrow()[2].get_num_siblings()));
}

I am pretty sure there is no way to do this as a generic rather than a macro. Something to look at for HKT… or if it can be done today, I’d love to know how.

I think this is memory safe, other than in the sense that _raw isn’t really private, but I’m not sure. Is there a way to sneak pointers into the wrong places I haven’t thought of?


#2

Hah, I’ve spent some time today playing with the similar idea :wink: I don’t know if it is safe or not, but I’ve arrived at the following code:

use std::marker::PhantomData;

// We would like to store a string and a token referencing it in a single structure
#[derive(Clone, Copy)]
struct Token<'a> {
    start: usize,
    end: usize,
    text: &'a str,
}

// At some point, we will need to write a function
// fn borrow<'a>(&'a self) -> Token<'a>
// except that we will not know the type on the RHS, and won't be able to feed 'a into it.
// So, we will ask user to provide a type function, which can crate instances of types from a lifetime
trait TypeMaker<'a> {
    type Output;
}

enum TokenMaker { }

impl<'a> TypeMaker<'a> for TokenMaker {
    type Output = Token<'a>;
}

// Here is a struct, which holds a borrower and a borrowed. Borrowed is stored on the heap
// and is not moved.
struct Imbox<D: 'static, M>
    where for<'a> M: TypeMaker<'a>
{
    data: Box<D>,
    borrower: <M as TypeMaker<'static>>::Output,
}

impl<D, M> Imbox<D, M>
    where for<'a> M: TypeMaker<'a>
{
    fn new<F>(data: D, f: F) -> Self
        where F: FnOnce(&'static String) -> <M as TypeMaker<'static>>::Output
    {
        let data = Box::new(data);
        // I have no idea if this cast is safe. It probably isn't, but I feel that
        // some variation of it should be safe, precisely because we guarantee that we won't move
        // data after it is placed in a box.
        let r: &'static String = {
            let r: &D = &data;
            unsafe { std::mem::transmute(r) }
        };
        let borrower = f(r);
        Imbox {
            data: data,
            borrower: borrower,
        }
    }

    fn borrow<'a>(&'a self) -> &'a <M as TypeMaker<'a>>::Output {
        // Likewise, I don't know if this is safe, but I hope it should be possible to make it safe.
        unsafe { std::mem::transmute(&self.borrower) }
    }
}

// We probably should implement drop to make sure that borrower is dropped before the data.
// impl Drop for Imbox

fn main() {
    // This compiles
    let imbox = Imbox::<String, TokenMaker>::new("Hello, World".to_string(), |s|{
       Token {
            start: 0,
            end: 5,
            text: s
        }
    });

    let t = imbox.borrow();
}

// And this does not compile
//fn imbox_does_not_live_long_enough() {
//    let t = {
//        let imbox = Imbox::<String, TokenMaker>::new("Hello, World".to_string(), |s|{
//            Token {
//                start: 0,
//                end: 5,
//                text: s
//            }
//        });
//
//        *imbox.borrow()
//    };
//}