Make Compiler Understand that Vec::clear() removes references; Or: Reuse allocation to store temporary references

#1

I have a need for a temporary buffer to store messages in before serialization. I don’t want to reallocate each time I run the function, so I have to save it.

What I have looks like

struct SharedState<'a> {
  data: HashMap<KeyType, ValueType>
  buf: Vec<Message<'a>> // Message borrows from data
}

I only need to use it in one function body

fn send(&mut self) {
   self.buf.clear();
   // fill buf with messages, taking references to self.data...
   handle_msgs(&buf);
   self.buf.clear(); // no more references held to data
}

Obviously this doesn’t compile, as the compiler doesn’t understand that the vector is empty and sees that any other method could mutate self.data and invalidate the references. Passing in a vector to be used doesn’t help either, as fundamentally the data inside self.buf is only valid as long as we have the unique reference to self.data, which is only the function scope.

Since this is performance critical, here is my current solution with unsafe

struct SharedState {
    data: HashMap<KeyType, ValueType>,
    msg_buf: (*mut DashboardMessage<'static>, usize, usize),
}

fn send(&mut self) {
    let mut msg_buf: Vec<DashboardMessage<'a>> = unsafe {
        Vec::from_raw_parts(
            std::mem::transmute(self.msg_buf.0),
            self.msg_buf.1,
            self.msg_buf.2,
        )
    };
    // fill buf with messages, taking references to self.data...
    handle_msgs(&buf);
    self.msg_buf.0 = unsafe { std::mem::transmute(msg_buf.as_mut_ptr()) };
    self.msg_buf.1 = msg_buf.len();
    self.msg_buf.2 = msg_buf.capacity();
    std::mem::forget(msg_buf);
    self.buf.clear(); // no more references held to data
    }

Or another way I thought of:

struct SharedState {
    data: HashMap<KeyType, ValueType>,
    msg_buf: Vec<DashboardMessage<'static>>,
}

fn send<'a>(&'a mut self) {
    let msg_buf: &mut Vec<DashboardMessage<'a>> =
        unsafe { std::mem::transmute(&mut self.msg_buf) };
    // fill buf with messages, taking references to self.data...
    handle_msgs(buf);
    msg_buf.clear(); // no more references held to data
}

Is there a better way? Which of the two above is better? Are they safe?

#2

If handle_msgs panics, then msg_buf will still contain those messages as if they were 'static!

#3

So I suppose I should make a wrapper type that clears the vector it references in Drop?

EDIT: I implemented one: https://gist.github.com/Lytigas/dc26f5c7a5f48676c46eca7d639b1690
If someone could take a look it would help me out a lot. This is my first time using lifetimes for real. It’s pretty amazing how far you can get without anything too complex.

1 Like
#4

Would perhaps reallocating a vector each time work? Especially with a preset size? Or if you really want to get rid of the lifetimes the you can just use a reference counting type or a Box, but that seems abit too much overhead for the performance critical. You could maybe use a private mutable static value as a 'static buffer, but that wouldn’t fix anything. Hmm, this one is puzzling.

#5

Can you store the vec in a thread local in your function instead of in the shared state?

#6

Does handle_msgs() need a vec (or a slice)? Can it work off an Iterator? Perhaps you can stream the msgs through without needing any buffer at all - materialize the required data off the data in the HashMap. But perhaps this doesn’t work for you, but since you didn’t say it, I thought I’d ask :slight_smile:.

#7

Don’t think thread local helps, really - the Vec wants to hold a Message<'temp_borrow>, but thread local wants T: 'static, of course.

Fundamentally, there’s no good way to store a shorter borrow in a container wanting a longer borrow under the pretense of “I promise I’ll clean up” - that promise will need to come in the form of unsafe code.

#8

It’s a long shot, but wouldn’t an arena allocator help you in your goal? https://crates.io/crates/typed-arena

#9

Unfortunately this wouldn’t work, as the data is being serialized with serde, which doesn’t support iterators. It does work with a reference to a vector though.

Although I have some example code in an SO question along these lines, but I’m not sure how to implement that myself,

#10

I think the arena allocator is designed to give disparate objects the same lifetime. My problem is a little different. I want to serialize using references to a mutable data structure. Even if the message buffer had the same lifetime as the data, we couldn’t ever mutate data itself later, as this would invalidate the references in the vector.

#11

The only issue I can see on casual inspection is that it’s technically perfectly safe for a program to never drop a value. That means that you can have safe code trigger undefined behavior by using the unsafe code incorrectly, which is, at least from a strict safety standpoint, a bad thing.

For improved safety, though, you just need a combination of your original approach and your newer wrapper:

struct VecHolder {
    ptr: *mut Message<'static>,
    capacity: usize,
}

impl VecHolder {
    pub fn new() -> Self {
        Self {
            ptr: std::ptr::null_mut(),
            capacity: 0,
        }
    }

    pub fn take<'a>(&mut self) -> Vec<Message<'a>> {
        if self.capacity == 0 {
            return Vec::new();
        }
        let ptr: *mut Message<'a> = unsafe { std::mem::transmute(self.ptr) };
        let capacity = self.capacity;
        self.capacity = 0;
        self.ptr = std::ptr::null_mut();
        unsafe { Vec::from_raw_parts(ptr, 0, capacity) }
    }

    pub fn give<'a>(&mut self, mut data: Vec<Message<'a>>) {
        data.clear();
        let ptr: *mut Message<'static> = unsafe { std::mem::transmute(data.as_mut_ptr()) };
        let capacity = data.capacity();
        std::mem::forget(data);
        self.clear();
        self.ptr = ptr;
        self.capacity = capacity;
    }

    pub fn clear(&mut self) {
        if self.capacity > 0 {
            std::mem::drop(self.take());
        }
    }
}

impl Drop for VecHolder {
    fn drop(&mut self) {
        self.clear()
    }
}

#[derive(Debug)]
struct Message<'a>(&'a str);

struct SharedState {
    msg_buf: VecHolder,
}

impl SharedState {
    pub fn dispatch_events(&mut self, data: &[&str]) {
        let mut msg_buf = self.msg_buf.take();
        // push a bunch of messages
        for m in data.iter() {
            msg_buf.push(Message(m));
        }
        println!("Messages: {:?}", &msg_buf);
        self.msg_buf.give(msg_buf);
    }
}

fn main() {
    let mut state = SharedState {
        msg_buf: VecHolder::new(),
    };
    state.dispatch_events(&["a", "b", "c", "d"]);
    state.dispatch_events(&["e", "f", "g", "h"]);
}

Disclaimer: I’m not a seasoned Rust expert. I put this code through only very minimal testing, and I’m also writing while tired. This could very easily be even more unsafe than your gist if it doesn’t work the way I think it does.

That being said, if this actually works as intended, it would be interesting to generalize it to support arbitrary types. That’ll probably be non-trivial to do safely, though.

#12

Have you looked into/considered https://docs.serde.rs/serde/ser/trait.Serializer.html#method.collect_seq?