Force lifetime or borrow to end early?


#1

I’m working on a networking stack, and a general pattern that we use for performance is to use a single mutable buffer for both receiving and sending packets. If a packet comes in, it’s placed in a buffer, and when the stack is done with the packet and wants to send out a new packet in response, it simply re-uses the buffer to serialize the new packet.

This leads to a problem: It would be great if non-lexical lifetimes worked across function boundaries (obviously, they do not). In particular, it would be great if we could take a buffer, and parse a packet out of it. Consider the following code, in which we take a &'a mut [u8] and parse it into a TcpSegment<'a> (note the lifetimes - TcpSegment borrows the buffer). We would like to be able to pass both the segment and the buffer into another function call, and, once that callee is done with the segment, we would like the callee to be able to stop using the segment and get the original buffer back. It would be as if NLL extended from the caller to the callee. Here’s a sketch of what would work in an ideal world:

fn receive_segment(src_ip: Ipv4Addr, dst_ip: Ipv4Addr, buffer: &mut [u8]) {
    let segment = TcpSegment::parse(buffer));
    if segment.syn() {
        ...
    } else {
        receive_data_segment(src_ip, dst_ip, segment, buffer);
    }
}

fn receive_data_segment(src_ip: Ipv4Addr, dst_ip: Ipv4Addr, segment: TcpSegment, buffer: &mut [u8]) {
    // Use the TcpSegment
    let conn = get_conn(src_ip, dst_ip, segment.src_port(), segment.dst_port());
    // get other various parameters based on the data in segment.
    let (a, b, c) = conn.receive_data_segment(segment);

    // Now that we no longer use segment, its lifetime ends,
    // and we can use the original buffer to serialize a new segment.
    conn.write_ack(buffer, a, b, c);
    ...
}

Of course, we don’t have cross-function NLL, so this doesn’t work. My question is: How can I make something like this work, possibly using unsafe code? My first attempt was the following. The idea was to wrap the borrowing object (TcpSegment in the previous example) in an object - OwnedBorrow - which could be consumed to get the original reference back.

struct OwnedBorrow<'a, T: 'a + ?Sized, U: 'a> {
    t: *mut T,
    u: U,
    _marker: PhantomData<&'a mut T>,
}

impl<'a, T: 'a + ?Sized, U: 'a> OwnedBorrow<'a, T, U> {
    fn new<F: FnOnce(&'a mut T) -> U>(t: &'a mut T, init: F) -> OwnedBorrow<'a, T, U> {
        let t_ptr = t as *mut T;
        OwnedBorrow {
            t: t_ptr,
            u: init(t),
            _marker: PhantomData,
        }
    }

    fn consume(this: OwnedBorrow<'a, T, U>) -> &'a mut T {
        let OwnedBorrow { t, u, _marker } = this;
        ::std::mem::drop(u);
        unsafe { &mut *t }
    }
}

impl<'a, T: 'a + ?Sized, U: 'a> Deref for OwnedBorrow<'a, T, U> {
    type Target = U;

    fn deref(&self) -> &U {
        &self.u
    }
}

impl<'a, T: 'a + ?Sized, U: 'a> DerefMut for OwnedBorrow<'a, T, U> {
    fn deref_mut(&mut self) -> &mut U {
        &mut self.u
    }
}

Unfortunately, the wise @cramertj pointed out a soundness hole in this design, which you can see in action here:

let mut x = [0; 16];
let y = &mut x;
let z: &mut [u8];
let mut ob = OwnedBorrow::<[u8], Option<&mut [u8]>>::new(y, |x| Some(x));
// z has the lifetime of y, and so outlives ob
z = ob.take().unwrap();
let y = OwnedBorrow::consume(ob);
// z and y are still alive at the same time!
z[0] = 5;
println!("{:?}", y[0]);

So is there a design - similar to this one, or entirely different - which will allow me to simulate cross-functional NLL in this way?

PS: Yes, I know about owning_ref, and no, it doesn’t do what I need.


#2

I must be missing something, but why couldn’t you do the following:

fn receive_data_segment(src_ip: Ipv4Addr, dst_ip: Ipv4Addr, segment: TcpSegment) {
    // Use the TcpSegment
    let conn = get_conn(src_ip, dst_ip, segment.src_port(), segment.dst_port());
    // get other various parameters based on the data in segment.
    let (a, b, c) = conn.receive_data_segment(segment);
    // Now that we're done with segment, consume it and take the buffer back out
    let buffer = segment.take();
    conn.write_ack(buffer, a, b, c);
    ...
}

struct TcpSegment<'a> {
   ...
   buffer: &'a mut [u8],
}

impl<'a> TcpSegment<'a> {
    fn take(self) -> &'a mut [u8] {
         self.buf
    }
}

#3

That’s a good question. The answer is that I didn’t give the full context in my example. In fact, TcpSegment is parsed from a subset of the buffer, and so it is incapable of giving back ownership of the entire buffer. It also carves the buffer up internally (into a header, options, and body). You can see the definition of TcpSegment here.


#4

Ok, it’s a bit hard to see how it all fits together without seeing the full flow, although I think I get what you’re saying.

So what is the buffer that you’d like to release? The original “full” buffer, from which only a subset was borrowed by TcpSegment? And you’re saying when TcpSegment is finished, somehow the entire buffer is released even though it borrowed only a subset of it?

A cheap solution might be to put the whole buffer into a RefCell, and borrow it once TcpSegment is dropped. You’ll get dynamic borrow flag checks per borrow call, but there won’t be any heap allocation.

Sorry, I’m exploring non unsafe alternatives before going in that direction … :slight_smile:

Also, any reason you don’t use smoltcp? I don’t know if it solves your issues, but I’m pretty sure it has TCP segment parsing and other parts of that protocol.


#5

I’m pretty sure this doesn’t work. The issue is that TcpSegment would need to own Ref rather than an actual reference, and it’s not capable of doing that.

I prefer these anyway if possible :slight_smile:

Yes, this is a general-purpose networking stack that is going to be very tailored to our needs. It will grow far beyond just TCP/IP. smoltcp is very cool, though :slight_smile:


#6

Why? Is it not possible to implement ByteSlice for Ref<'a, [u8]>, or whatever the buffer is?


#7

Hmmm, I’ll have to think about that. I’m honestly not sure. I’ll look into it.


#8

OK, I don’t think it will work. Here’s the definition of ByteSlice from the source:

/// `&[u8]` or `&mut [u8]`
///
/// `ByteSlice` abstracts over the mutability of a byte slice reference. It is
/// guaranteed to only be implemented for `&[u8]` and `&mut [u8]`.
pub unsafe trait ByteSlice: Deref<Target = [u8]> + Sized + self::sealed::Sealed {
    fn as_ptr(&self) -> *const u8;
    fn split_at(self, mid: usize) -> (Self, Self);
}

I challenge you to implement split_at for RefMut<[u8]>. I don’t believe it can be done.


#9

Yeah, so that’s a no-go.

So who’s borrowing other parts of the buffer while TcpSegment has a slice? If TcpSegment could borrow the whole buffer (sub-segments can be maintained internally with offsets and slices materialized as needed) then you could release the whole thing in the end.

To your original question of extending NLL across functions, perhaps you can accomplish that with a macro, where the callee is instead a macro, thus allowing the compiler to see borrows within a single fn.


#10

It’s doable, but it’d require re-doing a bunch of runtime checks on every single access. You’d need to reconstruct the header (which would require a length check), reconstruct the options if necessary (which would require re-parsing the entire options section of the header), etc. It would make the parsing code itself much uglier and it would push any bugs to runtime. Right now, all errors surface in the parse method, and once you have a TcpSegment, you’re guaranteed that everything will just work because all of the necessary runtime checks yielded witnesses that everything is valid (e.g., the LayoutVerified object, which is a witness that the byte slice that it owns has the right size and alignment).

That might be possible in certain cases, but I’d definitely prefer a more general solution. There will likely be cases in which a bunch of nested macros will be really unwieldy. But that might work if I can’t figure out anything else that does.


#11

Hmm, not sure I see why it would require redoing safety/validity checks. Couldn’t you perform the validation upfront and then hand out ranges/indices for the sections? The presence of some type wrapping a range can be the witness that the range in the buffer is valid.

But I admit that I’m throwing out ideas without a thorough understanding of the current code flow.

Speaking of ideas, another option might be to return “instructions” from receive_data_segment in your original example. So instead of trying to send data there, instead return the data needed to do a send back up the call stack, and let it “unwind” to the point where TcpSegment is gone; then do the send from there.


#12

This is definitely an option, but one that gets pretty unwieldy pretty quickly. It’s sort of like returning a continuation, but worse because now the caller needs to understand the internals of what the callee might have wanted to do. It makes for pretty unergonomic code. I’m also worried that this would quickly get out of hand with complicated or deeply-nested control flow.

Also, re: RefCell: https://internals.rust-lang.org/t/make-refcell-support-slice-splitting/7707?u=joshlf


#13

Maybe. If the caller and callee are in the same module, and so are semantically cohesive, then some enum returned back (or a continuation closure passed in) doesn’t seem too bad on the surface. But indeed, one would need to look at the resulting code to assess. The nice property would be no unsafe code (presumably).

If it was me, and all the other ideas don’t work for one reason or another and the scope/control flow where aliasing would be present is fairly limited, I’d just go with unsafe code (raw ptrs).

As for RefCell, it sounds like you’re trying to reinvent the bytes crate a bit :slight_smile:


#14

I wish I could use bytes, but all of their types are owned :confused: . I need these to be references.


#15

I’m sure you know, but them being owned is very much intentional. In fact, that’s probably what you could use here. Maybe you can make use of Bytes::try_mut() to reclaim the BytesMut covering the entire buffer once all the (readonly) Bytes are dropped. I’ve not fully thought this through but maybe it’s possible and will act sort of like a splittable RefCell.

And you should be able to impl ByteSlice for it, although not sure about that sealed::Sealed marker.


#16

Yeah, the issue for us is that we’re trying to be agnostic about where the memory came from. Especially because we literally don’t know. We haven’t designed that part of the system yet, and it might come from some fancy inter-process shared memory buffer thing.

Yeah, should be doable.


#17

Oh, I see. So you’d kind of want a BytesMut-like construct, but one that receives a &mut [u8] and only manages the slices, without being able to shrink or expand the backing storage.


#18

Correct.