Design problem: minimizing allocation/copying while parsing for a Tokio Decoder


#1

I’m working on an IMAP implementation based on Tokio. However, I cannot figure out a good solution to this problem. I have a tokio_io::codec::Decoder implementation, like this:

impl<'a> Decoder for ImapCodec {
    type Item = ResponseData;
    type Error = io::Error;
    fn decode(&mut self, buf: &mut BytesMut)
             -> Result<Option<Self::Item>, io::Error> {
        ...
    }
}

I also have a parser that takes a byte slice and returns a Response<'a> representation that keeps a bunch of pointers into the byte slice, so that I can minimize allocation and copying:

fn parse(msg: &'a [u8]) -> Response<'a> {
    ...
}

The parser needs to run before I can figure out how long the underlying message is. After the parser is done, I’d like to put the raw contents together with the Response representation together into an object that I can return to higher layers of the protocol implementation.

The BytesMut can be split, but I cannot explain to the compiler that the bytes that are kept alive by the new BytesMut are actually the same as the bytes in the old BytesMut that the Response is pointing to. And of course I cannot split the BytesMut before parsing because I don’t yet know which part of the BytesMut I need. After parsing I could split and then parse again, but seems inefficient/wasteful.

Any suggestions?


#2

Not sure I fully grok this. BytesMut is essentially a ref counted pointer into some storage. It doesn’t have any lifetime parameters associated with it. So, it’s unclear (to me at least) what the “explanation” is that you’re trying to achieve. Could you expand on that a bit?


#3

No, but the Response struct I have is lifetime-bound to the BytesMut because it has pointers into the BytesMut's storage. If I then split the BytesMut, rustc doesn’t understand that the old BytesMut's storage is the same as the new BytesMut's storage, so it thinks that the Response should no longer be allowed to live.

Does that make more sense?


#4

Does the Response need references into the BytesMut storage? I’d imagine storing just the BytesMut value would be easier.


#5

The Response structure is a somewhat-nested enum which can have several different pointers into the storage. And even if I’d store the BytesMut directly, it wouldn’t solve my problem as I’d still want to split the BytesMut after parsing.


#6

Is it possible for a parser to return just the length of the message and whatever other positional info needed for a real parse (and split) to occur later? Sorry, it’s a bit hard to put the entire picture together based on the info you’ve provided. It might help if you’d include the full code in question, or some minimal concrete code that demonstrates the issues.


#7

You can see an attempt at getting this working here:

https://github.com/djc/tokio-imap/commit/7e0fe600d92525bbd1a6f7f9f1cf91a8b95428a2