[ NEWBIE ] unconstrained lifetime parameter

Hi Everyone,

i am learning tokio, and i am trying to write a decoder using tokio_util::codec::Decoder , this is just an example i use for learning, not real project. mostly i am trying to get an in depth understanding of lifetimes and how references works in rust

use tokio_util::codec::Decoder;
use bytes::BytesMut;

#[derive(Debug)]
pub struct Greeting<'a> {
    code: &'a BytesMut,
    text: &'a BytesMut,
}

pub struct ClientInitiationDecoder;

impl<'a> Decoder for ClientInitiationDecoder {
    type Item = Greeting<'a>;
    type Error = std::io::Error;

    fn decode(&mut self, buf: &mut BytesMut) -> Result<Option<Self::Item>, Self::Error> {
        if buf.ends_with(b"\r\n") {
            let text = buf.split_off(3);

            return Ok(Some(Greeting {
                code: buf,
                text,
            }))
        }

        Ok(None)
    }
}

i tried all the proposed suggestions by the compiler, but i don't understand this error, can anyone please help ?

   |
12 | impl<'a> Decoder for ClientInitiationDecoder {
   |      ^^ unconstrained lifetime parameter

Compiler cannot understand that Decoder::Item can be constrained by lifetime in such code:

fn get_item<D: Decoder>(d: D)->D::Item{
    ...
}

so it asks to add information about lifetime into trait:

trait Decoder<'a>{
   type Item: 'a;
   type Error;
}

then you would be able to implement it like

impl<'a> Decoder<'a> for ClientInitiationDecoder {
    type Item = Greeting<'a>;
    type Error = std::io::Error;

    fn decode(&mut self, buf: &'a mut BytesMut) -> Result<Option<Self::Item>, Self::Error> {
        if buf.ends_with(b"\r\n") {
            let text = buf.split_off(3);

            return Ok(Some(Greeting {
                code: buf,
                text,
            }))
        }

        Ok(None)
    }

@AngelicosPhosphoros i get another error

12 | impl<'a> Decoder<'a> for ClientInitiationDecoder {
   |          ^^^^^^^---- help: remove these generics
   |          |
   |          expected 0 lifetime arguments

The Decoder trait assumes you implement

    fn decode(
        &mut self, 
        src: &mut BytesMut
    ) -> Result<Option<Self::Item>, Self::Error>;

which has, with elided lifetimes made explicit, a signature like this

    fn decode<'a, 'b>(
        &'a mut self, 
        src: &'b mut BytesMut
    ) -> Result<Option<Self::Item>, Self::Error>;

This method is expected to be generic over two lifetime parameters, and always return a result containing Self::Item, where the Self::Item type cannot incorporate any of the lifetime 'a or 'b.

Your current implementation tries to create a &BytesMut reference from the &mut BytesMut argument in order to place this into a Greeter<'_> to be used as Item; this can only work if the lifetime argument of Greeter could be coupled to the lifetime of the buf: &mut BytesMut argument. This is not allowed by design, for the Decode trait, which is why your implementation cannot possibly work.

1 Like

If you do want a working Decoder implementation, you'd have to take ownership of the bytes you want to consume. If you're new to rust, as a rule of thumb try to avoid defining structs with lifetimes anyways, unless you know you need them. It's hard to tell how exactly your ClientInitiationDecoder is supposed to operate in the first place, since Decoder assumes, according to its documentation:

This method is called by FramedRead whenever bytes are ready to be parsed. The provided buffer of bytes is what’s been read so far, and this instance of Decode can determine whether an entire frame is in the buffer and is ready to be returned.

If an entire frame is available, then this instance will remove those bytes from the buffer provided and return them as a decoded frame. Note that removing bytes from the provided buffer doesn’t always necessarily copy the bytes, so this should be an efficient operation in most circumstances.

If the bytes look valid, but a frame isn’t fully available yet, then Ok(None) is returned. This indicates to the Framed instance that it needs to read some more bytes before calling this method again.

Note that the bytes provided may be empty. If a previous call to decode consumed all the bytes in the buffer then decode will be called again until it returns Ok(None) , indicating that more bytes need to be read.

your implementation is expected to

  • figure out whether enough data is already present to decode a full item
  • if so, consume the relevant prefix from the BytesMut buffer (e.g. using split_to)

whereas what you've written above seems to (try to) always use the entire buffer, however long or short it currently is, provided it just happens to end with \r\n ar the moment, and otherwise returns Ok(None) to indicate more data is needed.

Edit: I missed the ends_with(b"\r\n") initially, changed the last paragraph accordingly.

1 Like

@steffahn assuming that i expect to receive large amount of data, is the current implementation Good for reading bytes from a performance perspective, if not what could be the ideal way of simply giving back a poister to data to the caller after the 3th byte?

    fn decode(&mut self, buf: &mut BytesMut) -> Result<Option<Self::Item>, Self::Error> {
        if buf.ends_with(b"\r\n") {
            let code = buf.split_to(3);
            let code = String::from_utf8(code.to_vec()).unwrap().parse::<usize>().unwrap();

            return Ok(Some(Greeting {
                code,
                text: String::from_utf8(buf.to_vec()).unwrap(),
            }))
        }

        Ok(None)
    }

how expensive is to_vec and from_utf8 ?

Before considering efficiency, let's discuss correctness. Assuming you'll want to split up the input at the first new-line, you'll have to search for that. The check buf.ends_with(b"\r\n") might just never become true as the rather arbitrary position, where the buffer happens to end on each decode call, might never hit any of the newlines. You'd have to do something similar to how the implementation of LinesCodec operates (the implementation there is slightly complicated by the fact that is implements a length-limit for the lines to safeguard against malicious messages consuming an unbounded amount of memory when processed).

@steffahn yes it will be true for protocol's like http, in my case i am serving files that are always guaranteed to end with "\r\n" (files generated by me), as i am trying to learn about async, and performance on rust, without focusing on algorithms, so i can start programming with it on my work.

Unrelated to the correctness discussion, the to_vec call would allocate a new Vec and copy over all the data; the String::from_utf8 will not allocate or copy any data, but it needs to do a scan through the whole data to check whether it's valid UTF-8.

this is the part i am more worried about, and i am trying to avoid the allocation, and the copying without using unsafe

Assuming that you use Framed or FramedRead with the Decode implementor, whats the type that implements AsyncRead that you use this with? I'd assume that usually, when reading a file, that might happen in chunks, too.

this is the client, i am using streamext

my goal is to stream the file straight to disk (not sure how to do that yet), but i am stuck now on the decoder, i know i can skip many parts on a production app, but here its only for learning, so i am trying to add pieces, so i can understand the flow of it

use std::io::Result;
use tokio::net::{ TcpStream, ToSocketAddrs };
use tokio_util::codec::Framed;
use tokio_stream::StreamExt;

use crate::ClientInitiationDecoder;

pub struct Client;

impl Client {
    pub async fn connect<T: ToSocketAddrs>(addr: T) -> Result<Self> {
        match TcpStream::connect(addr).await {
            Ok(stream) => {
                let mut framed = Framed::new(stream, ClientInitiationDecoder);

                while let Some(response) = framed.next().await {
                    dbg!(response.unwrap());
                }

                Ok(Self { })
            },
            Err(err) => Err(err),
        }
    }

    pub fn is_service_ready(&self) -> bool {
        self.last_reply_code() == 220
    }

    pub fn last_reply_code(&self) -> usize {
        todo!()
    }
}

My utimate goal would be to have response.unwrap() resolves to a pointer to the data without allocation or copying

I'm not all that familiar with solving these kinds of things. Also I'm not quite sure I really understand what's going on here. Does the TcpStream produce multiple new-line-delimited things or just one? You want to receive (one or multiple) new-line-delimited piece(s) of data and directly write that to a file, or is the data short enough so that you want to hold a full copy in memory to make sure it's received properly (and possibly simplify the implementation) before starting to write it to a file?

What do you mean with "pointer to the data without allocation of copying". The data has to be somewhere; of course unnecessary additional allocations/copying can always be avoided (but that's usually less crucial, and more of an optimization question, I guess?); do you expect the data on memory or in a file (in the latter case, we could discuss what kind of "pointer" you mean).

Once you know/expressed exactly what you want, if you don't already get an answer here, you could consider opening a new topic dedicated to that question so that it has a more relevant title and people that can help out with such a question can find it. It's quite late here, I'm going offline now anyways :wink:

1 Like

now is my fifth day on rust, so i am still really new, and i am trying to grasp the concepts and the flow of the data in rust, and primarly how can i move data arround using pointers.

i designed a small example that look like this:

  • a web server that contains some files
  • fetch files asynchronously
  • remove letter 'a' from each file (to remove a byte) (use iterators)
  • remove "abcd" from each file (to remove a string) or sequence of bytes (use iterators)
  • save file to disc

i created this small bash script to generate the files

#!/bin/bash

for i in `seq 1 10`; do
    echo -n `pwgen 1000 1`$'\r\n' > ${i}.txt
done

later i will serve all these files via http, and try to process them asynchronously, now i am only trying to process one file, so instead of http, i simply serve one file using socat -u file:1.txt tcp-listen:1028,fork,reuseaddr


By pointer i mean a variable that holds the address to the 4th byte of the data that i received, i understand that i can do that via slices, for ex: buf[3..], and for BytesMut i can use split_to which is O(1) as they said on the doc, so i can return back a struct, that hold that pointer.


The data i received it via network, so its on the memory, that's why since the data is already available to the app, i am trying to avoid new allocations, or copying the data, and simply get back a pointer to the exact byte that i want feed later to an iterator, of course this can be skip, but my goal is to learn how can i pass back a pointer to data in rust.

I'm not sure if I'm understanding properly, but if you just change both &'a BytesMut to BytesMut in the Greeting struct, then it should be what you want, without any extra allocations or copying:

#[derive(Debug)]
pub struct Greeting {
    code: BytesMut,
    text: BytesMut,
}

I'm guessing what may be throwing you off is that BytesMut is indeed already a pointer in disguise -- a smart, ref-counting one that can be "split" off as many times as needed with O(1) cost. While BytesMut owns the data it points to (i.e. cleans up the backing memory when it's dropped), &'a BytesMut or a slice reference &'a [u8] is a statically-checked borrow of some data owned elsewhere. Since the Decoder trait you're working with isn't flexible enough to return such statically-checked borrows (i.e. it doesn't have a lifetime parameter), you will need to returned owned data, and as mentioned previously, BytesMut happens to let you do that with O(1) cost.

1 Like

sorry about my bad english, i use google translate for most of what i write :pensive:

@jessa0 can you plz explain to me more what you mean by the above sentence, i am lost here ? i believe its the part where i start losing it with references..


from what i read about lifetimes, trait Decoder i guess will be expended to something like

trait Decoder<'a, 'b> ...
    type Item = &'a...
    type Error = &'b...

if this correct, why i can not simply assign these lifetimes to data ?

The Decoder trait is defined without any lifetimes on it, so it's not going to get any lifetimes annotated on it.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.