Cloning a Reader - non-idiomatic?

feefladder · October 15, 2024, 9:23pm

I am implementing a tiff reader. Now, I've made my own trait for the reader to implement:


#[async_trait::async_trait]
pub trait RangeReader {
    async fn read_range(
        &mut self,
        bytes_start: u64,
        bytes_end: u64,
    ) -> futures::io::Result<Vec<u8>>;
}

Now, if I want to make multiple, concurrent requests from my Image, e.g.

let image = Image::from_reader(range_reader)

/// IFD, not necessarily holding an Image
struct IFD {
  sub_ifds: Vec<IFD>,
  data: BTreeMap<Tag, Entry>
}

struct Image {
  ifd: BTreeMap<Tag, Value>,
  param: u16,
  other_param: u64,
  chunk_offsets: Vec<u64>,
  chunk_lengths: Vec<u32>,
}

impl Image {
  fn from_reader<R: AsyncRead + AsyncSeek + RangeRead + Clone>(reader: R, ifd: BTreeMap<Tag, Entry>) {
    let need_fetch = BTreeMap::new();
    let ifd = BTreeMap::new();
    for (tag, entry) in ifd {
      if let Some(val) = entry.get_val_if_fits() {
          ifd.insert(tag, val)
      } else {
        tag_future = reader.clone().get_range(entry.offset, entry.offset + entry.byte_count());
        need_fetch.insert(tag, tag_future);
      }
    }
    // await all need_fetch here, or put them in the struct, to be used later
  }
}

Now, my question is whether cloning this reader is particularly ugly, mainly because such a clone will not clone the current state of the reader e.g. here. Is that too ugly? are there other options out there that I've overlooked?

An alternative I could think of is to collect all ranges that are needed for the current processing step and then making one get_ranges request with all those ranges. However, the issue still remains, since then I wouldn't want to have to wait on the above read to complete to be able to do another independent read (e.g. read the next ifd, image data from another ifd etc). And - if I understand correctly - having a single reader - even though it is async - still means that from within the same context I cannot make concurrent requests/reads. Thus, cloning the reader would make a new context?

Many thanks!

kornel · October 16, 2024, 6:27pm

Definitely unidiomatic, and could be difficult to use for the callers.

&mut R can be a reader, but can't be cloned (you can make another R, but can't make &mut R directly without storing the R somewhere first, and the Clone interface doesn't give a chance to do that).

And if the reader is wrapping Vec<u8>, it would copy all the data, not just the range you want.

Maybe take a function that gives you a dedicated reader for a range? It's still difficult to provide that, because many readers will be used in parallel, so whatever source they're reading from will need synchronization.

feefladder · October 22, 2024, 4:34pm

Thanks! Indeed, I think a factory pattern, as used here:

#[async_trait]
pub trait MyReader {
    /// Read range from source into buffer. buffer length should match the required length
    async fn read_range(&mut self, range_start: u64, buf: &[u8]);
}

pub trait ReaderFactory {
     fn build_reader(&self) -> impl MyReader;
}

impl Image {
  fn from_reader<R: ReaderFactory>(reader_factory: R, ifd: BTreeMap<Tag, Entry>) {
    let need_fetch = BTreeMap::new();
    let ifd = BTreeMap::new();
    for (tag, entry) in ifd {
      if let Some(val) = entry.get_val_if_fits() {
          ifd.insert(tag, val)
      } else {
        tag_future = reader_factory.build_reader().get_range(entry.offset, entry.offset + entry.byte_count());
        need_fetch.insert(tag, tag_future);
      }
    }
    // await all need_fetch here, or put them in the struct, to be used later
  }
}

But then there is also the other option of having a Synced reader with non-mut read operations , similar to object_store like:

/// Reader trait. 
#[async_trait]
pub trait MyReader: Sync {
    /// build the reader, fetching the header(s)
    async fn new<R>(opts: R) -> Self;
    /// Read range from source into buffer. buffer length should match the required length
    async fn read_range(&self, range_start: u64, buf: &[u8]);
}

simonbuchan · October 22, 2024, 8:31pm

For some of the container formats I've handled, images included, I've found it convenient to split an initial metadata parse that gives you a read-only, shared source, from readers that share said source - though there's still a lot of room to play in that area.

system · January 20, 2025, 8:32pm

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.

Topic		Replies	Views
How to copy Read trait? help	5	598	May 15, 2022
A better idea for my streaming iterator help	8	1124	January 12, 2023
Need help/suggestion to implement iterator/reader help	7	652	June 22, 2021
Not sure about "suspicious clone" lint	5	592	March 3, 2024
BufReader does not implement Copy error in a loop help	9	926	April 27, 2023

Cloning a Reader - non-idiomatic?

Related topics