Cloning a Reader - non-idiomatic?

I am implementing a tiff reader. Now, I've made my own trait for the reader to implement:


#[async_trait::async_trait]
pub trait RangeReader {
    async fn read_range(
        &mut self,
        bytes_start: u64,
        bytes_end: u64,
    ) -> futures::io::Result<Vec<u8>>;
}

Now, if I want to make multiple, concurrent requests from my Image, e.g.

let image = Image::from_reader(range_reader)
/// IFD, not necessarily holding an Image
struct IFD {
  sub_ifds: Vec<IFD>,
  data: BTreeMap<Tag, Entry>
}

struct Image {
  ifd: BTreeMap<Tag, Value>,
  param: u16,
  other_param: u64,
  chunk_offsets: Vec<u64>,
  chunk_lengths: Vec<u32>,
}

impl Image {
  fn from_reader<R: AsyncRead + AsyncSeek + RangeRead + Clone>(reader: R, ifd: BTreeMap<Tag, Entry>) {
    let need_fetch = BTreeMap::new();
    let ifd = BTreeMap::new();
    for (tag, entry) in ifd {
      if let Some(val) = entry.get_val_if_fits() {
          ifd.insert(tag, val)
      } else {
        tag_future = reader.clone().get_range(entry.offset, entry.offset + entry.byte_count());
        need_fetch.insert(tag, tag_future);
      }
    }
    // await all need_fetch here, or put them in the struct, to be used later
  }
}

Now, my question is whether cloning this reader is particularly ugly, mainly because such a clone will not clone the current state of the reader e.g. here. Is that too ugly? are there other options out there that I've overlooked?

An alternative I could think of is to collect all ranges that are needed for the current processing step and then making one get_ranges request with all those ranges. However, the issue still remains, since then I wouldn't want to have to wait on the above read to complete to be able to do another independent read (e.g. read the next ifd, image data from another ifd etc). And - if I understand correctly - having a single reader - even though it is async - still means that from within the same context I cannot make concurrent requests/reads. Thus, cloning the reader would make a new context?

Many thanks!

Definitely unidiomatic, and could be difficult to use for the callers.

&mut R can be a reader, but can't be cloned (you can make another R, but can't make &mut R directly without storing the R somewhere first, and the Clone interface doesn't give a chance to do that).

And if the reader is wrapping Vec<u8>, it would copy all the data, not just the range you want.

Maybe take a function that gives you a dedicated reader for a range? It's still difficult to provide that, because many readers will be used in parallel, so whatever source they're reading from will need synchronization.

Thanks! Indeed, I think a factory pattern, as used here:

#[async_trait]
pub trait MyReader {
    /// Read range from source into buffer. buffer length should match the required length
    async fn read_range(&mut self, range_start: u64, buf: &[u8]);
}

pub trait ReaderFactory {
     fn build_reader(&self) -> impl MyReader;
}

impl Image {
  fn from_reader<R: ReaderFactory>(reader_factory: R, ifd: BTreeMap<Tag, Entry>) {
    let need_fetch = BTreeMap::new();
    let ifd = BTreeMap::new();
    for (tag, entry) in ifd {
      if let Some(val) = entry.get_val_if_fits() {
          ifd.insert(tag, val)
      } else {
        tag_future = reader_factory.build_reader().get_range(entry.offset, entry.offset + entry.byte_count());
        need_fetch.insert(tag, tag_future);
      }
    }
    // await all need_fetch here, or put them in the struct, to be used later
  }
}

But then there is also the other option of having a Synced reader with non-mut read operations , similar to object_store like:

/// Reader trait. 
#[async_trait]
pub trait MyReader: Sync {
    /// build the reader, fetching the header(s)
    async fn new<R>(opts: R) -> Self;
    /// Read range from source into buffer. buffer length should match the required length
    async fn read_range(&self, range_start: u64, buf: &[u8]);
}

For some of the container formats I've handled, images included, I've found it convenient to split an initial metadata parse that gives you a read-only, shared source, from readers that share said source - though there's still a lot of room to play in that area.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.