Distance `Vec` to slice

Might be a bad title, could not think of anything better :smiley:

I roughly have this kind of code:

fn main() {
  let v: Vec<u8> = ...;

  let mut s: &[u8] = &v;
  let mut r: &mut &[u8] = &mut s;
  run_fun(r);

  let i = ... ; // What do I need here?
  v.drain(..i); // want to drain all the element up to idx below
}

fn run_fun(mut s: &mut &[u8]) {
  let idx = ...; // computed at runtime
  s = &s[idx ..];
}

The question is already in there: I want to drain the elements from the beginning of the Vec, without returning idx from run_fun. I still have that slice in s, so I should be able to get this offset, right? I kinda found pointer - Rust, but I'd rather have a safe solution.

Usecase: Vec<u8> is actually read into repeatedly from a file, and I want to parse that, but I want to discard the things I already parsed, which might not be the whole file, so I'd keep the unparsed part in the Vec.

Thanks for any pointers :slight_smile:

Please share more information about your real code/use case.

  • Could run_fun do the draining?
  • Why can run_fun not return i? Can it return something else?

No, your s in run_fun does not change the s in main. To change it, you would need to accept a &mut &[u8] in run_fun, but doing that would probably lead you to a dead end.

2 Likes

Yes, you're right, my real code takes &mut &[u8], I'll fix the code above! In my code, s is an &[u8] element of a struct, and I'm passing &mut struct around, but in simplifying I kinda neglected that, sorry!

Could run_fun do the draining?

Huh, I guess... maybe? It's a parser, so it calls quite some subfunctions, and I was hoping to pass &mut &[u8] to them all and have them move the slice start, since they need to do it anyways. It would also be a performance hazard, since I'd be draining quite often, as opposed to just doing it once.

Why can run_fun not return i? Can it return something else?

Well, it could (it's what I'm doing right now), but it's quite some bookkeeping I have to do, pretty error prone, and it feels like the info is there already. I can return anything I want though, I'm the one writing the function after all and it's not bound to some external API or something :slight_smile:

You might want to use VecDeque<u8> in place of Vec<u8> so that draining from the front doesn't actually move the data around in memory. You can then also make your parsing functions take &mut impl BufRead to conveniently read data from the front (and also support parsing from a file directly, if that becomes necessary).


If you want to keep what you have now, though, you can do this:

fn main() {
  let mut v: Vec<u8> = ...;

  let mut s: &[u8] = &v;
  run_fun(&mut s);

  // Verify slice is the tail of v
  assert_eq!(s.as_ptr_range().end, v.as_ptr_range().end);
  let i = v.len() - s.len();
  v.drain(..i);
 }

fn run_fun(s: &mut &[u8]) {
  let idx = ...; // computed at runtime
  *s = &s[idx ..];
}
3 Likes

I’m curious about the details here. Are you implying that Vec acts as a buffer for some part of the file? Or is is populated in a more complex manner? Also, if it’s a partial view, what does run_fun do if it has too little data loaded?

If I’d have to guess the use-case for what it sounds like, it might be something where usage of std::io::Write could work, and then the buffering could be handled e.g. by using std::io::BufReader<std::io::File>

3 Likes

It's not complicated, I'm watching a log file (via inotify), which I'm keeping an open file descriptor of. Whenever I get a notification change, I read everything I have not yet read into the buffer, then try to parse as much as I can. I don't want to rely on full writes, so it might well be that there's stuff on the end I can't really parse (e.g. a partial line). That's the point where I want to remove everything I parsed from the Vec, but keep everything I didn't, because I won't read it again from the file (and it will probably be completed on the next read).

I don't see how Write could do any work here, though. But maybe I'm not thinking right about all this somehow.

... don't I feel stupid now, that does precisely what I need. Thanks!

Some relevant unstable API:

impl<T> [T] {
pub fn subslice_range(&self, subslice: &[T]) -> Option<Range<usize>>;
}
🔬This is a nightly-only experimental API. (substr_range #126769)
Returns the range of indices that a subslice points to.

Returns None if subslice does not point within the slice or if it is not aligned with the elements in the slice.

This method does not compare elements. Instead, this method finds the location in the slice that subslice was obtained from. To find the index of a subslice via comparison, instead use .windows().position().

This method is useful for extending slice iterators like slice::split.

Note that this may return a false positive (either Some(0..0) or Some(self.len()..self.len())) if subslice has a length of zero and points to the beginning or end of another, separate, slice.

Panics

Panics if T is zero-sized.

Examples

Basic usage:

#![feature(substr_range)]

let nums = &[0, 5, 10, 0, 0, 5];

let mut iter = nums
    .split(|t| *t == 0)
    .map(|n| nums.subslice_range(n).unwrap());

assert_eq!(iter.next(), Some(0..0));
assert_eq!(iter.next(), Some(1..3));
assert_eq!(iter.next(), Some(4..4));
assert_eq!(iter.next(), Some(5..6));
2 Likes

Dang, I've wanted precisely this a bunch of times for parser tokens...

I have also written the str equivalent about 417 times.