Blending Vectors of U8 Values

Bear with me as I'm an experienced JavaScript coder but brand new to Rust so I'm having trouble with this task.

Here is the basic problem. Let's say I have a group of vectors with u8 values. They would look like this:

let vec1 = vec![0,0,2,2,4,4];
let vec2 = vec![1,1,0,0,0,0];
let vec3 = vec![0,0,0,0,3,3];

Then let's say I have a second, smaller Vec that represents a "null" slice, such that I want to find the first matching slice of the collection and return that. In this example, the null slice would be:

let null_slice = vec![0,0];

Finally, the result I want is a single vector that matches [1,1,2,2,3,3]
That is, jumping by two (the length of the null slice) and going last to first, I want to return the first slice that doesn't match null. So slice one is 1,1, then slice two is 2, 2, then slice 3 is 3,3, then I put them all into a single vector. Note that it should not use 4,4 in the last block as 3,3 would be found first and therefore it will stop searching.

This is easy in JavaScript, but in Rust I keep creating this mangled heap of iterators and I know there's a better way. Any clues here?

EDIT: One more note is that this would be an unknown number of vectors to merge. Anything from 1 to 20 vectors in most use cases.

Can you show us how you would implement it in JavaScript? Often it's easy enough to mechanically translate a for(var i = 0; i < vec1.length; i++) loop into Rust. You can even do something similar to JavaScript's map() and filter() with iterators if you collect() into intermediate lists.

I'm not sure I understand what you want, but if I do, then it's pretty simple to do naïvely:

fn blend<T>(haystacks: &[&[T]], needle: &[T]) -> Vec<T>
where
    T: PartialEq + Clone
{
    if haystacks.is_empty() { return Vec::new() };

    let mut iters: Vec<_> = haystacks.iter().rev().map(|hs| hs.chunks(needle.len())).collect();
    let mut result = Vec::with_capacity(haystacks[0].len());

    loop {
        let mut found = false;

        for iter in &mut iters {
            let Some(slice) = iter.next() else { return result };

            if slice != needle && !found {
                result.extend_from_slice(slice);
                found = true;
            }
        }
    }
}
4 Likes

Absolutely. Let me demonstrate (not tested here, but it should work as an example).

const int1 = new Uint8Array([0,0,2,2,4,4]);
const int2 = new Uint8Array([1,1,0,0,0,0]);
const int3= new Uint8Array([0,0,0,0,3,3]);

const blank1 = new Uint8Array([0,0]); 
const blank2 = new Uint8Array([0,0]); 
const blank3 = new Uint8Array([0,0]); 

const matches = (slice, blank)=> {
  const matchCount = 0;
  for(let i =0; i< blank.length; i++) {
     if(blank[i] == slice[i]) {
        matchCount ++;
     }
  }
  return matchCount == blank.length;
}

const merge = (blanks, numbers) => {
   const byteLength = numbers[0.]length;
   const blockLength = blanks[0].length;
   const result = new Uint8Array(byteLength);
   for(let i = blanks.length-1; i >=0; i--) {
      for(let j = 0; j < byteLength; j += blockLength) {
          const slice = numbers[i].slice(j, blockLength);
          if(matches[blanks[i], slice) {
             result.set(slice, j);
          }
      }
   }
   return result;
}

Sorry if this posted a few times. Turns out that forum software isn't a great coding experience :wink:

This code looks like magic and I'm not sure I totally understand it, but thank you! What if, as in my example below, each haystack has a corresponding needle, and so the idea is to go row by row and only match if the needle matches for that row, else go to the next row and compare that needle to that row?

Here’s some modified version of the code.

use std::iter;

fn blend<T>(haystacks: &[&[T]], needles: &[&[T]]) -> Vec<T>
where
    T: PartialEq + Clone
{

    if haystacks.is_empty() { return Vec::new() };

    // some validity checks for my sanity 😅
    assert!(haystacks.len() == needles.len());
    assert!(haystacks.iter().all(|hs| hs.len() == haystacks[0].len()));
    assert!(needles.iter().all(|n| n.len() == needles[0].len()));

    let mut needles_and_iters: Vec<_> = iter::zip(haystacks, needles).rev().map(|(hs, n)| (*n, hs.chunks(n.len()))).collect();
    let mut result = Vec::with_capacity(haystacks[0].len());

    loop {
        let mut found = false;

        for (needle, iter) in &mut needles_and_iters {
            let Some(slice) = iter.next() else { return result };

            if slice != *needle && !found {
                result.extend_from_slice(slice);
                found = true;
            }
        }
    }
}

Steffahn gave you the answer to that, but I still wanted to ask which part in my original you didn't understand.

2 Likes

I think Rust's syntax is still screwy in my head. I don't understand how the loop exits without going infinitely, and some of the methods like extend_from_slice I had to look up. I'm slowly looking at the docs to piece it together, but can you add some comments to your playground to explain how you got to this solution? I really appreciate you taking the time with me. I'm already blown away by Rust but it definitely has a learning curve for my feeble brain :wink:

1 Like

In short, it's because somewhere in the loop body is a diverging statement, like the return keyword. [1]

Yeah, this is a very common situation. I am basically useless without documentation, no matter what language I'm using.

I can never remember if I should use String.prototype.substr() or String.protottype.substring() in JavaScript (or if they have an underscore or not, apparently). What's the order of the haystack and needle function parameters for that search function, again? How do I print with a specific amount of padding on the left or right side? (Oh, there's a whole DSL for this, I'm never going to remember any of that, lol.) Uh, how do I write a regular expression that can parse floating point numbers? Welp, I'm going to want to test this in Debuggex... Oh, I can simplify it by just using string splitting. But how I do make it stop splitting after n occurrences? Was it an optional argument, or a different method name?

Ok, I'll stop. You get the idea.


  1. Other examples: break, ?, and a million ways to panic. ↩︎

1 Like

That's what return result does.

As I mentioned, it's the naïve solution that does nothing more than your description. It obtains iterators to subslices ("chunks") into each slice, advances them in lockstep, and when it finds a subslice that matches, its elements are appended to the result vector.

I think I'm starting to grok it. I added in a fallback check (in case it is not found anywhere and some assertions at the end.

You can just copy this code 1 to 1 to rust. You don't have to use iterators. Or to be more exact you can use indexing and replicate the for loops from your javascript example.

The for loop for example would look like this

for i in (0..blanks.len()).rev() {
    for j in (0..byte_length).step_by(block_length) {
        let slice = &numbers[i][j..block_length];
        if matches(blanks[i], slice) {
             result[j] = slice;
        }
    }
}

Also I don't think that this is realy about rust vs javascript but more about index based for loops vs for-each loops. And the thing that probably confuses you the most are the functional style iterator adapters like map(), chunks() etc. But you can skip them for now and learn them once you feel more comfortable with rust. The only ones you need to replicate index based for loops are the range start..end, rev() to go in reverse (like i--) and step_by() to change the step size (like j+=blockLength).

2 Likes

I tried to implement this and got stuck on some compile errors related to usize vs u32 types. I haven't gotten to the matches function yet!

It looks like the algorithm could use some error handling, after all. Also, taking separate haystacks and needles is non-idiomatic if their lengths must match. You could take a slice of pairs, in order to use the type system instead of run-time assertions for ensuring that every haystack has a corresponding needle.

I made your code more idiomatic here:

fn blend<T>(haystacks_and_needles: &[(&[T], &[T])]) -> anyhow::Result<Vec<T>>
where
    T: PartialEq + Clone
{
    let Some((hs_fst, nd_fst)) = haystacks_and_needles.get(0) else {
        anyhow::bail!("empty input")
    };

    anyhow::ensure!(haystacks_and_needles.iter().all(|(hs, nd)| {
        hs.len() == hs_fst.len() && nd.len() == nd_fst.len() && !nd.is_empty()
    }));

    let mut iters_and_needles: Vec<_> = haystacks_and_needles
        .iter()
        .rev()
        .copied()
        .map(|(hs, nd)| (hs.chunks(nd.len()), nd))
        .collect();

    let mut result = Vec::with_capacity(hs_fst.len());

    loop {
        let mut found = false;

        for (iter, needle) in &mut iters_and_needles {
            let Some(slice) = iter.next() else { return Ok(result) };

            if slice != *needle && !found {
                result.extend_from_slice(slice);
                found = true;
            }
        }

        anyhow::ensure!(found, "not found");
    }
}

Why are you converting the index to u32? You are making your own life harder. Just leave every index as usize, which is the natural choice.

Anyway, there's a lot more wrong with that code:

  • There are several other type mismatches (e.g. the return type should be a vector of slices)
  • vec!(byte_length) constructs a vector of length 1 with the single element byte_length. That's not what you want. It looks like you wanted to make a vector of length byte_length, but that's also not what you want, because if you return a vector of slices, then its length will only be byte_length / block_length.
  • Similarly, for indexing into result, you need the index j / block_length, not j.
  • Your loops are nested the wrong way around.
  • If there's a match, you don't break out of the inner loop, which makes the result incorrect (it will contain the last match instead of the first one).
  • slice[j..block_length] slices from index j to index block_length, which is wrong (the end should be j + block_length)

It doesn't look to me like any of this (except the syntax for constructing a vector) is specific to Rust. The algorithm presented here would have been equally wrong in JavaScript.

In any case, here is the algorithmically fixed code. It's still non-idiomatic, doesn't contain any error checking, can panic, contains divisions (which are expensive but fixable with enumerate()), and probably generates a lot more bounds checks than warranted.

I typed this solution out as well now^^
Well, I guess there's no point in posting almost the same code again.
The j..j+block_length is my bad.

I have one further point.
It's a bit unclear in the whole discussion whether the 'needle' should be found or not. In your description of the problem you say

whitch would make @H2CO3 's metaphor of haystack and needle a bit reversed, because you try to not find the needle in the haystack.
On the other hand your code example contains the matches function which seems to test for equality and then you include the slice in your result.

The expected results suggest that it should not be found. The matches name can arguably be interpreted more abstractly than equality; one could define a "match" as not equal to the null in this case. (Not that I think it's a good naming convention, though.)

Yeah, this was not meant to demonstrate production code so names are screwy and it wasn't tested. I just wanted to understand how I could operate on vectors, which was totally confusing me. For what it's worth, I was able to combine these examples with my actual task and got results that passed assertions, so I think I'm on my way here, at least until I hit my next stumbling block which should happen tomorrow right on schedule :wink:

Thanks, everyone!

1 Like