How to find element positions of a vector

I have two Vector<i32>:

let origin_vec = vec![1i32, 2, 3, 4];
let target_vec = vec![0i32, 3, 5, 2];

I am trying to define a function, which can find the position (locate in origin_vec) of each element in target_vec. So I can get this

indexin(&origin_vec, &target_vec)
Output: vec![None, Option(2), None, Option(1)]

I define the function indexin like this:

fn indexin<T: PartialOrd>(ori: &[T], target: &[T]) -> Vec<Option<usize>> {
    target
        .iter()
        .map(|x| ori.iter()
                        .position(|y| x == y))
        .collect()
}

But it is too slow, I thought it may be because of that for each element of target, I call ori.iter().
What can I do to make it right?

The larger your arrays are, the worse your performance will be. Your first attempt has to search for the whole array for every single target element, which is obviously going to be slow unless all of the target elements are at the beginning of the array.

One way to fix this is to just generate a full list of all the elements and their locations ahead of time

fn index_by_hash<T: Hash + Eq>(ori: &[T], target: &[T]) -> Vec<Option<usize>> {
    let map: HashMap<&T, usize> = ori.iter().enumerate().map(|(i, t)| (t, i)).collect();

    let mut results = Vec::with_capacity(target.len());
    for t in target {
        results.push(map.get(t).copied());
    }

    results
}

Then every time you have to locate one of the target elements you just look it up in the map. You could probably improve the performance of this version by only initializing the map lazily so you don't pay the penalty of computing all the locations if you only need the first couple.

2 Likes

Thanks for your reply.
The function you mentioned above can work on any type of vector, but in my situation, the ori and target are both sorted. Maybe I can make a use of it. So I define the function like this:

pub fn index_by_sort<T: PartialOrd>(ori: &[T], target: &[T]) -> Vec<Option<usize>> {
    let mut res = Vec::with_capacity(target.len());
    let mut ori_index = 0usize;
    let mut ori_value = &ori[ori_index];
    let mut tar_index = 0usize;
    let mut tar_value = &target[tar_index];
    let mut posi;
    loop {
        if tar_value < ori_value {
            posi = None;
        } else if tar_value == ori_value {
            posi = Some(ori_index);
        } else {
            loop {
                if ori_index >= ori.len() - 1 {
                    posi = None; 
                    break;
                }
                ori_index += 1;
                ori_value = &ori[ori_index];
                if ori_value < tar_value {
                    continue;
                } else if ori_value == tar_value {
                    posi = Some(ori_index);
                    break;
                } else {
                    posi = None;
                    break;
                }
            }
        }
        res.push(posi);
        tar_index += 1;
        if tar_index >= target.len() { break; }
        tar_value = &target[tar_index];  
    }
    res
}

And I made a playground to test the performance on both function.

If they're sorted you can use binary_search to do the minimum number of comparisons.

It's possible for some types that some sort of hashmap lookup would still be faster but you'd have to benchmark your code to know for sure. I think for things that are ~64 bits binary search will probably be fastest.

4 Likes

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.