Find elements in vector

ljh · September 25, 2022, 1:58pm

Hand code for loop is better solution than the following : )

I tried with sort then binary_search, but binary_search returns match in random order. How can I improve the task.

Thanks

fn main() {
    let mut vec = vec![
        Foo{name:"aaa".to_string(), num: 10, }, //dup
        Foo{name:"bbb".to_string(), num: 20, },
        Foo{name:"ccc".to_string(), num: 30, },
        Foo{name:"aaa".to_string(), num: 10, }, //dup
    ];

    // vec.sort();
    let mut index: usize = 0;

    while index != vec.len() {
        let ret = vec[index..]
            .iter()
            .position(|x| x == &Foo{name:"aaa".to_string(), num: 10,});

        if let None = ret {
            break;
        }

        if let Some(i) = ret {
            index += i;
            println!("{}:{}: {:?}", file!(), line!(), index);
            index += 1;
        }
    }
}

#[derive(Debug, Eq, Ord, PartialEq, PartialOrd)]
struct Foo {
    name: String,
    num: i32,
}

H2CO3 · September 25, 2022, 2:00pm

What are you trying to do, exactly?

erelde · September 25, 2022, 3:00pm

If what you want is to find the indices of all the duplicates (which is my best guess at what your code does), then simply looping will give you that:

let foos = vec![
    Foo { name:"aaa".to_string(), num: 10 },
    Foo { name:"bbb".to_string(), num: 20 },
    Foo { name:"ccc".to_string(), num: 30 },
    Foo { name:"aaa".to_string(), num: 10 },
];
let indices: Vec<_> = foos
    .iter()
    .enumerate()
    .filter_map(|(i, foo)| if foo == &foos[0] { Some(i) } else { None })
    .collect();
println!("{indices:?}");

But, if you want to actually do something with those indices, maybe you'd be better served by not using indices at all and just using filter or other methods directly.

ljh · September 25, 2022, 3:47pm

There're duplicate elements in the vector

    //vec.sort();
    Foo { name:"aaa".to_string(), num: 10 }, //dup1
    Foo { name:"bbb".to_string(), num: 20 },
    Foo { name:"ccc".to_string(), num: 30 },
    Foo { name:"aaa".to_string(), num: 10 }, //dup2

I sorted the vector and tried to use binary_search to speedup finding the duplicates. But binary_search won't return the matches in order.

vec[0..].binary_search() may return dup2
vec[1..].binary_search() may return dup2, still
vec[2..].binary_search() returns Err

So, binary_search cannot find dup1

    Foo { name:"aaa".to_string(), num: 10 }, //dup 1
    Foo { name:"aaa".to_string(), num: 10 }, //dup 2
    Foo { name:"bbb".to_string(), num: 20 },
    Foo { name:"ccc".to_string(), num: 30 },

Thanks,

erelde · September 25, 2022, 4:22pm

I still don't quite understand what you want. But a binary search is probably not what you want if you don't have an already sorted list.

H2CO3 · September 25, 2022, 4:32pm

If you want to find all occurrences of all duplicates, then sorting the vector is almost the entire solution. In a sorted list, duplicates are necessarily consecutive, so you can just iterate over the vector in a single pass and check if a value is the same as the following value. There is no need to binary search the vector at all.

ljh · September 25, 2022, 4:33pm

Hi, there're two duplicated elements. It finds out only one. How can I find out both with binary search.

Thanks

fn main() {
    let mut vec = vec![
        Foo{name:"aaa".to_string(), num: 10, }, //dup
        Foo{name:"bbb".to_string(), num: 20, },
        Foo{name:"ccc".to_string(), num: 30, },
        Foo{name:"aaa".to_string(), num: 10, }, //dup
    ];

    vec.sort();
    let mut index: usize = 0;
    let x = Foo{name:"aaa".to_string(), num: 10, };

    while index != vec.len() {
        let ret = vec[index..].binary_search(&x);
        if let Err(_) = ret {
            break;
        }
        if let Ok(i) = ret {
            index += i;
            println!("{}", index);
            index += 1;
        }

    }
}

#[derive(Debug, Eq, Ord, PartialEq, PartialOrd)]
struct Foo {
    name: String,
    num: i32,
}

H2CO3 · September 25, 2022, 4:38pm

You can't, that's the point. If there are equal elements in the vector, which of them exactly is returned is intentionally undefined by the documentation of binary_search().

All you need is a fully sequential loop:

fn main() {
    let mut vec = vec![
        Foo{name:"aaa".to_string(), num: 10, }, //dup
        Foo{name:"bbb".to_string(), num: 20, },
        Foo{name:"ccc".to_string(), num: 30, },
        Foo{name:"aaa".to_string(), num: 10, }, //dup
    ];

    vec.sort();
    
    for (i, w) in vec.windows(2).enumerate() {
        let &[ref prev, ref next]: &[Foo; 2] = w.try_into().unwrap();
        
        if prev == next {
            println!("duplicates found at indexes {} and {}: {:?} and {:?}", i, i + 1, prev, next);
        }
    }
}

ljh · September 25, 2022, 4:53pm

Hi, if there're a dozen of duplicates, it keeps repeating the same entries in between.

H2CO3 · September 25, 2022, 4:57pm

Well, yes, I assumed that's good enough for the requirement of "all duplicates". But the above loop is trivial to modify in a way that it only reports one duplicate per group.

ljh · September 25, 2022, 5:09pm

Thanks,

I tried for_each(). It seems it doesn't repeat the same entries.

And I tried other numbers other than 2 with windows(2). But those other numbers don't fit : )

fn main() {
    let mut vec = vec![
        Foo{name:"aaa".to_string(), num: 10, }, //dup
        Foo{name:"bbb".to_string(), num: 20, },
        Foo{name:"aaa".to_string(), num: 10, }, //dup
        Foo{name:"aaa".to_string(), num: 10, }, //dup
    ];

    vec.sort();

    vec.windows(2).enumerate().for_each(|(i, x)| println!("{}: {:?}", i, x));

}

#[derive(Debug, Eq, Ord, PartialEq, PartialOrd)]
struct Foo {
    name: String,
    num: i32,
}

ljh · September 25, 2022, 6:48pm

Hi, is this as fast as binary_search on sorted vector?

I want a faster solution than the following for loop version.

fn main() {
    let mut haystack = vec![
        Foo{name:"aaa".to_string(), num: 10, }, //same
        Foo{name:"bbb".to_string(), num: 20, },
        Foo{name:"ccc".to_string(), num: 30, },
        Foo{name:"aaa".to_string(), num: 10, }, //same
    ];

    haystack.sort();

    let needle = Foo{name:"aaa".to_string(), num: 10, };
    let len = haystack.len();

    for i in 0..len {
        if haystack[i] == needle {
            println!("{}: {:?}", i, haystack[i])
        }
    }
}

#[derive(Debug, Eq, Ord, PartialEq, PartialOrd)]
struct Foo {
    name: String,
    num: i32,
}

H2CO3 · September 25, 2022, 6:58pm

for_each() is completely equivalent to a for loop. It doesn't do anything differently. You are changing the behavior by changing the body of the loop (which is the callback closure in this case). You are simply printing every pair of consecutive elements, you are not actually doing anything to check duplicates.

It's always windows(2) because you want to check if the previous and the next element are the same. I don't see how any other number would make sense.

ljh · September 25, 2022, 7:04pm

Hi, thanks a lot.

I'm poor in English language too. My original intent is to find any specified element in vector. If there're duplicated ones, find out them all too. I'll sort the vector, binary search is preferred than a simple for loop.

// poor for loop version. needed a binary search version.
fn main() {
    let mut haystack = vec![
        Foo{name:"aaa".to_string(), num: 10, }, //same
        Foo{name:"bbb".to_string(), num: 20, },
        Foo{name:"ccc".to_string(), num: 30, },
        Foo{name:"aaa".to_string(), num: 10, }, //same
    ];

    haystack.sort();

    let needle = Foo{name:"aaa".to_string(), num: 10, };
    let len = haystack.len();

    for i in 0..len {
        if haystack[i] == needle {
            println!("{}: {:?}", i, haystack[i])
        }
    }
}

#[derive(Debug, Eq, Ord, PartialEq, PartialOrd)]
struct Foo {
    name: String,
    num: i32,
}

H2CO3 · September 25, 2022, 7:33pm

OK, so your goal is not to find all duplicates, but to find all occurrences of a single element using binary search.

That is possible: find an arbitrary element via binary search, and then look before and after it until you find elements that differ in order to discover the whole cluster of repeated elements:

fn main() {
    let mut haystack = vec![
        Foo { name:"aaa".to_string(), num: 10, }, // same
        Foo { name:"bbb".to_string(), num: 20, },
        Foo { name:"ccc".to_string(), num: 30, },
        Foo { name:"eee".to_string(), num: 50, },
        Foo { name:"aaa".to_string(), num: 10, }, // same
        Foo { name:"aaa".to_string(), num: 10, }, // same
    ];

    haystack.sort();

    let needle = Foo { name:"aaa".to_string(), num: 10, };

    if let Ok(idx) = haystack.binary_search(&needle) {
        let (before, after) = haystack.split_at(idx);
        
        for (i, x) in (idx..).zip(after) {
            if *x != needle { break }
            println!("{}: {:?}", i, x);
        }
        for (i, x) in (0..idx).zip(before).rev() {
            if *x != needle { break }
            println!("{}: {:?}", i, x);
        }
    } else {
        println!("not found");
    }
}

Playground

ljh · September 25, 2022, 8:37pm

Thanks a lot

ljh · September 25, 2022, 8:55pm

C++ has lower_bound and upper_bound. Swift has firstIndex. Rust binary_search won't return the first match if there're multiple ones. The zip function looks like python : )

quinedot · September 25, 2022, 9:15pm

Sounds like you want partition_point.

H2CO3 · September 25, 2022, 9:17pm

if you want to emulate lower_bound and upper_bound, you can always use binary_search_by_key() and provide a tuple (element, index) as the sorting key; this ensures uniqueness.

Also, if you don't have a sorted slice to begin with, consider performing a partial sort using the select_unstable_by_key() family of methods.

ljh · September 26, 2022, 6:11am

Hi,
This is the function I want.
Thanks a lot.

fn main() {
    let mut haystack = vec![
        Foo{name: "aaa".to_string(), num: 10,},
        Foo{name: "bbb".to_string(), num: 20,}, //dup
        Foo{name: "ccc".to_string(), num: 30,},
        Foo{name: "bbb".to_string(), num: 20,}, //dup
        Foo{name: "bbb".to_string(), num: 20,}, //dup
    ];
    let needle =
        Foo{name: "bbb".to_string(), num: 20,};

    haystack.sort();
    let lower = haystack.partition_point(|x| x < &needle);
    let upper = haystack.partition_point(|x| x == &needle);

    if lower == upper {
        println!("not found");
    }
    for i in lower..upper {
        println!("{}: {:?}, {:?}", line!(), i, haystack[i]);
    }
}

#[derive(Debug, Eq, Ord, PartialEq, PartialOrd)]
struct Foo {
    name: String,
    num: i32,
}

Topic		Replies	Views
How to find element positions of a vector help	4	1569	May 14, 2023
Deduplicate vector in place while preserving order help	18	2937	June 22, 2021
Std Trait or impl for checking duplicate item	5	896	April 13, 2021
Mutating duplicate elements of an iterator	11	468	February 25, 2024
Find duplicates in a collection implemented as vector help	6	150	February 17, 2026

Find elements in vector

Related topics