Find elements in vector

Hand code for loop is better solution than the following : )

I tried with sort then binary_search, but binary_search returns match in random order. How can I improve the task.

Thanks

fn main() {
    let mut vec = vec![
        Foo{name:"aaa".to_string(), num: 10, }, //dup
        Foo{name:"bbb".to_string(), num: 20, },
        Foo{name:"ccc".to_string(), num: 30, },
        Foo{name:"aaa".to_string(), num: 10, }, //dup
    ];

    // vec.sort();
    let mut index: usize = 0;

    while index != vec.len() {
        let ret = vec[index..]
            .iter()
            .position(|x| x == &Foo{name:"aaa".to_string(), num: 10,});

        if let None = ret {
            break;
        }

        if let Some(i) = ret {
            index += i;
            println!("{}:{}: {:?}", file!(), line!(), index);
            index += 1;
        }
    }
}

#[derive(Debug, Eq, Ord, PartialEq, PartialOrd)]
struct Foo {
    name: String,
    num: i32,
}

What are you trying to do, exactly?

If what you want is to find the indices of all the duplicates (which is my best guess at what your code does), then simply looping will give you that:

let foos = vec![
    Foo { name:"aaa".to_string(), num: 10 },
    Foo { name:"bbb".to_string(), num: 20 },
    Foo { name:"ccc".to_string(), num: 30 },
    Foo { name:"aaa".to_string(), num: 10 },
];
let indices: Vec<_> = foos
    .iter()
    .enumerate()
    .filter_map(|(i, foo)| if foo == &foos[0] { Some(i) } else { None })
    .collect();
println!("{indices:?}");

But, if you want to actually do something with those indices, maybe you'd be better served by not using indices at all and just using filter or other methods directly.

There're duplicate elements in the vector

    //vec.sort();
    Foo { name:"aaa".to_string(), num: 10 }, //dup1
    Foo { name:"bbb".to_string(), num: 20 },
    Foo { name:"ccc".to_string(), num: 30 },
    Foo { name:"aaa".to_string(), num: 10 }, //dup2

I sorted the vector and tried to use binary_search to speedup finding the duplicates. But binary_search won't return the matches in order.

vec[0..].binary_search() may return dup2
vec[1..].binary_search() may return dup2, still
vec[2..].binary_search() returns Err

So, binary_search cannot find dup1

    Foo { name:"aaa".to_string(), num: 10 }, //dup 1
    Foo { name:"aaa".to_string(), num: 10 }, //dup 2
    Foo { name:"bbb".to_string(), num: 20 },
    Foo { name:"ccc".to_string(), num: 30 },

Thanks,

I still don't quite understand what you want. But a binary search is probably not what you want if you don't have an already sorted list.

If you want to find all occurrences of all duplicates, then sorting the vector is almost the entire solution. In a sorted list, duplicates are necessarily consecutive, so you can just iterate over the vector in a single pass and check if a value is the same as the following value. There is no need to binary search the vector at all.

3 Likes

Hi, there're two duplicated elements. It finds out only one. How can I find out both with binary search.

Thanks

fn main() {
    let mut vec = vec![
        Foo{name:"aaa".to_string(), num: 10, }, //dup
        Foo{name:"bbb".to_string(), num: 20, },
        Foo{name:"ccc".to_string(), num: 30, },
        Foo{name:"aaa".to_string(), num: 10, }, //dup
    ];

    vec.sort();
    let mut index: usize = 0;
    let x = Foo{name:"aaa".to_string(), num: 10, };

    while index != vec.len() {
        let ret = vec[index..].binary_search(&x);
        if let Err(_) = ret {
            break;
        }
        if let Ok(i) = ret {
            index += i;
            println!("{}", index);
            index += 1;
        }

    }
}

#[derive(Debug, Eq, Ord, PartialEq, PartialOrd)]
struct Foo {
    name: String,
    num: i32,
}

You can't, that's the point. If there are equal elements in the vector, which of them exactly is returned is intentionally undefined by the documentation of binary_search().

All you need is a fully sequential loop:

fn main() {
    let mut vec = vec![
        Foo{name:"aaa".to_string(), num: 10, }, //dup
        Foo{name:"bbb".to_string(), num: 20, },
        Foo{name:"ccc".to_string(), num: 30, },
        Foo{name:"aaa".to_string(), num: 10, }, //dup
    ];

    vec.sort();
    
    for (i, w) in vec.windows(2).enumerate() {
        let &[ref prev, ref next]: &[Foo; 2] = w.try_into().unwrap();
        
        if prev == next {
            println!("duplicates found at indexes {} and {}: {:?} and {:?}", i, i + 1, prev, next);
        }
    }
}
2 Likes

Hi, if there're a dozen of duplicates, it keeps repeating the same entries in between.

Well, yes, I assumed that's good enough for the requirement of "all duplicates". But the above loop is trivial to modify in a way that it only reports one duplicate per group.

1 Like

Thanks,

I tried for_each(). It seems it doesn't repeat the same entries.

And I tried other numbers other than 2 with windows(2). But those other numbers don't fit : )

fn main() {
    let mut vec = vec![
        Foo{name:"aaa".to_string(), num: 10, }, //dup
        Foo{name:"bbb".to_string(), num: 20, },
        Foo{name:"aaa".to_string(), num: 10, }, //dup
        Foo{name:"aaa".to_string(), num: 10, }, //dup
    ];

    vec.sort();

    vec.windows(2).enumerate().for_each(|(i, x)| println!("{}: {:?}", i, x));

}

#[derive(Debug, Eq, Ord, PartialEq, PartialOrd)]
struct Foo {
    name: String,
    num: i32,
}

Hi, is this as fast as binary_search on sorted vector?

I want a faster solution than the following for loop version.

fn main() {
    let mut haystack = vec![
        Foo{name:"aaa".to_string(), num: 10, }, //same
        Foo{name:"bbb".to_string(), num: 20, },
        Foo{name:"ccc".to_string(), num: 30, },
        Foo{name:"aaa".to_string(), num: 10, }, //same
    ];

    haystack.sort();

    let needle = Foo{name:"aaa".to_string(), num: 10, };
    let len = haystack.len();

    for i in 0..len {
        if haystack[i] == needle {
            println!("{}: {:?}", i, haystack[i])
        }
    }
}

#[derive(Debug, Eq, Ord, PartialEq, PartialOrd)]
struct Foo {
    name: String,
    num: i32,
}

for_each() is completely equivalent to a for loop. It doesn't do anything differently. You are changing the behavior by changing the body of the loop (which is the callback closure in this case). You are simply printing every pair of consecutive elements, you are not actually doing anything to check duplicates.

It's always windows(2) because you want to check if the previous and the next element are the same. I don't see how any other number would make sense.

1 Like

Hi, thanks a lot.

I'm poor in English language too. My original intent is to find any specified element in vector. If there're duplicated ones, find out them all too. I'll sort the vector, binary search is preferred than a simple for loop.

// poor for loop version. needed a binary search version.
fn main() {
    let mut haystack = vec![
        Foo{name:"aaa".to_string(), num: 10, }, //same
        Foo{name:"bbb".to_string(), num: 20, },
        Foo{name:"ccc".to_string(), num: 30, },
        Foo{name:"aaa".to_string(), num: 10, }, //same
    ];

    haystack.sort();

    let needle = Foo{name:"aaa".to_string(), num: 10, };
    let len = haystack.len();

    for i in 0..len {
        if haystack[i] == needle {
            println!("{}: {:?}", i, haystack[i])
        }
    }
}

#[derive(Debug, Eq, Ord, PartialEq, PartialOrd)]
struct Foo {
    name: String,
    num: i32,
}

OK, so your goal is not to find all duplicates, but to find all occurrences of a single element using binary search.

That is possible: find an arbitrary element via binary search, and then look before and after it until you find elements that differ in order to discover the whole cluster of repeated elements:

fn main() {
    let mut haystack = vec![
        Foo { name:"aaa".to_string(), num: 10, }, // same
        Foo { name:"bbb".to_string(), num: 20, },
        Foo { name:"ccc".to_string(), num: 30, },
        Foo { name:"eee".to_string(), num: 50, },
        Foo { name:"aaa".to_string(), num: 10, }, // same
        Foo { name:"aaa".to_string(), num: 10, }, // same
    ];

    haystack.sort();

    let needle = Foo { name:"aaa".to_string(), num: 10, };

    if let Ok(idx) = haystack.binary_search(&needle) {
        let (before, after) = haystack.split_at(idx);
        
        for (i, x) in (idx..).zip(after) {
            if *x != needle { break }
            println!("{}: {:?}", i, x);
        }
        for (i, x) in (0..idx).zip(before).rev() {
            if *x != needle { break }
            println!("{}: {:?}", i, x);
        }
    } else {
        println!("not found");
    }
}

Playground

1 Like

Thanks a lot

C++ has lower_bound and upper_bound. Swift has firstIndex. Rust binary_search won't return the first match if there're multiple ones. The zip function looks like python : )

Sounds like you want partition_point.

1 Like

if you want to emulate lower_bound and upper_bound, you can always use binary_search_by_key() and provide a tuple (element, index) as the sorting key; this ensures uniqueness.


Also, if you don't have a sorted slice to begin with, consider performing a partial sort using the select_unstable_by_key() family of methods.

2 Likes

Hi,
This is the function I want.
Thanks a lot.

fn main() {
    let mut haystack = vec![
        Foo{name: "aaa".to_string(), num: 10,},
        Foo{name: "bbb".to_string(), num: 20,}, //dup
        Foo{name: "ccc".to_string(), num: 30,},
        Foo{name: "bbb".to_string(), num: 20,}, //dup
        Foo{name: "bbb".to_string(), num: 20,}, //dup
    ];
    let needle =
        Foo{name: "bbb".to_string(), num: 20,};

    haystack.sort();
    let lower = haystack.partition_point(|x| x < &needle);
    let upper = haystack.partition_point(|x| x == &needle);

    if lower == upper {
        println!("not found");
    }
    for i in lower..upper {
        println!("{}: {:?}, {:?}", line!(), i, haystack[i]);
    }
}

#[derive(Debug, Eq, Ord, PartialEq, PartialOrd)]
struct Foo {
    name: String,
    num: i32,
}