BTreeSet<char> union produces BTreeSet<&char>

Hi, folks.

I'm new to Rust and the community and having trouble understanding what's going on here. I wanted to write some code like this:

use std::collections::BTreeSet as Set;

pub fn union_all(set: Set<Set<char>>) -> Set<char> {
    set.iter().fold(Set::new(), |acc, s| acc.union(s).collect())
}

This makes sense to me, but the compiler complains:

error[E0277]: a collection of type `std::collections::BTreeSet<char>` cannot be built from an iterator over elements of type `&char`
 --> src/lib.rs:4:55
  |
4 |     set.iter().fold(Set::new(), |acc, s| acc.union(s).collect())
  |                                                       ^^^^^^^ a collection of type `std::collections::BTreeSet<char>` cannot be built from `std::iter::Iterator<Item=&char>`
  |
  = help: the trait `std::iter::FromIterator<&char>` is not implemented for `std::collections::BTreeSet<char>`

I can make the compiler happy by adding in .map(|&c| c) like this:

use std::collections::BTreeSet as Set;

pub fn union_all(set: Set<Set<char>>) -> Set<char> {
    set.iter().fold(Set::new(), |acc, s| acc.union(s).map(|&c| c).collect())
}

But I'm not understanding why this is needed. The acc and s values should both be type Set<char>, but somehow acc.union(s).collect() is producing Set<&char>. I followed the definitions of union and collect around and can't see why this is happening. Could someone lend me some guidance?

And if there's anything else I'm doing here that looks just wrong, please let me know. :smile:

This method returns Union<'a, T>, which impls Iterator<Item=&'a T>

Thanks! Even after you said that it took me a while to find where Item was defined as &'a T.

Would you say my code is the correct idiomatic approach or is there a better way to deal with this situation? I thought about using Set<&char> instead to avoid the extra map, but that doesn't seem to be very clean since you shouldn't need to use &char usually.

Since current code doesn't reuse any of previously allocated sets, set.into_iter().flatten().collect() should be better.

2 Likes

Consider whether you want union_all to accept set: Set<...> or set: &Set<...>.

  • The former forces the caller to either relinquish control of their set, or give you a clone of it, but then you can use @Hyeonu's approach which is simpler.

  • The latter gives the caller more flexibility, but then you need to use .cloned() (or equivalently .map(|&c| c)).

1 Like

Huh, there doesn't appear to be any (BTreeSet, BTreeSet) -> BTreeSet function available, I was expecting impl BitAnd<BTreeSet> for BTreeSet at least. Maybe there's just no way to make this any more efficient than the naive implementation of adding every element again.

In case there is some kind of optimization available, the function that gives the BTreeSet the most choice in how to implement the merging is append which can be used like

use std::collections::BTreeSet as Set;

pub fn union_all(set: impl IntoIterator<Item = Set<char>>) -> Set<char> {
    let mut iter = set.into_iter();
    let first = iter.next().unwrap_or_else(Set::new);
    iter.fold(first, |acc, s| { acc.append(s); acc })
}

.map(|&c| c) is actually equivalent to .copied() (i.e., both require the bound T : Copy, more restrictive than the T : Clone bound required for .cloned()). In other words, .copied() and .cloned() are only equivalent when T : Copy (such as with T = char).
(Just pointing this out to be 100% clear for future readers)

Since we are talking about unions, I guess you meant BitOr, right?

It would be good to see which of .append(&mut s) and .extend(s) is more efficient. If it is the latter, we are back to @Hyeonu's flattening suggestion, except it can be improved to reuse the first allocation:

use std::collections::BTreeSet as Set;

pub
fn union_all (
    sets_iterable: impl IntoIterator<Item = Set<char>>,
) -> Set<char>
{
    let mut sets_iterator = sets_iterable.into_iter();
    if let Some(mut first_set) = sets_iterator.next() {
        first_set.extend(sets_iterator.flatten());
        first_set
    } else {
        Default::default()
    }
}

Oh wow, I didn't know copied existed!

It does ... in nightly :sweat_smile: we will have to wait in stable before we get it, but my comment was aimed to be a future reference :slight_smile:

Can you explain why this is true?

Thanks all for the help here. I’ve got some more reading to do. :slight_smile:

Yep...


append takes both self and other by unique reference, and also says

Moves all elements from other into Self, leaving other empty.

Which basically means it really takes other by value (I'm not sure why the API is designed how it is, if you look into the implementation you see that it essentially does mem::replace(other, BTreeSet::new()) so really should just be taking other by value).

The other unioning methods (union and the BitOr implementation) take both self and other by shared reference, so they cannot mutate either BTreeSet and must instead build a new BTreeSet and clone the values into it. Skimming the source for append it doesn't do much optimisation based on this, the only thing it handles better than union is not cloning the elements (which shouldn't really matter for char) and a shortcircuit for the case when one of the two sets is empty.

One other thing that only append exploits is the fact that both incoming values are already sorted, it uses a linear time method to build the tree instead of just using the normal insertion. (BTreeSet::union and BitOr could exploit this as well, but don't, flatten and extend and other generic iterator methods can't).

The downside of using append is that it is basically building a new set for each incoming set, whereas using flatten().collect() or extend will only build a single new set and insert all the elements into it. Probably the most optimal way to do this would be to do what append does with a sorted merging iterator, but with all the incoming sets put into the iterator at once instead of two at a time. Unfortunately BTreeMap doesn't expose its from_sorted_iter method that would be needed to try benchmarking this.

1 Like

Heh, while looking into the implementation of append I realised that reusing the first allocation for it is useless, it already has an optimization to just return the full tree when one is empty. Whereas it might actually be useful for extend here.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.