BTreeSet<&char> dedupes on the char value not the "reference"

Trying to learn Rust but coming from C++ and C# so my mental models need rewiring (trying to unlearn what I have learned) - so I really have 2 questions here.

First, if I have a Vec<char> and char is trivially copyable, why can I not collect direct into another collection of <char> - it has to be <&char>.
Then when I do use <&char> the BTreeSet only keeps one of each char value, which is what I want - but I originally thought it would compare the references which are, I thought/assumed under the hood, "pointers" to the different chars in memory that happen to have to the same value?

use std::collections::BTreeSet;

fn main() {
    
    let data = String::from("abcdae");
    
    let vec = data.chars().collect::<Vec<char>>();
    
    println!("{:?}", vec[0] == vec[4]);
    println!("{:?}", &vec[0] == &vec[4]);
    
    // char is copyable easily so why can't I 
    // let tree = vec.iter().collect::<BTreeSet<char>>(); ?

    let tree = vec.iter().collect::<BTreeSet<&char>>();
    
    println!("{:?}", tree);
}

(Playground)

Output:

true
true
{'a', 'b', 'c', 'd', 'e'}

Errors:

   Compiling playground v0.0.1 (/playground)
    Finished dev [unoptimized + debuginfo] target(s) in 1.44s
     Running `target/debug/playground`

1 Like

That’s only because you’re using generic APIs: iter() is implemented generically for Vec<T> with all types T, including not trivially copyable ones. It thus consistently always produces an iterator over &T values. You are free to add an iterator transformation that does the copying (and only works for copyable types). That would look as follows:

let tree = vec.iter().copied().collect::<BTreeSet<char>>();

The relevant transformation is documented here.


Edit:

BTreeSet uses the comparison operators as defined via the trait Ord. The implementation for references explicitly compares values, not addresses, even though references indeed are using pointers. This means it only exists for references to types which in turn implement Ord, too, but Rust’s trait system makes such conditions very nice to work with.[1] As in your case, it’s very often the desired behavior (and in the cases where it’s not, one can always define an alternative wrapper type that defines an alternative Ord definition.)


  1. For example code like this

    use std::collections::BTreeSet;
    
    struct S;
    
    fn main() {
        [S, S, S].iter().collect::<BTreeSet<&S>>();
    }
    

    Results in a useful and not overly verbose or cryptic error message

    error[E0277]: the trait bound `S: Ord` is not satisfied
     --> src/main.rs:6:22
      |
    6 |     [S, S, S].iter().collect::<BTreeSet<&S>>();
      |                      ^^^^^^^ the trait `Ord` is not implemented for `S`
      |
      = help: the trait `FromIterator<T>` is implemented for `BTreeSet<T>`
      = note: required for `&S` to implement `Ord`
      = note: required for `BTreeSet<&S>` to implement `FromIterator<&S>`
    note: required by a bound in `collect`
     --> /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/core/src/iter/traits/iterator.rs:1891:5
    help: consider annotating `S` with `#[derive(Ord)]`
      |
    3 + #[derive(Ord)]
    4 | struct S;
      |
    
    For more information about this error, try `rustc --explain E0277`.
    

    Unlike, say, template instantiation errors in C++. It even manages to point out an actionable suggestion of adding Ord support to the relevant struct S via a derive. ↩︎

5 Likes

References implement equality by comparing the underlying value, so that's what it will use for a collection of references. Comparing addresses requires explicit calls to std::ptr::eq.

5 Likes

The technical reason is that Vec<T>, BTreeSet<T> etc etc all implement FromIterator<T> (i.e. they can be created by collecting an iterator of Ts) but not FromIterator<&T>. It would not be unreasonable to implement FromIterator<&T> too though, especially considering they implement both Extend<T> and Extend<&T> (the latter only when T: Copy).

It would probably break inference when collecting into a Collection<_>, i.e., when the item type of the collecton is a hole and the item type of the iterator is a reference, since it would then be ambiguous (both impls would apply).

3 Likes