In some program, I want to treat references retrieved from an Iterator
(or Stream
) differently, depending on whether a reference occurs multiple times. I learned that I can use std::ptr::eq
to compare references (not the values they refer to) for equality.
ptr::eq::<T>
just coerces its arguments to a *const T
and then compares the pointers for equality.
Likewise, it seems possible to coerce references to pointers and store them in a HashSet
, such that it is possible to check if a reference has been processed previously:
use std::collections::HashSet;
struct S {}
fn count_distinct<'a>(items: impl IntoIterator<Item = &'a S>) -> usize {
let mut count = 0;
let mut memorized: HashSet<*const S> = HashSet::new();
for item in items {
if memorized.insert(item as *const S) {
count += 1;
}
}
count
}
fn main() {
let s1 = S {};
let s2 = S {};
let s = vec![&s1, &s1, &s1, &s2, &s2];
println!("Distinct count: {}", count_distinct(s));
}
This code seems to run fine, as it creates the following output:
Distinct count: 2
The code also seems to be correct because the references are immutable and should live at least as long as the iterator does. While count_distinct
is executed, it should not be possible to drop any values behind the references, thus the pointers will be valid until count_distinct
returns. Note that no unsafe code is needed, because I don't need to dereference any pointer but only compare them (in the HashSet
).
When I try to convert the code to async, weird things happen:
use std::collections::HashSet;
struct S {}
async fn count_distinct<'a>(
items: impl IntoIterator<Item = &'a S>,
) -> usize {
let mut count = 0;
let mut memorized: HashSet<*const S> = HashSet::new();
for item in items {
if memorized.insert(item as *const S) {
count += 1;
}
}
count
}
#[tokio::main]
async fn main() {
let s1 = S {};
let s2 = S {};
let s = vec![&s1, &s1, &s1, &s2, &s2];
println!("Distinct count: {}", count_distinct(s).await);
}
This code executes with the following output:
Distinct count: 1
My first question is: Is my code in the first example (non-async) correct or do I miss something that makes the code unsafe or causes undefined behavior?
My second question is: Can anyone explain what's going on in the second example? Why does it return a count of 1 instead of 2?
Another issue is that *const _
is !Send
, which means any futures containing such pointers are also !Send
. A workaround is to coerce the pointer to an usize
and use a HashSet<usize
>, as a usize
can be sent to different threads, but that means I lose type information. I could also implement my own type storing the pointer and make it Send
using unsafe
:
#[derive(Clone, Copy, Eq, PartialEq, Hash)]
struct SId(*const S);
unsafe impl Send for SId {}
However, using unsafe
doesn't seem right, as the code really isn't unsafe. Maybe the best way to go is this:
#[derive(Clone, Copy, Eq, PartialEq, Hash)]
struct SId(usize);
impl SId {
fn new(s: &S) -> Self {
SId(s as *const S as usize)
}
}
But that also feels like a lot of code just for the purpose of working around *const _
not being Send
. Or is Rust protecting me from doing bad stuff for my own good?
See also: Shouldn’t pointers be Send + Sync? Or