Working with identity (comparing equality of references/pointers)

I think this one falls under a much broader caveat, though I don't know if it's explicitly written anywhere either: most operations on addresses, even though safe, are not useful in general.

For example, what does this do?

    let (p, q);
    {
        let x = 4;
        p = &x as *const _ as usize;
    }
    {
        let y = 5;
        q = &y as *const _ as usize;
    }
    dbg!(p == q);

It looks like right now it's false, but there's nothing guaranteeing that -- the compiler could notice that x and y have disjoint lifetimes, and use the same storage for them, at which point it could be true. (It might even do this today in async fn's desugaring; I don't recall.)

Or, more specifically, you can construct easy examples of other rules that imply that "different" ZSTs can have the same address -- consider [[(); usize::MAX]; usize::MAX], for example. (Yes, that's a valid type..)

Well, that's why the nomicon calls it exotic :slightly_smiling_face:

But I do think it'd be good to add a note of some kind there about addresses, since a ZST can have the same address as even a non-ZST.

From all your comments, I finally came up with a nicer approach that neither involves unsafe nor restricts usage to Sized types (and also doesn't do any dirty tricks such as coercing a pointer to usize). That's my current approach:

pub struct RefId<'a, T: ?Sized>(pub &'a T);

impl<'a, T: ?Sized> Clone for RefId<'a, T> {
    fn clone(&self) -> Self {
        RefId(self.0)
    }
}

impl<'a, T: ?Sized> Copy for RefId<'a, T> {}

impl<'a, T: ?Sized> PartialEq for RefId<'a, T> {
    fn eq(&self, other: &Self) -> bool {
        (size_of_val(self.0) == 0 && size_of_val(other.0) == 0)
            || (self.0 as *const T == other.0 as *const T)
    }
}

impl<'a, T: ?Sized> Eq for RefId<'a, T> {}

impl<'a, T: ?Sized> Hash for RefId<'a, T> {
    fn hash<H: Hasher>(&self, state: &mut H) {
        if size_of_val(self.0) != 0 {
            (self.0 as *const T).hash(state)
        }
    }
}

It treats zero-sized types as well as empty slices (of the same type) as equal.

As an exercise, I decided to create my very first Rust crate (refid) which provides the above newtype (RefId) for any reference &'a T (with T: ?Sized) that allows identity checks through pointer comparison. Feel free to criticize my code or any metadata, but please also keep in mind that these are my very first steps with crates.io.

Thanks to @H2CO3 for bringing up the issue with comparing regions instead of base addresses. Actually I don't need to manually compare the length as Rust can do that when comparing wide-pointers; but I did include special checks for zero sizes (see the full source, including some tests).

By the way: Coercing a pointer to a usize (which I'm not doing anymore) wouldn't lose the length information because the compiler actually refuses to do that coercion (see Playground).

Also thanks to all others for helping me understanding Rust's behavior better. I still feel like there are some open questions, and I'm sorry if my repeated questions for normative references come off as overly criticizing. I love Rust; I just would like to be sure that my code works as intended, and I feel like when dealing with pointers, some things aren't obvious (and/or undefined yet).

I hope I could express why identity checks (without extra runtime cost) may be a nice thing to have, and maybe I'm not the only one who has stumbled upon this problem. Perhaps my crate can help to solve this gently (e.g. by using HashSet<RefId(K)>).

1 Like

See also GitHub - mbrubeck/by_address: Rust wrapper type that implements hashing and comparison by pointer address (references implement Deref so this generalizes your approach)

2 Likes

So I'm not the only/first one who had this idea :grinning_face_with_smiling_eyes:. Supporting Deref seems to make sense in many cases. I think the by_address crate doesn't handle the zero-size case though? @mbrubeck

(Update: I published a new version 0.0.3 of refid, which also supports smart pointers with Deref, but still differs in regard to empty memory ranges, see below.)


use by_address::ByAddress;
use refid::RefId;

fn main() {
    let v = vec![5, 6, 7];
    assert_eq!(RefId(&v[0..0]), RefId(&v[1..1]));
    assert_ne!(ByAddress(&v[0..0]), ByAddress(&v[1..1]));
}

Maybe treating these two slices as non-equal might have advantages in certain cases though?

Nope, definitely not. The first I ran into it was doing something like https://github.com/rust-lang/rfcs/pull/3100 -- since for ZSTs, overlap checks can't tell if you got duplicates.

This is a valid approach, but may surprise others in niche circumstances, so it would be good to highlight in documentation. Doing something else would surprise a different set of people, so maybe this is a vacuous point. You need the documentation regardless. :slight_smile:

Mmm... not quite I'm afraid. This covers unsized types where the pointer metadata is length (slices and wrappers of slices, such as str and Path and so on). But with a wide pointer to dyn Trait (or a wrapper there-of), the metadata is a vtable pointer, and two pointers to the same type may have different vtable pointers. So it's possible to get some false negatives (e.g. two references to the same instance that underwent unsized coercion in different code generation units).

There's a deeper consideration here that I'll just mention in passing: different types cannot be compared by this crate. So if I coerce a &i32 to a &Debug and a &Display, they can't be compared. Maybe this is what you want, maybe it isn't. There are probably also some complications around dyn upcasting, should that land.

Anyway, since you're generalizing over metadata types, you can't rely on the wide pointer comparison; you'll want to not compare pointer metadata. Use size_of_val to compare the sizes; that will take care of the length of slice-like DSTs. (Everything else has the same size for the same type.)

That leaves the question of how to compare the pointers without comparing metadata. Which leads to...

You can't coerce a wide pointer into a usize, but you can coerce a wide pointer into a thin pointer, and then coerce the thin pointer into a usize.

You don't need usize, but as per above, you do need to get (just) the data pointer. Taking it all together, I think you want something like this:

match (size_of_val(self.0), size_of_val(other.0)) {
    (0, 0) => true,
    (x, y) if x != y => false,
    _ => self.0 as *const T as *const u8 == other.0 as *const T as *const u8
}

We're in the same boat; I love Rust but thinks it needs some more guarantees nailed down. Sadly my impression is this won't improve in the short term :slightly_frowning_face:.

One thing you definitely can't do is read data behind the pointer; it's dangling.

From what I've read in the nomicon and std::ptr::read docs, I don't think trying to read a dangling pointer to a zero sized type is UB. It can't be null or unaligned, but dangling is fine.

1 Like

At least in regard to base addresses, this is also true for non-ZSTs, as @quinedot said earlier:

But I guess that &() as *const _ may still be equal to the pointer of any empty slice (while the pointers of two empty slices can't be equal if the empty slices start/end at different addresses). Anyway, it feels surprising but likely makes sense. A note in the docs might help to reduce confusion, unless the problem is more general anyway.


I did mention it here in the struct's doc. Maybe I should add the note in the impl Eq or impl PartialEq also? How do you usually comment notes on equality: in Eq or PartialEq?

And how can you make yellow or red boxes with an info-symbol (or auto-fullscreen flashing + alarm-sounding boxes :stuck_out_tongue_winking_eye:)? I didn't figure out yet how that works (if it is possible with rustdoc).

Okay, thanks for this info! I didn't know. I will look more into wide-pointers to understand metadata other than size info. However, I think that in some (or even all?) cases it may be useful to treat wide-pointers with vtable pointers as different (because method calls will execute different code, right?) So two values with different metadata aren't "identical" in that case (depends on defintion of "identical", of course).

I just found the section "Memory model" in Rust's reference. :innocent:

I was thinking of these

  • Reference
  • Nomicon
    • (I now think the ZST case is being addressed in the paragraph below, but pretty indirectly IMO; that paragraph could use a rewrite.)

But when following up to this comment found this

Which is more specific and presumably normative.

So thank you for the correction!

Okay, after typing up one response and reading it over to check my assumptions, I think we're running into the deeper consideration I mentioned in passing. Let's look at:

#[derive(Debug)]
#[repr(transparent)]
struct Wrapper(i32);
fn f() {
    let w = Wrapper(0);
    let x: &dyn Debug = &w.0;
    let y: &dyn Debug = &w;
    let z: &dyn Debug = coerce(&w);
    // In some other module:
    // fn coerce(w: &Wrapper(i32)) -> &dyn Debug { w }
}

x and y have the same type, but they have different vtables because they have different (erased) base types. When it comes to comparison, which type do you care about? I think any reasonable person would agree that x and y execute different code. So I think you're saying you want them to compare differently. Let's call this the "base type objective"; we'll come back to it.

Now let's consider y and z. If coerce happened to be compiled in a different codegen unit, y and z may have different vtable pointers, even though they are both the same type, and have the same (erased) base type as well. Do you want these to compare differently? I can't imagine you do, regardless of what you think about x and y. That's what the issue I linked is about. If you rely on "vtable address equality matches base type equality", you will have false negatives.

(One could argue that the different vtables for the same base type also means they "execute different code", technically, but I don't think that's very reasonable in this context. I'll assume you don't care about that.)


So what are the options?

You could give up on the base type objective; that is what my suggestion does. That does mean x and y are considered equal; here, equality means they have the same type, plus the same address and size (or are ZSTs). But they may have different erased types, and thus observable behaviors, in the case of dyn Trait.

You could keep your current code. That means x and y are considered different. But it also means that y and z will occasionally and unpredictably and probably undeterministic-across-compiles compare as different. I have no idea if this is an acceptable trade-off or how common it is. Maybe you just document this gotcha around dyn types and move on. Maybe you limit yourself to Sized types.

Is there a third way that achieves the base type objective? I am not aware of any good way to achieve this. If my impressions of the vtable issue is correct, you could compare vtable contents. But I'm not sure my impressions are correct, it would be unsafe regardless, and it would also rely on non-normative implementation details.

Anything else I've thought of involves limiting the applicability of your data structure pretty severely, as you're basically looking at TypeId and/or dyn Trait downcasting.


Is this a bug or just unfortunate? It's not currently mentioned in the issue (I'll try to follow up later), but wide pointer comparison is the outcome of an RFC. The intention of that RFC was that you could compare base types by vtable address; at least that's my reading, and what others implied in the PR comments. So, I would argue that it's a bug.

Will it be fixed then? I'm afraid this is one of those situations where I'm not too confident it will, based on reading the issue itself; i.e., an area where I don't think Rust's guarantees are improving.

Does this gotcha apply to anything else? Yes, it applies to const and fn pointers too. You can read more in the issue and the things it links to.


(Edits: Minor typos corrected. Additional note, I don't think fn and const are in violation of an RFC; I did follow up and commented about the wide pointer RFC in the issue.)

5 Likes

:scream:

That'd at least be consistent, but I think from a semantic p.o.v., "identity" should be an even stricter requirement than equality. If it behaves differently, it certainly is not the same.

Doesn't sound to be the worst idea. Besides, if pointer comparison behaves weird, it's Rust's fault, not mine :innocent:. Seriously, I think I could just document the possibility of false negatives in case of trait objects. But yeah, it's also not really nice.

Doesn't seem nice to forbid slices, just because of problems with dyn objects.

There is a third type of wide-pointers: "Custom DSTs" as mentioned in the Rustonomicon subsection on DSTs. They are not in the bullet list but mentioned in the end of the subsection. As the pointer metadata is the metadata for the last field in the struct, it doesn't seem to be a special case to consider. However, documentation of the experimental std::ptr::Pointee API states that:

In the future, the Rust language may gain new kinds of types that have different pointer metadata.

Speaking of the unstable APIs in std::ptr, I wonder if it was possible to use [#feature(negative_impls)] and bounds on Pointee::Metadata not being DynMetadata to explicitly remove trait objects from my impls. Both negative_impls and DynMetadata are unstable though.

What needs to be done is to compare the contents of the vtable instead of the pointer to the vtable, right? Again, maybe it's possible to implement that in a crate with the available unstable APIs mentioned above (and impl bounds that treat the DynMetadata case differently). But ultimately, I guess that should be done by Rust when comparing these wide-pointers? That would make pointer comparison a less trivial operation though :slightly_frowning_face:.

The way I see it, you wrote some code to calculate an arbitrary, meaningless number, and you were disappointed that it wasn't some different arbitrary, meaningless number, despite never defining what behavior you actually wanted.

Object identity is an ill-defined concept in Rust. Bits can be freely copied from one memory address to another even for non-Copy types - the compiler may stop you from using the "original" but what does that mean for "identity"? If you move a NonCopyType into a Vec is it still "the same" object? Ownership has been transferred, but outstanding borrows are invalidated, and the address may or may not be the same as you have already observed. Rust doesn't have object IDs like in Python where the language itself keeps track of identity: it's all just bits and bytes when you really boil it down. All rustc cares about is whether the program matches the observable behavior of the (somewhat underspecified) abstract machine, which has no built in notion of "object identity" (at least, none which corresponds to address).

I don't think this really has anything in particular to do with ZSTs being "special"; it's really about incorrectly applying a notion of "identity" to the semantics of pointer comparison. Object identity is a foreign notion to Rust. If you want your objects to have a notion of identity, well, that's fine: but you will have to write some actual code, not just ascribe meaning to addresses, which are essentially arbitrary numbers that only have intrinsic meaning to the compiler and the CPU, neither of which cares about what you think "identity" is.

6 Likes

If you read the entire thread carefully, you will be able to find a short description of my use case (though arguably not a precise definition), and at least two other use cases where the concept of "identity" may be helpful for certain algorithms.

Pointers aren't meaningless in Rust, as there seem to be certain guarantees, as also pointed out in this thread, even if a lot of that is not normative. (But there are good reasons to demand normative descriptions of the behavior.)

Like I said:

A cost can't really be considered "overhead" if it's something you need as an inherent part of whatever problem you intend to solve, can it?

In Python (I'm not sure what language you are most familiar with, so I will continue to use snake language) you pay this cost implicitly. But you pay it all the time, whether you need it or not. So in that language, I would consider the cost of boxing every object "overhead". Python also doesn't have ZSTs so that is perhaps not a 1 to 1 comparison. In Rust you don't pay it unless you need it. But of necessity, if you do need it, you have to write code for it. Is that "overhead"?

1 Like

I also have a use case for this. I am implementing modular arithmetic, and each number refers to a (bignum) modulus. I want to check that the numbers being added refer to the same modulus, without having to compare all the many bits.

When you're comparing two references, then they haven't been invalidated (I guess... or at least they still existed just before the comparison).

Thanks for sharing it. I think there are really a couple of use cases, and I'm apparently not the only one who attempts to use pointers to solve this efficiently, as the above mentioned by_address crate does something similar:

From the README:

Rust wrapper type that implements hashing and comparison based on address rather than value.

I understand it might or might not be a bad idea to do this with (current) Rust as normative references on memory behavior may be missing, but I don't think it's bad to discuss this issue, and I don't think pointer values are (or should be) per se arbitrary or meaningless.

Yes, it is important that a reference exists in order for the value of the pointer to have a "meaning". If I don't make a huge mistake here, this should be ensured both by the refid and by_address crate (in case of refid, the newtype depends on 'a of the inner reference &'a T, and in case of by_address, the newtype depends on T which also can have a lifetime other than 'static and thus limits the lifetime of by_address<T>. (Please correct me if I'm wrong.)

There are still open questions about when or when not pointers can be equal. But I don't think that these open questions make the overall idea useless or unworthy to talk or theorize about.

1 Like

As per the issue and it's dupes, you'll be in good company :slight_smile:. Though with my suggestion, it just transforms it into the possibility of false positives in case of trait objects of both structs and their contents when they have the same size (which seems even more niche).

I've often wished for a NotDyn bound. Sized is a poor substitute. To base it off of DynMetadata, you would need to be able to test for associated type non-equality, which isn't quite the same thing... though maybe implementing with associated type X could be considered an opt-out for any other type Y, and T: Trait<Assoc != Y> could be sugar for T: Trait + !Trait<Assoc=Y> or something.

I think that's a ways off though. Note that negative bounds are not implemented on unstable yet, even. (I think coherence is in the process of landing in nightly.) And disjoint associated type coherence is delayed until the broader issue of mutually exclusive traits is tackled.

Those the wrappers-around like str that I mentioned. Their metadata is the same though, yep.

As for future possibilities, they'll have to work at least somewhat similarly due to the existence of methods like size_of_val. Unless we get a new Sized-like implicit-auto-trait.

Some comments implied that, but I don't actually know that the contents of the vtable are the same, so I can't answer this definitively. Another possibility is to unify the vtables when linking. There are trade-offs between compile time and run time cost.

That's a valid view, and one that motivated the acceptance of the RFC. And I'm not arguing against it, but there are other valid views.

Incidentally, you're throwing out behavior in the ZST case.

#[derive(Debug)] struct S;
#[derive(Debug)] struct T;

let (s, t) = (S, T);
// Maybe the same address
let v: Vec<&dyn Debug> = vec![&s, &t];

#[repr(transparent)] #[derive(Debug)] struct U(T);
// Definitely the same address
let u = U(T); // &u, &u.0

You may also have a bad time with "identity" due to optimization. Consider:

    let const I: i32 = 0;
    let i = 0;
    let a: &dyn Debug = &i;  // Base is i32 (stack address)
    let b: &dyn Debug = &&i; // Base is &i32 (stack address)
    let c: &dyn Debug = &a;  // Base is &dyn Debug (stack address)
    let e: &dyn Debug = &I; // Base is i32 (unstable address)
    let d: &'static dyn Debug = &0; // Base is i32 (static address)

These all "behave the same" but compare differently. Is it okay if optimizations make some of them compare the same?

I'm not really making a point here, other that pointing out it's probably always going to be a "best effort" type of situation.

(Though if you want all things that "behave the same" to compare the same, you'll have a worse time due to the halting problem :slight_smile:)

You are right (I think) that testing the associated type is different from negative impls. However, there is a way (with Rust nightly) to implement a NotDyn trait. That is because there are only three concrete types used as pointer metadata:

We can thus create a trait NotDynMetadata as follows:

trait NotDynMetadata {}
impl NotDynMetadata for () {}
impl NotDynMetadata for usize {}

And then, using #![feature(ptr_metadata)], we can easily define our NotDyn trait:

trait NotDyn {}
impl<T> NotDyn for T
where
    T: ?Sized,
    <T as Pointee>::Metadata: NotDynMetadata,
{
}

What do you think? I made a small example on Playground for testing it.

Yeah, I mistook negative impls for negative bounds. The first is just a guarantee for non-implementation. But luckily, we don't seem to need negative impls or negative bounds for an implementation of NotDyn, as shown above.

Hmmm, okay. Either way, it would make pointer comparison more complex.

Not every value that behaves the same should be considered identical. I stated the opposite: Every value that is identical should behave the same. (Otherwise, there would be the issue with the halting problem indeed :grinning_face_with_smiling_eyes:.)

If you are right that let x = 1; let y = &x; let z = &x; (or let z = y as a more relaxed case) ensures that y as *const _ == z as *const _, then it is possible to create two values (*y and *z) that are "identical" in the sense that they share the same memory location. Then, working with wrappers like RefId or ByAddress makes sense with today's Rust.

Of course, that won't mean that we will know in all cases whether two values are identical, but we would know at least in some cases (and in the future, those cases will hopefully be explicitly mentioned in the reference or other normative documentation).

1 Like

I noticed that the repository of by_address lists 2.0.0 as last version, while the most recent crate on crates.io, has version is 1.0.4 (actually 2.0.0 has been yanked, I just saw). Does anyone know why?


Update: Looks like version 2.0.0 was meant to perform a double-dereference on deref (returning T::Target instead of T), but the method body wasn't changed (see changeset). Not sure what that means. Deref always confuses me :sweat_smile:.

Changing only the type but not the method body works precisely because of deref coercion. If your method is declared to return &<T as Deref>::Target, then if your method body actually returns a &T, it will be converted to &<T as Deref>::Target.

2 Likes