Working with identity (comparing equality of references/pointers)

jbe · November 3, 2021, 1:34pm

I think so too. Otherwise the following code should also not be able to compile:

let mut val = 42; 
let ptr1 = &mut val as *mut i32; 
let ptr2 = &mut val as *mut i32;

Anyway, let val_after = val_before; also performs a copy, so it's no surprise that the pointers aren't equal, and it's not what @tczajka referred to:

I would assume (and hope) it can't change.

See also @quinedot's post above.

H2CO3 · November 3, 2021, 1:53pm

I get that, but it is exactly my point that this is not relevant, since we were talking about smart pointers and passing on ownership while having a pointer to an object. With regular references, that is not possible, so that simply does not apply here.

jbe · November 3, 2021, 1:57pm

That makes me realize there is a method called Rc::ptr_eq, which basically is that sort of "identity" comparison, but limited to this particular smart pointer. Maybe there could instead be a trait to generalize this? Like:

trait Identity {
    fn ptr_eq(&self, other: &Self) -> bool;
}

But that doesn't help us to add these to a HashMap or HashSet. I guess we'd rather need something like a method that extracts the (stable) address.

On the other hand, I don't think smart pointers are the only use case where "identity" makes sense. For example when taking slices from a Vec, it should also be safe to compare the pointers (apart from the zero-size case).

trentj · November 3, 2021, 3:35pm

You keep saying "safe", but what you actually mean is "does what I mean". Comparing pointers is always safe. The problem with "does what I mean" is that it's wholly contextual. I can imagine situations where comparing pointers to elements of a Vec is a perfectly reasonable thing to do and other situations where it makes no sense at all. In particular, if you're storing pointers somewhere (like in a HashSet) it becomes extremely relevant whether the objects behind the pointers might have been moved or mutated since the pointer was taken.

I can also readily imagine situations where comparing pointers to ZSTs is a reasonable thing to do and the notion of "identity" being applied is completely tolerant of the fact that all ZSTs are identical. In fact, by your own admission earlier in the thread, your use case is one of these situations! So to have a meaningful discussion about what it means to compare pointers to elements of a Vec, or any other two pointers for that matter, we need to establish what notion of identity is applicable. At the moment it still seems to me like you are attached to an idea of object identity that is imported from another language, and you imagine that everybody knows what you mean by identity and shares the same expectations about what comparing pointers should do, when that is simply not the case. This is why I asked earlier about whether "identity" is preserved across moves. The answer may be yes or no - neither is wrong. But which answer you want depends on what you're using it for; we can't have a meaningful discussion about object identity in the abstract.

jbe · November 3, 2021, 3:37pm

I attempted to write down a definition of "identity":

Two references refer to identical values if all operations which do not involve pointer arithmetic (including pointer comparison) with any of the two references behave exactly as if they were done with the respective other reference.

(If that is still not precise enough, I'd be happy if anyone else could phrase it in better words.)

The opposite isn't true, i.e. two references may refer to non-identical values even if all operations on one reference behave exactly as if they were done with the other.

This definition of identity, along with a fast implementation of identity checks, could allow us to do certain optimizations in a couple of algorithms when it's known that some values are identical.

It should be noted that the definition alone doesn't let us decide if two references refer to identical values. There could be different implementations fulfilling the definition. I assume that my implementation of RefId in version 0.1.0 (see source) fulfills the above definition of identity. (I'd be happy if someone could confirm this or give a counter-example.)

However, also a trivial implementation of an identity check that always returns false would fulfill the definition. Such an implementation of an identity check would be faster than Eq for sure, but it would be pretty much useless in practice.

jbe · November 3, 2021, 3:38pm

Sorry for using a term that is used differently in Rust. I should have used a different wording.

As for the rest you wrote, see my previous post where I attempt to make a definition of "identity" to clarify what I (or we) are talking about.

Actually, my original idea of "identity" has already been "defined" by the topic of this thread:

Working with identity (comparing equality of references/pointers)

Which would basically mean I defined identity through pointer comparison. During the discussion, there were questions rised whether that's useful or not, and whether pointer comparison is suitable in which cases.

My new definition in my previous post is stricter in the sense that v[0..0] is allowed to be "identical" to v[1..1], even though &v[0..0] as *const _ != &v[1..1] as *const _.

But I think I finally get your point. You assume that I imply the existence of some unique attibute such as an "object identity" that is preserved in some way. I don't think I did (but maybe I did somewhere?). I guess such an attribute could be generated by using Box::pin, though I'm not even entirely sure if that guarantees uniqueness! Anyway, perhaps it helps to not talk about "identity" as an attribute, but about an operation ("identity comparison") between references that follow the definition of my last post.

I do think I'm flexible enough to not apply everything I do in other languages 1:1 to Rust, but I'm still in the process of learning semantics, guarantees, and behavior of Rust, and discussions like these can help me a lot. I would appreciate if I wouldn't have to feel all time like I'm in the wrong or doing something wrong, or should just learn things the Rust way. Besides, in some regards Rust is incomplete, e.g. regarding async traits. I think there should be a welcoming atmosphere when a newcomer like me wishes to do something, whether it's identity checks, self-referential structs, or asynchronous traits. (And yeah, I know self-referential structs won't work with safe Rust. But I do believe identity checks (regarding my provided definition) can make sense and are possible in today's Rust. Or am I wrong here?)

trentj · November 3, 2021, 7:03pm

I entered this thread with perhaps a needlessly antagonistic tone. Most people (including me) come to Rust with faulty assumptions based on languages they already know and it is important to explode those assumptions. It's not wrong to not know something. My comments earlier in the thread were not meant to make you feel bad about yourself but to challenge some of those latent assumptions which you might not even know you have. For my tone which made you feel attacked, I apologize.

That said, I don't think my continued participation can lend any further clarity to the technical issues under discussion here.

jbe · November 3, 2021, 7:07pm

You are certainly right that I also might sometimes be tempted to apply things learned in other languages too quickly to Rust. So it's alright if you emphasize that I shouldn't do that. I'm sorry if I was too sensitive about it. I'd like to thank you anyway for your input and for your warnings that are important to keep in mind!

Feel free to comment again whenever you like, and sorry if I was a bit sensitive.

jbe · November 5, 2021, 9:20am

I think Rc does in practice because in the implementation of Rc::new, we find:

box RcBox { strong: Cell::new(1), weak: Cell::new(1), value }

The RcBox isn't a ZST, even if value is. However, I'm still not sure if there are guarantees in the reference that this really results in a unique pointer also in future, depending on how the value is used and which future optimizations might be added to the compiler. (That's why I'm keep coming back to ask questions about guarantees in the reference or other normative documentation all time.)

Box::new doesn't allocate if the argument is a ZST (see documentation of Box::new). This isn't explicitly stated in the documentation of Box::pin, but also applies under the current implementation. Thus Box::pin does not generate a unique address in all cases. Consider:

let b1 = Box::pin(()); 
let b2 = Box::pin(()); 
assert_ne!(
    &*b1 as *const _,
    &*b2 as *const _
); // fails

(Playground)

But compare with Rc:

let rc1 = Rc::new(()); 
let rc2 = Rc::new(()); 
assert_ne!(
    &*rc1 as *const _,
    &*rc2 as *const _
); // passes
}

(Playground)

I noticed that RefId fails my intended definition of identity checks, because RefId(rc1) == RefId(rc2) while rc1 and rc2 are certainly not interchangable.

Apparently, my "hack" to treat zero-sized pointees differently causes undesired behavior (according to my own definition) in case of Rcs (or Arcs) because two distinct Rc<()> values will be consiered equal even if they work with different counters.

What to learn from all of this or what to do about it? I'm not sure. Let's keep in mind that pointer comparisons for "identity" or "cheap equality" checks are done in real-life Rust (including the Rust compiler itself). (Yet another use-case seems to have been discussed in this thread, but please correct me if I'm wrong here and that is a different case.) To back up this statement, also consider that there exists a method Rc::ptr_eq in the standard library, even though the standard library doesn't seem to provide a method to store Rc's in a hash map using that method-implemented equality relation (ptr_eq).

The reason why Rc<()> behaves different than Box<()> (or Pin<Box<()>>) is because of two things:

Rc::as_ptr (which is used by Rc::ptr_eq) will calculate the pointer using ptr::addr_of_mut! on a non-zero-sized struct (RcBox) that is defined as #[repr(C)] in the source of Rc.
Boxes seem to always allocate if the inner type is non-zero-sized. (Which might or might not be guaranteed by any normative documentation? I'd really like to know.) And they never allocate if the inner type is zero-sized.

Note that ByAddress uses Deref, which in turn will also use as_ptr if I understand the source correctly. Thus ByAddress can be used to compare the identity of two Rcs or Arcs (while RefId can't if the inner type T is zero-sized).

I wonder if it would make sense to introduce a new trait like Identity or something like that, which could be implemented for Rc and Arc, and maybe for a couple of other types as well.

(Edit: Sorry, I made a copy&paste mistake on my laptop and removed the duplicated parts. Currently on the road here.)

(Edit #2: Added: "And they never allocate if the inner type is zero-sized.")

jbe · November 5, 2021, 4:26pm

Not sure if anyone is interested in this thread yet, but if there is, I wanted to let you know that I finally put all the input into a new approach to allow "identity" comparison.

I defined such a new "Identity" trait, and I named it refid::Id. For references to Sized values or slices, it will check if the reference points to the same memory range, whereas empty memory ranges are always considered equal. (For dyn objects it behaves a bit weird yet (due to the reasons explained in this thread), but I included a warning note in the documentation.) Rcs (or Arcs) are the same if they share the same reference counter (i.e. if they are clones). That works now also in cases where the inner type is a zero-sized type because I added a specific implementation of Id for Rc<T> and Arc<T>.

The downside is, it won't work on any other smart pointers without adding an implementation for those. On the other hand, that might be wise, because we won't know how other smart pointers might act in regard to the &*x as *const _ operation, especially if zero-sized types are involved.

system · February 3, 2022, 4:27pm

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.

Topic		Replies	Views
Object Identity help	11	2415	January 12, 2023
Can somene review my learning exercise? code review	9	623	October 7, 2022
Have almost made my struct hashable! :) help	9	599	January 21, 2021
Understanding differences in references help	18	1966	January 12, 2023
Set operations, results and references help	5	671	March 3, 2021

Working with identity (comparing equality of references/pointers)

Related Topics