Working with identity (comparing equality of references/pointers)

I think being able to compare memory addresses (or ranges) for equality can be a helpful tool for a couple of algorithms. And I wouldn't be surprised if some crates already do that in order to solve certain tasks efficiently.

Now if Rust really doesn't give any guarantees, then these (hypothetical) implementations could break in future. Of course, it would be a mistake of those (yet hypothetical) crates to assume you can compare base addresses (or ranges) in order to do identity checks.

I would like if there was any form of statement in the reference that allows to deduce in which cases (if any cases at all) it is possible to use pointer comparison (or wide-pointer comparison, including length) for identity checks, because that could be helpful for a couple of algorithms and would avoid having to add otherwise useless auto-increment integers to data structures.

Even in case Rust decides to not give any guarantees, such a hint could be helpful, and it would motivate me to include runtime-overhead in my code to make my code sound.

If you can't find official documentation or an RFC, it's probably not considered normative / guaranteed, so if you want to be robust, take the conservative route. Rust sometimes gives an explicit non-guarantee (e.g. #[repr(Rust)] layout), but more often just does not give a positive guarantee on topics that are undecided or where flexibility is still desired.

E.g.

Even if addresses are a safe choice for your situation (heap allocated, live, not resized?), and even if you're talking about Sized, non-ZST types, you need to at least keep track of the type / only do this for a restricted set of types (vs. a *const () or usize or whatnot) to ensure identity. A struct and a field inside that struct can have the same address.

4 Likes

Why wouldn't it make sense?

If sizes are known at compile time, they would almost surely get optimized away. If they aren't known at compile time, then again, it is necessary, for example in the case of slices of length 0.

I'm curious what your use case is that could be broken by this? The only one I can imagine is to avoid repeating a slow computation or to avoid an expensive equality check, and neither of these is affected by the kind of optimization you're talking about. If two variables have the same address then their memory contents are identical, so who cares if they may have had different names?

1 Like

You guessed right: My use case is avoiding repeating a computation (or network access). And you are right: In my particular use case, false positives aren't a problem if the memory contents are equal. Thanks for pointing that out! I wasn't aware of that while writing my previous posts.

But there might be other use cases (see below); and there might also be an issue with false negatives (see farther below).

Yeah, it seems to be difficult, but I also think it's rarely needed. Maybe one particular use case could be the Entity-Component-System pattern if all entities have certain data attached to it (e.g. there is a special component that all entities have). Then it could make sense to not use an integer to define the entity, but an allocated piece of memory, where the base address (and Sized type) could be used to identify the entity (e.g. in hash tables), and the allocated piece of memory can be used to store extra information for that entity.

Yes, I was aware of that. In my Playground code above, I use a PhantomData field to keep track of the type (at compile-time).

Edit: And the PhantomData even captures the lifetime of the immutable reference.

That means I can count on the following?

let x = 1;
let y = &x;
let z = &x;
assert_eq!(y as *const _, z as *const _);

Or would the compiler only ensure that while I hold a reference the contents wont move (but there is no guarantee about y and z being equal)?

If y and z may differ, then this would have an impact for my use-case, because extra computation (or even I/O) could happen, where it is not desired.

Because for a particular type, the size is constant.

Slices are !Sized. You are right, if my type is ?Sized, then it is necessary to compare the length. Of course, also the type needs to be the same, as otherwise a struct could be equal to its first (or only) field. As I said, I kept track of the type through PhantomData.

The only remaining issue is with ZSTs or other cases where Rust doesn't give guarantees on pointer

  • stability,
  • uniqueness.

And checking the size won't help me here.

Following the links provided by @quinedot above, the topic indeed seems to be more tricky than it might appear at a first glance.


P.S.:

As pointed out a couple of times in this thread, the issue isn't really limited to ZSTs, as such optimizations could happen with other types too (except where there are explicit guarantees that they don't happen). Thus, I would like to revoke my comment on the Rustonomicon being misleading.

You can count on it.

Topics this fundamental can be hard to find normative citations for (as they've been hashed out pre-RFC-era and there is still no specification), but there does happen to be RFC 68 that talks about it (somewhat indirectly).

2 Likes

I think this one falls under a much broader caveat, though I don't know if it's explicitly written anywhere either: most operations on addresses, even though safe, are not useful in general.

For example, what does this do?

    let (p, q);
    {
        let x = 4;
        p = &x as *const _ as usize;
    }
    {
        let y = 5;
        q = &y as *const _ as usize;
    }
    dbg!(p == q);

It looks like right now it's false, but there's nothing guaranteeing that -- the compiler could notice that x and y have disjoint lifetimes, and use the same storage for them, at which point it could be true. (It might even do this today in async fn's desugaring; I don't recall.)

Or, more specifically, you can construct easy examples of other rules that imply that "different" ZSTs can have the same address -- consider [[(); usize::MAX]; usize::MAX], for example. (Yes, that's a valid type..)

Well, that's why the nomicon calls it exotic :slightly_smiling_face:

But I do think it'd be good to add a note of some kind there about addresses, since a ZST can have the same address as even a non-ZST.

From all your comments, I finally came up with a nicer approach that neither involves unsafe nor restricts usage to Sized types (and also doesn't do any dirty tricks such as coercing a pointer to usize). That's my current approach:

pub struct RefId<'a, T: ?Sized>(pub &'a T);

impl<'a, T: ?Sized> Clone for RefId<'a, T> {
    fn clone(&self) -> Self {
        RefId(self.0)
    }
}

impl<'a, T: ?Sized> Copy for RefId<'a, T> {}

impl<'a, T: ?Sized> PartialEq for RefId<'a, T> {
    fn eq(&self, other: &Self) -> bool {
        (size_of_val(self.0) == 0 && size_of_val(other.0) == 0)
            || (self.0 as *const T == other.0 as *const T)
    }
}

impl<'a, T: ?Sized> Eq for RefId<'a, T> {}

impl<'a, T: ?Sized> Hash for RefId<'a, T> {
    fn hash<H: Hasher>(&self, state: &mut H) {
        if size_of_val(self.0) != 0 {
            (self.0 as *const T).hash(state)
        }
    }
}

It treats zero-sized types as well as empty slices (of the same type) as equal.

As an exercise, I decided to create my very first Rust crate (refid) which provides the above newtype (RefId) for any reference &'a T (with T: ?Sized) that allows identity checks through pointer comparison. Feel free to criticize my code or any metadata, but please also keep in mind that these are my very first steps with crates.io.

Thanks to @H2CO3 for bringing up the issue with comparing regions instead of base addresses. Actually I don't need to manually compare the length as Rust can do that when comparing wide-pointers; but I did include special checks for zero sizes (see the full source, including some tests).

By the way: Coercing a pointer to a usize (which I'm not doing anymore) wouldn't lose the length information because the compiler actually refuses to do that coercion (see Playground).

Also thanks to all others for helping me understanding Rust's behavior better. I still feel like there are some open questions, and I'm sorry if my repeated questions for normative references come off as overly criticizing. I love Rust; I just would like to be sure that my code works as intended, and I feel like when dealing with pointers, some things aren't obvious (and/or undefined yet).

I hope I could express why identity checks (without extra runtime cost) may be a nice thing to have, and maybe I'm not the only one who has stumbled upon this problem. Perhaps my crate can help to solve this gently (e.g. by using HashSet<RefId(K)>).

1 Like

See also GitHub - mbrubeck/by_address: Rust wrapper type that implements hashing and comparison by pointer address (references implement Deref so this generalizes your approach)

2 Likes

So I'm not the only/first one who had this idea :grinning_face_with_smiling_eyes:. Supporting Deref seems to make sense in many cases. I think the by_address crate doesn't handle the zero-size case though? @mbrubeck

(Update: I published a new version 0.0.3 of refid, which also supports smart pointers with Deref, but still differs in regard to empty memory ranges, see below.)


use by_address::ByAddress;
use refid::RefId;

fn main() {
    let v = vec![5, 6, 7];
    assert_eq!(RefId(&v[0..0]), RefId(&v[1..1]));
    assert_ne!(ByAddress(&v[0..0]), ByAddress(&v[1..1]));
}

Maybe treating these two slices as non-equal might have advantages in certain cases though?

Nope, definitely not. The first I ran into it was doing something like RFC: Add a standard trait for getting many &mut to places by Nokel81 · Pull Request #3100 · rust-lang/rfcs · GitHub -- since for ZSTs, overlap checks can't tell if you got duplicates.

This is a valid approach, but may surprise others in niche circumstances, so it would be good to highlight in documentation. Doing something else would surprise a different set of people, so maybe this is a vacuous point. You need the documentation regardless. :slight_smile:

Mmm... not quite I'm afraid. This covers unsized types where the pointer metadata is length (slices and wrappers of slices, such as str and Path and so on). But with a wide pointer to dyn Trait (or a wrapper there-of), the metadata is a vtable pointer, and two pointers to the same type may have different vtable pointers. So it's possible to get some false negatives (e.g. two references to the same instance that underwent unsized coercion in different code generation units).

There's a deeper consideration here that I'll just mention in passing: different types cannot be compared by this crate. So if I coerce a &i32 to a &Debug and a &Display, they can't be compared. Maybe this is what you want, maybe it isn't. There are probably also some complications around dyn upcasting, should that land.

Anyway, since you're generalizing over metadata types, you can't rely on the wide pointer comparison; you'll want to not compare pointer metadata. Use size_of_val to compare the sizes; that will take care of the length of slice-like DSTs. (Everything else has the same size for the same type.)

That leaves the question of how to compare the pointers without comparing metadata. Which leads to...

You can't coerce a wide pointer into a usize, but you can coerce a wide pointer into a thin pointer, and then coerce the thin pointer into a usize.

You don't need usize, but as per above, you do need to get (just) the data pointer. Taking it all together, I think you want something like this:

match (size_of_val(self.0), size_of_val(other.0)) {
    (0, 0) => true,
    (x, y) if x != y => false,
    _ => self.0 as *const T as *const u8 == other.0 as *const T as *const u8
}

We're in the same boat; I love Rust but thinks it needs some more guarantees nailed down. Sadly my impression is this won't improve in the short term :slightly_frowning_face:.

One thing you definitely can't do is read data behind the pointer; it's dangling.

From what I've read in the nomicon and std::ptr::read docs, I don't think trying to read a dangling pointer to a zero sized type is UB. It can't be null or unaligned, but dangling is fine.

1 Like

At least in regard to base addresses, this is also true for non-ZSTs, as @quinedot said earlier:

But I guess that &() as *const _ may still be equal to the pointer of any empty slice (while the pointers of two empty slices can't be equal if the empty slices start/end at different addresses). Anyway, it feels surprising but likely makes sense. A note in the docs might help to reduce confusion, unless the problem is more general anyway.


I did mention it here in the struct's doc. Maybe I should add the note in the impl Eq or impl PartialEq also? How do you usually comment notes on equality: in Eq or PartialEq?

And how can you make yellow or red boxes with an info-symbol (or auto-fullscreen flashing + alarm-sounding boxes :stuck_out_tongue_winking_eye:)? I didn't figure out yet how that works (if it is possible with rustdoc).

Okay, thanks for this info! I didn't know. I will look more into wide-pointers to understand metadata other than size info. However, I think that in some (or even all?) cases it may be useful to treat wide-pointers with vtable pointers as different (because method calls will execute different code, right?) So two values with different metadata aren't "identical" in that case (depends on defintion of "identical", of course).

I just found the section "Memory model" in Rust's reference. :innocent:

I was thinking of these

  • Reference
  • Nomicon
    • (I now think the ZST case is being addressed in the paragraph below, but pretty indirectly IMO; that paragraph could use a rewrite.)

But when following up to this comment found this

Which is more specific and presumably normative.

So thank you for the correction!

Okay, after typing up one response and reading it over to check my assumptions, I think we're running into the deeper consideration I mentioned in passing. Let's look at:

#[derive(Debug)]
#[repr(transparent)]
struct Wrapper(i32);
fn f() {
    let w = Wrapper(0);
    let x: &dyn Debug = &w.0;
    let y: &dyn Debug = &w;
    let z: &dyn Debug = coerce(&w);
    // In some other module:
    // fn coerce(w: &Wrapper(i32)) -> &dyn Debug { w }
}

x and y have the same type, but they have different vtables because they have different (erased) base types. When it comes to comparison, which type do you care about? I think any reasonable person would agree that x and y execute different code. So I think you're saying you want them to compare differently. Let's call this the "base type objective"; we'll come back to it.

Now let's consider y and z. If coerce happened to be compiled in a different codegen unit, y and z may have different vtable pointers, even though they are both the same type, and have the same (erased) base type as well. Do you want these to compare differently? I can't imagine you do, regardless of what you think about x and y. That's what the issue I linked is about. If you rely on "vtable address equality matches base type equality", you will have false negatives.

(One could argue that the different vtables for the same base type also means they "execute different code", technically, but I don't think that's very reasonable in this context. I'll assume you don't care about that.)


So what are the options?

You could give up on the base type objective; that is what my suggestion does. That does mean x and y are considered equal; here, equality means they have the same type, plus the same address and size (or are ZSTs). But they may have different erased types, and thus observable behaviors, in the case of dyn Trait.

You could keep your current code. That means x and y are considered different. But it also means that y and z will occasionally and unpredictably and probably undeterministic-across-compiles compare as different. I have no idea if this is an acceptable trade-off or how common it is. Maybe you just document this gotcha around dyn types and move on. Maybe you limit yourself to Sized types.

Is there a third way that achieves the base type objective? I am not aware of any good way to achieve this. If my impressions of the vtable issue is correct, you could compare vtable contents. But I'm not sure my impressions are correct, it would be unsafe regardless, and it would also rely on non-normative implementation details.

Anything else I've thought of involves limiting the applicability of your data structure pretty severely, as you're basically looking at TypeId and/or dyn Trait downcasting.


Is this a bug or just unfortunate? It's not currently mentioned in the issue (I'll try to follow up later), but wide pointer comparison is the outcome of an RFC. The intention of that RFC was that you could compare base types by vtable address; at least that's my reading, and what others implied in the PR comments. So, I would argue that it's a bug.

Will it be fixed then? I'm afraid this is one of those situations where I'm not too confident it will, based on reading the issue itself; i.e., an area where I don't think Rust's guarantees are improving.

Does this gotcha apply to anything else? Yes, it applies to const and fn pointers too. You can read more in the issue and the things it links to.


(Edits: Minor typos corrected. Additional note, I don't think fn and const are in violation of an RFC; I did follow up and commented about the wide pointer RFC in the issue.)

5 Likes

:scream:

That'd at least be consistent, but I think from a semantic p.o.v., "identity" should be an even stricter requirement than equality. If it behaves differently, it certainly is not the same.

Doesn't sound to be the worst idea. Besides, if pointer comparison behaves weird, it's Rust's fault, not mine :innocent:. Seriously, I think I could just document the possibility of false negatives in case of trait objects. But yeah, it's also not really nice.

Doesn't seem nice to forbid slices, just because of problems with dyn objects.

There is a third type of wide-pointers: "Custom DSTs" as mentioned in the Rustonomicon subsection on DSTs. They are not in the bullet list but mentioned in the end of the subsection. As the pointer metadata is the metadata for the last field in the struct, it doesn't seem to be a special case to consider. However, documentation of the experimental std::ptr::Pointee API states that:

In the future, the Rust language may gain new kinds of types that have different pointer metadata.

Speaking of the unstable APIs in std::ptr, I wonder if it was possible to use [#feature(negative_impls)] and bounds on Pointee::Metadata not being DynMetadata to explicitly remove trait objects from my impls. Both negative_impls and DynMetadata are unstable though.

What needs to be done is to compare the contents of the vtable instead of the pointer to the vtable, right? Again, maybe it's possible to implement that in a crate with the available unstable APIs mentioned above (and impl bounds that treat the DynMetadata case differently). But ultimately, I guess that should be done by Rust when comparing these wide-pointers? That would make pointer comparison a less trivial operation though :slightly_frowning_face:.

The way I see it, you wrote some code to calculate an arbitrary, meaningless number, and you were disappointed that it wasn't some different arbitrary, meaningless number, despite never defining what behavior you actually wanted.

Object identity is an ill-defined concept in Rust. Bits can be freely copied from one memory address to another even for non-Copy types - the compiler may stop you from using the "original" but what does that mean for "identity"? If you move a NonCopyType into a Vec is it still "the same" object? Ownership has been transferred, but outstanding borrows are invalidated, and the address may or may not be the same as you have already observed. Rust doesn't have object IDs like in Python where the language itself keeps track of identity: it's all just bits and bytes when you really boil it down. All rustc cares about is whether the program matches the observable behavior of the (somewhat underspecified) abstract machine, which has no built in notion of "object identity" (at least, none which corresponds to address).

I don't think this really has anything in particular to do with ZSTs being "special"; it's really about incorrectly applying a notion of "identity" to the semantics of pointer comparison. Object identity is a foreign notion to Rust. If you want your objects to have a notion of identity, well, that's fine: but you will have to write some actual code, not just ascribe meaning to addresses, which are essentially arbitrary numbers that only have intrinsic meaning to the compiler and the CPU, neither of which cares about what you think "identity" is.

6 Likes

If you read the entire thread carefully, you will be able to find a short description of my use case (though arguably not a precise definition), and at least two other use cases where the concept of "identity" may be helpful for certain algorithms.

Pointers aren't meaningless in Rust, as there seem to be certain guarantees, as also pointed out in this thread, even if a lot of that is not normative. (But there are good reasons to demand normative descriptions of the behavior.)

Like I said:

A cost can't really be considered "overhead" if it's something you need as an inherent part of whatever problem you intend to solve, can it?

In Python (I'm not sure what language you are most familiar with, so I will continue to use snake language) you pay this cost implicitly. But you pay it all the time, whether you need it or not. So in that language, I would consider the cost of boxing every object "overhead". Python also doesn't have ZSTs so that is perhaps not a 1 to 1 comparison. In Rust you don't pay it unless you need it. But of necessity, if you do need it, you have to write code for it. Is that "overhead"?

1 Like