Newtypes and Phantom Data

Spoooky :ghost:

I'd like to guarantee the benefits of both newtype wrapping and phantom types at the same time, namely to define a "typed key" as shown in this other post.

I can't just do:

pub struct Id<T>(u32);

...since Rust complains about the unused T. Fine in Haskell, but in Rustland we must add a PhantomData field:

pub struct Id<T> {
    id: u32,
    marker: PhantomData<T>,
}

Question: Does this undo the usual runtime advantages of a newtype? Is the second version of Id actually wrapped as a struct, not optimizing away the "pointless" PhantomData?

PhantomData has zero size, there is no runtime overhead to it.

2 Likes

Is there a difference in runtime representation between:

struct Id {
  id: u32
}

and

struct Id(u32);

?

I had always assumed the latter was a "true newtype" while the former was just a normal struct.

They are essentially the same thing, it's just different notation. They are both going to have the same memory representation.

AFAIK technically the layout is not guaranteed to be just one u32, but it's going to be that in reality because there is no other realistic choice.

If you want to guarantee the layout to be identical with u32 (e.g. if you want to use mem::transmute), you can write:

#[repr(transparent)]
struct Id {
    id: u32
}

or

#[repr(transparent)]
struct Id(u32);
6 Likes

Tuple structs are just normal structs but its field names are integer. Field names of the struct never affect the runtime representation of it.

6 Likes

So, a one-field struct will always be a "newtype", regardless of whether that field has a name?

What advantages are you after exactly?

A struct containing a u32 and a PhantomData is likely to be optimized identically to using a u32 directly.

2 Likes

Newtype in Rust is not a specific language feature but a conventional name for a type which wraps another type without additional state. It's not a strictly defined terminology.

Also, struct fields always have a name.

1 Like

Precisely this.

Your choice of PhantomData also influences variance. (Apparently not wrt. drop check like the documentation says any more, but also AFAIK there is no guarantee this won't come back in some form. I can try to dig up a recentish discussion if you want more details.)

No, and you can use #[repr(transparent)] to confirm and emphasize that.

This works: https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=96cb2f4225ee91a88f46a542500e9d11

#[repr(transparent)]
pub struct Id<T> {
    id: u32,
    _phantom: PhantomData<T>,
    _other_1_zsts_too_if_you_want: (),
}

But this doesn't work https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=a36b90151bade80c49a77323a081ef97

#[repr(transparent)]
pub struct Id {
    id: u32,
    _something_not_a_zst: i32,
}

because

error[E0690]: transparent struct needs at most one non-zero-sized field, but has 2
 --> src/lib.rs:3:1
  |
3 | pub struct Id {
  | ^^^^^^^^^^^^^ needs at most one non-zero-sized field, but has 2
4 |     id: u32,
  |     ------- this field is non-zero-sized
5 |     _something_not_a_zst: i32,
  |     ------------------------- this field is non-zero-sized

Tuple structs are sugar for a braced struct, plus the constructor function (& pattern): https://rust-lang.github.io/rfcs/1506-adt-kinds.html#tuple-structs

This is so true that you can even construct them (and match them and such) via braced-struct syntax:

struct Id(u32);

let id = Id { 0: 123 };

https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=6e6f73119fda1329bd2c452b1bcb62b2

9 Likes

I've gone with:

#[repr(transparent)]
#[serde(transparent)]
pub struct ID<T> {
    id: U53,
    // The `const` is a trick to tell the compiler that we don't own anything.
    marker: PhantomData<*const T>,
}

Note that *const T is a poor way of doing this, because it will make your type !Send + !Sync. That will in turn annoy downstream users of the code, because a trivial ID type like that should most likely be Send + Sync.

If you are trying to express "this type does not own a T but it stands in for a T and it doesn't require dropck", use PhantomData<fn() -> T> instead. The full list of such patterns can be found in the Nomicon.

12 Likes

Paranoia is starting to grow... should I, in general in my day-to-day Rusting, be defensively marking things as #[repr(transparent)] on the off-chance that Rust would otherwise decide not to Do The Right Thing?

No.

1 Like

I should hope not. Perhaps I need to talk to the Nomicon.

It depends what your day-to-day is.

In safe code, you should just let repr(rust) do its thing, and not worry about it. That way you'll automatically get new things like Optimizing Rust Struct Size: A 6-month Compiler Development Project | Blindly Coding that make your code more memory-efficient without you needing to do anything but recompile on the newer toolchain.

If you're doing unsafe, though, you need to be super-careful about every assumption on which you're relying. To re-use a previous post, check out #[repr(transparent)] – Why? - #6 by scottmcm

3 Likes

And, of course, if one needs to make assumptions about the layout of a type, #[repr(transparent)] is not enough, and one may even need #[repr(C)]. (That should be the exception, rather than the norm, though.)

1 Like

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.