PhantomData<T> vs PhantomData<fn(T) -> T>, what about Send and Sync?

Hi all, I wondered about the following thing regarding PhantomData.

If I include PhantomData<T> in a struct, I understand that if T is !Send or !Sync, my struct will also be !Send or !Sync, respectively.

What if I instead use PhantomData<fn(T) -> T>. Then I get invariance. But how does it behave in regard to Send and Sync? I would say a function that takes a non-Send or non-Sync value isn't non-Send or non-Sync itself?

There is a table in the Rust reference explaining (co/contra)variance in regard to different uses of PhantomData, but it doesn't talk about Send and Sync.

Edit: the section in the reference actually isn't about PhantomData.

Neither does the table in the Nomicon, and additionally, I believe the remark about drop check is outdated (unless you use eyepatch).

Your value will still be both send and sync.

use std::marker::PhantomData;

type T = std::rc::Rc<u8>;

fn assert_send<T: Send>() {}
fn assert_sync<T: Sync>() {}

fn main() {
    assert_send::<PhantomData<fn(T) -> T>>();
    assert_sync::<PhantomData<fn(T) -> T>>();
}
   Compiling playground v0.0.1 (/playground)
    Finished dev [unoptimized + debuginfo] target(s) in 1.42s
1 Like

:scream:

Maybe there should be a note added to these tables? As I (wrongly) used these different "styles" of PhantomData to change variance (only).

P.S.: I just noticed I only used them on lifetimes yet, so it won't be a problem with my previous uses.

Well, that's not about the "styles". PhantomData<T> works "as if" the struct contains T. So, PhantomData<fn(T) -> T> works "as if" the struct contains a function pointer. Function pointers are always Send and Sync, no matter the types involved; so the corresponding PhantomData is Send and Sync, too.

5 Likes

Yeah it makes sense. I still think a note that this affects Send and Sync might be helpful, because it's suggested in the context of using them for variance:

Table of PhantomData patterns

Here’s a table of all the wonderful ways PhantomData could be used:

Phantom type 'a T
PhantomData<T> - covariant (with drop check)
PhantomData<&'a T> covariant covariant
PhantomData<&'a mut T> covariant invariant
PhantomData<*const T> - covariant
PhantomData<*mut T> - invariant
PhantomData<fn(T)> - contravariant
PhantomData<fn() -> T> - covariant
PhantomData<fn(T) -> T> - invariant
PhantomData<Cell<&'a ()>> invariant -

But, it doesn't affect it with function pointers?

It was confusing to me that turning PhantomData<T> to PhantomData<fn(T) -> T> affects more than variance: it will also allow my type to be Send or Sync even if T is not.

The table suggests that the difference is variance only. Of course that's just been a wrong conclusion by me, but I would assume other people might wrongly make the same wrong conclusion when looking at the table. Or not?

Perhaps others would also make the same assumption.

So I decided to open an issue. But I found it was already reported previously:

So basically, the thing is: PhantomData isn't special as in "PD<fn> is treated differently compared to any other PD<T>". Rather, the compositional nature of types (what PD pretends to be, and what happens when a struct contains a fn pointer) gives rise to an idiom that you can use to achieve the specific effect you are observing.

Neither the language nor PhantomData was designed with this single behavior in mind, just like the language is not specifically designed to implement e.g. blockchains, but it happens to be used for that a lot, because its perf and safety characteristics fit the distributed-cryptography-and-database use case.

Yeah, I see that PhantomData<whichever> just makes the struct behave like it contained some field of type whichever. That means there are more consequences than just variance on its type arguments (or allowing unused type arguments syntactically).

In that matter, the documentation isn't wrong. It was me drawing wrong conclusions.

Until now, I had the following strategy: If the compiler complains about an unused type argument T, then just add a PhantomData<T> to the struct. But apparently that's bad! One needs to think more careful about it (e.g. whether it's desired that T: !Send + !Sync will "infect" the struct being !Send and/or !Sync too).

Yes, exactly! For example, when I'm writing strongly-typed database IDs (aka struct Uid<T>(u64)), I'm never blindly slapping a PD<T> onto them, because that potentially makes poor raw integer newtype cease to be Send + Sync, which in turn is a huge pain for downstream code. So yes, you have to carefully consider how your type uses the type parameter, and choose the appropriate idiom.

1 Like

Something which that table doesn't make clear is how to decide which variance is correct for one's own pointer-based struct. AFAICT, it depends on the operations that the struct's public interface allows. Perhaps the docs could be improved in that aspect.

2 Likes

What would you use in your case then? PhantomData<fn(T) -> T>?

I have the following case:

/// Constraints on database (type argument `C` to [`Db`])
pub trait Constraint: 'static {
    /// Duplicate keys allowed?
    const DUPLICATE_KEYS: bool;
}

/// Type argument to [`Db`] indicating unique keys
pub struct KeysUnique;

impl Constraint for KeysUnique {
    const DUPLICATE_KEYS: bool = false;
}

/// Type argument to [`Db`] indicating non-unique keys
pub struct KeysDuplicate;

impl Constraint for KeysDuplicate {
    const DUPLICATE_KEYS: bool = true;
}

/// Database handle
#[derive(Debug)]
pub struct Db<K: ?Sized, V: ?Sized, C> {
    key: PhantomData<fn(K) -> K>,
    value: PhantomData<fn(V) -> V>,
    constraint: PhantomData<fn(C) -> C>,
    backend: ArcByAddr<DbBackend>,
}

/// Pointer types that can be converted into an owned type
pub trait PointerIntoOwned: Sized {
    /// The type the pointer can be converted into
    type Owned;
    /// Convert into owned type
    fn into_owned(self) -> Self::Owned;
}

/// Types that can be stored
pub unsafe trait Storable: Ord + 'static {
    /* … */
    /// Pointer to aligned version of Self
    type AlignedRef<'a>: Deref<Target = Self> + PointerIntoOwned;
    /* … */
}

/// Read-write or read-only transaction
pub trait Txn {
    /// Get reference to value in database
    fn get<K, V, C>(
        &self,
        db: &Db<K, V, C>,
        key: &K,
    ) -> Result<Option<V::AlignedRef<'_>>, io::Error>
    where
        K: ?Sized + Storable,
        V: ?Sized + Storable,
        C: Constraint;
    /// Get owned value from database
    fn get_owned<'a, K, V, C>(
        &'a self,
        db: &Db<K, V, C>,
        key: &K,
    ) -> Result<Option<<<V as Storable>::AlignedRef<'a> as PointerIntoOwned>::Owned>, io::Error>
    where
        K: ?Sized + Storable,
        V: ?Sized + Storable,
        C: Constraint,
    {
        Ok(self.get(db, key)?.map(|x| x.into_owned()))
    }
    /* … */
}

impl<'a> TxnRw<'a> {
    /* … */
    /// Delete all values from database that match a given key
    pub fn delete<K, V, C>(&mut self, db: &Db<K, V, C>, key: &K) -> Result<bool, io::Error>
    where
        K: ?Sized + Storable,
        V: ?Sized + Storable,
        C: Constraint,
    {
        /* … */
    }
}

Edit: Extended code excerpt to cover also get and get_owned methods.

Is PhantomData<fn(T) -> T> the right choice here?

For my database ID, it's PhantomData<fn() -> T>. That makes it covariant, not invariant. Basically, fn() -> T is very similar to T, but instead of being a value, it produces a value.

I'm not immediately sure about your code. I'll have a look later.

I generally think PhantomData is too overloaded, and we need to split it to multiple structs handling variance, dropck, auto traits etc.

2 Likes

I think you could get a lot of the benefits of that for users of PhantomData just with type aliases like these:

type Contravariant<T> = PhantomData<fn(T)>;
type Covariant<T> = PhantomData<fn() -> T>;
type Invariant<T> = PhantomData<fn(T) -> T>;

I would guess that having one underlying thing would be easier for compiler developers, but that is just a guess.

4 Likes

Indeed, but the problem is that nobody bothers to do that as it's easier to just go with PhantomData (there are crates that do these things, but they're not commonly downloaded). I think if it would've been in std people would be using it more.

I’m unaware of a reason why these type aliases couldn’t be added to the standard library.

I am also under the impression that since such a change would mainly consist of documentation, and the only added code would be type aliases, that change could just be a PR to the main Rust repo. That is, the change probably doesn’t need to go through the RFC process.

I’m basing this on the following sentence from the RFC repo's README:

Many changes, including bug fixes and documentation improvements can be implemented and reviewed via the normal GitHub pull request workflow.

IMO the biggest problem is not PhantomData's documentation but the fact that the compiler always suggests it when it encounters an unused generic parameter, even though the compiler itself has no way to know what kind of PhantomData is appropriate (obviously, since the whole point of it is to tell the compiler what it can't figure out on its own). PhantomData is not the best solution to the majority of cases where the error message suggests it, and PhantomData<T> is not the least error-prone suggestion it could make, but that's what it suggests anyway. This leads a lot of programmers to believe they should be using PhantomData a lot more than they really ought to, and discourages people from thinking critically about what it means for a struct to be generic (as opposed to a trait, an impl, or a function).

Adding type aliases wouldn't fix this problem; it would just be more type theory gobbledygook that most people would not bother to understand before blindly following the compiler suggestion.

5 Likes