Safe use cases for `PhantomData`

Continuing the discussion from Rust is not as hard as the topics discussed here!:

I file this under Tutorial somewhat arbitrarily. "Exploration of a concept" might be more appropriate. I am sort of curious how many of my prior uses were self-cargo-culted, as I saw a few that could probably be replaced with trait bounds or other approaches. (Maybe I should have filed it under Code Review :wink:.)

Anyway, I think at least some of these are legit, and thought I'd share them. After writing it up, I think there's a lot of

  • Storing (the name of) an unknown type for later use
  • Being able to implement generically over a type that's not owned
  • Some using-generic-impls-sorta-like-macros maneuvers to satisfy coherence

Generic over associated type (without a newtrait)

I wanted something iterator-based for producing multiple types; there was one obvious implementation that covered all the types, but Iterator isn't generic over the produced Item. So:

// (I was also avoiding borrowing for some reason)
pub struct SplitString<I> {
    data: String,
    offset: usize,
    to_type: PhantomData<I>,
}

impl<I> Iterator for SplitString<I> where I: Default + FromStr {
    type Item = I;
    // ...
}

And another similar case (there is no As trait):

pub struct ViaAs<Inner, Frm, To> {
    pub inner: Inner,
    pub convert: PhantomData<(Frm, To)>,
}

// in some macro
        impl<I: Iterator<Item=$fm>> Iterator for ViaAs<I, $fm, $to> {
            type Item = $to;
            fn next(&mut self) -> Option<Self::Item> {
                Some(self.inner.next()? as $to)
            }
        }

// using it elsewhere (usually another generic or macro context...)
for f in ViaAs::<_, _, f64>::new(iter) { /* ... */ }

Generic on-demand types

That's a poor description. I need to store a type for invoking a generic method later. Not an instance of a type, but the type itself, so I can name it.

Hopefully this code sketch makes it more clear:

pub struct Ticket { /* is just metadata for creation of a Tour type */ }

// Tours are created on demand and call a callback for each point in
// the tour (with additional data to be used by the callback)
pub trait Tour {
    fn new(point: Point, ticket: Ticket) -> Self;
    fn tour<F: FnMut(Point, Weight)>(&self, f: F);
}

// Each `Technique` is a ZST
pub trait Visitor<Technique>: Sized {
    // Note how this is generic over `Tour`s
    fn visit<T: Tour>(data: Data<Self>, ticket: Ticket) -> Data<Self>;
}

impl Visitor<SomeTechnique> for SomeDataType {
    fn visit<T: Tour>(data: Data<Self>, ticket: Ticket) -> Data<Self> {
        for point in data {
            let mut output = todo!();
            let tour = T::new(point, ticket);
            tour.tour(|pt, wt| /* some callback modifying output */);
        }
    }
}

Then, elsewhere, if I want to store an action to be invoked later/repeatedly, I use this:

pub struct TravelPlan<Technique, T> {
    _tour: PhantomData<T>,
    _technique: Technique,
    pub ticket: Ticket,
}

And that type implements some "give me data and I'll generate you some new data" trait by calling the appropriate

<DataType as Vistor<Technique>>::visit::<T>(data, self.ticket)

And this way I can implement a new Tour and it can be applied using every Technique, and vice-versa. The Techniques are all ZSTs (it's their implementation that counts), so I store that directly, even though I am similarly only using the type and not the value. The Tours are typically not ZSTs.

Associated item substitute

I had a set of types that could be converted between using a matrix conversion. I attempted to have something like

trait MatrixConvert<X> {
    const MATRIX: [[Value; SZ]; SZ],
}
impl MatrixConvert<SpecificX> for SpecificY { /* ... */ }

But ran into problems I couldn't solve at the time. (Not sure if Rust or myself could solve it now, memory too vague on what the actual problem was. Orphan rules maybe?)

Instead I ended up with something like

pub struct MatrixConvert<T, U> {
    left: PhantomData<T>,
    right: PhantomData<U>,
    matrix: [[Value; SZ]; SZ],
}
// ...
impl Default for MatrixConvert<SpecificX, SpecificY> { /* ... */ }
// ...
impl<X, Y> From<X> for Y where MatrixConvert<X, Y>: Default, /* ... */

More orphan rule schenanigans

This is getting too long for me to confirm this case properly, but I know I've used things like this to dodge the orphan rules before. Very roughly:

// `P` is a phantom in `Struct`
impl<P, T> Trait<P> for Struct<P, T> where Y: SomeOtherTrait<P> { /* ... */ }

// Without the phantom in `Struct`, errors with s.t. like:
//   `SpecificTwo` may some day implement `SomeOtherTrait<SpecificOne>`
impl Trait<SpecificOne> for Struct<SpecificOne, SpecificTwo> { /* ... */ }

(The specific case I'm basing this on has more blanket implementations between multiple traits, so I suspect I didn't really capture the problem here. But it's too complex to want to reproduce right now.)

ZST

I mentioned this one before, but I've used PhantomData where I really just needed ZSTs for storing types (basically). E.g. Kelvin or Celsius. I'm not sure why I didn't just use the ZSTs. Maybe I didn't trust them to be ZSTs? I think today that's what I'd do :man_shrugging:. Maybe I was avoiding having to annotate with Copy bounds or something.

3 Likes

I almost never do unsfe-for-my-own-memory-management, so basically all of the use cases for PhantomData I encountered so far in real code were safe.

The most straightforward one comes up when I am writing a data abstraction/ORM layer. Specifically, when I need to create a Repository or Collection type, that doesn't store the actual instances in itself (since those are logically stored in the DB), but I still need to make it generic over the instance type so that it can give back statically-typed values, not a bunch of JSON, for example. This is what it roughly looks like:

struct Collection<T> {
    /// The backing `MongoDB` collection.
    inner: mongodb::coll::Collection,
    /// Just here so that the type parameter is used.
    _marker: PhantomData<T>,
}

impl<T> Collection<T>
where
    T: Serialize + for<'de> Deserialize<'de>
{
    fn find_by_id(&self, id: ObjectId) -> Result<T, Error> {
        ...
    }
}
6 Likes

I wonder, is it reasonable to use "const generics" as an alternative? Like this:

struct Temp<const UNIT: char>(f64);

fn is_frozen(temp: Temp<'C'>) -> bool {
    temp.0 < 0.0
}

#[test]
fn test() {
    assert_eq!(is_frozen(Temp::<'C'>(-5.0)), true)
}

(Playground)

But I just noticed you cannot use strings as argument. Besides, using a ZST might be cleaner than utilizing a const string (or char) for specifying the type.

I'm not thinking up a lot of benefits unless you utilize the type of the const generic somehow. And you couldn't distinguish implementations by bounds on the const parameter, like you can with a type parameter.

1 Like

I would reach for a ZST in this case because then you can implement traits or hang extra methods on your Celsius and Fahrenheit types (e.g. fn freezing_point() -> f64).

1 Like

I've got one. I created a silly Timer that just sleeps a thread and periodically sends messages of some generic type. The struct doesn't need to store the type, it just needs to know about it. Hence the PhantomData: cartunes/timer.rs at main ยท parasyte/cartunes (github.com)

And usage example for this silly timer is in the update checker: cartunes/updates.rs at main ยท parasyte/cartunes (github.com)

Context: The whole thing just allows timer threads to clean up after themselves quickly (the update checking thread can be stopped by the user at any time) and lets the update checker periodically check for app updates.

edit: Whoops, I didn't mean to reply directly to your post. I am still not used to the Disqus UI.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.