Modifying a generic definition of an enum widely used in the codebase

I need to modify a generic enum in a pretty large codebase, the enum is defined as

#[derive(Stuff)] 
pub enum Element<Out> {
    Item(Out),
    Timestamped(Out, Timestamp),
    Watermark(Timestamp),
    Terminate,
}

The problem is the Timestamp which is often unused as the Watermarked and Timestamped variants are only used by certain types, but it increases the enum size.
The enum is widely used as there is an Operator trait that is key to the implementation

pub trait Operator<Out: Clone + Send + 'static> {
    fn next(&mut self) -> Element<Out>;
    // ...
}

I'm trying to find a way to refactor the enum to allow Timestamp to be generic, either by adding a generic to Element<Out, Ts> that I can set to () if the ts is unused, or having the Out type in the Element<Out> wrapped in a struct like Item<T> for non timestamped items and an enum Timestamped<T, Ts> for those with timestamps and have different implementations depending on the type.

Is there a way to refactor the type without manually changing every fn and impl signatures using the type?

Are most of these signatures using Element<Foo> for concrete types Foo, or are they usually generic funcitons fn …<Out, …> using Element<Out>?


If many are using concrete types, and what you said about

is consistently true, then an option would be to codify this dependency via some trait

trait HasTimestampInfo {
    type TimestampTy;
}

and implement this accordingly with

impl HasTimestampInfo for Foo {
    type TimestampTy = Timestamp;
}

for those types Foo that do use these enum variants, and

impl HasTimestampInfo for Bar {
    type TimestampTy = ();
}

that don’t. Or maybe even nicer, use something uninhabited like Infallible.

impl HasTimestampInfo for Bar {
    type TimestampTy = Infallible;
}

Some overhead, of course, in the form of these impls, in case many types are affected, but depending on your usage, this might be reasonable.

Then Element becomes

pub enum Element<Out: HasTimestampInfo> {
    Item(Out),
    Timestamped(Out, Out::TimestampTy),
    Watermark(Out::TimestampTy),
    Terminate,
}

As hinted at above, generic code would still need to adapt; it could no longer be fn foo<Out>(x: &Element<Out>) or the like, but would need to get Out: HasTimestampInfo bounds.

Also, if any generic code wants to correctly handle the timestamp-containing variants, and possibly even get an actual Timestamp value out in those cases, you can e.g. create a conversion

use std::convert::Infallible;
impl From<Infallible> for Timestamp {
    fn from(x: Infallible) -> Timestamp {
        match x {}
    }
}

and add appropriate bounds to TimestampTy, e.g.

trait HasTimestampInfo {
    type TimestampTy: Copy + Into<Timestamp>;
}

(example usage).


If this is not useful for your application, please share more information about what your most common usage sites look like (i.e. the ones so numerous you don’t want to refactor them).

Thank you for the answer!

Unfortunately, the usage is almost always generic, this enum is being used as a wrapper around generic elements in a dataflow processing crate. Essentially the generic Out type is like the Item type in std iterators and the crate is defining many different generic operators, like map, fold, group_by.

The timestamped variants are only used after an add_timestamp operator has attached a timestamp (or a watermark) to each element in the flow.
Most operators (like map) essentially do not care about the timestamp and modify the Out component, keeping the ts unchanged in the current implementation.

Ideally what I would like to achieve is to have generic implementations that ignore the timestamp, but have a way to work with the timestamps when the elements have them, but this is problematic, since if i implement

impl<T: Data> Operator<T> for Foo<T> { /* ... */ }

A specific impl for an hypothetical Timestamped<T> or T that implements HasTimestamp that I can use to interact with the timestamp content

impl<T: Data + HasTimestamp> Operator<T> for Foo<T> { /* ... */ }

Would overlap with the generic implementation.

I wouldn't mind having to rewrite the operators that make use of the timestamps (and/or removing the variant from the enum and the consequent match arms), but a big change in the signature (e.g. adding another generic type to Element) would require changing the signature of hundreds of generic impl and fn blocks and I wouldn't know how to automate it.

I know that trying to "override" the impl is not idiomatic and the compiler is trying to stop me from doing that, so I'm wondering what is the proper way to achieve this.

Here is a slighlty modified example of how the enum is used in the crate

impl<In: Data, Out: Data, F, Prev> Operator<Out>
    for Map<In, Out, F, Prev>
where
    F: Fn(In) -> Out + Send + Clone,
    Prev: Operator<In>,
{
    fn next(&mut self) -> Element<Out> {
        match self.prev.next() {
            Element::Item(item) => Element::Item(f(item)),
            Element::Timestamped(item, ts) => Element::Timestamped(f(item), ts),
            Element::Watermark(w) => Element::Watermark(w),
            Element::Terminate => Element::Terminate,
        }
    }
    // ...
}

While for example a windowing operator needs to keep track of the latest timestamp it has received and attach it to its output if the elements have timestamps

yeah, the compiler won’t improve of this pattern. The approach that would fix the problem is to make the property “does this type have a timestamp” a property recorded in the Data trait itself.

If this was used only for behavior of code, a const HAS_TIMESTAMP: bool in the Data trait could work nicely; it would even allow to set a default, const HAS_TIMESTAMP: bool = false so that many trait implementations would not be affected.

However, since this information is supposed to be used for defining types, not just behavior, a const won’t be usable, as far as I’m aware. It would still be possible to encode this choice directly either

  • by putting the type type TimestampTy directly into Data, requiring implementors to choose Timestamp or Infallible accordingly, or
  • by using a type-level encoding of the boolean, e.g. like this, which might be nicer to read in the impls, and could be extended with e.g. with logical operators if necessary[1]

The thing that they can’t do is supporting a default “value”, at least until we get #![feature(associated_type_defaults)] finished and stabilized; so all implementations of Data would need to be updated. But at least, that’s an easy find-and-replace operation to do in bulk, in case only a few ones need a Timestamp/True value (or some logical operation), which would then subsequently need to be implemented manually.


  1. which then might in turn, when actually used, require some value-level conversions, e.g. like this… I can only guess whether or not something like this would come up. It’s slightly unwieldy to write, I’ll admit, but much should be possible. ↩︎

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.