Vec of enums of same variant?

In our codebase, we have the following type:

pub enum ScalarValue {
    Bytes(Vec<u8>),
    Str(String),
    Int(i64),
    Uint(u64),
    F64(f64),
    Counter(i64),
    Timestamp(i64),
    Cursor(OpId),
    Boolean(bool),
    Null,
}

We want to create another type that is similar to a Vec<ScalarValue>, but that enforces the constraint that all of its members are of the same type.

To do that, we've done the following:

pub enum ScalarValues {
    Bytes(Vec<Vec<u8>>),
    Str(Vec<String>),
    Int(Vec<i64>),
    Uint(Vec<u64>),
    F64(Vec<f64>),
    Counter(Vec<i64>),
    Timestamp(Vec<i64>),
    Cursor(Vec<OpId>),
    Boolean(Vec<bool>),
    // length only (nulls cannot differ)
    Null(usize),
}

One issue we've run into is, how to implement methods on this wrapper type in a way that isn't super repetitive.

For example, right now, len is implemented like this:

    pub fn len(&self) -> usize {
        match self {
            ScalarValues::Null(len) => *len,
            ScalarValues::Bytes(xs) => xs.len(),
            ScalarValues::Str(xs) => xs.len(),
            ScalarValues::Int(xs) => xs.len(),
            ScalarValues::Uint(xs) => xs.len(),
            ScalarValues::F64(xs) => xs.len(),
            ScalarValues::Counter(xs) => xs.len(),
            ScalarValues::Timestamp(xs) => xs.len(),
            ScalarValues::Cursor(xs) => xs.len(),
            ScalarValues::Boolean(xs) => xs.len(),
        }
    }

get is similarly repetitive. Note that get returns an Option<ScalarValue> since this type, from the outside, behaves just like/very close to a Vec<ScalarValue>.

Our append method is also similarly repetitive, although here the repetitiveness is more justified since we need to check whether the incoming ScalarValue matches the current variant of ScalarValues:

    /// Add a ScalarValue to a ScalarValues
    pub fn append(&mut self, v: ScalarValue) -> Result<(), InvalidMultiSetValues> {
        Ok(match (self, v) {
            (ScalarValues::Bytes(xs), ScalarValue::Bytes(x)) => xs.push(x),
            (ScalarValues::Str(xs), ScalarValue::Str(x)) => xs.push(x),
            (ScalarValues::Int(xs), ScalarValue::Int(x)) => xs.push(x),
            (ScalarValues::Uint(xs), ScalarValue::Uint(x)) => xs.push(x),
            (ScalarValues::F64(xs), ScalarValue::F64(x)) => xs.push(x),
            (ScalarValues::Counter(xs), ScalarValue::Counter(x)) => xs.push(x),
            (ScalarValues::Boolean(xs), ScalarValue::Boolean(x)) => xs.push(x),
            (ScalarValues::Null(xs), ScalarValue::Null) => *xs += 1,
            (values, v) => {
                return Err(InvalidMultiSetValues::MixedTypes(
                    format!("{:?}", values),
                    v.to_string(),
                ))
            }
        })
    }

I was wondering if there are more idiomatic ways to handle this issue. I was thinking maybe something could be done where I use a library like strum to generate a ScalarValueKind enum and then have a struct of the form:

struct ScalarValues {
    vec: Vec<???>, // not sure what goes here
    kind: ScalarValueKind
}

impl ScalarValues {
    pub fn append(&mut self, v: ScalarValue) {
        if std::mem::discriminant(self.kind) == std::mem::discriminant(v.as_kind()) {
             // somehow append the value
        }
    }

but this is pretty hazy.

Does anyone have thoughts on this?

Update: I realized one obvious solution that I've been missing here is:

struct ScalarValues {
    vec: Vec<ScalarValue>,
    kind: ScalarValueKind
}

impl ScalarValues {
     fn append(&mut self, v: ScalarValue) {
         // do the check here
     }
}

This solution feels a bit slow though since we have to store a useless Vec<ScalarValue::Null> (when in the above example we optimize it to a single integer).

1 Like

I wouldn't do a ScalarValues. I'd make newtypes:

pub struct Bytes(Vec<u8>);
pub struct MyString(MyString);
pub struct Int(i64);

pub enum ScalarValue {
    Bytes(Bytes),
    Str(MyString),
    Int(Int)
    // ...
}

...then you can just pass around Vec<Bytes> and friends around directly.

we have to store a useless Vec<ScalarValue::Null> (when in the above example we optimize it to a single integer).

That's not the case if you define a pub struct Null;. because it doesn't take up any space (it's a ZST, "zero sized type"), neither does a Vec<Null> need any space to store them.

What is missing from your post is why ScalarValues is a thing. If you are trying to abstract over its members or reuse code there are probably better ways to do that (like traits).

2 Likes

Thanks! This is an interesting idea. I'm a bit confused on the value of using newtypes to write Vec<Bytes> versus Vec<Vec<u8>>. Is this to save characters?

Here's an explanation of why we have a ScalarValue type and a ScalarValues type. Our library operates on very dynamic data. We might have a list that contains an integer, string, and then another integer. Since this is not representable in native rust, we use the ScalarValue type. (This is not exactly what is happening, but I think it's a useful enough simplification).

Sometimes, we want to enforce the invariant/condition that a Vec<ScalarValue> has ScalarValues which are all of the same variant.

That is why we have the ScalarValues type.

Enum variants are not proper types (the type of a variant is the type of it's parent enum). So you can't enforce a specific variant at the type level; you can't make Vec<ScalarValue> all have the same variant.

But by using new types, you can have a type-level distinction between a Vec<Int>, Vec<Counter>, and Vec<Timestamp>, say. A Vec<Timestamp> can only have Timestamps in it. That way, you don't need an enum-where-every-variant-is-a-Vec to distinguish between the variants.

You're basically using new types to give every variant it's own type.

1 Like

Oh! I (think) I understand this. My question is: "why not just do Vec<Vec<u8>> or Vec<i64> and also do

enum ScalarValue {
     I64(i64),
    Bytes(Vec<u8>),
}

Apologies if this is obvious, I get the sense I'm missing something here.

we want to enforce the invariant/condition that a Vec<ScalarValue> has ScalarValue s which are all of the same variant.

If I understand you correctly you don't really care which ScalarVariant you have, just that they're all the same one.

Your "obvious solution" is a newtype for enforcing exactly that :wink:, but a proper Vec<Timestamp> would be better because it's enforced "at a deeper level".

For example, say that you need to deal with different distance units.

One way would be to have an enum:

pub enum Distance{
    Meters(f64),
    Miles(f64),
}

The other way would be newtypes:

pub struct Meters(f64)
pub struct Miles(f64)

The disadvantage of the former is that you need runtime checks to check which variant of the enum you are dealing with. A big advantage of the latter is that if you get it wrong you find out because your code doesn't compile, not in production.

For a dramatic example of this, this is how spacecraft have been lost. This probably wouldn't have happened if they'd had newtypes.

1 Like

Is your Vec<i64> full of Ints, Counters, Timestamps, something else, a mix of all the above? If you want to enforce a distinction, the new type will do so (with the additional benefit of being more semantically meaningful).

Your best bet here is to make a simple macro_rules! macro that will repeat the match arms for every variant.