How to declare "Vec<T>" where `T` is only known at runtime?

Consider the enum

enum Value {
     Int(i32),
     String(String),
     List(Vec<"Value">),
}

"Value" because I am trying to express that List is a container whose elements can be Value (i.e. to support List(List(Int)) but that every item of the list is of the same type. I.e. vec![Value::Int(2), Value::String("aa".to_string())] would not compile.

Is this even possible? My current hypothesis is that it is not because we have no mechanism to declare this in the typing system, but I am not very familiar with the theory behind.

This doesn't sound like dynamic typing. Are you sure you can't express the exact type statically? What are you trying to use this for ultimately?

If I'm understanding what you want correctly, you'd need something like an "enum variants as types" feature which has been kicked around quite a lot but has never quite managed to converge on a design.

You can do something similar manually with wrapper structs though.

Playground

struct IntValue(i32);

struct StringValue(String);

// Marker trait so the expected types are required. If you were exposing this as a public API you'd want to seal the trait.
trait ValueContainer {}

impl ValueContainer for IntValue {}

impl ValueContainer for StringValue {}

enum Value<V: ValueContainer> {
    Int(IntValue),
    String(StringValue),
    List(Vec<V>),
}

fn main() {
    // Works
    Value::List(vec![IntValue(1), IntValue(0)]);
    // Fails
    Value::List(vec![IntValue(0), StringValue("String".into())]);
}

You could obviously just implement the trait for the wrapped values instead. Depending on what you're trying to do that might cause issues though.

1 Like

But doesn't this fail on:

Value::List(vec![Value::List(vec![IntValue(0)])]);

(Playground)

But maybe that's not required to work, @jorgecarleitao?

If it's not, then there is atrivial way to solve this:

enum Value {
     Int(i32),
     String(String),
     IntList(Vec<i32>),
     StringList(Vec<String>),
}

Or do I misunderstand something?

2 Likes

That just requires the blanket impl

impl<V> ValueContainer for Value<V> where V: ValueContainer {}

to work again. (Your fix doesn't allow recursive lists-in-lists).

However, this whole dance is not substantially different from just using Vec<i32> or Vec<Vec<String>> in the first place.

1 Like

I meant: If it's not necessary to allow lists-in-lists, then it's trivial to solve the original problem (not the one I showed). Sorry for the confusing wording.

1 Like

I think it is because the compiler needs to know the size of the variable to add it in the stack.
Recursive types like this do not make it possible.

This question is addressed in the doc here: https://doc.rust-lang.org/book/ch15-01-box.html

Then you can use Box to make your variable size deterministic and to point to the data stored on the heap:

#[derive(Debug)]
enum Value {
     Int(i32),
     String(String),
     List(Vec<Box<Value>>),
}

fn main() {
    let aux = Value::List(
        vec![
            Box::new(Value::Int(2)),
            Box::new(Value::String("bb".to_string()))
        ]
    );
    let my_list = Value::List(
        vec![
            Box::new(Value::Int(1)),
            Box::new(Value::String("aa".to_string())),
            Box::new(aux)
        ]
    );
    println!("{my_list:?}");
}

This makes it possible to create List of different types of Value inside.
If you don't want to allow it, you can read: https://doc.rust-lang.org/book/ch17-02-trait-objects.html

No, that's incorrect. There haven't been any directly self-contained (thus infinite-sized) or dynamically-sized types in either of the earlier code snippets in this thread whatsoever. Vec is already Sized and contains indirection, hence creating a recursive type by putting a Vec<Value> inside Value is perfectly fine.

1 Like

It feels like most of the solutions proposed so far are over-complicated. Can't you just have a Vec<Value> variant?

enum Value {
     Int(i32),
     String(String),
     List(Vec<Value>),
}

That's exactly what I do in my VM for OpenSCAD (a dynamically typed language for defining CAD models) and it works fine.

1 Like

Yes thank you, I had not tried Vec<Value>, my bad!
But as said @Michael-F-Bryan, I don't understand why there is a problem with Vec<Value> then...
Because this works fine:

#[derive(Debug)]
enum Value {
     Int(i32),
     String(String),
     List(Vec<Value>),
}

fn main() {
    let aux = vec![
        Value::Int(2),
        Value::String("bb".to_string())
    ];
    let my_list = Value::List(
        vec![
            Value::Int(1),
            Value::String("aa".to_string()),
            Value::List(aux)
        ]
    );
    println!("{my_list:?}");
}

That's exactly the problem. The desired data structure should apparently disallow heterogeneous lists. I.e., your last example is expected to fail to compile, but not for the reason of unsized/dynamically sized/infinitely-sized types, but because it tries to put an Int a String and a List in the same list.

2 Likes

Can that sort of thing be represented in any language?

It feels impossible unless you introduce some level of type erasure so Value doesn't contain an infinite number of items. For example, you might have a List(List) variant where List tracks the element type at runtime and provides a way to downcast to different types.

1 Like

This works:

struct Value(Box<dyn ValueType>);

trait ValueType {}
impl ValueType for i32 {}
impl ValueType for String {}
impl<V: ValueType> ValueType for Vec<V> {}
2 Likes

The key point, I think, is that Value doesn't implement ValueType. If you start enumerating the types that do, you get:

  • i32
  • String
  • Vec<i32>
  • Vec<String>
  • Vec<Vec<i32>>
  • Vec<Vec<String>>
  • Vec<Vec<Vec<i32>>>
  • Vec<Vec<Vec<String>>>
  • etc...
1 Like

Yeah, nevermind, it does work.

Thanks a lot for all the answers!

@tczajka, that would require to know all the possible values of T at compile time; how do we support arbitrarily nested at runtime with the Value you provided?

The background of this question is that some data formats only support homogeneous (although nested) list types. Usually, we have a enum of the types, such as:

enum Type {
    Int32,
    String,
    List(Box<Type>), // a list of homogeneous type `Type`
}

In this situation, we usually receive a file with some representation of this types (e.g. in flatbuffers, protobuf, etc.), together with a byte representation of the actual data (e.g. parquet, avro, arrow).

Say we receive a file with type Type. How would we support the arbitrary nestness of what it represents in the system you proposed?

struct Value(Box<dyn ValueType>);

trait ValueType {}
impl ValueType for i32 {}
impl ValueType for String {}
impl<V: ValueType> ValueType for Vec<V> {}

In other words, under this system, how do we compile the following?

enum Type {
    Int32,
    String,
    List(Box<Type>), // a list of homogeneous type `Type`
}

struct Value(Box<dyn ValueType>);

trait ValueType {}
impl ValueType for i32 {}
impl ValueType for String {}
impl<V: ValueType> ValueType for Vec<V> {}

fn deserialize_i32(bytes: &mut &[u8]) -> Result<i32, String> {
    todo!()
}

fn deserialize_string(bytes: &mut &[u8]) -> Result<String, String> {
    todo!()
}

fn deserialize(bytes: &[u8], type_: &Type) -> Result<Value, String> {
    let bytes = &mut bytes; // so we can move the pointer
    match type_ {
        Type::Int32 => deserialize_i32(bytes).map(|x| Value(Box::new(x) as Box<dyn ValueType>)),
        Type::String => deserialize_string(bytes).map(|x| Value(Box::new(x) as Box<dyn ValueType>)),
        Type::List(inner) => {
            deserialize_list(bytes, inner).map(|x| Value(Box::new(x) as Box<dyn ValueType>))
        }
    }
}

fn deserialize_list(bytes: &mut &[u8], type_: &Type) -> Result<Value, String> {
    let (length, remaining) = bytes.split_at(4);
    *bytes = remaining;
    let length = u32::from_le_bytes(length.try_into().unwrap());
    match type_ {
        Type::Int32 => (0..length)
            .map(|_| deserialize_i32(bytes))
            .collect::<Result<Vec<i32>, String>>()
            .map(|x| Value(Box::new(x) as Box<dyn ValueType>)),
        Type::String => (0..length)
            .map(|_| deserialize_string(bytes))
            .collect::<Result<Vec<String>, String>>()
            .map(|x| Value(Box::new(x) as Box<dyn ValueType>)),
        Type::List(inner) => (0..length)
            .map(|_| ???)
            .collect::<Result<Vec<???>, String>>()
            .map(|x| Value(Box::new(x) as Box<dyn ValueType>)),
    }
}

note how the ??? would need to recursively call deserialize_list, but this can't happen because deserialize_list returns the untyped Value. The alternative would be to write deserialize_list<T: ValueType>(...) -> Result<Vec<T>, String>, but then we do not know what generic to call on the function deserialize.

This is what I meant with "only known at runtime" in the question.

Then you can't use the type system to express it. The type system is a compile time concept, at the end of compilation there is always a finite set of types in the program.

Just use Vec<Value> and enforce the homogeneity constraint in your public interface, i.e. deserialize will count the depth and error out if the depths are different.

Is it helpful to consider a bound of T: Deserialize ?

I.e. use serde. I'm pretty sure that handles types only known at runtime using generics. It just needs to know how to deserialize the type.

I'll see if I can get a playground working, and report back with any problems!

Update: ran into a blocker for me. I forgot that trait objects can't be deserialized. Serde's Deserialize trait is not object safe. Looks like people are able to make Serialize object safe but not Deserialize.

Okay, so what you want to do is dynamically validate the homogeneity of arbitrary types. You are not going to be able to do that at the type level, but you can approximate it at runtime.

What you can do is allow Box<dyn ValueType> to represent an arbitrary tree of heterogeneously-typed values in order to allow writing recurrsive functions. Then your deserializers will perform the validation, and you can return an opaque type that could represent a heterogeneous list, but it never does (because validation doesn't let it be constructed like so).

Here is the updated playground:

pub struct Value(Box<dyn ValueType>);

trait ValueType {}
impl ValueType for i32 {}
impl ValueType for String {}
impl<V: ValueType> ValueType for Vec<V> {}
impl<V: ValueType> ValueType for [V] {}
impl ValueType for Box<dyn ValueType> {}

pub fn deserialize(mut bytes: &[u8], type_: &Type) -> Result<Value, String> {
    deserialize_impl(&mut bytes, type_).map(Value)
}

fn deserialize_impl(bytes: &mut &[u8], type_: &Type) -> Result<Box<dyn ValueType>, String> {
    match type_ {
        Type::Int32 => deserialize_i32(bytes).map(|x| Box::new(x) as _),
        Type::String => deserialize_string(bytes).map(|x| Box::new(x) as _),
        Type::List(elem_type) => deserialize_list(bytes, elem_type).map(|x| Box::new(x) as _),
    }
}

fn deserialize_i32(bytes: &mut &[u8]) -> Result<i32, String> {
    todo!()
}

fn deserialize_string(bytes: &mut &[u8]) -> Result<String, String> {
    todo!()
}

fn deserialize_list(bytes: &mut &[u8], elem_type: &Type) -> Result<Vec<Box<dyn ValueType>>, String> {
    let (length, remaining) = bytes.split_at(4);
    *bytes = remaining;

    let length = u32::from_le_bytes(length.try_into().unwrap());

    (0..length)
        .map(|_| deserialize_impl(bytes, elem_type))
        .collect()
}
1 Like