How to determine the type from type_id?

In polars I want to map a struct column into an actual Rust struct, but doing that requires handling the struct column as a vector of impl Scalar that are only capable of casting to any and then rely on you to downcast_ref to the concrete type.

But the problem is the concrete types are not obvious, so I have to successfully guessed the proper type to use for downcast_ref<T>.

How can I know the type? Essentially I'm in situation where I only know the following:

The String fields have:
TypeId: (16029866226928567356, 7151959947073222850)
ArrowDataType: Utf8View
Debug formatted sample: Scalar(Some("rank"))

The integer fields have:
TypeId: (13743848030338613455, 2905191465459971113)
ArrowDataType: Int32
Debug formatted sample: PrimitiveScalar { value: Some(4), data_type: Int32 }

So given only these pieces of information, how can know what type to use to downcast_ref correctly?

I've been guessing the type, but it always returns None...

Full Example:

use polars::prelude::*;

let outcome = df!(
    "a" => ["a", "b"],
    "b" => [1i32, 2i32],
).unwrap().lazy().select([as_struct(vec![all()]).alias("info")]).collect().unwrap()
    .column("info")?
    .struct_()?.clone()
    .into_series().iter()
    .map(|s| match s {
        AnyValue::Struct(length, entries, field) => {
            println!("field: {field:?}");
            let q = entries.into_iter().map(|n| match n {
                Some(inner) => inner.into_iter().map(|scal| {

                    let s_any = scal.as_any();
                    let t = s_any.type_id();
                    let na = "NA".to_owned();
                    let guess = s_any.to_owned().downcast_ref::<String>().unwrap_or(&na);

                    let dt = scal.data_type();
                    println!(" casted: {guess:?} datatype: {dt:?} scalar: {scal:?}");
                    guess.to_owned()
                }).collect::<Vec<_>>(),
                _ => panic!("")
            }).collect::<Vec<_>>();

            q
        },
        a => {
            panic!("")
        }
    })
    .collect::<Vec<_>>();

println!("outcome in rust: {outcome:?}");

console output:

field: [Field { name: "a", dtype: String }, Field { name: "b", dtype: Int32 }]
casted: "NA" datatype: Utf8View scalar: Scalar(Some("a"))
casted: "NA" datatype: Int32 scalar: PrimitiveScalar { value: Some(1), data_type: Int32 }
casted: "NA" datatype: Utf8View scalar: Scalar(Some("b"))
casted: "NA" datatype: Int32 scalar: PrimitiveScalar { value: Some(2), data_type: Int32 }
field: [Field { name: "a", dtype: String }, Field { name: "b", dtype: Int32 }]
casted: "NA" datatype: Utf8View scalar: Scalar(Some("a"))
casted: "NA" datatype: Int32 scalar: PrimitiveScalar { value: Some(1), data_type: Int32 }
casted: "NA" datatype: Utf8View scalar: Scalar(Some("b"))
casted: "NA" datatype: Int32 scalar: PrimitiveScalar { value: Some(2), data_type: Int32 }
outcome in rust: [[["NA", "NA"], ["NA", "NA"]], [["NA", "NA"], ["NA", "NA"]]]

type_id is the result of a one-way hash, so there’s no way to extract any information from it. Instead, you have to get the id of some known type for comparison— downcast is doing exactly this behind the scenes.

In your case, if simply guessing fails, you can either search the documentation/internet for examples that show you the concrete types that Arrow is providing or dive into the Arrow source code itself to figure out what concrete type was used to create the dyn Any is created in the first place.

Ultimately, though, your code is probably going to need to check several different type options in sequence to figure out which one you actually have, for example:

if let Some(s) = s_any.downcast_ref::<String>() { … }
else if let Some(i) = s_any.downcast ref::<i32>() { … }
else if let Some(x) = …
   …
else {
    panic!(“Unsupported type!”);
}
2 Likes

ooof..! I am so frustrated by not being able to figure out the type. Thank you for telling me what I feared most - that it's brute force alone finding the right type.

If you're going to list all the types exhaustively anyway, maybe use an enum instead? The discriminant will be much smaller than TypeId.

2 Likes

You are probably abusing the whole mechanism, then. The way to use polymorphism and dynamic dispatch is not to brute-force all the concrete types; that's missing the point. Whatever code you want to be part of the dynamic dispatch, you should put it in a method and implement it on each type, then use the trait object to actually dispatch it to the underlying type (regardless of what it is).

1 Like

I agree, and what's preventing me is I can't, for the life of me, figure out what the type is!!!

Scouring the polars github repo but so far nothing :weary:

It’s probably going to be one of the types that are listed as implementors of the Scalar trait in the polars-arrow documentation.

1 Like

If you're debugging a working program, would type_name_of_val help?

It won’t work for the dyn Any value, but I don’t know what it does for impl Trait opaque types:

Additionally, this function does not resolve trait objects. This means that type_name_of_val(&7u32 as &dyn Debug) may return "dyn Debug", but will not return "u32" at this time.

It calls the opaque type's implementation and prints the name of the type behind the curtain.

2 Likes

Incredible suggestions in here, thank you everyone.

Got it!:

let the_int = s_any.to_owned().downcast_ref::<PrimitiveScalar<i32>>();
let the_str = s_any.to_owned().downcast_ref::<BinaryViewScalar<str>>());

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.