Idea: add a `try_cast<T, U>(value: T) -> Result<U, T>` function?

Is it a good idea to add the following function to the standard library:

const fn try_cast<T, U>(value: T) -> Result<U, T> {
    // Pseudocode.
    if T == U {
        Ok(transmute(value))
    } else {
        Err(value)
    }
}

This function can be used in generic functions to provide a quick and dirty specialization behavior:

fn foo<T>(value: T) {
    match try_cast<_, i32>(value) {
        Ok(value) => { /* `value` is an `i32`. */ }
        Err(value) => {
            match try_cast<_, String>(value) {
                Ok(value) => { /* `value` is a `String`. */ }
                Err(value) => { ... }
            }
        }
    }
}

I think currently, I might be able to simulate this behavior using std::any::Any::type_id, but it only supports 'static types, For non-'static types, I don’t know how to do it without the help of the compiler.

For non-'static types, this will probably be unsound for exactly same reasons specialization is unsound: since lifetimes are erased, you won't be able to ensure that the value is indeed valid for the scope you need.

4 Likes

What do you mean by “lifetimes are erased”?

You might be out of luck, there. The 'static requirement on std::any::Any is needed for soundness because otherwise you'd be able to transmute lifetimes... which would be bad.

I think they just mean we aren't checking whether the lifetimes are the same when we ask "are T and U the same type?".

1 Like

Why don’t we check lifetimes? Is there a fundamental limitation in the compiler? Intuitively, &'a str and &'b str are the same type if and only if 'a is the same as 'b.

Also, if we take variance into consideration, we can relax the success condition:

const fn try_cast<T, U>(value: T) -> Result<U, T> {
    // Pseudocode.
    if T is a subtype of U {
        Ok(transmute(value))
    } else {
        Err(value)
    }
}

How would the compiler store a lifetime inside std::any::TypeId and check whether two items are "compatible" at runtime?

The naive approach would be to assign a unique number to each lifetime, then implement some sort of tree or lookup table to tell you whether 'b: 'a (i.e. &'a T is a subtype of &'b U), but considering how many references are used in Rust code (each of which has its own unique lifetime), you could have hundreds of thousands of entries in this data structure.

Not only would that have a significant effect on binary sizes, it also means checking if A has the same type as B goes from a trivial u64 comparison to an expensive lookup with lots of cache misses when lifetimes get involved... That's one hell of a performance cliff.

2 Likes

Currently lifetimes can't affect the generated code at all. It can only throw compile error if its validation failed.

I imagine the checking is done at compile time, and try_cast is zero overhead. It could be done like this: when the compiler instantiates a try_cast function, it must know what T and U actually is, so the compiler knows what value to return at compile time.

But now I do see a problem: for a generic function that calls try_cast, we may need to generate two instances:

fn foo<'a, 'b>(a: &'a str, b: &'b str) {
    try_cast::<&'a str, &'b str>(a);
}

This function only contains lifetime generic arguments, but we do need to generate two instances, according to whether 'a and 'b are the same, and even worse, the number of instances can grow exponentially:

fn foo<'a, 'b, 'c, 'd, 'e, 'f>(a: &'a str, b: &'b str, c: &'c str, d: &'d str, e: &'e str, f: &'f str) {
    try_cast::<&'a str, &'b str>(a);
    try_cast::<&'c str, &'d str>(a);
    try_cast::<&'e str, &'f str>(a);
}

The function above may need 8 different instances.

One possible solution is we give up the zero overhead behavior, somehow encode the relations of the lifetimes in a hidden function arguments, and check the relation at runtime:

fn foo<'a, 'b>(a: &'a str, b: &'b str, __hidden_lifetime_relation: SomeMagicType) {
    try_cast::<&'a str, &'b str>(a, __hidden_lifetime_relation.a_and_b);
}

But it does seem to be a complex solution, I don’t think it is worth doing.

1 Like

This is why codegen can't depend on the specific lifetimes.

(And it's ever worse than you describe there, since reborrows are fresh borrows, so basically every callsite is a fresh lifetime.)

Not even "in the compiler". This limitation is pretty fundamental, no matter how smart the compiler is. For example, recursion and other dynamic-ish schemes are basically impossible to encode in any reasonable manner.

A couple of excellent answers have been given to my question on IRLO.

Lifetimes are also commonly underdetermined. As long as they’re not overconstrained (i.e. with contradictory requirements), the compiler simply doesn’t care.[1] If you have some references, e.g.

let mut n1 = 1;
let mut n2 = 2;
let mut n3 = 3;
let r1 = &mut n1;
let r2 = &mut n2;
let r3 = &mut n3;
do_something();

do the three have the same lifetime? Is one longer than another other? The compiler doesn’t care. The code could be extended to

let mut n1 = 1;
let mut n2 = 2;
let mut n3 = 3;
let r1 = &mut n1;
let r2 = &mut n2;
let r3 = &mut n3;
n1 += 1;
do_something();
*r3 += 1;

now, the lifetime of r1 must be shorter than the one of r3! Because r1’s lifetime ends before the n1 += 1, and r3’s lifetime must stay alive until the *r3 += 1. What about r2 though? Is it the same as r1 or r3 or neither? The compiler doesn’t care, there still are no conflicting constraints, everything is fine.


Now of course, this approach of “the compile doesn’t care” would go entirely out of the window if you were to be able to ask the question “are the two lifetimes of these two types equal?”

In case the problems aren’t that obvious already by now, let me continue…

Imagine I’ll use your try_cast to implement some function are_the_types_the_same<U, T>(x: &mut U, y: &mut T) -> bool.

What is the output of

let mut n1 = 1;
let mut n2 = 2;
let mut n3 = 3;
let mut r1 = &mut n1;
let mut r2 = &mut n2;
let mut r3 = &mut n3;
dbg!(
    are_the_types_the_same(&mut r1, &mut r2),
    are_the_types_the_same(&mut r2, &mut r3),
);
n1 += 1;
do_something();
*r3 += 1;

it cannot be true twice… it could be false twice. If it’s only true for one of the two, which one and why? Whatever you think it should be, let’s take it further by adding calls to a function fn assert_same_type<T>(x: &mut T, y: &mut T) {}, making whole story even more complex:

let mut n1 = 1;
let mut n2 = 2;
let mut n3 = 3;
let mut r1 = &mut n1;
let mut r2 = &mut n2;
let mut r3 = &mut n3;
assert_same_type(&mut r1, &mut r2);
dbg!(
    are_the_types_the_same(&mut r1, &mut r2),
    are_the_types_the_same(&mut r2, &mut r3),
);
n1 += 1;
do_something();
*r3 += 1;

now (assuming the code still compiles) we expect the output (true, false); whereas with

let mut n1 = 1;
let mut n2 = 2;
let mut n3 = 3;
let mut r1 = &mut n1;
let mut r2 = &mut n2;
let mut r3 = &mut n3;
assert_same_type(&mut r2, &mut r3);
dbg!(
    are_the_types_the_same(&mut r1, &mut r2),
    are_the_types_the_same(&mut r2, &mut r3),
);
n1 += 1;
do_something();
*r3 += 1;

we expect the output (false, true). Right? Otherwise, the successful call to assert_same_type would contradict the are_the_types_the_same output. Of course combining both assert_same_type should result in compilation failure. (See also the real code example below.)

I would consider this some pretty freaky spooky action at a distance though. I do definitely not want a call to assert_same_type (which basically is more of a “if these types aren’t the same, please fail to compile”) function to have any influence on program behavior.

Here you can reproduce the compilation result of these examples without the hypothetical are_the_types_the_same function:

fn do_something() {}
fn assert_same_type<T>(x: &mut T, y: &mut T) {}

// works
fn test1() {
    let mut n1 = 1;
    let mut n2 = 2;
    let mut n3 = 3;
    let mut r1 = &mut n1;
    let mut r2 = &mut n2;
    let mut r3 = &mut n3;
    assert_same_type(&mut r1, &mut r2);
    // dbg!(
    //     are_the_types_the_same(&mut r1, &mut r2),
    //     are_the_types_the_same(&mut r2, &mut r3),
    // );
    n1 += 1;
    do_something();
    *r3 += 1;
}

// works
fn test2() {
    let mut n1 = 1;
    let mut n2 = 2;
    let mut n3 = 3;
    let mut r1 = &mut n1;
    let mut r2 = &mut n2;
    let mut r3 = &mut n3;
    assert_same_type(&mut r2, &mut r3);
    // dbg!(
    //     are_the_types_the_same(&mut r1, &mut r2),
    //     are_the_types_the_same(&mut r2, &mut r3),
    // );
    n1 += 1;
    do_something();
    *r3 += 1;
}

// fails
fn test3() {
    let mut n1 = 1;
    let mut n2 = 2;
    let mut n3 = 3;
    let mut r1 = &mut n1;
    let mut r2 = &mut n2;
    let mut r3 = &mut n3;
    assert_same_type(&mut r1, &mut r2);
    assert_same_type(&mut r2, &mut r3);
    // dbg!(
    //     are_the_types_the_same(&mut r1, &mut r2),
    //     are_the_types_the_same(&mut r2, &mut r3),
    // );
    n1 += 1;
    do_something();
    *r3 += 1;
}

(run in the playground)


  1. Unlike for types (except for their lifetime parameters), where an underdetermined, aka ambiguous, type will lead to compilation errors asking you to specify the type more precisely. ↩︎

4 Likes

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.