Integer overflow is considered to be an error (except where it not is)

I would like to understand why this isn't considered an error in Rust:

fn main() {
    println!("{}", 257u16 as u8);
}

(Playground)

In debug mode, integer overflows cause a panic (see reference). In release mode, an integer overflow is still considered an error even if not panicking (see reference).

I understand the rationale up to here.

But why is a cast that overflows not considered an error? Instead, the reference guarantees truncation (which I assume means bitwise truncation):

Numeric cast

  • [‚Ķ]
  • Casting from a larger integer to a smaller integer (e.g. u32 -> u8) will truncate
  • [‚Ķ]

What's the rationale behind this? Why aren't all integer overflows treated the same (i.e. panicking in debug mode)?

Because the cast is explicit. People very much expect explicit casts to truncate, whereas it's not at all apparent when doing arithmetic.

To put it differently, the point of a cast is to change representations; that obviously opens up the possibility of narrowing and thus truncating the bits that don't fit. However, the same is not something people intuitively expect when doing arithmetic, since the type doesn't change there. Even though operations on a type might not be closed, our mathematical intuition (wrongly) tells us that it "should" be.

6 Likes

So I would understand it then as:

"If you use as, you know you accept some loss-of-precision or wraparound."

Where as in arithmetic operations, it is only expected to have loss of precision (in case of floats), but a wraparound would be "surprising" (thus considered an error).

That makes sense, and I usually use .try_into().unwrap() instead of as.


Side problem:

However, I can't do this when converting from a float to an integer:

fn main() {
    println!("{}", u8::try_from(256.0f64).unwrap());
}

(Playground)

Is there some sort of method to allow converting a float to an integer but panicking when encountering NaN (and maybe/optionally also when overflowing) instead of silently mapping NAN to 0 (which is what as u8 does, for example, and which is a bit odd):

fn main() {
    assert_eq!(f64::ln(-1.0) as u8, 0);
}

(Playground)

Because truncating, zero-extending or bit-casting a number between signed and unsigned are all valid operations, indispensable in some contexts. They are also super common in C, and Rust aims at good C interop.

However, there were some proposals to use more explicit syntax for those operations.

What should it do, then? It used to be undefined behavior by accident (which is a soundness hole, so it couldn't stay that way), but it was subsequently defined. 0 might not be the perfect option, but it's probably the least bad one.

Such functionality is implemented in num-traits:

use num_traits::ToPrimitive;

fn main() {
    println!("{}", 255.0_f64.to_u8().expect("255 is in range"));
    println!("{}", 256.0_f64.to_u8().expect("256 is not in range"));
    println!("{}", f64::NAN.to_u8().expect("NaN is not in range"));
}
1 Like

At what stage are these plans, BTW? as just feels like an alien from PHP world: something which is so magical you just want to avoid it, if possible… but there are no good replacement for that simple truncation conversion.

You can do something like (u32 & 0xFF).try_into().unwrap(), but this doesn't work well with signed numbers (although I admit I usually do bit tricks with unsigned numbers) but even with unsigned numbers it looks harder to understand than u32 as u8.

It's not hard to create it's own trait for that, of course, but if the whole point is to make code more readable… unique trait which everyone would make for himself or herself is not of much help.

Personally, I'm not a fan of those proposals. Certain kinds of low-level code use so many integer casts that making them more heavyweight would be a major degradation of readability. (u32 & 0xFF).try_into().unwrap() is certainly a no-go in those situations. Neither are potential panics all over the place.

One important issue with as-casts is that they can be freely used in const contexts. Traits currently can't be, so you would either have to introduce many disparate const intrinsics on integers, or wait for the stabilization of const traits (which looks very far away).

I don't see anything magical about Rust's casts. They are very explicit and limited. One suggestion I saw which makes them more manageable is to allow integer casts only when they either change sign, or the width of the types. Changing both at the same time is indeed confusing, and not common.

3 Likes

The magic is in as, not in casts themselves. X as Y have so many different meanings it's really hard to keep track of them. And even harder to understand what happens in a piece of code which uses that syntax.

In value context, expr as Type only has a single meaning: convert the value of expr to type Type.

Yeah. And every experssion in any language may have only a single meaning: it does what's written in it. Got it.

Nonetheless Rust reference includes a table which includes more than dozen possibilities and few pages of explanations. And it doesn't even mentions the fact that similarly-looking expr as Trait is explained in entirely different part of documentation.

Sorry, but it's almost as convoluted as C casts (which C++ rightfully replaced with bunch of different casts because semantic was too unobvious and confusing).

1 Like

It's not as complex as C casts, it's more like C++ static_cast, and it's even simpler than that, since Rust casts are not overloadable (there is a finite set of specific rules in the reference, and that's all).

It's good to introduce more specific functions with readable names, and there is work in that direction, but it's hard to avoid degradation of ergonomics. The integer casts in particular are too complex for an ergonomic API at the moment. There is also an issue of code bloat and unacceptable debug performance: as-casts are compiler primitives which are always compiled to efficient code, while a generic API will incur monomorphization penalties and function call overhead, which can have terrible effect on debug performance.

5 Likes

I almost knew that was going to be your reply.

That long-winded explanation is basically what anyone would sensibly expect (and that's no coincidence). The only thing I couldn't guess without reading the documentation if I were new to Rust is whatever NaN as integer does. Everything else aims to produce the closest value in the target type (when converting between integers and floats-used-as-reals) or does what's the most hardware-efficient while still yielding a reasonable value (truncate or zero/sign-extend). See, this can be written down in 1 or 2 sentences.

The Rust documentation tries to be friendly, elaborate, and eloquent as a work of prose, but if you look at the content that really matters, it all just boils down to the high-level idea of "preserve bits if converting between ints, preserve values when converting to/from float, and both kinds of cast are on a best-effort basis". There might be a lot of words in the documentation, but that's not what the actual Kolmogorov complexity of its semantics is.

There's no expr as Trait in Rust. You are probably thinking about Type as Trait, which is in a completely different context (it's type-level, not value-level), so it can't be mistaken for value as Type. By the same argument, it could be said that the for keyword is ambiguous and must be abolished. Yet somehow hardly anyone is ever confused about $Trait for $Type (in impl definition context) having a different meaning from that of for $pattern in $value (in statement context), exactly because they are in different contexts, and context matters a lot.

See, you didn't even complain about use crate::name as other_name containing the as keyword, even though it's a thing ‚Äď you probably didn't even think about it, because it's conceptually so far from both other uses of the keyword that it cannot really be confused with either of them.

Oh, and finally, let me point out that this is a pretty useless strawman. The semantics of as are compiler built-ins; it is exactly for this reason that everything a cast can do comes from a small, finite, closed set of possible operations. In contrast, general expressions bear no such restrictions. Even the innocent function call can do arbitrary (and arbitrarily bad) things. Would you also suggest that there shouldn't be functions in the language, because they are too confusing?

2 Likes

But if there was a dedicated trait it would work nicely:

let a: u8 = 1234u32.wrapping_into();

I think as works the way it does for legacy reasons and it's unfortunate. It would be a lot better if it worked like other arithmetic, panicking in debug mode.

Most uses I've seen don't expect truncating. Typical use: array[index as usize] doesn't expect to truncate here.

That's not a great example though: casting indexes is a code smell. If one needs indices, one should always use usize for storing the index in the first place. If indexes come from somewhere else and in a different representation, one should then attempt to bulk-convert them to usize in a fallible all-or-nothing manner anyway, since it's a pretty bad debugging UX in the usage of a data structure if some indexing operations fail/panic while other ones had previously succeeded, just because e.g. some of them happens not to fit a 32-bit usize on a 32-bit platform.

Most legitimate uses of integer casts are encountered when manipulating binary formats, where truncating is usually desired, e.g. breaking up a wider number into its constituent bytes or words for parsing purposes is widespread.

I agree that's how it should be. But the nice and short as notation encourages using it in place of .into() or try_into() all over real code though where wrapping is not expected. I see plenty of examples in std code, for example.

1 Like

In practice that would mean all your integers are usize because most everything can be used for an index calculation. But even std functions give you other types. Hasher gives you u64, not usize. u64::trailing_zeros gives you u32. Etc. So you have to convert sometimes.

Also usize is not portable, so actually I don't think this is great advice. For instance, if I'm calculating the size of a data structure I need, and I know my calculation will fit in 40 bits, I don't want to use usize for that calculation because usize might not have 40 bits, which would make my calculation overflow. Totally not what I want. So I want to use u64, and then if the number is actually too big to fit in memory, I want this to fail when I try to allocate memory by calling Vec::new(size.try_into().unwrap()) or something like that.

I'd still argue that most of the time when you want to convert between types, you don't expect just the low bits -- that's rather specialized and low-level. Hence it should be something like wrapping_into, analogous to wrapping_add, wrapping_sub, etc.

Maybe you misunderstood and/or I expressed it badly.

I understand now why "as" has this behavior, so it isn't "odd" to me anymore.

But in many cases I need a different behavior, namely returning an error or panicking (which is what I can achieve with .try_into()? or .try_into().unwrap()). But I haven't found a way to do that in case of converting from f64 to an integer type unless I manually check for NAN. Is there any built-in function doing that for me?

No built-ins, but the num-traits crate I linked to is a de facto standard crate, maintained by Rust team members, and it implements that functionality.

2 Likes

I don't think so. "Can be used", maybe, technically, but in practice not nearly every integer is used for indexing; this seems quite obvious to me. In many cases, integers are used for something completely different, such as:

  • cryptographic credentials
  • database identifiers
  • packet type descriptors in network protocols (similar to enum discriminants)
  • counting and doing math with money or time, without floating-point rounding errors

A cryptographic key or a payment amount specified in cents rarely ever ends up indexing into anything, so it's fine to specify them as an appropriate fixed-with integer.

That was exactly my suggestion w.r.t. bulk conversion.