"as" considered harmful?

ZiCog · December 4, 2019, 7:57am

During my few months of tinkering with Rust I have translated a few C programs. By way of gaining some Rust coding practice and as a comparison of the performance that is achieved. With excellent results I might add.

Along the way I have translated things like this:

        k=1; l=1;
        for(i=0;i<=x.fmax;i++){
            k*=ppow(x.p[i],z[i]);
            l*=ppow(x.p[i],x.n[i]-z[i]);
        }

To this:

            let mut k = 1;
            let mut l = 1;
            for i in 0..=self.factors.fmax {
                k *= self.factors.p[i].pow(self.z[i]);
                l *= self.factors.p[i]
                    .pow(self.factors.n[i] - self.z[i]);
            }

Only to find of course that there are type mismatches all over and that I can write this to get it working:

            let mut k = 1;
            let mut l = 1;
            for i in 0..=self.factors.fmax {
                k *= self.factors.p[i as usize].pow(self.z[i as usize] as u32);
                l *= self.factors.p[i as usize]
                    .pow(self.factors.n[i as usize] as u32 - self.z[i as usize] as u32);
            }

This already a bit of a pigs ear and hard to read. The compiler of course knows something I did not and suggests I use ".try_into().unwrap()". Which would render such code unreadable.

What I did not know until now is that using "as" can silently corrupt data by losing bits. It not only changes the type it effectively does masking and modulus along the way.

I had naively assumed that "as" would move big things into small things and change signs but also fail with an overflow if there was some damage being done to the data. Turns out that "as" is "unsafe" in that respect.

Having now read the docs properly Casting - Rust By Example I see that is what it is supposed to do. But now I wonder why?

A common case is all this messing from a loop variable that is integer to an array index which has to be usize. Very annoying. Often conceptually that array index is not a size of anything, it's intended to be an integer in it's own right. For example when converting values with a look up table.

I hear the phrase "idiomatic Rust" used a lot. So what is the idiomatic way to deal with these things in Rust? Ways that will not silently trash values.

alice · December 4, 2019, 8:07am

One approach is to use from

let a: u16 = 1;    
let list = vec![10, 20];
println!("{}", list[usize::from(a)]);

This only works for conversions that are guaranteed not to be invalid such as u16 -> usize. You can use it to turn an i32 into an usize because that might fail.

As for what idiomatic rust is? I'd say it's when your type already had the correct type and you didn't have to convert them. Why is your i not an usize anyway?

boxofrox · December 4, 2019, 8:11am

I don't know why as was designed that way, but you can consolidate some of the conversions to improve the readability. For the as u32, I would add a method to self that gets the value at i, attempts the conversion, then returns a u32 value.

If you're converting from floating point to usize or u32, I don't see how you avoid losing bits when the fraction is truncated.

let mut k = 1;
let mut l = 1;
let fmax : usize = self.factors.fmax.try_into().unwrap();
for i in 0..=fmax {
    k *= self.factors.p[i].pow(self.z[i] as u32);
    l *= self.factors.p[i]
        .pow(self.factors.n[i] as u32 - self.z[i] as u32);
}

raggy · December 4, 2019, 8:17am

The first thing i do when translated code runs into the problem of too many damn as conversions is i see if i can just make the types fit on definition. Like what @boxofrox suggested.
A alternative approach is avoid indexing alltogether and write something like (untested)

let fmax: usize = self.factors.fmax.try_into().unwrap();
for ((p, n), z) in self.factors.p[0..=fmax]
    .iter()
    .zip(&self.factors.n[0..=fmax])
    .zip(&self.factors.z[0..=fmax])
{
    k *= p.pow(z.into());
    //...
}

H2CO3 · December 4, 2019, 9:18am

C code is often sloppy with integer types, even around array indices. This is because you are basically allowed to index arrays with any type of integer (with implicit, lossy conversions happening under the hood).

I'd consider it idiomatic Rust if you used usize from the beginning for all array indices. If your 32-bit computations don't fit into a usize (because you are running on a platform where pointer size is 16 bits), then you couldn't index your arrays with the resulting big numbers anyway.

vmedea · December 4, 2019, 10:29am

i think rust being explicit about integer types and conversion between them is good, tons of bugs and vulnerabilities happen this way

my only gripe with regard to rust casts is really usize::from(u32), it tends to come up with pixel coordinates and such a lot; sometimes i really don't care about 16-bit architectures and hate to clutter the code for it

kornel · December 4, 2019, 2:12pm

Rust being explicit about integer types, but silently truncating in as is a source of bugs and vulnerabilities.

I think the current situation is awful and dangerous (not "unsafe" per Rust's definition, but unexpected truncation caused by as can cause dangerous bugs). And because Rust is missing majority of useful From implementations for usize, the woeful as is de-facto required.

My Rust code would be more readable, clearer and safer if Rust allowed indexing by u32 or had a guaranteed-lossless conversion from u32 (and compile error on 16-bit platforms).

juleskers · December 4, 2019, 5:17pm

.zip would be the idiomatic way to iterate over multiple arrays in lockstep, as @raggy demonstrated. Let rust worry about checking all those "boring" things like "are these arrays equally long?" and "am I accessing out of bounds?"

std only offers a pairwise .zip, leading to raggy's .zip.zip. for a multi-item .zip, see the itertools crate.

You could also consider changing your data layout, so that want you use together is stored together.
Something in the direction of vec<{n: usize, p: usize, z: usize}>. Or even with an anonymous tuple: (usize, usize, usize).
In indecipherable jargon for search engines this is called "struct of arrays to array of structs", or "SOA to AOS transformation".

ZiCog · December 4, 2019, 5:36pm

Thanks for the replies everybody. There is a lot there to digest and will have to find time to consider it all more when I'm a bit fresher.

In the meantime a few comments:

I am very happy that Rust is fussy about types and sizes. That is a one reason why I'm enthusiastic about Rust and here at all. Over the decades I have probably created every possible bug the wibbly-wobbly type systems of C/C++ and other languages allow. I have in turn spent my fair share of time fixing such issues caused by others.
I would expect that in any remotely high level language an array is a container of similarly sized objects. It has an integer number of elements. Each element can be accessed using an integer index. Conceptually, abstractly arrays are all about integers.
As such, one should be able to access array elements with any integer type as an index.
Of course one's chosen integer might be negative, or too big for the array size. Surely that is what arrays bounds checking is there for? No problem in Rust.
Nothing about accessing arrays by integer index has anything to do with the size of a pointer vs the size of an integer on whatever platform. When are we using pointers in Rust, except when going unsafe?
Often times an integer used to index an array is not just an array index. It is an integer value in it's own right.
With all the above in mind it appears very odd to have to convert an integer to a kind of pointer type, usize, all the time in order to index an array. An array index is an integer!

Then I find that "as", which is a convenient way to get around this usize problem, introduces the same wibbly-wobblyness as we have in C. Silently throwing away bits and allowing for similar bugs to be written.

Perhaps I have missed an important point but just now the need for usize in Rust array access makes no sense to me. Also "as" is dangerous and should at least cause a warning in the compiler or clippy.

alice · December 4, 2019, 6:02pm

In my experience having indices that represent anything but an index or length of an array is pretty rare. I'm okay with having to convert my indices for the rare case where this happens, as I get the advantage of compile-time checking that I don't use something that isn't an index as an index.

juleskers · December 4, 2019, 6:06pm

I believe as is a concession to the low level nature of our computer hardware. Useful, but low-levelly-dangerous, just like pointers, which is why we try to avoid it.

As for your points on arrays, Can you have a set with minus-4 items in it?
There is a big difference between "integers", and "positive integers". An array doesn't contain "an integer number of elements", it contains a positive integer (including +0) number of elements.
That single word is important, and is exactly what usize vs int is about.

C plays fast and loose with signage, to enable "fancy" "optimisation" tricks with relative, negative indexing. That is a very powerful trick in capable hands, just like goto-instructions.
Both are also extremely easy to get wrong in even trivial cases (looking at you, 42.zip)

Just like the goto-instruction was made obsolete by structured programming and scoped functions, so should we find better ways to handle iterations, that allow the same power, but protect against more mistakes.

As for why usize changes size between architectures: that is the reality of the biggest array, of the smallest items, a computer can theoretically address: an array of bytes/u8 from memory address zero, all the way to the end of addressable memory, at 2^8, 2^16, 2^32 or 2^64 in existing hardware (practically, modern CPU's max out at 2^48, to save on silicon area and copper traces to RAM).
Architecture-sized numbers/pointers can be copied in one CPU-operation, making them fast.
Always copying 64-bit (or even worse 128 bit, for future-proofing) pointers, even on 8bit microcontrollers, would be wasteful, slow, and unworthy of Rust's system-level ambition.

The C-equivalent would be size_t

ZiCog · December 4, 2019, 6:44pm

Exactly my point. An array's length and indices are integers. How come we need some unrelated type that is something to do with the size of pointers in order access an array?

Rare? The code I have been working with here does that on every other line!

When you come from the world of Algol, Fortran, Coral, PL/M, Ada, C, C++ that is how code looks. Heck even BASIC and Javascript.

alice · December 4, 2019, 6:56pm

It isn't an "unrelated" type — it's the type designed to contain indexes and lengths of arrays. It's equivalent to size_t from C. The usize type is fundamentally related to arrays — the link to pointers is just because arrays are linked to pointers.

ZiCog · December 4, 2019, 7:03pm

I don't recall exactly now but you probably can in Ada. In Ada you can define your own "integer" types with whatever min and max range of values you like and it will range check them. As far as I can recall you can also define arrays that don't start at zero and may well go backwards. I may be wrong, have to check...

Technically yes. I can't argue with that.

That does not satisfy me as to why I can't index an array with any old integer type. An i8 for example. If it's negative that is an array bounds error, no problem in Rust. If it's too small to reach where I want to access that is just a dumb error on my part. Likely an overflow error in debug builds. No problem in Rust.

I have some sympathy for that idea.

However unlike banishing GOTO, replacing loops by all kind of modern day functional programming iterators and such has two issues for me:

It obfuscates what one is intending to say. See example above with all it's .zip stuff. It will take me half a year to figure out what that code is saying. It will likely be another year before I can write like that myself!
So far when I have tried that in Rust it had a very detrimental effect on performance.

Perhaps I should put this whole little project up on github and invite everybody to contribute towards improving its readability and performance?

ZiCog · December 4, 2019, 7:14pm

Perhaps "unrelated" is putting it to strongly. It is unrelated enough to require redundant conversions to index my arrays thus obfuscating by code.

I think you have hit the nail on the head there. It's all down to the nature of the underlying hardware (size of pointers vs size of integers) leaking though the abstractions of Rust.

I guess there is no better way around that and we just have to live with it.

alice · December 4, 2019, 7:17pm

I mean, where does your index come from? It seems to come from self.factors.fmax, but what does this variable represent? Maybe it should be an usize to start with?

Note that you can simplify your code by converting self.factors.fmax into an usize before entering the loop, which would make i be an usize from the get-go.

ZiCog · December 4, 2019, 7:39pm

As it happens if I change self.factors.fmax to a usize it simplify things without making to much of a mess elsewhere in the code.

            for i in 0..=self.factors.fmax {
                k *= self.factors.p[i].pow(self.z[i] as u32);
                l *= self.factors.p[i]
                    .pow(self.factors.n[i] as u32 - self.z[i] as u32);
            }

fmax is counting prime factors. Which it happens to store in an array as a memo. I'm not sure I'm happy to make it a non integer.

Certainly I can factor out that conversion of self.factors.fmax. Somewhat better perhaps. Somewhat less verbose and annoying.

That still leaves the question of '"as" considered harmful?'. That silent data corruption does not sit well in Rust to my mind.

ryan · December 4, 2019, 8:31pm

Rust isn't absolutely safe, it only guarantees memory safety. In fact, there's quite a few issues surrounding numbers. Overflow, underflow, etc. By default in debug mode these cause a panic, but you can opt into the overflow/underflow properties via the wrapping module in std::num::Wrapping.

For casts the "safe," and most explicit way to handle it is by using the From and TryFrom traits.

I agree, it is surprising that rust silently allows this "unsafe," non-explicit behavior. Fortunately clippy does warn about it.

Maybe making as unsafe could be something to look at for an edition upgrade?

ZiCog · December 4, 2019, 8:52pm

I do appreciate that "safe" in the Rust context is talking about memory safety. Not overflows and other programmer mishaps. There are limits to what is possible.

I have been enabling overflow checks in release builds with "overflow-checks = true" in the Cargo.toml.

Given that we have overflow checks in Rust I found it very strange that this "as" thing gets through.

In my experiments clippy has never complained about possible error due to "as".

Not sure I would go as far as making "as" unsafe. But I see no reason it does not detect loss of bits, like an overflow, in debug builds.

ryan · December 4, 2019, 9:06pm

Funny, I had the opposite experience, that's how I learned there was an issue at all. I went and checked on the clippy lints page, and sure enough these cast lints are marked as allow. I wonder if that changed; they used to be warn I thought. (The easiest way to find them on that page is to use the search bar and look for cast)

I guess you could enable them in your project. Here is a link to get you started configuring clippy.

But to answer your original question, yes, I consider as harmful. It is a loaded foot-gun, and apparently I need to update my clippy configuration!

Topic		Replies	Views
Why would someone use `usize` over `u32`? help	33	8134	May 20, 2024
Is there a way to allow indexing `Vec` by `i32` in my program?	52	21221	July 3, 2022
Why numerical computation casting hell?	24	7635	January 12, 2023
Why does rust have [ ] indexing? community	26	4487	December 13, 2020
Why indexing by isize is disallowed?	17	2945	April 4, 2022

"as" considered harmful?

Related topics