"as" considered harmful?

Hmm, that's not exactly true though is it? A 32bit arch can potentially access more than i32::MAX memory (though admittedly not in one array). Whereas on my x64 desktop I can't access i64::MAX bits of memory no matter what I do (I think the limit is 48-bit addresses?).

No, it's not true. An angle I had not considered.

synn's statement was "... the size of usize / isize will vary across architectures, it can be used to represent all the memory addresses for that arch on which the program is running..."

Which is true. My argument has been that array indices are nothing to do with memory addresses. Abstractly they are integers used to identify elements in an array of similar items. Actual memory addresses are an implementation detail. As are pointers in references.

Correct me if I am wrong but you cannot make an array of 2 billion 32 bit words on a 32 bit machine. That would be 8 billion bytes. Which usize cannot reach. So usize is not helping w.r.t. memory safety.

Anyway, my point in this thread was about the dangers of "as" silently corrupting data. The whole usize thing is somewhat tangential.

I don't know why you keep talking about an array of words. You can have an array of bytes where the stride is one byte, or even an array of zsts where the stride is zero bytes.

I don't. I'm just talking about arrays in the abstract.

zsts don't exist in any practical programming I have ever seen.

Rust has zero sized types, and they have a stride of 0. () is the canonical ZST, and any struct Name; is also a ZST, as is any compound struct containing only ZSTs. A vector of ZSTs actually optimizes to just a size counter and discards the useless pointer and capacity values!

Note that there's no zero sized allocation involved; the vector/box code is specialized to handle zero sized types without allocation (as all custom containers are required to be).

The stride between elements in an array are the size of the item of an array. A vector is not a list of pointers to the actual object; it's actually a list of the items directly. So vec![0u16; 32] will be an array of 32 u16s for a total of 64 bytes allocated. A vec![0u8; 9] will be an array of 9 u8s for a total of 9 bytes allocated. (With exceptions for whatever the allocator's minimum allocation / overallocation policy is, but that isn't exposed to your code anyway.)

2 Likes

I work on code that cares about integer index sizes because it has impacts on memory usage, memory bandwidth, and vectorization. If Rust let me write code that assumed I was on a system with a 32-bit or larger address space, and therefore let me index an array with a u32, I'd be so happy. My code will never run on a 16-bit system - don't make me spend the time to support it!

That is only true as long as you don't publish your code, but allowing crate developers to ignore the subject would only cause a proliferation of crates that do not work in a subset of cases. That goes against the design philosophy of Rust.

1 Like

:man_shrugging: There's a million in a half things that I'm allowed to write in a Rust crate that make it non-portable. At least let me explicitly opt-in to allowing u32 indexing.

@chrisd @ZiCog
Check this out: I32 vs isize, u32 vs usize

(He's from the Rust team, and he won't be wrong, that is what I meant to say in my original post)

I don't think I ever said usize was wrong. Smarter people than me have thought this through and concluded it was the best approach.

Despite my argument I do understand why thing are the way they are in Rust.

Still, I can hold the claim that conceptually, abstractly, the size of my arrays in my high level programs is nothing to do with the size of an address in whatever machine my code might run on.

In the same way that a u64 is an abstract concept of a unsigned 64 bit value, even on machines that only have 32 or 16 bit word size.

I'm sure there are pragmatic reasons why the details of the machine hardware leak through into abstraction of arrays and array indexing in Rust.

Anyway, all of this is tangential to the point of this thread, which was the little trouble with 'as'.

1 Like

I think your observation that using usize as indices is a leaky abstraction touches on a very important point: it is true that it is maybe a leaky abstraction, but Rust is a systems language, thus such leaks are sometimes unavoidable (as here, IMHO).

1 Like

I love that idea! I just imagined how nice it would be, if we could assert the size of usize at compile time, which then allows me to write, e.g. array[0u8] instead of array[0u8 as usize], if usize has been asserted to be at least 1 byte big.

I can't see that working out.

usize is defined to be machine dependent in width. That is all for the good. We don't know if our code will be running on a machine with16, 32, 64 or even 128 bit address widths (Yes 128 bit is a defined register size in the RISC V architecture)

What it would take to get what we want is to change the types of arrays/vectors such that we can specify the type of the index as well as the type of the elements. As is done in Ada I believe.

I don't see that happening in Rust. And I'm not sure I'd like to see that complication added to the language.

Of course, I know what machine my code will be run on. I have to compile it for a target platform, unless you want to say, that someone could still try to run a 64bit program on a 32bit processor, but that'll just silently crash how it's supposed to. I'm not sure how that works behind the scenes, but it shouldn't be an issue to insert a runtime assert at the beginning of the program, if that's really necessary.

1 Like

Not if you publish it, as a crate say, and I use it.

OK. That is true. It's going to be a pain if I try to use your published code on a machine of a different size than the one you have.

Rust is a high level language. In general it should abstract the machine away as much as possible.

Of course being a "systems programming language" it also has to deal with system specific things. Like usize.

Meanwhile, being a high level language, we should be able to write code that does not care about machine specifics, like address space size. After all, if you publish your code you have no idea who will run it or on what. Hence the usize we have.

Even the C guys realized this after some decades of writing compiler and architecture specific code with int, char, etc. They also have size_t for indexing arrays.

There is #[cfg(target_pointer_width=...)] which lets you conditionally compile depending on how big a pointer is. This is how usize is defined

3 Likes

Cool.

Does not fix the issue of having to convert integers to usize for array indexing. But good to know.

1 Like

In reference to this

I get the idea.

But if ones code fails an assertion because it does not like the usize it's not generally useful.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.