"as" considered harmful?

The reason that it doesn't panic on loss of bits is that it is used in many places where you want it to throw away the extra bits. This is also why clippy doesn't warn about it. Note that clippy will still suggest using from when the conversion is already lossless.

I agree that it's unfortunate and that it would be nice if it was more explicit about loss of bits being the intention, but now we're stuck like this because of backwards compatibility.

Perhaps we could add a module to std::num called std::num::Truncating that semantically works the same way std::num::Wrapping does? This would make the conversion explicit. Then it would make sense to make the clippy lint at least warn and suggest using the Truncate trait.

Hmmm...In that case "as" should be better called "truncate", "mask", "trim", "mangle" (have no idea what a good name for it is). Because it it is not converting the value X as represented in one type to the X as represented in another type. It is potentially changing the value altogether.

I guess we just have to learn to live with it. Look carefully at the use of "as" in code reviews as we might for "unsafe" and hope not to see to many of them.

usize is an integer type. It is not a pointer. Integers comes in many sizes, u8, u16, u32, u64, and u128 is all integer types with an explicit number of bits while usize is an integer big enough to handle pointer offsets.

It seem reasonable to assume that the program should be able to handle as many prime factors as possible? And it also seems that to do that, the program needs to handle arrays of at least fmax items? If so, usize is the most suitable integer type to use.

8 Likes

Clearly usize is an integral value of some kind and not a pointer as we know in C for example.

But it is pointer, or rather address width, related. It changes size on different machines depending on their addressing capabilities. 16, 32 64 bits.

The documentation talks about usize as if they were pointers:

The size of this primitive is how many bytes it takes to reference any location in memory. For example, on a 32 bit target, this is 4 bytes and on a 64 bit target, this is 8 bytes.

As such using a usize for array indexing is even more bizarre. I end up using a type that can reference "any location in memory" as an integer index to array elements which are likely bigger than memory locations.

As far as I can tell 'usize' in conjunction with 'as' introduces platform dependent behavior. The same as many of those in C. Which I find surprising.

Perhaps it should state

Would you consider that an improvement?

Personally, I find usize to be the appropriate type for indexing arrays that could be arbitrarily large (albeit constrained to fit in memory). I do find the conversions irksome, so I either declare the index types to be usize or, as I have documented in prior posts, use the macro ix!(…), which implements debug-mode checking, to convert other integer types to usize.

[Edit] If you are worried about an expression that is greater than std::usize::MAX you could add that check to the debug assert.

2 Likes

This sounds like it could be a source of confusion.
At the metal level, there is no difference between a pointer and an integer. Both are "just" numbers, represented exactly the same as electrons.

In that respect, "pointer" is just a half-way-there point on the spectrum of abstraction from raw memory bits to fully typed structs representing useful data: a slightly more useful interpretation of electrons, but by no means the best we can do.

Typed languages do their best in helping us keep the different interpretations from being accidentally mixed up. C's *void is arguably horrible at this, *MyType is getting better, and (for example) a HashMap<String,<JdbcConnection<PsqlDriver>>> in Java is very abstract indeed. (I don't know Ada syntax, or I would have included a made up example in that too).

In the end though, it's just electrons, and in the end, there is a lot of low-level performance to be gained by "converting" them in zero clock instructions, just by looking at them through a different lens.

Which I guess is indeed a lengthy way of saying, that its very powerful, and thus indeed dangerous.

XKCD summarises the feeling I want to express quite well in XKCD 676: Abstraction.

1 Like

So, what's a poor programmer to do? I've got a bunch of u8 as usize in my code, which can't cause a problem. However, they are so frequent that I failed to notice a couple of usize as u16. It turns out that isn't a problem because of another constraint that keeps the usize value small enough. The problem is that it took me, who wrote the code, a couple of minutes to verify that fact, and I'm worried about the constraint being removed under maintenance.

Is there a conversion method that will return either an Option or a Result on a bad conversion, or do I have to roll my own? I can't just start with a u16, since some of the usize variables come from things like v.len().

1 Like

Yeah. TryFrom

5 Likes

I had tried that, but got

no function or associated item named `try_from` found for type `u16` in the current scope

With your nudge, I noticed that TryFrom isn't in std, it's in std::convert. Problem solved.

Clearly a "pointer" or machine memory address is going to be an integer. Yes it can even be negative as it was in machines in the past. Probably today it's down to byte resolution. But that need not be so either.

Meanwhile by definition the abstract concept of an array in a high level language is a contiguous list of objects that can be accessed by an integer index.

But these integers are not the same thing.

By analogy, houses typically have integer house numbers along the road. But that is nothing to do with how many meters you have to walk to get there.

By using 'usize', a thing capable of accessing bytes in memory, as array indices we see a mix up of abstractions. The hardware is leaking through to the high level language.

Well, perhaps that is quite OK. Rust is after all touted as a "systems programming language".

Which, despite all I may have said, makes me very happy :slight_smile:

2 Likes

Digression

In many US cities the address of a house, including the choice of which street for houses at the corner of two streets, is a linear function of where the city's sewer line connects to the house. In those cities, e.g., Chicago, IL or Tempe, AZ, the house number is quite precisely related to how many meters you have to walk to get to a point above the house's connection to the communal sewer line.

3 Likes

That is an interesting factoid. It must be reassuring to know that your position in life is dictated by how far down the shit pipe you live. There is something deeply profound about that.

More pertinent to our discussion is:

Precisely related in the same way that an array full of 100 byte sized objects is separated by an array index change of 1 but a physical address change of 100.

To reiterate my point, usize is defined in the documentation as able to address memory bytes. But an array is an abstract thing indexed by a an integer, even if each of those things is bigger than a byte.

I have started to think the problem is that an array actually needs two types:

  1. Obviously the type of each element.

  2. The type used to index it.

Item 2) could be a u8, u16, u32, whatever it takes to access the whole array.

We don't have that of course, so the second required type is 'usize' in all cases.

Edit: Actually what I said is likely not even true. An array of 100 byte sized objects probably has 128 bytes between elements. For memory alignments sake. usize has even less relevance in indexing then.

No, it doesn't, because a byte-sized object has an alignment of 1 only.

If you change those to usize::from(the_u8) it might make the as u16s stand out better.

A good idea, but it does introduce some annoying cruft, especially when there's more than one conversion in a source line. I'd prefer to turn on a compiler warning, but until I can I'll rely on good old grep.

Sorry, I think what I wrote was ambiguous. I said "An array of 100 byte sized objects..."

I meant an array of objects each of which is 100 bytes in size.

Not an array of length 100 for which each element was a byte.

For what it's worth, there is some discussion about allowing u8, u16, u32, u64, and even u128 as array indices. (Signed numbers are a much harder sell.) The main reason they aren't yet is the impact on number type inference and fallback as well as overload selection. (An obvious case is the simplest: arr[ix_u8.into()].)

Until such a day as this restriction might be lifted, using an ix! macro as mentioned above for indexing with non-usize types seems like the best option. An example quick implementation:

macro_rules! {
    ($ix:expr) => {
        match $ix { ix => {
            <usize as ::std::convert::TryFrom>::
            try_from(u128::from(ix))
                .unwrap_or_else(|_| panic!(
                    "{} is not a valid index", ix
                ))
        }}
    }
}
7 Likes

I glad you said that. After the discussion above I was starting to think I was the only one who had a problem with usize. I can see it might be a difficult fit in the language. Perhaps it's a complication we should reject after all...

As it happens, on advice from all above, I changed all my types around so that anything that gets used as an index is a usize. Things look much better now :slight_smile:

4 Likes

I'm happy to hear that! And I can relate, there is always this stage when porting/rewriting anything to anything else, where everything looks more horrible than when I started, and it makes me question why I even considered doing it.
Fortunately it gets better in the second pass, when we can allow ourselves to improve the rough initial port to be more idiomatic and clean.

1 Like

Sizes are always a multiple of alignment in rust (there's no separate stride), so a 100-byte object requires alignment of at most 4.

(And there's enough unsafe code in the ecosystem that that's probably not something that can every change.)

1 Like