Why is there no `impl From<u32> for usize`?

Would it not be safe to have the implementations:

#[cfg(or(target_pointer_width="32", target_pointer_width="64"))]
impl From<u32> for usize {
    // ...
}
#[cfg(target_pointer_width="64")]
impl From<u64> for usize {
    // ...
}

Why are these not present?

It would certainly help clean up a lot of code that needs to constantly convert between u32/i32 and usize/isize.

4 Likes

So what should happen on 16-bit platforms? There, usize is 16 bits, so u32- > usize is fallible.

By the way, the corresponding fallible conversions (TryFrom and TryInto) already exist on every platform.

You should not write code that needs to constantly convert between fixed-size and platform-size integers. Use usize/isize for indexing and pointer arithmetic directly.

8 Likes

You don't want such basic API to be platform-specific -- it would split the ecosystem. You could easily end up with libraries that only compile on 32-bit platforms, others that only compile on 64-bit platforms, etc.

4 Likes

The Linux kernel extensively uses u32 and i32 for sizing, so any code interacting here needs to constantly cast between these.

The real solution is to allow using u32 for sizes and indexes.

The blocker is that adding such API breaks legacy code that relies on Index<u32> and such not being implemented, for type inference.

3 Likes

The rustc supports MSP430 as a tier 3 target which is a 16bit CPU. As a language Rust supports architectures wth 16bit pointers and that's why we have impl From<u16> for usize. But many(if not most) libraries doesn't care much about 16bit target so you need to be extra careful when using libraries on such chips. Note that 128bit or bigger pointers like CHERI ISA may be supported later.

2 Likes

This feature request has died waiting for portability lints:

2 Likes

That's already possible: I can write static assertions in the code which cause a compilation error if pointer size is wrong. I believe it was possible to do since 1.0, but now that panics in consts are stable it's trivial to do.

It's not great, but lots of stuff depends on pointer width, particularly in FFI, and I'm not testing on 16-bit platforms anyway, so there are bound to be errors even if I didn't forbid this pointer width.

The thing you propose is currently impossible: stdlib is distributed in compiled form, so that all cfg directives are evaluated, and thus only one of your impls could exist on all platforms. I believe there is some talk about introducing features which could make your approach possible, but it's not something you should expect soon.

Not only that's possible, but std actually does it. Only it creates infallible TryInto instead of Into.

It could have easily created Into, instead, but I don't think it's good idea (as explained by others).

1 Like

My point wasn't that it's currently impossible to write non-portable code. Of course it's possible.

My point was that making something as basic as the existence of From<u64> for usize platform-dependent would make it much more likely that the code would be accidentally non-portable (and it could go both ways, 32->64 and 64->32 portability).

2 Likes

I argue that the kind of code which would be non-portable with impl From<u32> for usize is already non-portable, just silently so. People would just use usize::try_from(x_u32).unwrap() and crash at runtime. If anything, this kind of non-portability is much harder to find and fix: if an impl isn't found on a different platform, your code doesn't compile and you fix it. But how would you search for all those runtime panics?

Even worse, people could directly use bitwise or arithmetic operations on usize in a way where lack of panics or even core correctness depends on pointer width.

This is not non-portable code. This code works on 16-bit systems if the data structure fits in memory.

If the unwrap fails, this would indicate you don't have enough RAM for whatever data structure you're trying to index, in which case crashing is the correct behavior.

2 Likes

Speaking specifically about usize::from(u32), I think any worries about its (non)portability are completely irrelevant. It is just a theoretical consideration that has no practical value.

Programming for 16-bit targets requires special considerations that are way deeper than fixing .into() calls. The idea of taking code written for 32- or 64-bit platforms that never explicitly considered 16-bit targets, and using it without any changes on a 16-bit platform, is a pure fantasy. 4-byte integers on such platforms are extravagantly large and expensive*. There are very few Rust crates that can even fit on such architecture, and typical Rust code can easily eat more than 64KB just for stack space alone.

When a 16-bit platform is a serious target, such code needs to be redesigned to work with 8 or 16-bit integers instead. Having needless use of 32-bit integers fail to compile on 16-bit platforms is a benefit.


* Imagine that every u32 was 1 megabyte in size, and every try_into().unwrap() added ~12MB of code to the executable. This are the proportions if you rescale overhead on a 64KB machine to a 16GB machine

10 Likes

Check out the mz_ore::cast module in Materialize (materialize/src/ore at main · MaterializeInc/materialize · GitHub). While most of Materialize is under a proprietary license, ore is open-source (Apache 2).

It defines a trait CastFrom which operates like From, but is defined on more types (e.g., we impl CastFrom<u32> for usize on 32- and 64-bit platforms).

1 Like

I disagree -- even if true that all 16-bit code requires special care, you still want to be able to test the same code on 64-bit platforms, so there is value in not splitting the ecosystem between 16-bit and other.

2 Likes

This is all speculative. What non-trivial projects actually support both 16-bit and 32-bit platforms without 16-bit-specific #[cfg]?

16-bit platforms require a completely different coding style. An idiomatic 16-bit code would use fixed memory layout, which means mostly global variables, avoiding even use of stack. The same code would be in a bad style on larger platforms. I can't imagine a crate not being either too bloated for 16-bit, or too weirdly overoptimized and feature-limited on larger platforms at the same time. Any project spanning these different worlds would at least need a few #[cfg]s and could be expected to do a compile test for 16 bit if they're taking it seriously.

3 Likes

I don't see why global variables would be better than local variables on the stack on 16-bit machines -- using local variables lets you reuse memory better, so it seems even more important in a memory constrained environment.

But even if you use global variables, this doesn't in any way seem to prevent doing what I said, namely unit testing this code on a 64-bit development machine.

Of course, but why not both?

1 Like

We're getting way off-topic here, but it's because you may have as little as 256 bytes of stack. It's difficult to control exact stack usage of each function (it depends on the optimizer), and it's difficult to know and limit maximum stack use throughout the program (it's unsolvable in general case), so the safest bet is to be as conservative as possible.

Existence of additional From impls does not prevent testing of 16-bit compatible code on a 64-bit machine. It only creates a possibility of writing code that works on the 64-bit machine, but won't on the 16-bit machine. But that's already a possibility, e.g. float won't work (at best you'll get software-emulated one that will immediately make you blow all size and speed budgets). The 64-bit machine will have almost infinite RAM and stack space, so whatever you write, even if it type checks, may not fit on the actual target.

There's very little you can test on a big host, because code for microntrollers won't have OS abstractions, and will do its work by bit-banging hardware registers (absolute memory addresses). You'll need at least an emulator to see what poking the memory does, compile for the real target to know the code size and RAM use, and most likely still need actual hardware for checking the timing or interaction with the real hardware.

MSP430 chips that Rust can theoretically target can have 2KB of RAM. You're not going to run tokio on that. This won't even fit text of error messages from serde. When you have 2KB of RAM to work with, you'll probably know by heart what every byte you use for.

Look at this guide for MSP430 optimizations: https://www.ti.com/lit/an/slaa801/slaa801.pdf it celebrates reduction by 200 bytes as a massive win. It advises against using functions, because just calling a function is expensive at that scale. It's an environment where even C is bloated and needs special care. It's possible to write Rust for that, with zero-cost abstractions very carefully used to be really zero cost, but it's unrealistic to expect to be able to use general-purpose crates.io crates for big machines, without any changes.

8 Likes

Now you're making a portability claim, whereas previously you were saying that portability is irrelevant, which is what I was disagreeing with.

Is this true though? Adding a trait impl can stop code from compiling. Perhaps it's true or close to true in this case because From already has a lot of impls anyway, I don't know.

This is a strawman, at least in response to what I said, because I never suggested anything like running tokio or serde on a 2KB MSP430.

It also seems unfounded to assume that on 16-bit platforms we are limited by 2 KB. They can have up to 64 KB (including MSP430).

1 Like

I don't think discussing chips with 2K RAM is useful here. At this point why would you even use a high-level language? You should write assembly directly. Your entire program will likely be smaller than some hand-optimized assembly cryptographic functions.