How to convert &[i8] to &[u8]?

whoif · March 20, 2018, 9:04am

I tried to use
let u8slice : &[u8] = i8slice as &[u8];
But it complains

an as expression can only be used to convert between primitive types. Consider using the From trait

I don't know how to "use the From trait" to fix this problem.

whoif · March 20, 2018, 9:14am

OK, I use the following code to fix it.

let u8slice : &[u8] = unsafe{ slice::from_raw_parts(i8slice.as_ptr() as *const u8, i8slice.len()) };

vitalyd · March 20, 2018, 10:08am

std::mem::transmute is another option here.

aschampion · March 20, 2018, 5:46pm

The std:mem::transmute docs recommend something like this instead of transmute:

let u8slice = unsafe { &*(i8slice as *[i8] as *[u8]) };

vitalyd · March 20, 2018, 6:05pm

Yeah you can do:

let u8slice = unsafe { &*(i8slice as *const _  as *const [u8]) };

Too noisy for my liking

leonardo · March 20, 2018, 8:06pm

I think Rust+std should offer a way to perform this conversion without the need of unsafe code.

scottmcm · March 21, 2018, 11:55pm

There are a bunch of examples of conversions that are safe with as but stop working behind references or aggregates or similar. For example, &i8 <-> &u8 and [i8; 2] <-> [u8;2] and Vec<i8> <-> Vec<u8>.

There's some conversations going on around this area on IRLO:

Spring · March 23, 2018, 2:34pm

let mut v: Vec<u8> = vec![0, 0, 0, 4, 240, 159, 146, 150];
let mut v_slice = &mut v[2..5];
let mut v_slice = unsafe{ &*( v_slice as *mut [u8] as *mut [i8] ) };
println!("{:?}", v_slice);

DevNull · June 28, 2022, 5:56pm

Is there still no way that avoids unsafe? (Its been 4 years till now)

H2CO3 · June 28, 2022, 6:44pm

No, converting between non-identical types via pointer punning is ihnerently unsafe. It's not a matter of time, it's a matter of principle.

(As an aside, please don't revive 4-year-old threads.)

afetisov · June 28, 2022, 8:33pm

Using mem::transmute for slice conversion is unsound. A &[T] is a fat pointer consisting of a pointer to data ptr: *const T and the length len: usize. However, there are no guarantees on the layout of those two fields, so they can come in different order or even with different alignment for different types &[T] and &[U], even if T and U are themselves layout-compatible. Transmutation doesn't care about the semantics of the type, it does a blind reinterpretation of underlying bits.

The only proper way to convert between slices is to use the

slice::from_raw_parts(slice.as_ptr(), slice.len())

idiom mentioned above.

Similar considerations apply to any generic type with a #[repr(Rust)] (default) layout, e.g. Vec<T>. Even if T and U are layout-compatible, you cannot just transmute between Vec<T> and Vec<U>.

Some generic types may be declared with a fixed layout, e.g. #[repr(C)] or #[repr(transparent)]. In this case blind transmutation will not mess up the fields, but you still need to make sure that the fields themselves may be converted via a transmutation. E.g.

#[repr(transparent)]
struct Foo<T>(Vec<T>);

cannot be transformed via mem::transmute, since the underlying Vec<T> doesn't support it.

It's not a matter of principle, it is a matter of missing API. There is nothing unsafe in a conversion between &u8 and &i8, or &[u8] and &[i8]. The bytemuck crate provides safe API for a number of such conversions.

VorfeedCanal · June 28, 2022, 9:16pm

True for most CPUs (in fact I don't know any where that wouldn't work). But conversion between &f32 and &i32 is unsafe on this very exotic platform.

This is because both 8086 and 8087 are attached to the data bus yet work asynchronously. Which means that you can write to &mut f32, convert it to &f32 (that's just reborrow) and then &i32 (not currently supported) — and your program wouldn't work because there's nothing in the code which would prevent 8086 from reading i32 before 8087 wrote f32 in that place. Oops.

Note that one would probably never want to run Rust-compiled program on MS-DOS, but there are other similar platforms (like M68k or early ARM CPUs) which may be interesting to retro-computing guys.

Well… I guess on some platforms, where you know such conversions are safe, you can use these. But I see why they wouldn't want to open that can of worms by adding things like these to the language or standard library.

afetisov · June 28, 2022, 9:32pm

This is the first time I hear about such a bug. Where could I read more about it?

This is the kind of issue that I would expect the compiler to deal with, although I don't know whether it can see through mem::transmute. On the other hand, why wouldn't it? Anyway, if what you are saying is true, then how did the Quake 3 fastsqrt algorithm work? Was it just random luck that the errors either were unobservable or within the error tolerance?

Also, how does f32::to_bits work? It is just a mem::transmute under the hood.

VorfeedCanal · June 28, 2022, 10:26pm

Why do you consider it a bug? Everything works according to the specs.

On bitsavers, I guess. Or just read the excerpt here.

How would it know when to insert ESC? Adding it before every memory access is not practical, you know.

Because it's huge amount of complexity which is not needed for C/C++ (it's an UB to do such conversions in C/C++).

It's not the question of whether they within the error tolerance: 8087 would either write the result or it wouldn't be able to do that.

But this algorithm would work on most CPUs because it first does integer calculations and then sends data to FPU. Because FPU were, generally, coprocessors, and couldn't start calculations before CPU this ordering works (and compilers weren't clever enough back then to destroy everything by optimizations). If it were the other way around (first FPU, then CPU) then it would be been a problem.

In addition, by the time Quake 3 have been written CPUs and FPUs were merged (68040 and 80486 were CPUs where that happened on desktop, ARM story is more convoluted, but less relevant to Quake 3).

But it doesn't transmute reference or pointer. Thus it's enough for the compiler to move value from CPU register to FPU register. Of course any sane compiler would know how to do that. C++'s bit_cast does the same thing and has the exact same property: it's safe to transform int32_t to float and back, but it's not safe to transform int32_t* to float* and back.

Which puts Rust and C++ compiler on the same page thus Rust can just use LLVM's support for that operation.

armanriazi · June 28, 2022, 11:17pm

If 'as' does not work you can Implement Into

afetisov · June 28, 2022, 11:51pm

It would certainly be a bug in the end-user program.

That function accepts a float which would be arbitrarily modified prior to the call. Then it transmutes that float into an int, does its calculations and translates it back into a float. If float-int conversion would cause synchronization issues, that function would be incorrect, no? I don't think I understand what you are saying. Are you saying that it would be incorrect if its argument was float * instead of float? Do you mean that the bug would happen on actual original IBM PCs and not on contemporary hardware which is colloquially called "pc"?

Why wouldn't the compiler insert proper synchronization when it knows that int * is cast to float *?

Cerber-Ursi · June 29, 2022, 3:01am

The float, not the pointer. The number itself is shuffled around, not its address.

VorfeedCanal · June 29, 2022, 3:52am

Not necessarily. That's the issue with undefined behavior (and the reason Rust tries to ensure only unsafe code may lead to undefined behavior): technically incorrect program may work for years (and then would be broken with a simple compiler upgrade of even just change in libraries used).

Technically it was never correct. Type punning is undefined behavior in C/C++.

I'm explaining why it worked in practice.

No. I'm saying that this function is already incorrect, but works in practice. Type punning is undefined behavior in C/C++, but that doesn't mean compiler is obliged to break any program that tries to use it. And on some CPUs with some compilers it works reliably.

Yes, that too. To trigger that bug in real hardware you would either need quite advanced compiler (which may exploit the fact that function always leads to undefined behavior which would allow it to optimize at away) or you would need old hardware and still very unusual compiler.

Microsoft compiler passes floating point values on stack and issues ESC to ensure that float reaches memory before it calls the function. Which means that function would work even on original PC.

TopSpeed sounds promising: it passes arguments on 8087 stack, but then it also generates a lot of code thus I'm not sure it's possible to coax it to break that code.

And the fact that you need original PC to trigger that issue means it was never the problem for Quake 3: computers which you can realistically use to run Quake 3 all had merged CPU and FPU thus that problem become purely theoretical one for a time (till compilers learned to exploit these things for optimizations, but that's entirely different story).

Heck, you couldn't even trigger it with emulator, because all “precise” PC emulators only support 80287+! Specifically because it's nightmare to try to support that asynchronyous-yet-not really behavior in software (technically as specified 8087 can write to memory at any time after you start the calculations, but real 8087 would do that after known number of ticks and there were programs which actually used that!)

Because this would make correct program slower.

C doesn't forbid you to convert int * to float * or back. Any pointer can be converted to any other pointer. You just have to convert it back to proper type (actually any similar type) before use. This is because originally C had no void type and thus such conversion weren't unusual.

And if you use pointers with accordance to the rules (convert int * to float * temporarily, only to return it back to int * before dereferencing it) there are no need to add any synchronization.

Note that i8 and u8 types (and even i16 and u16 types) are, technically, declared “similar” thus arrays of i16 ints can be accessed as arrays of u16 ints. But i32 and float and not “similar” thus you can not convert array of i32 into array of floats (or back).

Unfortunately that function actually plays with pointers. And thus may lead to problems, in theory. In practice it's not easy to trigger, but Wikipedia even have whole article dedicated to that function and it has separate section which discussed why that function is not a valid ANSI C and how to rewrite it to make it correct C++.

Rust pushed all that complexity into third-party crate for now which is, IMO, the right decision if you consider the fact that so few developer understand all these issues in C/C++ and because there are no consensus about what we do want to support in Rust.

H2CO3 · June 29, 2022, 6:24am

There is plenty of unsafe in converting between references to differing types. It's basically circumventing the type system. The fact that it can sometimes be done soundly does not mean that it is in general safe, nor that it is easy to generalize usefully and 100% correctly. The "missing" API is a sign of that.

Yes, technically it was random luck. Or more exactly, optimizers of the day likely weren't smart enough to break the code.

Type punning in C is only permitted via unions (which is by-value, and is thus the equivalent of transmute in Rust); casting between pointers to incompatible types is UB. (The exception is casting to a char * to read the raw bytes of any object.)

VorfeedCanal · June 29, 2022, 11:16am

Note that this means that char* is treated very differently from char8_t*. This is, somewhat convoluted, yet very telling example:

void foo(int* sum, char src[], std::size_t size) {
    for (std::size_t i = 0; i < size; ++i) {
        *sum += src[i];
    }
}

void bar(int* sum, char8_t src[], std::size_t size) {
    for (std::size_t i = 0; i < size; ++i) {
        *sum += src[i];
    }
}

Code generated for these two functions is radically different exactly because char* can be used to access objects of other types.

Topic		Replies	Views
[u8; 8] to two [u8; 4]	19	1743	March 11, 2022
Safe way to cast &[u64;8] to &[u8;8]?	6	1140	February 9, 2022
Is This The Right Way to Transmut &mut [u8] to &mut MyType?	7	541	January 7, 2021
Convert slice &[u8] to &[u8; 4]	2	11330	November 19, 2021
What is the proper way to convert a &[u8] into &[u32]? help	2	1199	January 12, 2023

How to convert &[i8] to &[u8]?

Related Topics