Any_to_u8 O(1) time?


#1

I think I asked this before, but can’t dig up the topic.

I have the following code:


pub fn any_to_u8<'a, T>(v: &'a Vec<T>) -> &'a [u8] {
    unsafe { std::slice::from_raw_parts(v.as_ptr() as *const u8,
                                        v.len() * std::mem::size_of::<T>()) }
}

Is this function guaranteed to run in O(1) time?

Based on from_raw_parts, the answer seems to be yes:

#[inline]
#[stable(feature = "rust1", since = "1.0.0")]
pub unsafe fn from_raw_parts<'a, T>(data: *const T, len: usize) -> &'a [T] {
    debug_assert!(data as usize % mem::align_of::<T>() == 0, "attempt to create unaligned slice");
    debug_assert!(mem::size_of::<T>().saturating_mul(len) <= isize::MAX as usize,
                  "attempt to create slice covering half the address space");
    Repr { raw: FatPtr { data, len } }.rust
}

However, I would like to double check.


#2

Well, I’m pretty sure it’ll be O(1) time because you aren’t copying any data, and are just assembling a slice from the pointer/length. On a side note, I can’t check right now, but Vec also has a capacity function, which is the size of all the memory it has actually allocated, not just the memory that’s being used, so that’s something you may want to keep in mind.


#3

There’s no guarantee it’ll be O(1) time since compiler and processor operations could theoretically make it faster on some inputs than others. But it’s about as constant-time as you’re going to get without dropping into assembly, so I would say yes.


#4

O(1) absorbs constant factors.

I don’t see how anything you claim above is an argument against it being O(1).


#5

If there’s an upper bound (which there is), then it’s O(1).


#6

Fair. I wasn’t thinking straight with my comment on O(1), thanks. In any case it willbe very near constant time.


#7

Please note that this code allows reading uninitialized memory from safe code, and is thus unsound. Please don’t use it as-is.


#8

Why do you believe this? The code uses len, not capacity. Assuming your claim is true, how would you fix it?


#9

There may be padding in the struct.


#10

I’m on mobile so won’t make a play demo, but look at the last byte of (u16, u8) – it’s an LLVM undef.

There’s no nice fix. To be safe it would need something like bounding on an unsafe trait NoPadding or something, but that’s of course impossible to implement soundly today for repr(rust) because rust makes no layout guarantees there. (If you only need it for things like u32 or [u8; 3], though, it might be fine.) The other not-great fix is to have it return &[MaybeUninitialized<u8>], forcing the caller to prove things later – but I doubt that’s helpful to them. (Though it, interestingly, does allow safe code to (soundly but uselessly) write the padding bytes.)

These are general instances of the two usual ways to fix unsound code:

  1. make your caller promise something stronger, or
  2. return something that can’t be used in the problematic way without additional promises

#11

In summary, and in answer to your original question, if the Any item does not contain any padding or uninitialized fields, then your Any_to_u8() runs in O(1) time, generally without triggering UB. For any other case all bets are off (or as we say in the States, “your mileage may vary”).


#12

I don’t believe this is true. I have a Vec, where T is repr© and encodes a bunch of OpenGL Vertices data. Even in the case of padding, the locations computed by offset_of still has correct values (and the uninitialized padding data, if existent, is irrelevant as OpenGL uploads but does not read it.)


#13

And if the processor traps on attempting to read uninitialized memory (e.g., Itanium), then what?


#14

I’m using wasm32. Is there a real problem here, or are people being pedantic?


#15

You loose guarantees of the compiler as soon as your code invokes undefined behavior (as soon as it can read undefined memory in this case). Everything might be fine, but it could also break in very strange and unpredictable ways.

See posts like Undefined Behavior is Really Undefined and With Undefined Behavior, Anything is Possible. These posts reference C/C++ but the details with regard to undefined behavior are similar since Rust is backed by LLVM. TLDR; the compiler optimizes with the assumption that undefined behavior is never invoked, so if it is, the compiler optimizes with the assumption that your code does not run, and changes to it won’t do anything.


#16

Let me see if I get the high level points correctly:

View 1: (my view): reading uninitizlied memory = I get garbage values; but nothing else bad happens

View 2: (your view): reading uninitialized memory is completely undefined; it is perfectly valid for LLVM to do rm -rf ~ when the code asks to read uninitialized memory

I.e. a single uninitilzied memory read invalidates everything else.

Have I captured the point you are making?

===

If so, this is a lot more serious than I thought. In this case, is there a way to tell Rust structs to “zero all padding” ?


#17

Coincidentally, there’s a contemporaneous post on IRLO that addresses this topic. The underlying problem is that rustc uses a hyper-optimizing back-end (LLVM) to generate wasm code. It is likely that, in this case and on CPUs more recent than Itanium, your approach will work. Unfortunately, in the presence of UB that can’t be guaranteed.

Many of us would like to have a safe way to get read-access to arbitrary structs as [u8]. For structs we control, we usually use #[repr(C)] and order the struct’s fields with attention to meeting the alignment requirements of multi-byte types. Sometimes we have to add dummy fields to ensure those alignment requirements are met, then remember to initialize those dummy fields as well. That’s a case where the Default trait and the compiler’s optimizations come in handy; usually the result is a very-efficient memzero of the object’s storage.


#18

So the suggested solution is:

  1. use repr(C) (which I am also doing) and
  2. when there is padding forced due to alignment due to multi-byte types, add dummy fields – this doesn’t eliminate “padding”, it just makes the padding “explicit”, which forces them ot be initialized when we construct it as Rust types

?


#19

This is such a common misconception that it’s called out explicitly in the LLVM docs. You should probably read through https://llvm.org/docs/LangRef.html#undefined-values

I’ve also talked about this before, such as in

And it’s come up in other threads, like


Reading uninitialized value vs undefined behaviour
Function/method/idiom for in-place substitution using previous value
#20

From my knowledge (maybe too little) use read_volatile can get around some UB detection/optimization.

pub fn any_to_u8<'a, T>(v: &'a Vec<T>) -> &'a [u8] {
    unsafe { std::slice::from_raw_parts(
         std::ptr::read_volatile(&(v.as_ptr() as *const u8)),
         v.len() * std::mem::size_of::<T>()) }
}

Even if it compiles to legitimate code it still risks being unsound to use.