Unnecessary performance penalty for mem::MaybeUninit?

I tried to write some code to make a late initialization of an stack allocated array:

pub fn create_array_nocopy2() {
    println!("fn create_array_nocopy2()");
    let a = {
        const NUMBER_OF_ARRAY_ELEMENTS: usize = 1000;
        type ContentType = ContentI32;

        let mut uninit = mem::MaybeUninit::<
            [ContentType; NUMBER_OF_ARRAY_ELEMENTS],
        >::uninit();

        let unit_ptr = uninit.as_mut_ptr();

        for index in 0 .. NUMBER_OF_ARRAY_ELEMENTS {
            unsafe {
                std::ptr::addr_of_mut!((*unit_ptr)[index])
                    .write(ContentType::new(content_from_index(index)));
            };
        }

        // SAFETY: Everything is initialized.
        // Performance impact: Unnecessary memcopy occurs here
        // Performance impact: Unnecessary stack space allocated
        unsafe { uninit.assume_init() }
    };

    print_array(&a);
}

For the discussion it is unnecessary to know, what type ContentI32 really is, (however the complete code is avalable at compiler explorer Example code.
If I look at the assemble output I made the following observations:

example::create_array_nocopy2:
        push    r14
        push    rbx
        sub     rsp, 2008 ; Allocation of 1000 unnecessary bytes

and later (comes from unsafe { uninit.assume_init() })

        mov     edx, 1000
        mov     rdi, rbx
        call    qword ptr [rip + memcpy@GOTPCREL] ; Unnecessary memcpy

What is reason about this? Shouldn't unnecessary moves be removed by the compiler?

I don't know why this happens, but one trick you can use to force the compiler to eliminate the move is to define a type like this:

struct InitArray<'a, T> {
    ptr: NonNull<T>,
    len: usize,
    _lifetime: PhantomData<&'a T>,
}

impl<'a, T> InitArray<'a, T> {
    pub unsafe fn assume_init_slice(arr: &'a mut [MaybeUninit<T>]) -> Self {
        let len = arr.len();
        let ptr = arr.as_mut_ptr();
        Self {
            ptr: unsafe { NonNull::new_unchecked(ptr.cast()) },
            len,
            _lifetime: PhantomData,
        }
    }
    
    pub unsafe fn assume_init_arr<const N: usize>(arr: &'a mut MaybeUninit<[T; N]>) -> Self {
        let ptr = arr.as_mut_ptr();
        Self {
            ptr: unsafe { NonNull::new_unchecked(ptr.cast()) },
            len: N,
            _lifetime: PhantomData,
        }
    }
    
    pub fn as_mut_ptr(&mut self) -> *mut T {
        self.ptr.as_ptr()
    }

    pub fn as_ptr(&self) -> *const T {
        self.ptr.as_ptr()
    }
}

impl<'a, T> std::ops::Deref for InitArray<'a, T> {
    type Target = [T];
    fn deref(&self) -> &[T] {
        unsafe {
            std::slice::from_raw_parts(self.as_ptr(), self.len)
        }
    }
}
impl<'a, T> std::ops::DerefMut for InitArray<'a, T> {
    fn deref_mut(&mut self) -> &mut [T] {
        unsafe {
            std::slice::from_raw_parts_mut(self.as_mut_ptr(), self.len)
        }
    }
}
impl<'a, T> Drop for InitArray<'a, T> {
    fn drop(&mut self) {
        unsafe {
            let slice_ptr = std::ptr::slice_from_raw_parts_mut(self.as_mut_ptr(), self.len);
            std::ptr::drop_in_place(slice_ptr);
        }
    }
}

playground

2 Likes

As an aside, your code would currently leak destructors if any of your writes panic. Something like this can be used to avoid that.

2 Likes

Hi alice,

thanks for the answer. As far as I understand the InitArray is just used to convert the reference from the MaybeUninit<[T; N]> to a reference to the target [T; N] and then dereferencing it. Nice! However this is exactly what I expected from `MaybeUninit::assume_init() itself.
So my questions remain the same ase before:

  • What is reason for this (see original post)?
  • Shouldn't unnecessary moves be removed by the compiler?

Conceptually they are very different. InitArray (as the name implies) keeps array in the same place, while MaybeUninit::assume_init() returns copy.

Of course. But in your code it's not “unnecessary move”. Instead it's “unnecessary object” (your uninit variable). It's much harder for the compiler to eliminate “unnecessary object” than “unnecessary move”.

And if object is not eliminated the move is, in fact, necessary.

I don't know why you get a redundant memcopy, but note that you likely have UB in your code. Here:

you are dereferencing uint_ptr: *mut [ContentType; N] and calling the safe indexing operation on it. Calling x[idx] is equivalent to *Index::index(&x, idx), so you are creating a safe reference to an uninitialized array, which is UB. This is unlike addr_of_mut!((*p).field), which would directly create a raw point to a place corresponding to field, without any temporary references.

The proper way to initialize your array would be to do something like

unit_ptr.cast::<ContentType>().add(index)
    .write(ContentType::new(content_from_index(index)));

However, this doesn't affect the missed optimization that you hit.

My understanding is that indexing an array uses an intrinsic rather than going through the Index trait, making it ok in this particular case.

Note that MaybeUninit::assume_init internally calls ManuallyDrop::into_inner, which seems to be the culprit. As you can see, simply creating and unwrapping a ManuallyDrop<[T; N]> ends up with a redundant memcopy (call rbx calls memcopy). I don't know why it can't be eliminated.

It may be so, but it's dangerously brittle and relies on an implementation detail. I don't know of any authoritative reference which would say that this operation is legal.

Because LLVM is not a Rust compiler, essentially.

It's not redundant. It copies from x to a.

I think disconnect here comes from your inability to think like a compiler. You see that first there are some calculations done with x, then we move from a to x and continue from there.

And it's even easy to see that x is not used after move (which makes it easy, in your mind, to merge x and a which makes memcpy redundant).

But that's not what LLVM is seeing! What LLVM is seeing is bazillion operations on x (and these operations send address of x to other places in your program many times) and then bazillion more operations on a (and these work with addresses, too).

To “eliminate memcpy” is would have to eliminate difference between x and a and to do that it would have to prove that nothing in your program relies on x and a being different.

If you recall that to prove that it needs to prove that code which initially worked with x haven't stashes addresses of these parts of the x anywhere… it's hard. Worse: because it's supposed to happen on the LLVM level compiler can not rely on usual properties of Rust references. It works with pointers (albeit annotated with some aliasing tags).

Frankly, I'm surprised that compiler can, sometimes, eliminate these variables and memcpys (when you are not dealing with arrays and thus don't have any loops) more then by the fact that it couldn't do that in your case.

Here's a bug I have opened about this:

People are working on it, though.

3 Likes

You've written lots of words, but the conveyed information is just "the compiler works in mysterious ways". That's not helpful. It doesn't help in any way to fix or to workaround this issue.

As this example shows, the missing optimization is entirely unrelated to any kind of hard escape analysis, and is caused just by using ManuallyDrop. Creating and unwrapping it is simply initializing and moving out of a struct. It's as primitive operation as could be, and it is quite shocking that it's not optimized away.

3 Likes

I think you are right under certain circumstances. As far as I understand, we cannot rely on some intrinsics, because they may be only available to specific plattforms. Also note that your solution will also be UB. The reason for that is that have to think about arry initialization if several different steps:

  1. Allocate the needed memory on the stack
  2. Initialize the array internal structure (which on typical CPU architectures is a Noop)
  3. Initialize all the elements

The 2. point may be problematic, if for some CPU the array has not a standard layout. If may for example need to contain the size of the array on the first memory address of the allocated memory (not that I know of any architecture where this is necessary). And I am also not shure, if Rust supports any a-typical memory layout of an array.
However, if this is not the case, alice is right, because nothing undefined can happen here.

... and it is shocking, that ManuallyDrop does not fulfill its own requirments. From the documentation:

A wrapper to inhibit compiler from automatically calling T’s destructor. This wrapper is 0-cost.

The requirement is definitively not met.
The next one can be found in the documentation of ManuallyDrop::take:

Whenever possible, it is preferable to use into_inner instead, which prevents duplicating the content of the ManuallyDrop<T>.

This is also not met as you pointed out.

@scottmcm : Thanks for the information. Hope it will get solved in the near future.

The intended meaning of this statement is about what happens in an ownership-sense, and it is met in that sense. The into_inner method does prevent you from accessing the original ManuallyDrop<T> after your call to into_inner.

Alice basically already said this, but I think the goal of that statement is to contrast MaybeUninit with Option -- in a sense, MaybeUninit is UnsafeOption. Or, equivalently, Option is SafeMaybeUninit, as it keeps the extra bit of state to know whether there's an initialized value or not. And thus the zero cost is about not having that extra bit of state.

4 Likes

Just noticed this part of your post.

What this means is not about what the codegen does, but about how future code can interact with the value.

Namely,

unsafe {
    md.take();
    md.take();
}

will compile, so it's something you have to be careful not to do.

Whereas

unsafe {
    md.into_inner();
    md.into_inner();
}

will not compile, because it's a self method.

Thus the "duplicating" is in the sense of "avoiding double free" mistakes, not about "avoiding memcpy". (As well as the general "use the safe method if you don't need the unsafe one".)

So here are some remarks about the posts in this discussion.
In the Rust environment some terms means something completely different from what I normally see:

  1. Avoid duplicate: This does not mean avoid duplicate but avoid something in the sense of avoid double free. So it can duplicate its content 100 (deliberately exaggerated) times and its ok.
  2. 0-cost: does not mean 0-cost, but is restricted to the internal compiler handling.
  3. Performance: This also has a different meaning compared to other languages. If I look at the home page:

Rust is blazingly fast and memory-efficient: with no runtime or garbage collector, it can power performance-critical services, run on embedded devices, and easily integrate with other languages.

So memory-efficient does not mean: Use as little memory as possibe but something different, which is alsway defined use as little memory in the context of ownership: A term that is not defined.

From an endusers point of view this is not understandable and from my point of view it is just frustrating.

So the conclusion from my side is the following:
The original questions where:

  1. What is reason about this?
    Finding unnecessary memory copies is hard. It is not supported by LLVM.
  2. Shouldn't unnecessary moves be removed by the compiler?
    It would be fine, if it is possible. However see 1. and by the way all the requirments are met (in the context of ownership).

I completly understand 1. and the first part of 2., I cannot accept redefinitions of terms however.

As a side node: I looked for the term "zero cost abstraction" as a general goal for rust, but was only able to find it in certain contexts. MaybeUninit a good example where this principle is broken: It is easy to write a function as in the example that is much better in memoy consumption and runtime performance.

Can you explain where you see them? This would help to understand what and how you expect.

0-cost means the same thing it always meant: certain thing is guaranteed to be eliminated during the compilation time.

It never meant “everything you can imagine would be removed during the compilation time”. This term was used by compiler writers before Rust was even a dream.

Rust continue to use it for the same things as it was used 20+ years ago.

Welcome to the real world! Talk to car repairers or brickbuilders and you would find out they, too, use certain words differently.

That's just how our world works. Again: I would love to see context where world work like you would like them to work. No, really. Ada, C++, C#, Haskell, Java, JavaScript, Python, Ruby… all these use words and word constructs similarly to how Rust does that and for similar language constructs.

If some higher-level (or level-level) language uses them differently then I want to know more about such language.

Because it's only holds in certain contexts. It originated from C++ and it was always a goal, not deliverable. In fact the guy who invented this term was also they guy who wrote a C++ benchmark which showed people how far real compilers are from delivering on that promise.

It drove me similarly mad when I was in college because 8x slowdown haven't looked like “zero cost” to me. But over the years Stepanov's benchmarks was a guiding light which drove the compiler's development. And today it's execution shows that price of abstractions used in that benchmark went from “something between 1.08 and 8.0 but typically around 2.0” to “almost always 1.0”.

But that doesn't mean that other features suddenly have become instantly zero-cost.

It's not as much as redefinition but more of clarification. Zero cost, as you correctly noted, never meant “as cheap as possible”. It always means that some, specific, narrow defined, thing is zero.

E.g. the first time I've heard "zero cost" was in context of C++ exceptions (not sure if that was where it was first used for real or not). It's the same mechanism that Rust uses today and it means that zero code is added to handle exceptions in the code. It's all done with tables which are not executed if exceptions are not thrown.

Does it mean our code is now as fast as possible? No, they can still affect optimisations and people know about these.

Situation with MaybeUninit is very similar: certain thing is guaranteed to be zero-cost, but it may affect optimizations (as you observed).

What's the difference?

I mean: I can understand the frustration someone feels when s/he finds out that certain terms mean, in IT industry, not what layman may expect they would mean. But it's not as if Rust, suddenly, started using them differently from how other languages are using them.

2 Likes

There's a reason I commented on the duplication comment and not this one, and I find this to be more of a gray area where I partially I agree with your assessment. On one hand, the fact that the copy is not optimized away even in simple cases seems like an optimizer bug. However, on the other hand, you can never really promise that some optimization will always happen, so statements about zero-cost will always be a probabilistic statement about what optimizations are likely to happen, and are never a true guarantee that it never has any cost.

No, in this statement, memory-efficient means what memory-efficient normally means. It means that in general, Rust will not use an excessive amount of memory. That said, I would like to be clear that it does not mean "use as little memory as possible" as you seem to claim — such a guarantee would be completely impossible in any programming language.

Again, in this particular instance, the optimizer isn't doing something it probably should be doing. But apparently there is some edge case for ManuallyDrop where the compiler fares poorly. However, the compiler is usually pretty good at these things.

Sorry for my late reply.
First I have to mention that I develop SW for many years now (since the time a commodore 64 was in). So I absolutly do not have any illusions about SW :grinning:
I know that every SW has bugs. I know that every SW has mis designs. I also know that even if a better solution is sometimes possible, it is not taken, because of complexity, not enough time and many other reasons.
So it is not my expectation that all things are perfect.
On the other side there are things that I am not very happy with, and they have nothing to do especially with Rust, but they are also popping up in my working environment.

  1. Redefinition of terms: (@VorfeedCanal wants to see where this occurs)
    1. Avoid duplicate: In the documentation to assume_init_read, I read that assume_init prevents duplication of the content of MaybeUnit<T>. However as @alice pointed out:
    So prevents duplicating does really mean something different compared with my understanding of the term.
    2. The second finding is 0-cost which can be found in the documentation of ManuallyDrop. I quote the sentence here: This wrapper is 0-cost.. As @afetisov points out there is a redundant memcopy in the code. And thanks to @scottmcm, there is also a bug report. By the way: Thank you for this!
    3. Memory efficiency: I already quoted, what Rust itself says to the topic. I get back from @VorfeedCanal that I should not be a dreamer: Really?!
  2. Documenation: So what do I expect: First of all a thinking about what terms really mean. English language is vague, so misunderstanding is preprogrammed. Nevertheless if terms in the rust-environment are used different in a systematic way (see the cite from @alice) there is the necessity to define it much more precise or even better use another term. It is always the easiest way to adapt documentation to the reality. I do not say that this is always the best solution.
  3. Code Adjustments: Accept the behaviour as a bug. I do not expect that bugs are fixed at once. Perhaps they never will be fixed because of either technical reasons, low priority, complexity, ... That is life

Looking at the previous posts, I still have a feeling of uncertainty: is the problem described a bug? Is this a new requested feature? Is the documentation insufficient here?

@alice

Yes your are absolutly correct: While I am not a compiler expert I understand your point absolutly. On the other hand, I remember a talk by Chandler Carruth(?) in which he explained compiler optimization in more detail for C++ based on LLVM. Thereby he came across a subotimal behavior. The result: a bug report.