Object copy in constructors

Hello all. I am new in Rust and can't solve one issue (I ask help on other forum and people recommend me signup here).
I try optimize my Rust code to remove memcpy. Small example:
struct Foo {
val: i32,
p: *const i32
}

impl Foo
{
        #[inline]
        pub fn new() -> Foo {
                let mut o = Foo {val: 123, p: std::ptr::null()};
                o.p = &o.val;
                return o;
        }

        fn print(&mut self) {
                print!("In foo print {:p} : {:p} -> {}.", &self.val, self.p, self.val);
                unsafe {
                        print!(" *p = {}\n", *(self.p));
                }
        }
}

impl Drop for Foo
{
        fn drop(&mut self) {
                print!("Destroy foo {:p}\n", self);
        }
}

fn main() {
        let mut l = Foo::new();
        l.print();
}

Output:
In foo print 0x7ffd9b81c300 : 0x7ffd9b81c3d0 -> 123. *p = 1705798352
Destroy foo 0x7ffd9b81c300

So as you can see there two different objects (in memory terms) on in Foo::new, and second in main. As I understand Rust make a copy (memcpy? ) a "o" from Foo::new to "l" in main. Same issues with Box (ie looks like Rust crate local object on stack, then allocate memory block in heap for Box and then copy bytes to heap). One more important issue, in example above "safe" code (Foo::new) produce incorrect pointer (p) and if some unsafe code will make write by this pointer (which is correct operation by logic) - this will corrupt stack.

Firstly, structs that contain a pointer into themselves aren't a good idea. You've already seen why, because the struct can be moved around in-memory and the pointer can be invalidated. Box gives you a guaranteed stable address, but as you've seen, it has to go on the heap, which means indirection and cache misses. That's the kind of tradeoff you have to make.

I'm not sure why you're trying to do this, but I'm guessing it has to do with directly porting some datastructure or algorithm from C/C++. That's generally where things start to go wrong. In C++ you could fixup the pointer when the object is moved in a move constructor, or just assume your class will always be heap-allocated. Rust doesn't work that way.

However, I don't know why you're trying to optimize out memcpy. Foo is small enough that it will be passed around in registers most of the time (except, by taking a pointer to Foo.val, you've prevented the compiler from performing this optimization). So whatever you're trying to do, I might suggest trying it The Rust Way first, then optimizing if it's necessary.

Foo is small enough that it will be passed around in registers most of the time

On, this, it seems clear to me that the struct presented in the question is only an example, and the struct in @zaz's actual code must be far larger.

So here's my question to @zaz: What kind of data does your struct contain that is so expensive to memcpy? Typically, the only "big" data structures that come to mind is an array [T; N]; but if you have an array that large you should be using a Vec<T> instead.

Modern CPU architectures have a lot of general-purpose registers. If it's big enough to still get memcpy'd, then yes, it does need to go to the heap.

The internal pointer is still concerning though, That is a Rust antipattern.

I use internal pointer only for track struct address (ie for debug). In first version I just add printk! in Foo::new but someone suggest structure a copy because printk! prevent inline constuctor.
So question still same - it is possible create a complex constructors for complex structures without memory copy ?

For now I not have any struct :slight_smile: Ie I not working on some large "real" project, just try understand some details how Rust work

The Rust compiler does something cool here.

Functions with large return types, such as [u64; 128] use return pointers, where stack space is preallocated in the calling scope, and then a pointer to it is passed into the called function for it to write its return value to. This is a common pattern in C. Rust just does it for you so you don't have to do anything special to get good performance. Also, since Rust doesn't have support for write-only pointers that can be used from safe code, this is really the only way to get this kind of optimization.

Note that these return types have to be really big to hit this optimization. Types that you might consider "large" or "complex" are still probably small enough to be passed around entirely in registers and never touch the stack. Modern processors have a lot of large general purpose registers at their disposal. A modern x86-64 processor has 14 word-size general purpose registers available. Some of those may be used for other things and the stack spilling point is probably a bit more cleverly chosen, but that's at least 100 bytes of storage before we have to go to the stack. And that's not counting all the SIMD registers which are larger and far more numerous.

Granted, not all of these registers can be used all the time, and the compiler might reserve arbitrary ones for different arbitrary reasons. But I'm just trying to show that this isn't really something to worry about until you profile and find that it's actually causing a bottleneck.

Did you able provide some example ? Because I modify my Foo strucure as:
struct Foo {
val: i32,
p: *const i32,
arr: [u64; 1024]
}

And still see a different memory blocks in new() and main().

The answer is actually pretty simple. It sounds like you're not compiling in release mode. In debug mode, Rust uses a lot of temporaries for various reasons, and relies on LLVM optimizations to eliminate redundancy. It also means Foo::new() doesn't get inlined, because that doesn't kick in until at least opt-level 1. Otherwise it would be as if the contents of Foo::new() were copy-pasted into main() and you wouldn't see different addresses.

If you output LLVM IR of your code, you can see the return pointer optimization being used in Foo::new(), as the function takes Foo* as its only argument, though the function name is mangled so it's a bit hard to find. If you look at the IR in release mode, you can see it writes directly to the return pointer. I recommend using #[inline(never)] on new to test this, otherwise you won't find new anywhere in the IR, it will be inlined into main (and probably intermixed with some nasty stuff from Foo::print(). I hate looking at formatting stuff in IR/assembly, it makes such a mess).

Unfortunately, the internal pointer you're using to observe the copies is still invalidated. This is because, for correctness, LLVM has to assume you wanted a pointer to Foo as it was as an on-stack temporary in Foo::new(). This forces it to create a copy. If you remove o.p = &o.val, it writes directly to the return pointer. That observer effect, it's a doozy.

I use "rustc -O" it is not enough ? Also I make some discovery with "rustc --emit llvm-ir" and found if make a "simple" initialization on "direct" return like this:
pub fn new() -> Foo { Foo { ....} }
Compiler not allocate temporary object on the stack, but if code a bit more complex:
pub fn new() -> Foo {
let o = Foo { ....};
return o;
}

Compiler provide temporary object with memcpy:
%o = alloca %Foo
....
call void @llvm.memcpy.p0i8.p0i8.i64(i8* %11, i8* %10, i64 8216, i32 8, i1 false)