Construct with own or ref

First is the sample code

use std::ptr::copy_nonoverlapping;

struct WrapTest<T> {
    v: T,
}

impl<T> WrapTest<T> {
    fn new_by_move(v: T) -> Self {
        Self::inner_new_by_move(v)
    }
    fn inner_new_by_move(v: T) -> Self {
        Self { v }
    }
    fn new_by_ref(v: &T) -> Self {
        Self::inner_new_by_ref(v)
    }
    fn inner_new_by_ref(v: &T) -> Self {
        let mut new = Self {
            v: unsafe { core::mem::zeroed() },
        };
        unsafe {
            copy_nonoverlapping(v, &mut new.v, 1);
        }
        new
    }
}

fn new_move_test() -> WrapTest<i32> {
    let a = 1;
    WrapTest::new_by_move(a)
}

fn new_ref_test() -> WrapTest<i32> {
    let a = 1;
    let tmp = WrapTest::new_by_ref(&a);
    core::mem::forget(a);
    tmp
}
fn main() {
    let a = new_move_test();
    let b = new_ref_test();
    println!("{},{}", a.v, b.v);
}

inner_func can be nested a lot, similar to this kind of initialization. Which of these two methods is better?

Rust uses Copy and Clone traits to specify if values can be trivially copied as underlying bytes or not. Your inner_new_by_ref ignores that and allows you to copy things that are not only not Copy but not Clone as well.

Vector copied this way will try to release allocated memory twice, when one copy is dropped - this will result in use after free, and so on.

Don't do that.

4 Likes

that's instant Undefined Behavior for a whole lot of types (all those for which the all-zero byte-pattern is not valid).

8 Likes

@jer New sample

use std::{mem::MaybeUninit, ptr::copy_nonoverlapping};

struct WrapTest<T> {
    v: MaybeUninit<T>,
}

impl<T> WrapTest<T> {
    fn new_by_move(v: T) -> Self {
        Self::inner_new_by_move(v)
    }
    fn inner_new_by_move(v: T) -> Self {
        Self {
            v: MaybeUninit::new(v),
        }
    }
    fn new_by_ref(v: &T) -> Self {
        Self::inner_new_by_ref(v)
    }
    fn inner_new_by_ref(v: &T) -> Self {
        let mut new = Self {
            v: MaybeUninit::uninit(),
        };
        unsafe {
            copy_nonoverlapping(v, new.v.as_mut_ptr(), 1);
        }
        new
    }
}

fn new_move_test() -> WrapTest<i32> {
    let a = 1;
    WrapTest::new_by_move(a)
}

fn new_ref_test() -> WrapTest<i32> {
    let a = 1;
    let tmp = WrapTest::new_by_ref(&a);
    core::mem::forget(a);
    tmp
}
fn main() {
    let a = new_move_test();
    let b = new_ref_test();
    println!("{:?}, {:?}", unsafe { a.v.assume_init() }, unsafe {
        b.v.assume_init()
    });
}

You can do this, I call forget

If your question is still "Which of these two methods is better?" then the answer will still be "use the first one".
It's easy to reason about, not unsafe, guaranteed to work.

2 Likes

You can do this, I call forget

If you want to keep using inner_new_by_ref - it should be marked as unsafe along with new_by_ref.

The need for all of this separation boils down a single fundamental property of Safe Rust, the soundness property:

No matter what, Safe Rust can't cause Undefined Behavior.

4 Likes

When I look at the assembly output, they compile down to the same code in release mode, so I would prefer new_by_move, since it's both less code and no need for unsafe code. Both of these (unsafe, more code) increase the chances that you'll make a mistake that's problematic (and, indeed, in your first version, you had at least 2 mistakes that would be problematic).

More generally, trust the compiler; it does a good job in release mode, by and large, and you can inspect the assembly output to see if it's done something weird that needs fixing.

3 Likes

Use val.clone() in new_by_ref(). The code you have written here is massively unsound, and, as others have pointed out, can cause a double free if you pass in a type that is not Clone. Use this code instead:

fn new_by_ref(v: &T) -> Self where T: Clone {
    Self::new_by_move(v.clone())
}

If you are concerned about performance, the compiler will optimize this into what you have right now, for types that implement the Copy trait as opposed to just Clone (integers implement this for example, and you can #[derive(Clone, Copy)] on your own types if all members implement Copy as well).

If you are absolutely positive you know what you are doing and want to keep your existing implementation, you should mark the new_by_ref() function as unsafe, and put a SAFETY: section in its doc comment, so people know that they have to drop the value.

/// Construct a new value by copying out of the reference.
/// 
/// SAFETY: You must forget the referenced value immediately after calling this function.
pub unsafe fn new_by_ref(v: &T) {
    ...
}
```i
1 Like

I call forget. Maybe use new_by_ref with macros.
For the Copy trait, if the size of T is large and the nesting like inner_func cannot be optimized, will it affect the efficiency?

It might affect efficiency; if it does, it will almost certainly do so in way that makes your current new_by_ref implementation worse than relying on the compiler optimizing moves, as done in new_by_move.

Moves are a common enough thing in Rust code that the compiler developers are always keeping an eye on how to keep them efficient; for example, the compiler will rearrange things to avoid copying padding if that increases performance. Similarly, if it can see that a moved field is never used again, it is allowed to avoid copying it; this may not happen today, but is permitted.

Given that your new_by_ref amounts to a move out of the thing referenced, you will always be, at best, equivalent to new_by_move, and at worst you will disable optimizations that the compiler can do on moves, making new_by_ref less efficient than new_by_move.

Undoubtedly any method that takes ownership through a reference, and requires the caller to call forget() to maintain soundness, should be marked as unsafe.

I would strongly suggest refactoring your code so that passing a reference to a value and then calling forget on the value is unneccessary. If that is completely impossible I would solve this problem by having a single new() function that takes a raw pointer (*const) as an argument.