Why many From trait implement as pass the array by value?

For example: the Vec::from

#[stable(feature = "vec_from_array", since = "1.44.0")]
impl<T, const N: usize> From<[T; N]> for Vec<T> {
    #[cfg(not(test))]
    fn from(s: [T; N]) -> Vec<T> {
        <[T]>::into_vec(box s)
    }
    #[cfg(test)]
    fn from(s: [T; N]) -> Vec<T> {
        crate::slice::into_vec(box s)
    }
}

Is this bad performance for large array? Because it will make unnecessary copy of the array.

I see the initial implement
impl From<[T; N]> for Vec<T> by jyn514 · Pull Request #68692 · rust-lang/rust · GitHub, there is a LengthAtMost32.
This is reasonable for forcing the user not to pass the large array.

But in https://github.com/rust-lang/rust/pull/74060 , the LengthAtMost32 was removed.

And I see [stackoverflow somelink] , it is said that large array passed by value will actually optimized to by reference . And I tested, it is not true.

the stackoverflow link is: performance - Should I pass a large array by reference or value? - Stack Overflow

How should this be implemented otherwise? To create Vec<T>, no matter how, one must copy Ts into the allocated container, anyway.

The unnecessary copy is on the stack. And Vec will allocate the memory on the heap and copy the array again.

The current implementation doesn't copy the array even single time in many cases.

2 Likes

How so? And more importantly, how else would you implement moving elements out of the array and into the vector, if not through passing it by value?

I write some code to test if the compiler will copy the array on the stack when it is passed by value.

fn main() {
	let h = Builder::new()
		.stack_size(2048)
		.spawn(|| {
			let arr = [0; 6000];
			println!("{}", arr[0]);
			//println!("ptr = {:?}", &arr as *const i32);
			call_by_value(arr);
		})
		.unwrap();
	h.join().unwrap();
}

fn call_by_value<const N: usize>(arr: [i32; N]) {
	//println!("in func arr addr = {:?}", &arr as *const i32);
	println!("{}", arr[0]);
}

I create an array on the stack, the size is about half the stack size. Once the program crash with stack overflow, it means that the array is copied. note: I run this in release build.

I find if I access the array in some ways both in the caller and callee functions will make the array copied.
For example, println!("{}", arr[0]) or cast to pointer. But code like:

let x = arr[0];
println!("{}",x);

won't let the array copied.

So in the standard library, they are special cases that compiler can optimized. But when we write our code, we should always use reference to pass large variables.

In the code you've posted here, arr is about 12 times the specified stack size, not half. It won't fit in even a single stack frame.


"Always" is perhaps too strong here, though using some kind of pointer is often a good idea when dealing with large data. When ownership needs to be transferred, that'll generally be a Box instead of a reference.

Also check out the machine code produced. The only memcpy call is within the backtrace support code which is not called on normal cases. The x86_64 doesn't have implicit memory copy mechanism.

Also note that the call_by_value() function itself is optimized out. There's no function call cost if there's no function call.

3 Likes

What about IntoIter? The way I can just copy a pointer size variable, and take the ownership of whole array.

fn move_element<const N:usize>(arr:&mut [String;N]){
	for x in arr.into_iter()
	{
		drop(x);
	}
}

You don't:

fn move_element<const N: usize>(arr: &mut [String; N]) {
    for x in arr.into_iter() {
        drop(x);
    }
}

fn main() {
    let mut arr = [String::from("first"), String::from("second")];
    move_element(&mut arr);
    println!("{:?}", arr); // prints ["first", "second"]
}

Playground

drop(x) in this case only drops the reference, not the value behind it.

3 Likes

You could std::mem::take() (for T: Default) instead of drop() if you find a situation where it's actually beneficial. But definitely compare (e.g. on Goldbolt), you could lose your memcpy etc.

It doesn't take ownership of the array. into_iter() on a reference-to-array returns an iterator that produces references to the individual items. It does not move the items. Please read the documentation.

I'm not professional at assembly language. And don't know what is "backtrace support code". Do you mean there is an optimized code that will be called in some sitiuation that will not copy the array?

And I have tried this:

In this way, the compiler can't optimized the arr. Becuase if the two arr are same, it will print two same address, this is not correct.

I feel move is cheap in rust. It looks variable alias in some sitiuation. But it is not, it actually copies the memory. Some objects are too large to copy cheaply. Using array is very easy to create large object.