I'm trying to understand why the map method on a Rust array moves (or copies) itself? For example this seems like idiomatic code in every other language I'm familiar with:
But it fails in Rust because the first map consumes the original array! I eventually did find .each_ref() which helps in my original circumstance where I don't want to copy the elements themselves either. Seems weird that I have to make an extra intermediate array just to get the final one, but maybe this is something I can just assume probably/sometimes/maybe/perhaps gets optimized away?
But I just feel like yet another thing about my mental model of Rust must be broken to get tripped up like this. Why doesn't .map iterate over immutable refs of the items in the first place?
Because there can only be onemap method, and the by-reference access can be achieved by adding an each_ref or each_mut step. These kinds of methods are kinda suboptimal anyway when the array is super big (even just moving around larger arrays can be overhead, or use a lot of stack memory), and for small arrays it should get optimized well.
Arguably, it’s of course possible to think of adding something like map_ref or map_mut methods that act like a fused each_ref().map(…) and each_mut().map(…) without the intermediate array, but that’s also somewhat a proliferation of methods[1].
and hopefully in most use cases not an actually relevant difference after optimization. Also at some point one might wonder if we aren’t instead looking for a more general fixed-size [i.e. size known at compile time] Iterator-like kind of interface for more flexibility & efficiency anyway ↩︎
Thanks both! So iiuc Rust doesn't really have a traditional map method, really I should think of its .map as more of a .transmogrify?
I originally wondered this if there was some aesthetic against overloads but then I got distracted by how many .into() methods there were and it seemed less of a reason. But reviewing now, as much as there's generics and such, I see that Rust really doesn't have method overloads at all!
So either I can use the built in "transform this array to some other type of array", in conjunction be used with other methods like .clone() or implicit copy, or .each_ref, etc. And hope that all the "extra copies" that need to be created as intermediates get optimized away?
Or it sounds like the map method itself is not real efficient in the first place? So maybe something like providing my own, more traditional map-style method if I want to basically go straight from the original array as read-only to a new array?
trait MyMap<T, const N: usize> {
fn map_directly<U, F>(&self, f: F) -> [U; N]
where
F: Fn(&T) -> U;
}
impl<T, const N: usize> MyMap<T, N> for [T; N] {
fn map_directly<U, F>(&self, f: F) -> [U; N]
where
F: Fn(&T) -> U,
{
let mut out = [const { std::mem::MaybeUninit::<U>::uninit() }; N]
for (idx, d) in self.iter().enumerate() {
out[idx].write(f(d));
}
// (`transmute_copy` here because compiler isn't sure MaybeUninit<U> is same size as U?)
unsafe { std::mem::transmute_copy::<_, [U; N]>(&out) }
}
}
The central difference of Rust is that it has unique ownership by default — and everything else (Copy, &, Rc) is built on top of that. Array map() is following that default — it applies a function to the values in the array rather than also taking references.
I would argue that the "Rust version" of the map operator is the more fundamental and natural one, in any language with ownership. If you're coming from a language like Python or JS, with refcounting or other kinds of transparent pointers/handles/references, it might seem strange... but really, that's just ownership in general being different.
Perhaps using iterators first might help to grasp Rust's ownership-based APIs, since there you have .iter for iterating through references and.into_iter to consume self and iterate through owned values.
Rust arrays have a minor pain point in that you can't move out of them. (EDIT: with indexing; you can move out of them with slice patterns as discussed below.) This is noted as an aside at the very end of the array documentation. (This is kind of related to the fact that you can mutably borrow two different fields of a struct, but not two apparently different elements of a slice.) You can typically take an element, and for types such as String, it won't incur any extra allocations. But if the element type doesn't have a reasonable default value, this doesn't work. You can also write something like let [x, y, z] = my_array;. Helper methods such as map let you avoid some pain. You need it if you are writing code that's generic over the array length because you can't write down the pattern you'd need.
Care to expand on that? Note that “problem” only happen when you explicitly made your Thing a non-Copy type.
if you would make it Copy then it would “just work”™.
That means that we are talking about languages where “move out” operation (that invalidates something) exists… which languages are these?
My take on that is that Rust works exactly like all other languages… if you are dealing with Copy types.
With non-Copy types it's different, but that has an explanation: Rust's move operation allows you to avoid clones or references… but yes, when you employ that superpower map start working differently, too.
I'm doing my best to not get nerdsniped by this because now I'm curious myself But I can say my mental model (perhaps overly informed by JavaScript, but also in Python or Swift for example) is that I use map in a "purely functional" sense to project items to a new copy. Whereas I would use something like forEach to modify in-place.
You both mentioned ownership, but that still doesn't quite click for me. My surprise was that ownership has to be moved (rather than immutably borrowed) to make what in my mind is a copy. Albeit that copy is modified in some way — but in my mind it's the new version that is getting mutated, not the original!
I'm probably getting unreasonably hung up on this. There's a lot of stuff in Rust that is unergonomic to use the way I'm "used to" or simply just isn't named the way I would have predicted. I'll make a new homonym in my head for "what Rust calls map" (where the original disappears) which is distinct from "what Nate calls map". Which Rust also has, it's simply named each_ref().map(…)
The "traditional" map method in Rust is arr.iter().map(…) (borrowing) or arr.into_iter().map(…) (moving). But it’s currently difficult to collect an array iterator back to an array, so the .map() method on arrays exists as a convenience thing. Indeed it wasn’t added until version 1.55 because before that there was no way to write functions that are generic over array size. Because of that, for a long time arrays were kind of second-class citizens, and in some ways are still a bit awkward to use.
Languages like JS, Python, Scala, and even Swift have heap-backed arrays, which are much closer to Rust’s Vec. Rust arrays are a more special tool for more specific purposes, typically for when you need one or both of:
That makes sense. But do also note that if you had values that implemented Copy, then map() on an array of them would copy them.
In this description you’re focusing on the output of the map operation, but I think it would be useful to consider the input more.
In general, in Rust, applying a function to some input value consumes (moves) that input. In particular, [x].map(f) is equivalent to [f(x)].[1] This is what I meant when I said that map() has the “default” behavior. If map()also took a reference, that would be an additional transformation — [x].each_ref().map(f) is equivalent to [f(&x)], not [f(x)].
array.map(f) is exactly “apply f to each element of array” — no more and no less. Because, in the general case without Copy, applying f to some element moves that element, so must .map(f) move every element.
Of course I want to avoid clones! (That was another thing that was bugging me about .map — I focused my example on the array "disappearing" in the move case, but the array getting copied in the "working" case also seemed like an expensive operation for the language to gloss over!)
But why would I want to avoid references?
That seemed like a headscratcher because I didn't want to avoid references. Why would I want to avoid references? Well in retrospect, by using array instead of Vec I must have been signalling exactly that.
In Rust, an array is a value type. I've been using it like a fixed-size Vec [not that the Vec or Box structs themselves aren't values, but……] whereas array is right there in the list next to bool and char and i32. It's meant to be a small and cheap local variable that one doesn't get attached to:
I don't expect 2.pow(x) to work by reference to avoid "consuming my original" 2 and it doesn't bother me that it gets copied into the fn pow(self, exp: u32) method call. With both types of primitives (i32 vs. array) my mental model is that they're being used as a read-only source of truth for the .pow() or .map() operation.
But that mental model is wrong! In Rust, my_num.pow(x) and my_arr.map(fn) do both consume (and in a sense replace) the originals, unless the originals are Copy. That's what you all have been patiently telling me. My surprise coming from JavaScript or Python or Swift or whatever is a red herring because the map methods there are all on those language's equivalent of a Vec not an array!
In a sense array::map moves the original sequence, simply because normal people are busy manipulating actual data via Vec::iter instead!
The backstory here is both that:
I'm keen to learn to write code that can be used in no_std / sans-alloc environments, so I tend to hesitate reaching for anything that uses the heap when I don't "have to"
and in my case I did have the constraint of @steffahn's footnote way back in that early reply — a method in a impl<const N: usize> wrapper whose return value needed to be [_; N]
Anyway, thanks all! Seems like every line of Rust I write requires yet another round of research/reading to understand why. Usually I do that on my own time but in this case my research wasn't turning up any existing reading! So maybe this will be helpful to someone else who's gotten themselves confused and if not you've at least helped me understand why another self-gotcha that made sense to everybody else
Arrays and Vecs are equally useful with or without references (though one often tries to avoid constructing a temporary Vec<&str> by using iterators instead). I think you’re seeing a relationship that doesn’t exist, here.
It’s not that Vec is offering you references by default; it's that it offers you references when you call .iter(). You can use .iter() on an array too. And you can move each item out of a Vec too, with .into_iter().
That’s a fine thing to aim for, but in many problems, you do need the heap and the thing you want to do throughout your program is to avoid excessive heap operations, such as reallocating and copying data that isn't needed.
Thus, here is an example of when you might want to “avoid references”, because you want to construct something: if I have a string: String and I need an Option<String>, then if I don’t need the String any more, the most efficient way to do that is Some(string) — a move. If I have a [String; 2] and I want a [Option<String>; 2] then I can use strings.map(Some). If I have a Vec<String> and need a Vec<Option<String>>, then I can use strings.into_iter().map(Some).collect(). All of these cases want moves, whether or not an array is involved.
In Rust, a move is the fundamental operation. When we work with values where strict moving is wasted bookkeeping, we fix that with Copy to relax the moving rule. When we use references, we do it by creating references and then moving=copying those references.
That’s the fundamental “why” of array::map() moving: because that is the kind of map() that is most flexible. If we had a map() that referenced, we’d need another function map_move(). So, instead of having two versions, we prefer one composable one.
Were you under the impression that "moved semantics ==> must involve the heap" and/or "values in a (stack) variable that don't manage heap memory ==> must be Copy"? Among other learning material, The Book can give this impression, as their ownership introduction heavily (over-)emphasizes the heap.
If not, don't mind me . But in case you did...
A clearer view of ownership (IMO) is that it's about managing resources. Types which manage heap allocations (such as String) need to deallocate when they drop, and that's a kind of resource management. It's undefined behavior to deallocate the same heap memory more than once, and Rust chose against implicit duplication (invisible cloning of Strings), so Strings have move semantics.
But there are many other types that manage resources which don't necessarily involve the heap: Files need to be closed when they drop, and it's undefined behavior to close the same file more than once. They also have move semantics.
If your type has a field with a destructor, your type will also have a destructor, and your type will necessarily have move semantics too.[1]
In order to have copy semantics, a type must implement Copy. The Book calls this being "stack only data",[2] but that is misleading. The real requirement for Copy is that it cannot manage any resources -- it cannot have a destructor. You opt into this as a guarantee by implementing Copy, and it is only allowed if all of your fields are Copy too. Because you have to opt in, a type may have move semantics -- not implement Copy -- even if it doesnt have a destructor.
The Thing in your OP could implement Copy, but didn't. It's an "inline data only" type that is not Copy. So is Range<usize>, intentionally (it's considered a footgun for an iterator to implement Copy).
This type requires a destructor as it is managing a resource (it's a read lock on some value). Thus it cannot ever implement Copy. And it resides in core, where there isn't even the concept of a heap.
This is true even if your type does not implement Drop itself. Destructors are more general than that. Currently String does not implement Drop itself, for example. ↩︎
"Inline only data" would be a better phrase for what they describe (a value of any type can be put on the heap) ↩︎