Hi, I'm trying to do some higher-order function magic, FP-style, but the borrow checker won't let me.
Essentially, I'm chaining map()s. As I understand it, while closures borrow their arguments, map() itself consumes the collection so that's kinda irrelivent? I'm mapping through functions that only borrow their inputs, thus return references to them. These references are to the map closure's arguments, and since the closure apparentlly owns those, this isn't allowed.
What I thought I needed was to use .as_ref() before calling map, so that map() consumes refs to the items, but ater the first map(), the resulting Map object doesn't have an as_ref().
let res = io::BufReader::new(f)
.lines()
.flatten()
.map(|l| l.split_at(l.len() / 2))
.map(|(one, two)| {
let s1: HashSet<_, RandomState>> =
HashSet::from_iter(one.chars().into_iter());
let s2 = HashSet::from_iter(two.chars().into_iter());
s1.intersection(&s2)
})
.map(|pair| foo(pair))
.sum::<u32>();
Borrow checker complains about l in the first closure - this makes sense if l is owned, because split_at() returns refs to it.
Also complains about s1 and s2 - also makes sense because one and two are consumed into s1 and s2, and again intersection() returns refs to their members.
I think this ultimately comes down to the fact that lines() returns an owned Iterator over owned Strings.
Things I've tried:
clone'ing() my way out of it - cloning the things I'm returning refs to. Ugly at the best of times, didn't work
Puttin things in Boxes, eg s1 and s2. Didn't do anything.
calling .lines().flatten().as_ref() - won't compile because there is no as_ref() - I think the Iterator itself isn't borrowed, so can't borrow the elements?
calling .lines().flatten().borrow().as_ref() - this gets closer - borrows the Iterator, and while as_ref() is now defined (on &Flatten), as_ref needs non-ref Flatten to implement AsRef. I have no idea what that means lol
collect()ing into Vecs. This solves the first error, ie lines().flatten().collect::<Vec<String>>().iter() allows the closure over l to work. I know this is a common pattern buy I'm not quite sure how it works? I think collect() gathers all the owned Strings on the heap, then the magic is that iter() borrows them out of that memory (because it's iter() not into_iter() ).
However the same trick of ..collect::<Vec<(&str, &str)>>().iter() doesn't work for the next one.
I feel like I know just enough here to be dangerous. I guess I'm working on a wrong / imcomplete mental mode, and it's led my down a rabbit hole. This can't be a rare or difficult thing to do - what gives?
Update: I've worked my round this as follows, with loads of collect()s and a cloned(). This is hideous and surely isn't needed? How should I be doing this?
@jongiddy Ok that's really helpful - I didn't think of that simple solution.
I still don't understand why the original code didn't work though?
What if instead of chained map()s I needed .map().and_then().map().flatten().map()? I couldn't combine them then, and I'd have the same problem?
There's no .as_ref iterator combinator because there's no way for it to work in the general case, as the results of an iteration must be able to be collected -- you have to be able to hold on to all the items at once -- and the data thus collected can outlive the iterator itself.
There's no way for one step of an iterator chain to "look forward" and see that a borrow you create here at step N ultimately cannot escape at step N + 12 or whatever.
In addition to the excellent answer above by quinedot, I'd point at that the flat_map() method exists in place of map().flatten(). Which itself is enough to obviate the need for calling map more than once in a chain in the common case.
There are some cases where you may wish to flatten more than once, but those usually show up by using combinators that are one or more levels deep, e.g. Option::and_then() and friends inside a map closure.
intersection doesn't copy the items. It only acts as a view into its inputs, so the result of the intersection is only valid for as long as its inputs can be accessed.
The problem here is that the intersection is needed for later steps, but s1 and s2 it intersects are local variables, and variables are automatically destroyed when their scope ends. Their scope is just one call of the map closure, and ends before another step.
Rust doesn't have a garbage collector, so it can't make anything live longer as needed. It ends up being your responsibility to change code in the way that avoids the problem (by copying the result of intersection to make it self-contained, or using it while s1 and s2 are still in the scope).
Thanks everyone that's really helpful. I'll confess I still don't understand the difference between one map() call with a long body and several map()s that all do one statement - why one is fine with the borrow checker and the other isn't. But I learned a bunch of stuff
Scopes in Rust are important and semantically meaningful. These: } destroy variables, so { code } is different from { co } { de } in terms of which data still exits.
Is it as simple as scoping? The intermediate results go out of scope if I chain several closures? How would I pass an intermediate result from one to the next (as I say in my original question - in this case I need many map()s, which I can combine, but what if I needed .map().reduce().map() and didn't have a choice but to use separate closures?)
I assume there's no escape analysis in Rust, so would I have to Box up any results I wanted to pass to the next higher-order function?
It is as simple as scoping, and data borrowed from local variables can't escape their scope. There's no magic to save you there.
Lifetime annotations are a DIY escape analysis, but they only assert to the compiler what is going on, and don't do anything (they're stripped out after being checked, and the code is then compiled without any lifetime knowledge).
Most importantly there is no garbage collector, so if something doesn't already live long enough, there is nothing that could make live longer. It's up to you to change the code manually to make things live longer.
If you didn't have a choice and had to use separate closures, you'd have to copy on-stack data to the heap and use self-contained non-borrowing types instead (e.g. convert &str to String, convert Intersection<'temporary> to Vec<String>, etc.). You may need Rc instead of & to reference things.
I mean that if you have data structures containing references, returning them from a closure will most likely create a self-referential struct, which the borrow checker doesn't support. RefCell alone can't avoid that, since Ref<'temp> is still temporary and bound to a scope.
Yes, you would have to put them outside and reference from closures.
This would work, but it's, probably, something mechanical translator from Haskell to Rust may use, not human.
Because at this point you have, more-or-less, replicated tha for loop and unless you are doing something extremely unusual for loop would just be easier and simpler to digest.
@kornel that's super-useful, thank you! "Lifetime annotations are a DIY escape analysis" is a very useful insight too.
I did sort of understand that I needed to get it all on the heap between closures, but I tried to just Box::new() my Intersections - I guess that's not the same as malloc(); memcpy().
I do take the point about just using a for-loop, but I'm used to this style of programming and it's how I think now, so I wanted to say with it if I could. I'm conditioned to find it much more readable, composable, easier to reason about side effects, etc.
Box::new is like malloc, but while the lifetime of the heap allocation itself is unrestricted, the Box as a whole is restricted by inner pointers it contains. You get Box<Intersect<'a>> and that 'a is a boat anchor that ties it to a scope.
Consider this UB situation that gives you malloced array of pointers to a stack:
char **map() {
char s1[10] = {}; // pretend it's a hashset
char s2[10] = {}; // pretend it's a hashset
char *intersect[2] = { &s1[0], &s2[0] }; // pretend it's the intersect iter
char **intersect_boxed = malloc(sizeof(char*) * 2);
memcpy(intersect_boxed, intersect, sizeof(char*) * 2);
// drop(&s2); // rust would also insert these
// drop(&s1);
return intersect_boxed;
}
Yup. So if you would find a way to do that the end result would be code which other people would find difficult to read.
Rust is anti-C++ in that department: it tends to actively reject efforts to add ways to write C++ in Rust, C# in Rust or Haskell in Rust.
Currently if offers two distinct ways: imperative way with for loops and functional way with combinators.
And, well… the fact that you can not easily implement that style is an advantage to me: if there are bunch of functional-style code with maps, inresections and other such things then I can be sure that style which you are proposing (with side-channel on the side which passes over all these adapters) is ugly and easy to spot.
I hate it when people write code like that in C#/Java/JavaScript and really hope Rust wouldn't make it easy.
Right, cause I'm just boxing Intersect, which borrows the sets I gave it. I.e. I'm allocating heap memory but then just copying pointers into it - pointers to the stack.
Interesting you say Rust deliberately makes it hard to write C++ / Haskell / etc. I guess I assumed it was like Scala (which I've used before) which says "choose any style you like!". I would say it's not really supporting the functional style given that my naïve "functional" code didn't work. I'm not saying that's wrong, just an interesting observation.
The style isn't vital, because there's no infinite data structures or lazy evaluation, so for loops it is I guess.