Is manually looping through a vector always strictly worse then using iterators?


#1

I’m new to rust, and I’m playing around with an optimization problem with graphs. The first thing I wrote was an implementation of Dijkstra’s algorithm to find the shortest path between all the nodes in the graph. The first version of the code is at http://is.gd/HPsCsa (function find_paths, note the graph nodes are activities, and the graph is not directed). It works according to a first test I have.

However, the outer loop in find_paths uses a range to go over each Activity one at a time. However, the Rust book says doing so is an anti-pattern and says it’s “strictly worse than using an actual iterator” ( https://doc.rust-lang.org/book/iterators.html ). Thus I wonder if there is a better way to deal with the outer loop. I initially tried to use iterators, but I ran into an issue as I wanted a mutable iterator, which borrowed the vector and wouldn’t let me access it further to get the other costs. Using indexing avoids that borrow allowing me to access element in the loop at any time.

Alternately, is the documentation a little strong, and iterators just generally the better path? From the little I’ve worked with rust, I’ve found iterators useful and I continue to try and use them everywhere else. If so, can the wording in the documentation be relaxed? As it stands, I feel like I did something wrong in this code because of that line. I’m happy to try and reword the sentence.


#2

There are two main reasons why using an iterator is better than manually indexing. One of those is performance; the bounds checks that need to be performed when indexing can adversely affect performance. The other is safety. Indexing into an array or vector has the possibility of panicking, if some of your arithmetic is off and you try to access something out of bounds.

Of course, indexing does have its uses, which is why it exists. But in general, when looping over a vec, array, or slice, using the iterator will be much better.

However, mutating the vec while iterating over it does raise some problems. Manually indexing is one way to work around them, and I wouldn’t say it’s something wrong. However, it’s what some people call a “code smell”; something that indicates that something is not quite right and maybe could be improved.

One way to do that would be to use interior mutability. Rather than iterating over the list mutably, which then prevents you from using any other references to the list, you can change the list to use one of the types in std::cell that allows interior mutability, and then iterate over it immutably, but still be able to update the contents. In this case, since you’re updating an entire Vec within an object, I think you’ll need to use RefCell.


#3

I would have thought going with interior mutability a worse code smell then manually indexing an array. At least this way I get statically checked borrowing, versus a RefCell's runtime checks. Also, wouldn’t RefCell be even worse for performance, since every access of the array gets at least an increment and a reference check compared to just a bounds check?

Either way still seems to have a code smell. Is the more idiomatic way to use a RefCell in this case? The documentation’s language would seem to point that way. What bugs me is the “strictly worse” language in the documentation, which suggests any manually indexed loop is wrong and should be avoided, compared to std::cell’s last resort. If it was a general rule of thumb I’d agree.


#4

“use an iterator over indexing” certainly smells like a rule of thumb to me. I definitely think indexing is preferable to interior mutability, which has its own complications. Not only does interior mutability add potentially more overhead (depending on how it’s implemented), but it adds more invariants to your code and potentially prevents its containing type from automatically impl’ing Sync.

If using an iterator isn’t practical, I personally don’t feel bad using indexing. Certainly, I agree that the specific example in the docs looks like a code smell to me.

Additionally, eliminating bounds checks is awesome, but they tend to be vanishingly cheap on modern hardware.


#5
  • It’s complicated. We have an optimizing compiler to play with, and they can be a bit fickle too, it’s not always clear why something optimizes well and something very similar doesn’t.

  • Strictly worse is not true about performance, since if you formulate it just right, an indexed loop may produce the exact same code output. Depends on the situation. I think the statement is about style just as much as performance.

  • Iterator is just a trait, so we can’t say anything about the performance of iterators in general. The vector’s and slice’s iterators work great


#6

I don’t know which way would perform better without trying it out.

As to which is better style, that’s more of a judgement call. I think that in this case, I do prefer the version with indexing; adding the RefCells add a good amount of complexity, without much benefit.

When I first started writing my previous comment, I thought you might be able to do it with Cell, which has no performance cost or dynamic checks, but that requires the types to be Copy, where this is replacing an entire Vec<u16> in each Activity.


#7

Thanks everyone for your answer! So it seems in this case, indexing isn’t bad, but as a general rule of thumb iterators are preferred. That seems reasonable.

I agree the example in the documentation is better written with an iterator, but I think I’ll see if I can reword that sentence to be clear its a strong rule of thumb.


#8

Indeed, there are cases where indexing is the right answer, but in general if you do it you should probably stop and think for a minute if you could do it with an iterator instead.

Actually, taking a closer look at your example, I think you are doing a lot of extra work, and could simplify this down considerably. I don’t think it’s necessary to clone each travel_cost vec and replace it after filling it in; is there any reason you can’t just update them in place? In fact, once you do that, you can indeed use Cell for interior mutability, and thus mix the iterator for the outer loop with indexing for updating the costs: https://play.rust-lang.org/?gist=2d096ef2c5db7abf1f7b&version=nightly (linking to nightly right now because it looks like the other channels are failing on Playpen). You would have to test to see if there’s a noticeable performance difference between this and a version of the same rewrite but with indexing rather than an iterator in the outer loop; sprinkling .get() and .set() around everywhere does add a little noise, but should be zero cost as they should just inline directly into reading and writing the contained value.

On the other hand, the only indexing we’re removing is in the outer loop, so the performance improvement would probably be negligible; I think that readability is probably a more important consideration, and getting rid of the Cell would improve readability. So overall, I think the better approach is the rewrite without copying the vec needlessly, but without using Cell and accept the indexing for the outer loop.

I’m also wondering why you’re filtering out all of the items in which cost is u16::max_value(); wouldn’t that generally indicate the set of items for which you have not yet computed a shortest path, and thus need to do so? I think it’s those items that you want to be working with, and those items which already have a cost computed that you don’t need to consider because you’ve already computed a cost in a previous iteration when computing costs from that node.


#9

I originally had it to try and satisfy the borrow checker. I can remove it by getting rid of the act variable and delaying the mutation of the original travel_cost by using the heap. It turns into http://is.gd/A8vdiG, which I think is much nicer. I trade the vector copy for some potential extra assignments, which I think will be faster.

Especially considering the code will only work over <50 nodes, and already takes less then a second to run in debug mode. I think I’ll avoid the Cell, as this data structure won’t be modified after this function. Avoiding mistakes in the future is probably better then using iterators now I think.

No, the way the algorithm works is to find the path to the shortest unvisited node, and see if there is a short path to any other node from that one. If one is found add it to the list of potential paths to search along and record that path for the original node. Any distance at u16::max_value() is considered to not have a path, and thus inaccessible.

As an example, consider a three node graph:
A <-> B <-> C
Where the cost to travel from A to B and B to C is 2. If we start at A, we find a path to B, but not C (which has a value u16::max_value()). So we store B as a place to look. Then we go to B, find a path to C. Since it cost 2 to get to B, it will cost 4 to get to C. C can now be added to the list at a cost of 4 (since we are searching from A), but since all the nodes are found we finish. If we had more nodes then we could continue searching in a similar manner.

Wikipedia has a nice drawing/explanation of the algorithm: https://en.wikipedia.org/wiki/Dijkstra’s_algorithm . I just don’t have an end node, so I find the shortest to all nodes from my starting node, and I do that for all nodes so I know the shortest path to all nodes from all nodes.


#10

Ah, right, sorry, I was mistaken when I was reading the code.


#11

Actually there were some good performance improvements in the std when switching some code from iterators to indexing. It’s just that it is hard to do indexing correctly, to avoid bounds checks.


#12

A trick I have seen in this thread is to do indexing through a local slice. Though I don’t know if that is still relevant.