Rerun an iterator / loop internal iterator management

Is it possible to define an iterator before entering a loop {} that can be re-iterated over again once within each pass through the loop?

I'm looking at using itertools::izip!() to combine together:

  1. a range
  2. iter() for an array or slice of immutable variables
  3. iter_mut() for a Vec of mutable variables
  4. and a Mutex<Vec<T>> or Vec<Mutex<T>> (depending on multithreaded access issues)

Item 4 on that list might be most of where my questions about the run/iterations are coming from. I have it there to act in place of mpsc channels for sending data to each of the mutable variables in the iter_mut(). I think it's viable as a result of how the phases within the loop can be separated by (in)dependence, but overall I'm just hoping to use the minimum overhead. Each pass through the loop can effectively be divided into two phases:

  • Phase 1 - set 'communication' (#4) variable elements to zero, sum various values into each element from any number of spawned threads.
  • Phase 2 - summed values from the 'communication' variable will be given to a function impl for each mutable variable (#2) run in its own thread (or parallel iterate mapped over, or similar).

Some further notes, info, and alternative ideas:

  • All of the zipped iterators will be the same length.
  • Each index in the range is for the same "object" (with attributes divided by mutability and independence).
  • Some instances of the loop I'd like to filter() the iterator to skip certain objects.
  • I could potentially drop the range, but it may help in instances where I`ve filtered but would still like to know the 'true' index.
  • I could drop the immutable part (#2) and just have each mutable struct instance hold a reference to it's immutable 'other half`.
  • The iterator will only need to hold a handful (1-3+) "objects". I could just use a simple for i in 0..n and filter manually. (Not sure if avoiding this is better for performance or just high-level code)

I'm guessing I'll need to put the izip!() call with new .iter()s and .iter_mut()s inside the loop, but just thought I`d check if there were alternatives. Also, a related question, are there any performance implications or concerns that come up because of anything I'm trying here? Or is there any better way to approach this?

Any help on this would be much appreciated. Thanks!

Usually, most iterators are not Copy or Clone (and even if they are, it's a bit surprising to copy or clone an iterator).

Sine your outer loop has only 2 iterations, I'd advise you to keep re-creating the iterator inside it. Creating an iterator is generally very cheap, so you shouldn't see performance implications. I'm not saying that caching the iterator is impossible, but probably any solutions would be uglier or more surprising than just invoking izip!() for each pass.

Thank you. I was pretty sure on that, but wondered if the mix of container types and mutability might lead to any alternatives for handling the lot altogether (or each separately) as efficiently as possible. Though, the items I'm iterating over are all technically Copy, so the option is there. It just sounds like it has no benefit.

The outer loop will be run MANY times. It is each run through the loop that has two phases, and thus the split by (in/inter)dependence level.

Also, I'm always happy to get the basic concept out. I feel like my knowledge of Rust is getting better on the syntax side, but knowing the smart/best way to architect larger solutions using it is still in progress...

The actual run-time loop will look something like:

let substeps = 4_u8; // could be 1 to 4

loop {
    for substep in 1..=substeps {
        // Here is where the zip would normally (and I guess ideally) be performed
        // And also where I`ll filter to get only objects with order >= substep
        
        // Phase 1: calculations to be summed into the Mutex(es)
        // threads here are independent, but must correctly address #4 in the zip.
        // The filter here will help avoid calculating for objects that won't run.

        // Phase 2: map() over the filtered zip, mutating each mutable struct
    };
    // Final part once all substeps are completed - moves to next step
    // This part will also map() over the full mutable vec (but no zip needed)
    // This step decides when to exit the loop/return
};

Daily reminder that inclusive ranges are slower than exclusive ranges, so prefer for substep in 0..substeps if you care about performance

3 Likes

What do you mean "slower"? That's quite an assertion.

Inclusive ranges need to consider that the element after the last one may overflow. This requires an additional branch which may also prevent auto-vectorization optimizations.

1 Like

Interesting! So would

for substep in 0..substeps {
    let substep_id = substep + 1;
    // etc.

really be measurably faster, then? I can guarantee substeps will be a relatively low value u8.

I don't like the 1-starting list either, but it's the convention I'm working with.

Depends on the complexity of the loop body, you can see pretty big differences when the loop could be auto-vectorized but with more complex loops the cost is just that of an additional branch which should be almost always predicted. Also note that with internal iteration (e.g. for_each, fold and other Iterator consuming methods, however this isn't always the case, for example if there's a zip in your iterator chain) the cost becomes just a branch after the loop (as opposed to inside the loop).

ps: In your case substep + 1 <= substeps, so you're guaranteed to not overflow.

There was a thread on reddit about this recently, where I put some details: scottmcmrust comments on Wondering about assembly code for factorial

See also bugs like Big performance problem with closed intervals looping · Issue #45222 · rust-lang/rust · GitHub

Though, as @SkiFire13 mentions, it's important to remember that this is a micro-difference. Once the body is doing material work, it's likely that there's no measurable difference.

1 Like

Ahh, quite interesting!

So if the issue is specifically in ..=, would going to 1..(substeps + 1) effectively fix it?

Yes, although the trick with that alternative is that if substeps is T::max_value(), then it's not the same thing any more.

So (0..substeps).map(|x| x + 1) (like you wrote earlier, albeit in a different form) might be preferable to avoid the risk of overflow.

1 Like

Ahh, that's pretty nice and compact!

On a related note, I was looking to create an immutable instance of each range in the hope that I could reuse them. No dice. Is the solution just that I have to write the range out each time? Would making a closure that returns a new range be... overkill (or even work)?

let substeps = || 1..(order+1);

for substep in substeps() {
    // I'm worried this would somehow get a new range after each substep loop
    // leaving this to just infinitely repeat for substep = 1
}

// maybe less likely, though I'm still unsure if using it in a zip
let example = itertools::zip(substeps(), solvers.iter_mut());

Reminder that a range is just a simple struct, so it's essentially free to construct.

But since you probably meant it as DRY, not perf, you can always do for substep in substeps.clone() {. The closure would work great too, and is also essentially free. So whichever you like best.

1 Like

Ah, excellent. Yeah, it's just as DRY - the one I'm repeating is a 0..objects.len() range. It doesn't need to be condensed, but only writing it one place ensures it's correct each place it's used. I was considering working out the parsing/desugaring of it, but sounds like I can skip that. Thanks!