Conditional move of enum data out of a collection

I'm filtering a vector of enums and I need to move the filtered data to another collection, but my solution doesn't seem very efficient because I have to test the enum twice - once in the filter, and once to extract and move its data with a pattern if let test:

    if matches!(self.instructions[i], Instruction::Label(_)) {
        if let Instruction::Label(label) = self.instructions.remove(i) {

Is there a better way to do this?

I made a simplified example, look at the double pattern test in compile(). Since drain_filter() is not stable yet I'm using a loop and remove() to move the Label items.

enum Instruction {
	Label(String),
	Op((u8, u8))
}

struct Program {
	instructions: Vec<Instruction>,
	labels: HashMap<String, usize>
}

impl Program {
	pub fn compile(&mut self) {
		let mut i = 0;
		while i < self.instructions.len() {
			if matches!(self.instructions[i], Instruction::Label(_)) {
				if let Instruction::Label(label) = self.instructions.remove(i) {
					self.labels.insert(label, i);
				}
			} else {
				i += 1;
			}
		}
	}
}

I cannot use a match because it can only give a reference or it would move everything including self. So I must move the data with remove (or drain), and even if drain_filter was stable and could make the filtering easier and more efficient, I would still have to extract the enum data - here a String, but I'm actually using that in a more complex use-case.

It's not a huge concern but I was wondering if I missed an easier solution.

I don't like having to use remove(i) either because it's O(n), but that's another problem.

Playground link.

How about:

pub fn compile(&mut self) {
    let mut new_instructions = Vec::new();
    for (i, instr) in self.instructions.iter().enumerate().rev() {
        if let Instruction::Label(label) = instr {
            self.labels.insert(label, i);
        } else {
            new_instructions.push(instr);
        }
    }
    self.instructions = new_instructions;
}
1 Like

You can consume and reform the Vec. The compiler is smart enough to reuse the allocation in many cases.

1 Like

I was going to suggest something similar to @quinedot's solution, except using partition_map().

impl Program {
    pub fn compile(&mut self) {
        let instructions = std::mem::take(&mut self.instructions);

        let (instructions, labels) =
            instructions
                .into_iter()
                .enumerate()
                .partition_map(|(i, item)| match item {
                    Instruction::Label(label) => Either::Right((label, i)),
                    other => Either::Left(other),
                });

        self.instructions = instructions;
        self.labels = labels;
    }
}

(playground)

This is still O(n), but that's a lot better than the remove() solution which is O(n2) in the worst-case.

If you don't care about retaining the old order, you could use swap_remove() to remove the item in O(1) time. You would still need to duplicate the match statement, but I'm confident LLVM would be smart enough to optimise that out.

3 Likes

Nuance note: enumerate does not give the desired numbering, they want the post-partitioned indices of the following op.

1 Like

Ah good catch. If that's the case, you can probably hoist a let mut i = 0 variable out of the closure and increment it whenever Either::Left(...) is returned.

Thanks all for your replies! :smiley:

That's very interesting, thanks! The take allows to separate the ownership from self, which was a blocking point for me.

I do mind about the order in the vector, that's why I avoided swap_remove. But this problem disappears with your solution and the others that have been posted so it's fine.

@Michael-F-Bryan The partition_map sounds like a good idea too. Yes, I can treat the indices differently to get the desired result, that's not a real problem.

@RedDocMD It wouldn't work like that and would require to clone the data, so unfortunately what I'm trying to avoid. :slight_smile: Maybe by combining this with take, though.