Ownership of Objects in Vector


#1

Hi
I’m quite new to Rust and have been trying to learn it, but now I’m stuck somewhere. I’m trying to port a Python script to Rust which reads and modifies DNA sequences. There are several “modules” for modification, filtering and analysis of these sequences. They can be chained in any order or combination. I realized this by iterating over a vector of Objects implementing a Processor trait.

Below I pasted example code with two Processors modifying/analyzing Item. CountProcessor actually counts the occurrences of the different strings and doesn’t modify anything. After all Text has been processed, I’d like to access the counts attribute of the CountProcessor, which however yields an error (use of moved value) because ownership of the objects was transferred into the vector as far as I understand.
Unfortunately, I’m not familiar enough with the possibilities I have (if any) to access the counts map. Using Vec<&Processor> doesn’t seem possible because the “modules” are dynamically added based on user input, which causes lifetime issues (maybe it would work with static lifetimes?).

Anyway, I would appreciate some help very much.

#![feature(core, std_misc)]
#![feature(box_syntax)]

use std::collections::HashMap;
use std::collections::hash_map::Entry::{Occupied, Vacant};


struct Item {
    pub text: String,
}

trait Processor {
    fn process(&mut self, item: &mut Item);
}

struct TrimProcessor;

impl Processor for TrimProcessor {
    fn process(&mut self, item: &mut Item) {
        item.text = item.text.as_slice().trim().to_string();
    }
}

struct CountProcessor {
    pub counts: HashMap<String, usize>,
}
impl CountProcessor {
    fn new() -> CountProcessor { CountProcessor{counts: HashMap::new(),} }
}

impl Processor for CountProcessor {
    fn process(&mut self, item: &mut Item) {
        match self.counts.entry(item.text.clone()) {
            Vacant(entry) => { entry.insert(1); }
            Occupied(mut entry) => { *entry.get_mut() += 1; }
        };
    }
}


fn main() {
    let mut processors: Vec<Box<Processor>> = vec!();
    let do_trim = true;
    if do_trim { processors.push(box() TrimProcessor as Box<Processor>); }

    let mut cp = CountProcessor::new();
    processors.push(box() cp as Box<Processor>);
    for text in ["a", "a ", "b", "c"].iter() {
        let mut item = Item{text: text.to_string(),};
        for p in processors.as_mut_slice().iter_mut() {
            p.process(&mut item);
        }
    }

    // error: use of moved value: `cp.counts`
    for (text, count) in cp.counts.iter() {
        println!("{}: {}" , * text , * count);
    }

}

#2

The problem here is that it’s not clear at all what kind of ownership semantics you want, nor how the code needs to fit together. Here’s a random assortment of possibilities to try:

  • Can you give CountProcessor a &mut HashMap<String, u64> and have it store the counts in that?
  • Incidentally, don’t use usize in the HashMap; it’s not appropriate. Only use usize/isize if you’re explicitly dealing with things like array lengths or memory distances.
  • Instead of trying to access cp directly, why don’t you define an API for accessing processors through processors itself? Instead of a plain Vec, make it an opaque type you put processors in, and can then get a reference to them.
  • If you really, desperately want shared ownership, look at Rc<RefCell<_>>. Just note that this should be your option of second last resort, as it basically throws away most of the compile-time guarantees that Rust gives you, and asserts them at runtime instead.
  • (Last resort is to just leak memory :P)

#3

Thank you for the hints, that really helped me. Sorry for the bad example. It can be thought of as a subcommand that counts unique words/DNA sequences, in combination with some pre-processing (TrimProcessor). The processors approach seemed appropriate to me because it allows for a very flexible assembly of different modules.

I chose to pass the count map by reference to the processor as you suggested. There has been some struggling with lifetime issues (only seems to work if the count map is defined before the processors vector) and learnt about how to define a scope in order to make the borrow checker happy.
This code it seems to finally work:

#![feature(core, std_misc)]
#![feature(box_syntax)]

use std::collections::HashMap;
use std::collections::hash_map::Entry::{Occupied, Vacant};


struct SeqRecord {
    pub id: String,
    pub sequence: String,
}

trait Processor {
    fn process(&mut self, record: &mut SeqRecord);
}

struct TrimProcessor;

impl Processor for TrimProcessor {
    fn process(&mut self, record: &mut SeqRecord) {
        record.sequence = record.sequence.as_slice().trim_matches('-').to_string();
    }
}

struct CountProcessor<'a> {
    pub counts: &'a mut HashMap<String, i64>,
}
impl <'a>CountProcessor<'a> {
    fn new(counts: &'a mut HashMap<String, i64>) -> CountProcessor {
        CountProcessor{counts: counts }
    }
}

impl <'a>Processor for CountProcessor<'a> {
    fn process(&mut self, record: &mut SeqRecord) {
        match self.counts.entry(record.sequence.clone()) {
            Vacant(entry) => { entry.insert(1); }
            Occupied(mut entry) => { *entry.get_mut() += 1; }
        };
    }
}


fn main() {
    let do_trim = true;

    let mut counts = HashMap::new();

    {
        let mut processors: Vec<Box<Processor>> = vec!();
        if do_trim { 
            processors.push(box TrimProcessor as Box<Processor>); 
        }
        
        processors.push(box CountProcessor::new(&mut counts) as Box<Processor>);
        let seqs = [("seq1", "ATGC"), ("seq2", "GCAA"), 
                    ("seq3", "GGA"), ("seq4", "GGA-")];
        for &(id, seq) in seqs.iter() {
            let mut item = SeqRecord {
                id: id.to_string(), 
                sequence: seq.to_string()
            };
            for p in processors.as_mut_slice().iter_mut() {
                p.process(&mut item);
            }
        }
    }
    
    for (seq, count) in counts.iter() {
        println!("{}: {}" ,  seq , count);
    }

}

Output:

GGA: 2
ATGC: 1
GCAA: 1