I am writing a program which needs to take a huge input file anywhere between 15MB to 5GB and then scan through, similar to grep. I wanted to split the input file up into 5 sections and feed each part to a separate thread. E.G. Thread 1 covers lines 0-160,000, thread 2 covers lines 160,001-320,000 etc...
So I have shared data - the buffer holding the data which is being sliced up and given to different threads for READING ONLY, the quotient of how many lines there are in the file divided by the number of threads, and the terms to search for. I combined these into a struct and created an Arc from it. The idea was to feed this Arc into a wrapper function. This wrapper function will clone the arc 5 separate times in the case of 5 threads, and then feed each clone into the thread::spawn method closure.
The following example program actually works fine, despite me trying to cause a problem:
use std::time::Duration;
use std::thread;
use std::sync::Arc;
struct Shared
{
name: String,
age: u32
}
fn print_person(person: &Shared)
{
println!("Name: {}\nAge: {}", person.name, person.age)
}
fn main()
{
let person1 = Shared {name: "James".to_string(), age: 32};
let main_arc = Arc::new(person1);
do_threading(main_arc);
}
fn do_threading(main_arc: Arc<Shared>)
{
let p1_arc = main_arc.clone();
let p2_arc = main_arc.clone();
let myString = p1_arc.name.clone();
let t1 = thread::spawn(move || {
println!("Thread 1 starting...");
println!("Test1: {}", myString);
println!("Test1: {}", p1_arc.age);
print_person(&p1_arc);
thread::sleep(Duration::from_secs(5));
println!("Thread 1 exiting...");
});
let t2 = thread::spawn(move || {
println!("Thread 2 starting...");
print_person(&p2_arc);
println!("Thread 2 exiting...");
});
t1.join().unwrap();
t2.join().unwrap();
}
This was my attempt at troubleshooting the problem and I haven't been able to actually reproduce the problem in that little example.
Now the actual problem code looks like this and it currently produces the error:
error[E0597]: `new_tda1` does not live long enough
--> src/main.rs:70:33
|
70 | let strings_as_slice1 = new_tda1.vector_of_strings.as_slice();
| ^^^^^^^^ borrowed value does not live long enough
...
90 | });
| -- borrowed value needs to live until here
| |
| `new_tda1` dropped here while still borrowed
error: aborting due to previous error
For more information about this error, try `rustc --explain E0597`.
error: Could not compile `file_io`.
To learn more, run the command again with --verbose
use std::fs::File;
use std::io::prelude::*;
use std::thread;
use std::sync::Arc;
extern crate regex;
use regex::Regex;
const NUMBER_OF_THREADS: usize = 5;
struct ThreadData {
vector_of_strings: Vec<String>,
terms: Vec<& 'static str>,
quotient: usize
}
fn main()
{
let mut file = File::open("info2.txt").expect("Can't open file!");
let mut terms: Vec<&str> = Vec::new();
construct_regex(&mut terms);
let mut contents = String::new();
file.read_to_string(&mut contents) //TODO: Uhhh... This should prob be a memory map at some point
.expect("Oops! Can not read the file...");
let vector_of_strings: Vec<String> = contents.split("\n").map(|s| s.to_string()).collect();
let total_lines: usize = vector_of_strings.len();
println!("Total # of lines in file: {}", total_lines);
let quotient: usize = total_lines / NUMBER_OF_THREADS;
let td = ThreadData {
vector_of_strings: vector_of_strings,
terms: terms,
quotient
};
let td_arc = Arc::new(td);
threaded_search(td_arc);
println!("Done, exiting...");
}
fn threaded_search<'a>(td_arc: Arc<ThreadData>)
{
let new_tda1 = td_arc.clone();
let new_tda2 = td_arc.clone();
let new_tda3 = td_arc.clone();
let new_tda4 = td_arc.clone();
let new_tda5 = td_arc.clone();
let strings_as_slice1 = new_tda1.vector_of_strings.as_slice();
let strings_as_slice2 = new_tda2.vector_of_strings.as_slice();
let strings_as_slice3 = new_tda3.vector_of_strings.as_slice();
let strings_as_slice4 = new_tda4.vector_of_strings.as_slice();
let strings_as_slice5 = new_tda5.vector_of_strings.as_slice();
let handle1 = thread::spawn(move || {perform_search(&strings_as_slice1[0..new_tda1.quotient], &new_tda1.terms);});
let handle2 = thread::spawn(move || {perform_search(&strings_as_slice2[new_tda2.quotient..new_tda2.quotient*2], &new_tda2.terms);});
let handle3 = thread::spawn(move || {perform_search(&strings_as_slice3[new_tda2.quotient*2..new_tda2.quotient*3], &new_tda3.terms);});
let handle4 = thread::spawn(move || {perform_search(&strings_as_slice4[new_tda3.quotient*3..new_tda3.quotient*4], &new_tda4.terms);});
let handle5 = thread::spawn(move || {perform_search(&strings_as_slice5[new_tda4.quotient*4..new_tda4.quotient*5], &new_tda5.terms);});
handle1.join().unwrap();
handle2.join().unwrap();
handle3.join().unwrap();
handle4.join().unwrap();
handle5.join().unwrap();
}
Please note that most of this code at this point (its been many many hours of troubleshooting/learning since I'm new to Rust and I need to do concurrency right away) is "troubleshooting code" (not how it was originally written but modified in an attempt to stop errors) and for the real application, I'll do a re-write of this without the redundant variables and such. I'm having a dificult time simplifying this question any more because when I change one thing in one area of my code, another area of the code is then flagged as being an error. This has gone on for a long time and thus I've gotten stuck but I am aware that this problem definitely has to do with ownership. Thanks for your guidance.