Advice on reusing memory when parsing

I am loading the contents of a text file into a string, parsing that string and populating a struct with multiple Vec<&str> containing references to the string I parse. I take the struct and do some crunching. After that, I want to clear the original string and load the contents of another file into it, and repeat the process for a large number of files.

I want to reuse the original string to minimize allocations. If I drop the struct containing the Vec<&str> the borrow checker lets me mutably borrow the original string just fine. But during parsing, I have to allocate a new struct with new Vec<_>. Is there a way I can reuse the allocated memory of both the string and the Vec without reallocating either?

I tried to put the original string in an Rc<RefCell<_>> and do the borrow checking at runtime instead of compile time. But I couldn't get it working. I borrow a Ref<String> and parse it and do the crunching.. and .clear() the Vec<&str>. The borrow checker doesn't seem to realize that .clear() unborrows the data. I figured there must be a correct way of doing this. Any help is appreciated, thank you!

EDIT: I also considered populating my struct with indices, or Range<usize> instead of &str. But that just felt like an unergonomic way to trick the borrow checker.

The lifetimes of these are checked statically using rules that can't understand run-time behaviour like .clear().

This may optimize to keep the original capacity:

let new = old.into_iter().map(|| "").collect();

Other than that you'd need unsafe to transmute the lifetimes.

Ok. Can you give me some pointers (no pun intended) as to how to do this? Any examples / learning material I can look at?

The general term is "arena allocation".

Ok, thank you! Yes I was considering using an arena allocator. I also just googled and found all these threads talking about a similar issue. (I guess I didn't google the right terms before). Looks like there are crates written to solve exactly this issue, and other alternate approaches.

https://users.rust-lang.org/t/transmute-alternative-for-lifetimes/111437
https://users.rust-lang.org/t/reuse-a-vec-t-s-allocation-beyond-the-scope-of-its-content/66398
https://users.rust-lang.org/t/pattern-how-to-reuse-a-vec-str-across-loop-iterations/61657/10

After reading through some of these posts, I ended up using the following pattern without any crates. This is not my actual code, but just an example of the pattern I used.

#[derive(Default)]
struct Parsed<'a> {
    words: Vec<&'a str>,
}

impl<'a> Parsed<'a> {
    pub fn clear(&mut self) {
        self.words.clear();
    }
}

#[derive(Default)]
struct Loader {
    raw_text: String,
    parsed: Parsed<'static>
}

fn load_impl<'a> (input: &'a str, words: &mut Vec<&'a str>) -> Result<(), Error> {
    // Parse the 'input' and populate 'words'.
}

impl Loader {
    pub fn load<'a>(&'a mut self, filepath: &Path) -> Result<&'a Parsed<'a>, Error> {
        self.raw_text.clear();
        File::open(filepath)
            .map_err(|_| Error::CannotReadFile(filepath.to_path_buf()))?
            .read_to_string(&mut self.raw_text)
            .map_err(|_| Error::CannotReadFile(filepath.to_path_buf()))?;
        self.parsed.clear();
        let borrowed = unsafe { 
            std::mem::transmute::<&'a mut Parsed<'static>, &'a mut Parsed<'a>>(&mut self.parsed)
        };
        load_impl(self.raw_text.trim(), &mut borrowed.words)?;
        Ok(borrowed)
    }
}

I am thinking this is safe, because until the returned reference to Parsed is dropped, the borrow checker won't allow the Loader to be borrowed again. I tested this pattern. It works and I am predictably seeing performance improvements because both raw_text and parsed are reused. But I rarely write unsafe Rust, and never used std::mem::transmute before so I am not sure. Is this really safe? Or am I introducing some weird UB? Thanks again!

Here's a safe version that takes advantage of a special case in the standard library that optimizes some combinations of vec.into_iter().collect() to preserve the original vec's allocation:

    let mut vec = Vec::with_capacity(10);
    loop {
        let s = String::from("a b c");
        let mut short_lived_vec: Vec<_> = vec.into_iter().collect();
        short_lived_vec.extend(s.split(' '));
        
        short_lived_vec.clear();
        vec = short_lived_vec.into_iter().map(|_| "").collect();
    }

It works in this case, but isn't guaranteed.

Here's an unsafe version, that is guaranteed to do what you want, but you must be very careful to always clear the short_lived_vec after every iteration of the loop, as otherwise it will be actually unsafe and have a use-after-free bug:

    let mut vec = Vec::with_capacity(10);
    loop {
        let s = String::from("a b c");
        let mut short_lived_vec = unsafe { fudge(&mut vec) };
        short_lived_vec.extend(s.split(' '));
        short_lived_vec.clear();
    }


unsafe fn fudge<'r, 'short, 'long: 'short>(vec: &'r mut Vec<&'long str>) -> &'r mut Vec<&'short str> {
    std::mem::transmute(vec)
}
1 Like

@kornel The solution I tried here seems equivalent to the your second unsafe option. And yes, I am clearing the vector before transmuting. Thank you so much!

Here is the project where I am using this: GitHub - ranjeethmahankali/ftag

Your struct is also self-referential (which Rust doesn't allow, and it's a whole another can of worms). It will have dangling pointers when dropped, because fields are dropped in the order of their definition, and raw_text gets freed before parsed. If Parsed had Drop, it could have use-after-free.

Ah, good catch! I am not implementing Drop for Parsed, so nothing should be able to read the data inside parsed after raw_text is dropped, before parsed is dropped. But yea, I will move parsed before raw_text in the struct to be safe. Thank you!

So, shouldn't this be ensured by design?

    let mut vec = Vec::with_capacity(10);

    loop {
        let s = String::from("a b c");
        vec.extend(s.split(' '));

        vec = recycle_vec(vec);
    }
}

fn recycle_vec<'a>(mut v: Vec<&'_ str>) -> Vec<&'a str> {
    v.clear();
    unsafe { std::mem::transmute(v) }
}

But, this bring me to a another point:

How many files can you read per second?

Let’s say you have a random-access transfer rate of 1 gigabyte per second, and your files are only 1 kilobyte each. This results in at most 1 million files per second, which translates to 1 microsecond per file.

However, a complete alloc() / dealloc() pair typically takes a few dozen nanoseconds on an average modern PC.

This means that repeated reallocation contributes to less than 5% of the processing time per file in this scenario. And if you’re unable to process 1 million files per second in the first place, the relative cost of reallocation becomes negligible, lost in the measurement noise.

So this could be a worthwhile optimization.

However, a simpler approach might work just as well: Preallocating Vecs using Vec::with_capacity(relatively_large_number). This reduces the number of reallocations and the associated data copying when the Vec needs to grow.

The relatively_large_number could be chosen to fit, for example, around 80% to 90% of the expected file sizes. This way, only about 10% of the files would trigger a reallocation, along with its relatively expensive data copying.

2 Likes

@mroth I am benchmarking this on a set of 300 files, each around 8 to 12 KB. Your estimates are very close. I saw a 10% performance improvement with this optimization.

I have to admit, I’m a bit skeptical. Would you mind sharing how long it takes in total to process these 300 files?

When I tried this change a couple of days ago, I saw the runtime for the command go from ~22ms to ~20ms. Since then I made other performance improvements, and changed things significantly. Now, Both run at the same speed!

hyperfine With optimization.

  Time (mean ± σ):      13.6 ms ±   0.3 ms    [User: 5.2 ms, System: 8.4 ms]
  Range (min … max):    13.0 ms …  15.5 ms    500 runs

hyperfine Without optimization (i.e. just return new vector each time)

  Time (mean ± σ):      13.6 ms ±   0.3 ms    [User: 5.3 ms, System: 8.4 ms]
  Range (min … max):    13.0 ms …  15.4 ms    500 runs

I ran this several times, saw some noise but generally cannot tell them apart.

Thank you for sharing the measurements. I had suspected that allocations wouldn’t have a significant impact compared to file operations, as file operations tend to be much slower than memory operations.