Advice on reusing memory when parsing

I am loading the contents of a text file into a string, parsing that string and populating a struct with multiple Vec<&str> containing references to the string I parse. I take the struct and do some crunching. After that, I want to clear the original string and load the contents of another file into it, and repeat the process for a large number of files.

I want to reuse the original string to minimize allocations. If I drop the struct containing the Vec<&str> the borrow checker lets me mutably borrow the original string just fine. But during parsing, I have to allocate a new struct with new Vec<_>. Is there a way I can reuse the allocated memory of both the string and the Vec without reallocating either?

I tried to put the original string in an Rc<RefCell<_>> and do the borrow checking at runtime instead of compile time. But I couldn't get it working. I borrow a Ref<String> and parse it and do the crunching.. and .clear() the Vec<&str>. The borrow checker doesn't seem to realize that .clear() unborrows the data. I figured there must be a correct way of doing this. Any help is appreciated, thank you!

EDIT: I also considered populating my struct with indices, or Range<usize> instead of &str. But that just felt like an unergonomic way to trick the borrow checker.

The lifetimes of these are checked statically using rules that can't understand run-time behaviour like .clear().

This may optimize to keep the original capacity:

let new = old.into_iter().map(|| "").collect();

Other than that you'd need unsafe to transmute the lifetimes.

Ok. Can you give me some pointers (no pun intended) as to how to do this? Any examples / learning material I can look at?

The general term is "arena allocation".

Ok, thank you! Yes I was considering using an arena allocator. I also just googled and found all these threads talking about a similar issue. (I guess I didn't google the right terms before). Looks like there are crates written to solve exactly this issue, and other alternate approaches.

https://users.rust-lang.org/t/transmute-alternative-for-lifetimes/111437
https://users.rust-lang.org/t/reuse-a-vec-t-s-allocation-beyond-the-scope-of-its-content/66398
https://users.rust-lang.org/t/pattern-how-to-reuse-a-vec-str-across-loop-iterations/61657/10

After reading through some of these posts, I ended up using the following pattern without any crates. This is not my actual code, but just an example of the pattern I used.

#[derive(Default)]
struct Parsed<'a> {
    words: Vec<&'a str>,
}

impl<'a> Parsed<'a> {
    pub fn clear(&mut self) {
        self.words.clear();
    }
}

#[derive(Default)]
struct Loader {
    raw_text: String,
    parsed: Parsed<'static>
}

fn load_impl<'a> (input: &'a str, words: &mut Vec<&'a str>) -> Result<(), Error> {
    // Parse the 'input' and populate 'words'.
}

impl Loader {
    pub fn load<'a>(&'a mut self, filepath: &Path) -> Result<&'a Parsed<'a>, Error> {
        self.raw_text.clear();
        File::open(filepath)
            .map_err(|_| Error::CannotReadFile(filepath.to_path_buf()))?
            .read_to_string(&mut self.raw_text)
            .map_err(|_| Error::CannotReadFile(filepath.to_path_buf()))?;
        self.parsed.clear();
        let borrowed = unsafe { 
            std::mem::transmute::<&'a mut Parsed<'static>, &'a mut Parsed<'a>>(&mut self.parsed)
        };
        load_impl(self.raw_text.trim(), &mut borrowed.words)?;
        Ok(borrowed)
    }
}

I am thinking this is safe, because until the returned reference to Parsed is dropped, the borrow checker won't allow the Loader to be borrowed again. I tested this pattern. It works and I am predictably seeing performance improvements because both raw_text and parsed are reused. But I rarely write unsafe Rust, and never used std::mem::transmute before so I am not sure. Is this really safe? Or am I introducing some weird UB? Thanks again!

Here's a safe version that takes advantage of a special case in the standard library that optimizes some combinations of vec.into_iter().collect() to preserve the original vec's allocation:

    let mut vec = Vec::with_capacity(10);
    loop {
        let s = String::from("a b c");
        let mut short_lived_vec: Vec<_> = vec.into_iter().collect();
        short_lived_vec.extend(s.split(' '));
        
        short_lived_vec.clear();
        vec = short_lived_vec.into_iter().map(|_| "").collect();
    }

It works in this case, but isn't guaranteed.

Here's an unsafe version, that is guaranteed to do what you want, but you must be very careful to always clear the short_lived_vec after every iteration of the loop, as otherwise it will be actually unsafe and have a use-after-free bug:

    let mut vec = Vec::with_capacity(10);
    loop {
        let s = String::from("a b c");
        let mut short_lived_vec = unsafe { fudge(&mut vec) };
        short_lived_vec.extend(s.split(' '));
        short_lived_vec.clear();
    }


unsafe fn fudge<'r, 'short, 'long: 'short>(vec: &'r mut Vec<&'long str>) -> &'r mut Vec<&'short str> {
    std::mem::transmute(vec)
}
1 Like

@kornel The solution I tried here seems equivalent to the your second unsafe option. And yes, I am clearing the vector before transmuting. Thank you so much!

Here is the project where I am using this: GitHub - ranjeethmahankali/ftag

Your struct is also self-referential (which Rust doesn't allow, and it's a whole another can of worms). It will have dangling pointers when dropped, because fields are dropped in the order of their definition, and raw_text gets freed before parsed. If Parsed had Drop, it could have use-after-free.

Ah, good catch! I am not implementing Drop for Parsed, so nothing should be able to read the data inside parsed after raw_text is dropped, before parsed is dropped. But yea, I will move parsed before raw_text in the struct to be safe. Thank you!

So, shouldn't this be ensured by design?

    let mut vec = Vec::with_capacity(10);

    loop {
        let s = String::from("a b c");
        vec.extend(s.split(' '));

        vec = recycle_vec(vec);
    }
}

fn recycle_vec<'a>(mut v: Vec<&'_ str>) -> Vec<&'a str> {
    v.clear();
    unsafe { std::mem::transmute(v) }
}

But, this bring me to a another point:

How many files can you read per second?

Let’s say you have a random-access transfer rate of 1 gigabyte per second, and your files are only 1 kilobyte each. This results in at most 1 million files per second, which translates to 1 microsecond per file.

However, a complete alloc() / dealloc() pair typically takes a few dozen nanoseconds on an average modern PC.

This means that repeated reallocation contributes to less than 5% of the processing time per file in this scenario. And if you’re unable to process 1 million files per second in the first place, the relative cost of reallocation becomes negligible, lost in the measurement noise.

So this could be a worthwhile optimization.

However, a simpler approach might work just as well: Preallocating Vecs using Vec::with_capacity(relatively_large_number). This reduces the number of reallocations and the associated data copying when the Vec needs to grow.

The relatively_large_number could be chosen to fit, for example, around 80% to 90% of the expected file sizes. This way, only about 10% of the files would trigger a reallocation, along with its relatively expensive data copying.

2 Likes

@mroth I am benchmarking this on a set of 300 files, each around 8 to 12 KB. Your estimates are very close. I saw a 10% performance improvement with this optimization.

I have to admit, I’m a bit skeptical. Would you mind sharing how long it takes in total to process these 300 files?

When I tried this change a couple of days ago, I saw the runtime for the command go from ~22ms to ~20ms. Since then I made other performance improvements, and changed things significantly. Now, Both run at the same speed!

hyperfine With optimization.

  Time (mean ± σ):      13.6 ms ±   0.3 ms    [User: 5.2 ms, System: 8.4 ms]
  Range (min … max):    13.0 ms …  15.5 ms    500 runs

hyperfine Without optimization (i.e. just return new vector each time)

  Time (mean ± σ):      13.6 ms ±   0.3 ms    [User: 5.3 ms, System: 8.4 ms]
  Range (min … max):    13.0 ms …  15.4 ms    500 runs

I ran this several times, saw some noise but generally cannot tell them apart.

Thank you for sharing the measurements. I had suspected that allocations wouldn’t have a significant impact compared to file operations, as file operations tend to be much slower than memory operations.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.