I am loading the contents of a text file into a string, parsing that string and populating a struct with multiple Vec<&str> containing references to the string I parse. I take the struct and do some crunching. After that, I want to clear the original string and load the contents of another file into it, and repeat the process for a large number of files.
I want to reuse the original string to minimize allocations. If I drop the struct containing the Vec<&str> the borrow checker lets me mutably borrow the original string just fine. But during parsing, I have to allocate a new struct with new Vec<_>. Is there a way I can reuse the allocated memory of both the string and the Vec without reallocating either?
I tried to put the original string in an Rc<RefCell<_>> and do the borrow checking at runtime instead of compile time. But I couldn't get it working. I borrow a Ref<String> and parse it and do the crunching.. and .clear() the Vec<&str>. The borrow checker doesn't seem to realize that .clear() unborrows the data. I figured there must be a correct way of doing this. Any help is appreciated, thank you!
EDIT: I also considered populating my struct with indices, or Range<usize> instead of &str. But that just felt like an unergonomic way to trick the borrow checker.
Ok, thank you! Yes I was considering using an arena allocator. I also just googled and found all these threads talking about a similar issue. (I guess I didn't google the right terms before). Looks like there are crates written to solve exactly this issue, and other alternate approaches.
After reading through some of these posts, I ended up using the following pattern without any crates. This is not my actual code, but just an example of the pattern I used.
I am thinking this is safe, because until the returned reference to Parsed is dropped, the borrow checker won't allow the Loader to be borrowed again. I tested this pattern. It works and I am predictably seeing performance improvements because both raw_text and parsed are reused. But I rarely write unsafe Rust, and never used std::mem::transmute before so I am not sure. Is this really safe? Or am I introducing some weird UB? Thanks again!
Here's a safe version that takes advantage of a special case in the standard library that optimizes some combinations of vec.into_iter().collect() to preserve the original vec's allocation:
let mut vec = Vec::with_capacity(10);
loop {
let s = String::from("a b c");
let mut short_lived_vec: Vec<_> = vec.into_iter().collect();
short_lived_vec.extend(s.split(' '));
short_lived_vec.clear();
vec = short_lived_vec.into_iter().map(|_| "").collect();
}
It works in this case, but isn't guaranteed.
Here's an unsafe version, that is guaranteed to do what you want, but you must be very careful to always clear the short_lived_vec after every iteration of the loop, as otherwise it will be actually unsafe and have a use-after-free bug:
let mut vec = Vec::with_capacity(10);
loop {
let s = String::from("a b c");
let mut short_lived_vec = unsafe { fudge(&mut vec) };
short_lived_vec.extend(s.split(' '));
short_lived_vec.clear();
}
unsafe fn fudge<'r, 'short, 'long: 'short>(vec: &'r mut Vec<&'long str>) -> &'r mut Vec<&'short str> {
std::mem::transmute(vec)
}
@kornel The solution I tried here seems equivalent to the your second unsafe option. And yes, I am clearing the vector before transmuting. Thank you so much!
Your struct is also self-referential (which Rust doesn't allow, and it's a whole another can of worms). It will have dangling pointers when dropped, because fields are dropped in the order of their definition, and raw_text gets freed before parsed. If Parsed had Drop, it could have use-after-free.
Ah, good catch! I am not implementing Drop for Parsed, so nothing should be able to read the data inside parsed after raw_text is dropped, before parsed is dropped. But yea, I will move parsed before raw_text in the struct to be safe. Thank you!
let mut vec = Vec::with_capacity(10);
loop {
let s = String::from("a b c");
vec.extend(s.split(' '));
vec = recycle_vec(vec);
}
}
fn recycle_vec<'a>(mut v: Vec<&'_ str>) -> Vec<&'a str> {
v.clear();
unsafe { std::mem::transmute(v) }
}
But, this bring me to a another point:
How many files can you read per second?
Let’s say you have a random-access transfer rate of 1 gigabyte per second, and your files are only 1 kilobyte each. This results in at most 1 million files per second, which translates to 1 microsecond per file.
However, a complete alloc() / dealloc() pair typically takes a few dozen nanoseconds on an average modern PC.
This means that repeated reallocation contributes to less than 5% of the processing time per file in this scenario. And if you’re unable to process 1 million files per second in the first place, the relative cost of reallocation becomes negligible, lost in the measurement noise.
So this could be a worthwhile optimization.
However, a simpler approach might work just as well: Preallocating Vecs using Vec::with_capacity(relatively_large_number). This reduces the number of reallocations and the associated data copying when the Vec needs to grow.
The relatively_large_number could be chosen to fit, for example, around 80% to 90% of the expected file sizes. This way, only about 10% of the files would trigger a reallocation, along with its relatively expensive data copying.
@mroth I am benchmarking this on a set of 300 files, each around 8 to 12 KB. Your estimates are very close. I saw a 10% performance improvement with this optimization.
When I tried this change a couple of days ago, I saw the runtime for the command go from ~22ms to ~20ms. Since then I made other performance improvements, and changed things significantly. Now, Both run at the same speed!
hyperfine With optimization.
Time (mean ± σ): 13.6 ms ± 0.3 ms [User: 5.2 ms, System: 8.4 ms]
Range (min … max): 13.0 ms … 15.5 ms 500 runs
hyperfine Without optimization (i.e. just return new vector each time)
Time (mean ± σ): 13.6 ms ± 0.3 ms [User: 5.3 ms, System: 8.4 ms]
Range (min … max): 13.0 ms … 15.4 ms 500 runs
I ran this several times, saw some noise but generally cannot tell them apart.
Thank you for sharing the measurements. I had suspected that allocations wouldn’t have a significant impact compared to file operations, as file operations tend to be much slower than memory operations.