Is an arena allocator good for my use case?

I'm writing an app similar to a web browser. At one point, when the user is typing in the navigation bar, I'm going to show a list of suggestion URLs that come from a navigation history table stored in SQLite. The list of suggestions will have a fixed size, say 1000 elements or less, and after each key stroke I need to regenerate that list with URLs that match what the user has written.

I have thought that using an arena for my 1000 strings would be a good idea, since I will be truncating it and refilling it with 1000 new elements quite often. However, my tests with bumpalo haven't turned out as I expected: using the arena is much slower. I suspect I'm not testing this thing correctly, so I would appreciate anyone that wants to give some feedback on the test I wrote and the overall design of this feature.

use std::time::Instant;
use bumpalo::{Bump, collections::String as BString};

// This simulates 1000 results that are read from SQLite. I don't own them, so I only have
// references to them
const DB_RESULTS: [&str; 1000] = [
    "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut \
    labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco \
    laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in \
    voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat \
    cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.";
    1000
];

fn normal() {
    // This is my storage with 1000 Strings, normally allocated
    let mut storage: Vec<String> = Vec::with_capacity(1000);
    // I will try truncating and filling my storage 5000 times
    let mut tries = 5000;
    while tries > 0 {
        storage.truncate(0);
        for item in DB_RESULTS.iter() {
            storage.push(String::from(*item));
        }
        tries -= 1;
    }
}

fn arena() {
    // This is my arena with enough capacity to allocate my 1000 results
    let mut b = Bump::with_capacity(DB_RESULTS.len() * DB_RESULTS[0].len());
    // I will try truncating and filling my storage 5000 times
    let mut tries = 5000;
    while tries > 0 {
        b.reset();
        for item in DB_RESULTS.iter() {
            BString::from_str_in(*item, &b);
        }
        tries -= 1;
    }
}

fn main() {
    let normal_start = Instant::now();
    normal();
    let normal_elapsed = normal_start.elapsed();
    println!("Normal allocation: {:.2?}", normal_elapsed);

    let arena_start = Instant::now();
    arena();
    let arena_elapsed = arena_start.elapsed();
    println!("Arena allocation: {:.2?}", arena_elapsed);

    println!("arena is {} times slower than normal",
             arena_elapsed.as_secs_f32() / normal_elapsed.as_secs_f32());
}

The text data you're testing is hard-coded and never changes, such that the compiler is probably able to optimize away a large part of the truncating and pushing for the "normal" case. You can try generating a list of 1000 random strings for each run instead, which should bring the two results much closer together.

For example, use a function like this:

use rand::{distributions::Alphanumeric, Rng};

// generate a random 80-char ASCII string
fn get_random_string() -> String {
    rand::thread_rng()
        .sample_iter(&Alphanumeric)
        .take(80)
        .map(char::from)
        .collect()
}

However, I suspect this is all premature optimization. A slow allocation will still be sub-millisecond, so the user is not going to notice any delay caused by using the regular allocator.

1 Like

Hi, thanks for taking the time to reply. I also imagined there could be some optimizations going on. I took your suggested function and modified my code to use that instead, and I made sure it gets new strings for each run. Still, the arena version is way slower (I'm only measuring time for code that deals with the allocation, the run time of your function is excluded) than the normal one.

use std::time::{Duration, Instant};
use bumpalo::{Bump, collections::String as BString};
use rand::{distributions::Alphanumeric, Rng};
use core::array;

fn get_random_strings() -> [String; 1000] {
    core::array::from_fn(|_|
        rand::thread_rng()
        .sample_iter(&Alphanumeric)
        .take(200)
        .map(char::from)
        .collect()
    )
}

fn normal() -> Duration {
    // This is my storage for 1000 Strings, normally allocated
    let mut storage: Vec<String> = Vec::with_capacity(1000);
    // I will try truncating and filling my storage 100 times
    let mut tries = 100;
    let mut duration = Duration::ZERO;
    while tries > 0 {
        let db_results = get_random_strings();
        let start = Instant::now();
        storage.truncate(0);
        for item in db_results.iter() {
            storage.push(String::from(item));
        }
        duration += start.elapsed();
        tries -= 1;
    }
    return duration;
}

fn arena() -> Duration {
    // This is my arena storage for 1000 Strings
    let mut b = Bump::with_capacity(1000 * 200);
    // I will try truncating and filling my storage 100 times
    let mut tries = 100;
    let mut duration = Duration::ZERO;
    while tries > 0 {
        let db_results = get_random_strings();
        let start = Instant::now();
        b.reset();
        for item in db_results.iter() {
            BString::from_str_in(item, &b);
        }
        duration += start.elapsed();
        tries -= 1;
    }
    return duration;
}

fn main() {
    let normal_duration = normal();
    println!("Normal allocation: {:.2?}", normal_duration);

    let arena_duration = arena();
    println!("Arena allocation: {:.2?}", arena_duration);

    println!("arena is {} times slower than normal",
             arena_duration.as_secs_f32() / normal_duration.as_secs_f32());
}

Sample output:

Normal allocation: 22.81ms
Arena allocation: 1.17s
arena is 51.45882 times slower than normal

Oh that's definitely the case, but since this is a learning project I'm taking the time to try this sort of stuff and learn from it :slight_smile:

Obligatory "are you running in release mode?"

If you're just doing normal cargo run without the --release flag, it isn't that suprising that the standard allocator is faster. You're comparing the optimized system allocator to an unoptimized allocator.

1 Like

Oops, I was not, good catch. The arena version is still slower now, but not by a ridiculous amount.

Normal allocation: 10.04ms
Arena allocation: 37.43ms
arena is 3.7294426 times slower than normal

I'm not super confident this is the case, but it does seem plausible that the compiler is avoiding some of the string clones since the array of strings is never used after you iterate over it. I added some code to try and force the clones, and the ratio is generally lower. However I also see calls to clone in the call stack even without that change when looking at a flamegraph so I'm not entirely sure what's happening there.

1 Like

Maybe I should just write the actual code where I'm planning to use this, with the two allocators, and check if there is an improvement. But benchmark aside, would you agree that in theory this is the scenario where an arena should be used?

1 Like

It seems like a reasonable place to use an arena to me, yes. Whether or not the simple version using the system allocator would actually be slow enough to matter, I'm not sure.

1 Like

Another simple method you could try is to do a static allocation upfront.

Specifically, you could create a fixed-length u8 vec of (say) 1 million bytes. Give 1000 bytes to each of the 1000 URLs, and store the first URL in the slice [0..1000], the second URL in [1000..2000], etc. Since you don't allocate extra memory at runtime, this is guaranteed to be faster than even the fastest arena allocator.

But you now have two new problems: 1) you need to constantly occupy around 1 MB of memory (which hopefully shouldn't be an issue) and 2) you can't accommodate URLs over 1000 bytes. This method would work best if your URLs have a known max length. Alternatively, if the number of extra-long URLs is small, you can allocate extra memory for exceptional cases.

1 Like

Ah, I see, sort of writing an allocator of my own? That's interesting, I had thought about something similar in using one big String and putting everything in there while keeping track of where a URL ends and the next one starts. A friend of mine actually pointed out that's basically an arena, which is why I was investigating this crate. I may give these ideas a try and see how they perform. Thanks.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.