For concurrent read access (99%) and write (1%) exist a better way than Rwlock

hi.
i have near fixed data load from postgres to hashmap, first-time server started
and sometimes (1%) write new entity to this.

how much is overhead of Rwlock::read( ) for concurrent access (run with 10 core) ??
exist a dirty way for zero-overhead ?!

arc_swap - Rust has lower read overhead than RwLock (iirc it avoids the problem of readers-counter-shared-between-cpus, like rwlock has)

Although, you cannot modify the arc in-place, so you'd have to clone the hashmap, I believe. If you write often enough and you have large enough num of elemes, that'd be a problem. But you can mitigate expensive clones by replacing std hasmap with im::HashMap - Rust. It's probably a little bit slower than std hashmap, but my guess would be that combined with getting rid of rwlock, it'd be a net perf win.

3 Likes

thanks. yes our data size is near 1GB maybe increased with very slowely.
our system is complete cpu-bound.

sorry for my wrong
actually we use indexmap maybe switch to Vec . because
almost fetching data by filtering not key

and performance is crucial.
I tested it before, rust can clone a heavy data , very fast. not problem.

There’s also ShardedLock in crossbeam::sync - Rust


Also using RwLock in parking_lot - Rust instead of standard library version should already be an improvement as that one should not be doing any os-calls whenever it does not have to block.


Which of these is better I don’t know, but they should be easy to swap out and benchmark.

3 Likes

but a question.
when a thread doing cloning. all other thread can read from it ??

thanks but our service are tokio based , and ShardedLock lock whole thread ,
but tokio::RwLock lock just task

Oh… it’s an asynchronous lock. Well, do you ever hold the write lock over an .await point though? If not, note that tokio docs suggest using actual blocking locking primitives for short-duration locking because those are more performant. See: “Which kind of Mutex should I use” which also applies to RwLock.

Edit: Admitted, a write operation on an ~1GB collection can perhaps take a while when it resizes… On the other hand, resizes are probably quite rare and mostly happen during startup, so it might potentially be not too bad.

1 Like

actually i done many research and test between.
tokio doc wrote a miniredis example.
std::mutex is better for that because that hot codes is ( lock -> fetch -> write)
but for our case we get , run many algorithm over them ,
take : 70 microsecond

actually tokio::sync::Mutex not good for ram-bound server.
but for our case i tested , tokio::mutex increase whole throughout of system

maybe you said correct but what about locking !!, it have overhead

we changed half of constant data structures to enum,
like ( Product_category , .... ) its very improved performance , and removed lock complete

I simulate 1_000 times get a Vector with ( RwLock and ArcSwap )
RwLock was 4x times faster than ArcSwap

RwLock result : 4.575021ms
ArcSwap result : 16.932217ms



use std::sync::Arc;
use arc_swap::ArcSwap; 
use tokio::{time::Instant, sync::RwLock};

#[tokio::main]
async fn main() {
    benchmark_rwlock().await;
    benchmark_atomic().await;
}


async fn benchmark_rwlock() {
    let mut v = vec![];
    (0..100i32).into_iter().for_each(|i| v.push(i.to_string()));
    let storage = Arc::new(RwLock::new(v));
    
    
    let now = Instant::now();
    // ===================================
    
    for i in 0..1_000 {
        let arr = storage.read().await;
        println!("==> {}", arr[i % 90])
    }
    

    // ===================================
    let new_now = tokio::time::Instant::now();
    println!("{:?}", new_now.checked_duration_since(now));
    
    
}

async fn benchmark_atomic() {
    // chech concurrent access and change
    let mut v = vec![];
    (0..100i32).into_iter().for_each(|i| v.push(i.to_string()));
    let storage = ArcSwap::from(Arc::new(v));
    
    
    let now = Instant::now();
    // ===================================
    
    for i in 0..1_000 {
        let arr = storage.load();
        println!("==> {}", arr[i % 90])
    }
    

    // ===================================
    let new_now = tokio::time::Instant::now();
    println!("{:?}", new_now.checked_duration_since(now));
    
    
}



You really shouldn't use println! in benchmarks. When I run it, the println! calls take up 99.04% of the running time. Additionally, the arc-swap runs in around 20µs and the rwlock in around 50µs in my tests, so I get that arc-swap is more than twice as fast.

Also, your running times a really high. Did you run in release mode?

That's with this code:

use arc_swap::ArcSwap;
use std::sync::Arc;
use tokio::{sync::RwLock, time::Instant};

#[tokio::main]
async fn main() {
    benchmark_rwlock().await;
    benchmark_atomic().await;
}

async fn benchmark_rwlock() {
    let mut v = vec![];
    (0..100i32).into_iter().for_each(|i| v.push(i.to_string()));
    let storage = Arc::new(RwLock::new(v));

    let mut sum = 0;
    let now = Instant::now();
    // ===================================

    for i in 0..1_000 {
        let arr = storage.read().await;
        sum += arr[i % 90].len();
    }

    // ===================================
    let new_now = tokio::time::Instant::now();
    println!("{:?} {}", new_now.checked_duration_since(now), sum);
}

async fn benchmark_atomic() {
    // chech concurrent access and change
    let mut v = vec![];
    (0..100i32).into_iter().for_each(|i| v.push(i.to_string()));
    let storage = ArcSwap::from(Arc::new(v));

    let mut sum = 0;
    let now = Instant::now();
    // ===================================

    for i in 0..1_000 {
        let arr = storage.load();
        sum += arr[i % 90].len();
    }

    // ===================================
    let new_now = tokio::time::Instant::now();
    println!("{:?} {}", new_now.checked_duration_since(now), sum);
}
3 Likes

thanks, I ran in dev mode with opt-level = 3

Would be interesting to see how ArcSwap compares with RwLock when there's actually multiple reader threads. Not sure if I read the code correctly, but it seems to be single-threaded right now.

2 Likes

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.