For concurrent read access (99%) and write (1%) exist a better way than Rwlock

DanyalMh · February 10, 2022, 11:44am

hi.
i have near fixed data load from postgres to hashmap, first-time server started
and sometimes (1%) write new entity to this.

how much is overhead of Rwlock::read( ) for concurrent access (run with 10 core) ??
exist a dirty way for zero-overhead ?!

krdln · February 10, 2022, 12:04pm

arc_swap - Rust has lower read overhead than RwLock (iirc it avoids the problem of readers-counter-shared-between-cpus, like rwlock has)

Although, you cannot modify the arc in-place, so you'd have to clone the hashmap, I believe. If you write often enough and you have large enough num of elemes, that'd be a problem. But you can mitigate expensive clones by replacing std hasmap with HashMap in im - Rust. It's probably a little bit slower than std hashmap, but my guess would be that combined with getting rid of rwlock, it'd be a net perf win.

DanyalMh · February 10, 2022, 12:24pm

thanks. yes our data size is near 1GB maybe increased with very slowely.
our system is complete cpu-bound.

sorry for my wrong
actually we use indexmap maybe switch to Vec . because
almost fetching data by filtering not key

and performance is crucial.
I tested it before, rust can clone a heavy data , very fast. not problem.

steffahn · February 10, 2022, 12:39pm

There’s also ShardedLock in crossbeam::sync - Rust

Also using RwLock in parking_lot - Rust instead of standard library version should already be an improvement as that one should not be doing any os-calls whenever it does not have to block.

Which of these is better I don’t know, but they should be easy to swap out and benchmark.

DanyalMh · February 10, 2022, 12:40pm

but a question.
when a thread doing cloning. all other thread can read from it ??

DanyalMh · February 10, 2022, 12:43pm

thanks but our service are tokio based , and ShardedLock lock whole thread ,
but tokio::RwLock lock just task

steffahn · February 10, 2022, 12:45pm

Oh… it’s an asynchronous lock. Well, do you ever hold the write lock over an .await point though? If not, note that tokio docs suggest using actual blocking locking primitives for short-duration locking because those are more performant. See: “Which kind of Mutex should I use” which also applies to RwLock.

Edit: Admitted, a write operation on an ~1GB collection can perhaps take a while when it resizes… On the other hand, resizes are probably quite rare and mostly happen during startup, so it might potentially be not too bad.

DanyalMh · February 10, 2022, 12:52pm

actually i done many research and test between.
tokio doc wrote a miniredis example.
std::mutex is better for that because that hot codes is ( lock -> fetch -> write)
but for our case we get , run many algorithm over them ,
take : 70 microsecond

actually tokio::sync::Mutex not good for ram-bound server.
but for our case i tested , tokio::mutex increase whole throughout of system

DanyalMh · February 10, 2022, 12:55pm

maybe you said correct but what about locking !!, it have overhead

we changed half of constant data structures to enum,
like ( Product_category , .... ) its very improved performance , and removed lock complete

DanyalMh · February 14, 2022, 9:27am

I simulate 1_000 times get a Vector with ( RwLock and ArcSwap )
RwLock was 4x times faster than ArcSwap

RwLock result : 4.575021ms
ArcSwap result : 16.932217ms



use std::sync::Arc;
use arc_swap::ArcSwap; 
use tokio::{time::Instant, sync::RwLock};

#[tokio::main]
async fn main() {
    benchmark_rwlock().await;
    benchmark_atomic().await;
}


async fn benchmark_rwlock() {
    let mut v = vec![];
    (0..100i32).into_iter().for_each(|i| v.push(i.to_string()));
    let storage = Arc::new(RwLock::new(v));
    
    
    let now = Instant::now();
    // ===================================
    
    for i in 0..1_000 {
        let arr = storage.read().await;
        println!("==> {}", arr[i % 90])
    }
    

    // ===================================
    let new_now = tokio::time::Instant::now();
    println!("{:?}", new_now.checked_duration_since(now));
    
    
}

async fn benchmark_atomic() {
    // chech concurrent access and change
    let mut v = vec![];
    (0..100i32).into_iter().for_each(|i| v.push(i.to_string()));
    let storage = ArcSwap::from(Arc::new(v));
    
    
    let now = Instant::now();
    // ===================================
    
    for i in 0..1_000 {
        let arr = storage.load();
        println!("==> {}", arr[i % 90])
    }
    

    // ===================================
    let new_now = tokio::time::Instant::now();
    println!("{:?}", new_now.checked_duration_since(now));
    
    
}

alice · February 14, 2022, 10:11am

You really shouldn't use println! in benchmarks. When I run it, the println! calls take up 99.04% of the running time. Additionally, the arc-swap runs in around 20µs and the rwlock in around 50µs in my tests, so I get that arc-swap is more than twice as fast.

Also, your running times a really high. Did you run in release mode?

That's with this code:

use arc_swap::ArcSwap;
use std::sync::Arc;
use tokio::{sync::RwLock, time::Instant};

#[tokio::main]
async fn main() {
    benchmark_rwlock().await;
    benchmark_atomic().await;
}

async fn benchmark_rwlock() {
    let mut v = vec![];
    (0..100i32).into_iter().for_each(|i| v.push(i.to_string()));
    let storage = Arc::new(RwLock::new(v));

    let mut sum = 0;
    let now = Instant::now();
    // ===================================

    for i in 0..1_000 {
        let arr = storage.read().await;
        sum += arr[i % 90].len();
    }

    // ===================================
    let new_now = tokio::time::Instant::now();
    println!("{:?} {}", new_now.checked_duration_since(now), sum);
}

async fn benchmark_atomic() {
    // chech concurrent access and change
    let mut v = vec![];
    (0..100i32).into_iter().for_each(|i| v.push(i.to_string()));
    let storage = ArcSwap::from(Arc::new(v));

    let mut sum = 0;
    let now = Instant::now();
    // ===================================

    for i in 0..1_000 {
        let arr = storage.load();
        sum += arr[i % 90].len();
    }

    // ===================================
    let new_now = tokio::time::Instant::now();
    println!("{:?} {}", new_now.checked_duration_since(now), sum);
}

DanyalMh · February 14, 2022, 10:21am

thanks, I ran in dev mode with opt-level = 3

krdln · February 14, 2022, 11:08am

Would be interesting to see how ArcSwap compares with RwLock when there's actually multiple reader threads. Not sure if I read the code correctly, but it seems to be single-threaded right now.

system · May 15, 2022, 11:08am

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.

Topic		Replies	Views
Is it possible to have Read only HashMap across threads without Mutex? help	12	7242	January 12, 2023
100s of thousands of RwLocks	4	435	June 8, 2023
A question about thread ownership and lock granularity help	5	310	November 29, 2022
Ways to share updated hashmap in multiple threads without using Arc<mutex> and Rwlock	9	3717	June 15, 2022
Could we make `std::sync::RwLock::read` faster?	13	2176	September 14, 2022

For concurrent read access (99%) and write (1%) exist a better way than Rwlock

Related Topics