Setup database before running any of the test

Hello,

I have a function -

   async fn reset_and_seed_db() {
        // connect postgres
        let config: ConnectOptions =
            String::from("postgres://postgres:SuperSecret@localhost:5455/staking").into();
        let conn = Database::connect(config).await.unwrap();

        // reset database
        staking_db_migration::Migrator::fresh(&conn).await.unwrap();

        ... seed rows
    }

I want to call this function once before running any of the tests. Basically if I call this function each time in each test, then this is going to return error because the tests will be running in parallel, and in such cases postgres returns an error https://groups.google.com/g/sqlalchemy/c/WoywAt3W2Lo (if I run cargo test -- --test-threads=1 then it works fine). That is why I want to run this reset and seed function once only before calling the test. I have looked into https://doc.rust-lang.org/stable/std/sync/struct.Once.html#method.call_once but it seems like using async with closures is currently unstable and is giving me a tough time to implement.

If you have any solution or suggestion about how should I do this or organize my tests then please let me know.

Thank You

1 Like

I've used a Mutex<bool> before for a similar purpose. Something like this

use tokio::sync::Mutex;

static INITIALISED: Mutex<bool> = Mutex::const_new(false);

async fn reset_and_seed_db() {
    let mut initialised = INITIALISED.lock().await;
    if *initialised {
        return;
    }
    
    // do stuff

    *initialised = true;
}

Whichever function happens to lock INITIALISED first will run the function while the others wait, and after its done the others will just check the bool, see that it's set to true and continue on with the test.

1 Like

Note though that a mutex around a small integer primitive is usually a red flag. The right thing to do if you need such a construct is to use an atomic:

use std::sync::atomic::{AtomicBool, Ordering};

static INITIALISED: AtomicBool = AtomicBool::new(false);

async fn reset_and_seed_db() {
    if INITIALISED.swap(true, Ordering::SeqCst) {
        return;
    }
    
    // do stuff
}

I'm not sure if that's a good idea in general because atomics (even with SeqCst) don't necessarily synchronize with all other operations. See also: This post on SeqCst in the "Better understanding atomics" thread.

When you feel like you need SeqCst, it's most often the wrong choice, as far as I understand the topic.

I don't feel I specifically need SeqCst, but it's the safest choice. In this case, its associated overhead is negligible compared to doing anything with a database, so I wanted to play safe.

What other operations are there that would need to be synchronized with? This is a single function with a single critical section. I don't see how the linked post's fear-mongering is relevant. (That post also comes across as automatically denouncing everyone chosing SeqCst as incompetent, which is a stretch to say the least.)

My note was not referring to the particular use case but to:

I personally would use atomics when Relaxed is sufficient, else use mutexes. And if I really want to use atomics, I would first thoroughly read the synchronization guarantees, which are really complex to understand.

As I said, I was not referring to the particular use case but to the general case:

The general problem with atomics and synchronization is that it matters whether you load or store values (edit: and which values are stored/loaded, respectively).


Regarding to the particular use case, I'm currently not sure if there is any problem or not.

The use of swap might be a problem since the store operation isn't happening after the critical section in that example

1 Like

I don't think this works:

use std::sync::atomic::{AtomicBool, Ordering};

static INITIALISED: AtomicBool = AtomicBool::new(false);

async fn reset_and_seed_db() {
    if INITIALISED.swap(true, Ordering::SeqCst) {
        return;
    }
    std::thread::sleep(std::time::Duration::from_millis(100));
    println!("Doing stuff. (This must happen first!)")
}

async fn foo() {
    reset_and_seed_db().await;
    println!("I hope stuff has been done.");
}

#[tokio::main]
async fn main() {
    let t1 = tokio::spawn(foo());
    let t2 = tokio::spawn(foo());
    let (r1, r2) = tokio::join!(t1, t2);
    r1.unwrap();
    r2.unwrap();
}

(Playground)

Output:

I hope stuff has been done.
Doing stuff. (This must happen first!)
I hope stuff has been done.


P.S.: What you really need is an async(!) mutex like @Heliozoa suggested, because you want to suspend the other tasks until initialization is complete. (see Playground with tokio's Mutex)

The last time I did something like this, I set it up so that the function that performed the initialization was what a test used to get a database connection. The Mutex protected an Option<DbConnectionPool>, so the first test found None and initialized the database, and each test after that just accessed the pool and grabbed a connection from it.

1 Like

Tokio provides a synchronization primitive for this: tokio::sync::OnceCell

5 Likes

Well, it depends on what you mean by "works". It works as it should, although that's probably not what was intended.

Incidentally, this makes me wonder whether one should bother with async in tests at all. (There's just not enough load to be worth the added complexity – tests don't need to be maximal-throughput.)

Indeed, but what would be better is an async-aware atomic-like construct. I still 100% uphold my assertion that Mutex<bool> is an anti-pattern. Alice's suggestion above that was designed for doing things like this is probably the best actual solution in this case.

The issue is that the only reason to use SeqCst is to get the synchronization with other operations. If all you care about is synchronization with other uses of this specific atomic, then Relaxed is plenty good enough - you don't need any of the heavier flushes.

And it's not unknown for the cost of SeqCst atomics to be higher than the cost of an Arc<Mutex<bool>>, which is why SeqCst is a code smell - it's surprisingly subtle in meaning, and use of SeqCst instead of Relaxed often implies that you've not thought about what you actually need here, but have just gone for SeqCst because it's the strongest ordering.

IMO, SeqCst atomics are more of a code smell than Arc<Mutex<bool>> - with Arc<Mutex<bool>>, I know you've not thought about it, and you've just gone for the most obvious implementation that works, while with SeqCst, I have to look at all accesses to the atomic to determine whether you're doing something deeply subtle and clever, or whether you've just chosen SeqCst because you weren't thinking it through.

To quote Mara Bos's excellent book on Rust Atomics and Locks:

While it might seem like the easiest memory ordering to reason about, SeqCst ordering is almost never necessary in practice.

In this particular case, Relaxed is almost certainly the right ordering to use - you only care about ordering with other users of this atomic, so you don't need Acquire or Release semantics, and given four threads (1, 2, 3, 4) touching the atomic in order, thread 4 does not care whether it agrees with thread 3 on the order in which threads 1 and 2 touched the atomic, only about the value it sees.

1 Like

Not sure which case you refer to, but for ensuring the initialization has happened once, even SeqCst isn't sufficient, as I showed in this previous post. Thus neither Relaxed nor SeqCst are right (and using SeqCst may be misleading to imply some guarantees which don't exist).

1 Like

For the case where you're replacing Arc<Mutex<bool>> with AtomicBool (or similar for other primitives), Relaxed is normally good enough. Same applies if you remove the Arc or Mutex layer from around the primitive.

If Arc<Mutex<bool>> is not good enough, then an atomic isn't either - AtomicBool is just a performance optimization against Arc<Mutex<bool>>.

The reason that the mutex is (probably) necessary over the atomic is that it is actually an Mutex<(bool, SomeFFIControlledStruct)>, but the extra data is just stored next to the mutex rather than inside it. The mutex performs the important job of synchronizing access to the FFI thing.

2 Likes

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.