Database transactions and concurrency

I have a system which has many tokio tasks whose job is to import products on a Mysql database: They receive a struct, and populate the required tables. Each task keeps importing products in a loop.

Creating a product on the database is not trivial: there's the main product table, and several other tables with related data (like prices, images, etc.), and takes a bit of processing. That's why each tokio task has a &mut Transaction so that if anything fails when importing a product, the database will be kept consistent.

So each tokio tasks does something like this (simplified):

loop {
      let product = products_rx.recv().await.unwrap();
      let mut txn = pool.begin().await.unwrap();
      import_product(&mut txn, &product).await().unwrap();
      txn.commit().await.unwrap();
}

Now, this doesn't work: Because each task has its own Transaction and they all run in paralel, transactions keep stepping into each other, and tasks die because of mysql lock timeouts.

The only way I've managed to make it work, is to prevent concurrent imports via a RwLock/Mutex, inside the import_product function, so all tasks will use a mutex to prevent stepping into each other.

But I'd like to be able to import them in paralel in order to make the whole process faster. Is there any way?

Are you using the same connection? Perhaps connection pool is the solution here.

Each task has its own connection: pool.begin() retrieves a new connection and starts a transaction:

I don't use MySQL but check if it supports different transaction isolation levels and if the Rust library exposes them. For example some databases (like PostgreSQL) can use less "strict" transaction isolation levels like READ COMMITED or even READ UNCOMMITED that don't need to lock the whole table: if two different tasks update different parts of the table at the same time the queries effectively run in parallel. But that depends on the database, and I don't use MySQL so...

I did some research for you and... :wink:

  1. No, sqlx does not support setting transaction isolation levels, but that's not a problem, because you can always send a SET TRANSACTION as the first statement of the session; and

  2. MySQL supports all standard isolation levels, with specific locking requirements. Try issuing the following query as you first statement after a connection and check if it works: SET SESSION TRANSACTION ISOLATION LEVEL READ COMMITED. If it works just make sure to check what to expect from that isolation level and try/choose the right one for you. See https://dev.mysql.com/doc/refman/8.4/en/innodb-transaction-isolation-levels.html.

2 Likes

If the writes txns are conflicting (writing to the same records, or perhaps the same pages by chance), then it's possible that too many such concurrent txns is counterproductive. You could limit the number of such txns, but still allow more than one at a time. You could have a connection pool of limited size that is used for product creation.

1 Like

Something like an autoincrementing primary key can be an easy source of contention. If this is the case, it may make sense to split the import into two transactions: One which claims a unique key from the counter and another that actually does the import process with that ID. If the import fails, the only long-lasting effect would be a skipped ID in the sequence, which is unlikely to cause any issues later.

1 Like

"SHOW ENGINE INNODB STATUS" Reports that this seems to be the case: multiple threads inserting rows on the product_price table, which has an auto_increment primary key.

Now, I could delete that primary key, as it is not of any use. But the same would happen for other related tables, and I guess that would have implications in other parts of the system.

How about retrying the transaction? It's a more brute-force approach: If a collision happens, the thread that gets the Error result would invoke the import process again?

It looks like Innodb has special lock modes to help with this problem. I haven't worked with MySQL, so I don't know what the exact implications of the various options would be.

1 Like

Maybe try different lock modes for auto-incr, like this one. A good database should provide a low contention auto-incr of some kind.

innodb_autoinc_lock_mode = 2 (“interleaved” lock mode)

In this lock mode, no “INSERT-like” statements use the table-level AUTO-INC lock, and multiple statements can execute at the same time. This is the fastest and most scalable lock mode, but it is not safe when using statement-based replication or recovery scenarios when SQL statements are replayed from the binary log.

https://dev.mysql.com/doc/refman/8.0/en/innodb-auto-increment-handling.html

This won't solve the contention problem caused by the auto-incr's table lock.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.