Really slow MySQL inserts

fn build_temporary_table(&self, tx: &mut Transaction) -> Result<(), Error> {
        const CREATE_INUSE_UUIDS_SQL: &str = r"CREATE TEMPORARY TABLE uuids_in_use (
            region_loc_x INT NOT NULL,
            region_loc_y INT NOT NULL,
            region_size_x INT NOT NULL,
            region_size_y INT NOT NULL,
            asset_type VARCHAR(20) NOT NULL,
            asset_uuid CHAR(36) DEFAULT NULL,
            asset_hash CHAR(8) NOT NULL)";
        const INSERT_INUSE_UUIDS_SQL: &str = r"INSERT INTO uuids_in_use
            (region_loc_x, region_loc_y, region_size_x, region_size_y, asset_type, asset_uuid, asset_hash)
            VALUES
            (:region_loc_x, :region_loc_y, :region_size_x, :region_size_y, :asset_type, :asset_uuid, :asset_hash)";
        //  Get all the active UUIDs. In memory all at once, but under 1MB
        let active_uuids = self.get_uuids_in_use(tx)?;
        log::info!("{} active UUIDs: {:?}", active_uuids.len(), &active_uuids[0..5.min(active_uuids.len())]);
         //  Create the temporary table
        tx.query_drop(CREATE_INUSE_UUIDS_SQL)?;
        log::info!("Temporary table created.");
        //  Put all the records in the temporary table.
        tx.exec_batch(
            INSERT_INUSE_UUIDS_SQL,
            active_uuids.iter().map(|p| params! {
                "region_loc_x" => p.region_loc[0],
                "region_loc_y" => p.region_loc[1],
                "region_size_x" => p.region_size[0],
                "region_size_y" => p.region_size[1],
                "asset_uuid" => p.asset_uuid.to_string(),
                "asset_hash" => p.asset_hash.clone(),
                "asset_type" => p.asset_type.to_str().to_string(),
            })
        )?;
        log::info!("Temporary table filled.");
        Ok(())
    }
03:09:00 [INFO] 406 active UUIDs: ...
03:09:00 [INFO] Temporary table created.
03:09:26 [INFO] Temporary table filled.

For some reason, MySQL called from Rust took 26 seconds to insert 406 records into an empty temporary table. SELECT is fast, but INSERT is insanely slow. 15 records per second.

I tried ENGINE MEMORY (the data doesn't even go out to disk) and got

03:15:19 [INFO] Temporary table created.
03:15:45 [INFO] Temporary table filled.

so it still took 26 seconds. Same as with the default InnoDB engine. This suggests it's not the database engine that's the problem.

This is Dreamhost shared hosting, being tested using remote access to the MySQL database from the "mysql" crate.

Is there something really slow about how Rust does this? Extreme throttling of external INSERT operations on Dreamhost? Something else?

MySQL 8.0.0, which is old.

i recommend you try accessing the database with some already existing frontend and see if the issue persists

1 Like

You're pasting row-by-row, then returning back (trip) so if you paste around 400 records it is going to send request 400times (50ms * 400 = +-20sec). Problem is your implementation. Use multi-row insert, which will insert basically all of 400records at once.

INSERT INTO table (...) VALUES
(...),
(...),
(...),
...

-basically sending just a table

3 Likes

One would naively assume that a function called exec_batch would actually send the statements as a batch, and not one statement per request (!!) but who knows…

2 Likes

One would expect exec_batch to do that efficiently. exec_batch accepts an iterator for the data, so the whole point is to feed the database.

One giant VALUES clause would work for this test set, but the real data will have about 20,000 rows, which is too much for that approach.

I put a debug log entry inside the iterator to watch the insertions. 17 records in one second. Bleah. And that's with ENGINE MEMORY; no disk at all. And the table has no indices. See below.

I'm using the approach recommended in the "mysql" crate documentation here: mysql - Rust which is one SQL statement with an iterator providing the data. Reasonable.

I looked at the code in the "mysql" crate, at rust-mysql-simple/src/conn/query.rs at 08b7f90aec8586e1fc64a3a1fa9cb09bfdce98aa · blackbeam/rust-mysql-simple · GitHub

Although the comments indicate it uses MySQL's bulk mode, what the code seems to do (look at the loop that begins at line 384) is iterate over "params" and do each substatement as a separate operation. Am I reading that right?

This is all so mainstream CRUD app it ought to Just Work. Missing something dumb here.

(Can't use LOAD DATA LOCAL; I'm not running on the same machine as the MySQL server.)

17:23:33 [INFO] Temporary table created.
17:23:33 [DEBUG] (2) common::tileassetsgc: Inserting [262144, 262144]
17:23:33 [DEBUG] (2) common::tileassetsgc: Inserting [262144, 262144]
17:23:33 [DEBUG] (2) common::tileassetsgc: Inserting [278528, 262144]
17:23:33 [DEBUG] (2) common::tileassetsgc: Inserting [278528, 262144]
17:23:33 [DEBUG] (2) common::tileassetsgc: Inserting [278528, 278528]
17:23:33 [DEBUG] (2) common::tileassetsgc: Inserting [278528, 278528]
17:23:33 [DEBUG] (2) common::tileassetsgc: Inserting [286720, 262144]
17:23:33 [DEBUG] (2) common::tileassetsgc: Inserting [286720, 262144]
17:23:33 [DEBUG] (2) common::tileassetsgc: Inserting [286720, 266240]
17:23:33 [DEBUG] (2) common::tileassetsgc: Inserting [286720, 266240]
17:23:33 [DEBUG] (2) common::tileassetsgc: Inserting [286720, 270336]
17:23:33 [DEBUG] (2) common::tileassetsgc: Inserting [286720, 270336]
17:23:33 [DEBUG] (2) common::tileassetsgc: Inserting [286720, 274432]
17:23:33 [DEBUG] (2) common::tileassetsgc: Inserting [286720, 274432]
17:23:34 [DEBUG] (2) common::tileassetsgc: Inserting [286720, 278528]
17:23:34 [DEBUG] (2) common::tileassetsgc: Inserting [286720, 278528]
17:23:34 [DEBUG] (2) common::tileassetsgc: Inserting [286720, 278528]
17:23:34 [DEBUG] (2) common::tileassetsgc: Inserting [286720, 278528]
17:23:34 [DEBUG] (2) common::tileassetsgc: Inserting [288768, 268288]
17:23:34 [DEBUG] (2) common::tileassetsgc: Inserting [288768, 268288]
17:23:34 [DEBUG] (2) common::tileassetsgc: Inserting [288768, 268288]
17:23:34 [DEBUG] (2) common::tileassetsgc: Inserting [288768, 268288]
17:23:34 [DEBUG] (2) common::tileassetsgc: Inserting [288768, 269312]
17:23:34 [DEBUG] (2) common::tileassetsgc: Inserting [288768, 269312]
17:23:34 [DEBUG] (2) common::tileassetsgc: Inserting [288768, 276480]
17:23:34 [DEBUG] (2) common::tileassetsgc: Inserting [288768, 276480]
17:23:34 [DEBUG] (2) common::tileassetsgc: Inserting [288768, 278528]
17:23:34 [DEBUG] (2) common::tileassetsgc: Inserting [288768, 278528]
17:23:34 [DEBUG] (2) common::tileassetsgc: Inserting [289280, 268288]
17:23:34 [DEBUG] (2) common::tileassetsgc: Inserting [289280, 268288]
17:23:35 [DEBUG] (2) common::tileassetsgc: Inserting [289280, 268800]
17:23:35 [DEBUG] (2) common::tileassetsgc: Inserting [289280, 268800]
17:23:35 [DEBUG] (2) common::tileassetsgc: Inserting [289280, 269312]

I want to second this recommendation:

It seems very possible that the cloud service you're using is slow. You should at least rule that out.

2 Likes

Wrt efficient bulk insert, at least Postgres supports COPY FROM for loading serialized data fast, maybe Mysql has similar?

It's a slow shared server, but 17TPS? Come on.

I think the Rust crate is doing a separate INSERT for each row, as I mentioned above.
The bulk operation may be for MariaDB only. See https://jira.mariadb.org/browse/MDEV-9114

I don't have a non-Rust front end handy, although I could write something in Python.
The command line MySQL client doesn't support bulk mode INSERT operations; you have to explicitly put all the data in the SQL statement.

This can't be a unique problem. I mostly do graphics, not CRUD apps, so this area is new to me. It must be old to others.

It could be anywhere in the stack. One quick check is to measure ping times to the server, to at least rule out network issues.

> ping -c 5 terrain.animats.info
PING terrain.animats.info (208.113.244.178) 56(84) bytes of data.
64 bytes from figueira.iad1-mysql-e2-6b.dreamhost.com (208.113.244.178): icmp_seq=1 ttl=47 time=64.1 ms
64 bytes from figueira.iad1-mysql-e2-6b.dreamhost.com (208.113.244.178): icmp_seq=2 ttl=47 time=64.0 ms
64 bytes from figueira.iad1-mysql-e2-6b.dreamhost.com (208.113.244.178): icmp_seq=3 ttl=47 time=64.0 ms
64 bytes from figueira.iad1-mysql-e2-6b.dreamhost.com (208.113.244.178): icmp_seq=4 ttl=47 time=64.3 ms
64 bytes from figueira.iad1-mysql-e2-6b.dreamhost.com (208.113.244.178): icmp_seq=5 ttl=47 time=64.1 ms

--- terrain.animats.info ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4006ms
rtt min/avg/max/mdev = 63.969/64.090/64.277/0.107 ms

64 ms round trip, or about 17 round trips per second. So this is what you'd expect if the "mysql" connector doesn't do bulk mode.

OK, so:

  • MariaDB has support for bulk insert, and Rust "mysql" supports this, but MySQL does not. exec_bulk isn't very "bulk" except for MariaDB. I'm stuck with MySQL 8.0 here.
  • Rust "sqlx" has QueryBuilder in sqlx - Rust which can assemble an insert with multiple Values clauses, but it doesn't do all the escaping and sanitization needed for a safe INSERT; it just constructs a big string SQL statement with a huge VALUES clause. And it's the user's responsibility to make sure the statement isn't too long.
  • Constructing a big SQL string with rows and rows of VALUE items using string operations and manual escaping directly is about equally easy as "sqlx". It kind of defeats the purpose of the "mysql" crate, with all its iterator and name interpolation features.

Well, now I understand what's wrong. Bleah.

1 Like

Even if you only can send 1000 or even 100 values per request, this could increase throughput by orders of magnitude. Using VALUES is the recommended solution as far as I can tell.

1 Like

Yeah. I thought that exec_bulk took care of that, but it doesn't. I'm amazed this isn't better supported.

2 Likes

OK, working on a workaround plan. I'd like to avoid my own string munging of MySQL values. That way lies SQL injection bugs.

Need a function to insert a variable-length list of values. Something like this:

/// My type to be inserted. Demo only.
struct MyValues {
    foo: i32,
    bar: bool,
    baz: String
}
/// Do the insert.
/// The basic question here is, having constructed a Vec of a Vec of mysql::Value,
/// how do I push it into MySQL? The crate has functions for constructing such a
/// type, but no indication of how to use it. This sketches out what I'd like to do.
/// But it probably won't work.
fn insert_vec_of_values(tx: &mut Transaction, values_vec: Vec<Vec<Values>>) -> Result<(), Error> {
    const  INSERT_STMT: &str = r"INSERT (foo, bar, baz) into my_table VALUES (:values)";
    let params = params!{
        "values" => values_vec, // ***PROBABLY NOT ALLOWED***
    };
    tx.exec_drop(sql_cmd, &params)?; // do a multi-row insert.
}

///  Assemble a vector of values to insert
fn construct_vec_of_values(my_values: &[MyValues]) -> Result<Vec<Vec<Values>>, Error> {
    let mut values_vec = Vec::new();
   for p in &my_values {
       let row_params = params!(
           "foo" => p.foo,
           "bar" => p.bar,
           "baz" => p.baz.clone(), // it's a string, can't copy implicitly
      }
     values_vec.push(row_params.vec_of_values()?);
   }
   // so now we have a Vec of a Vec of "mysql" crate values. Is that what we want?
   Ok(vec_of_values)
}

///   Build and insert values
fn insert_my_values(tx: &mut Transaction, my_values: &[MyValues]) -> Result<(), Error> {
    insert_vec_of_values(tx, construct_vec_of_values(my_values)?)
}

It's clunky, but a Value can be converted to a string with

pub fn as_sql(&self, no_backslash_escape: bool) -> String

So I have to put in what's mostly binary numerical data, which is then converted to mysql::Value, which is then converted to SQL text syntax, which is then parsed and converted to binary SQL wire protocol.

Is this the way to do it?

If it doesn't work, get a bigger hammer. Process in chunks of 200 records, use the params! macro, then convert each param to mysql::value, then to an SQL string, then join the strings with commas, then append all this to the canned SQL command and shove that into MySQL. You know, all that stuff an ORM is supposed to do for you.

This takes < 1 second, compared to the previous nice, clean, 26 second approach.

The mysql crate handles all the escaping, so this approach should be safe against SQL injection attacks.

    /// Build temporary table rows
    fn build_temporary_table_chunks(&self, tx: &mut Transaction) -> Result<(), Error> {
       const INSERT_INUSE_UUID_ROWS: &str = r"INSERT INTO uuids_in_use
            (region_loc_x, region_loc_y, region_size_x, region_size_y, asset_type, asset_uuid, asset_hash) VALUES
            ";
        const NAMED_PARAMETERS: [&str;7] = ["region_loc_x", "region_loc_y", "region_size_x", "region_size_y", "asset_type", "asset_uuid", "asset_hash"];
        let named_parameters: Vec<_> = NAMED_PARAMETERS.iter().map(|p| p.to_string().into_bytes()).collect();
        //  Not too big; query size is limited to 32767 (?) bytes.
        const CHUNK_SIZE: usize = 200;
        let active_uuids = self.get_uuids_in_use(tx)?;
        log::info!("{} active UUIDs: {:?}", active_uuids.len(), &active_uuids[0..3.min(active_uuids.len())]);
        for active_uuid_chunk in active_uuids.chunks(CHUNK_SIZE) {
            log::debug!("Chunk with {} items", active_uuid_chunk.len());
            let mut value_strings = Vec::new();
            for p in active_uuid_chunk {
                let row_params = params!{
                    "region_loc_x" => p.region_loc[0],
                    "region_loc_y" => p.region_loc[1],
                    "region_size_x" => p.region_size[0],
                    "region_size_y" => p.region_size[1],
                    "asset_type" => p.asset_type.to_str().to_string(),
                    "asset_uuid" => p.asset_uuid.to_string(),
                    "asset_hash" => p.asset_hash.clone(),                    
                };
                //  Make one row.
                let row_string: String = row_params.into_values(Some(&named_parameters))?
                    .into_iter()
                    .map(|v: mysql::Value| v.as_sql(false))
                    .collect::<Vec<_>>().join(",");
                 value_strings.push(format!("({})", row_string));
            }
            //  All escaping was done above.
            //  Make final SQL statement by concatenating query and values.
            let sql = INSERT_INUSE_UUID_ROWS.to_string() + value_strings.join(",\n").as_str();
            log::debug!("Query: {}", sql);    // ***TEMP***
            //  Actually do the insert of one chunk.            
            tx.query_drop(sql)?;           
        }
        Ok(())   
    }

The code claims to checks first if the server supports MariaDb's COM_STMT_BULK_EXECUTE and if so uses that capability, otherwise it will fall back to running each statement individually.

It seems you can just use push_bind to add binded values. That should have no escaping/sanitization needed, and the only issue remaining should be the size of the resulting statement or number of binded values.

Proposed a change to the mysql crate to make exec_batch actually do batching. See

So I have a workaround, and have proposed a more permanent fix. Done here.

5 Likes