Why Python performs better than rust (call cassandra)

I know that in theory, rust is better than python in terms of program performance. However, when I use them to execute the cassandra, I find that Python is better. Does anyone have the same experience?

Use case:

Python => python-cassandra-driver + flask

Rust => call cpp cassandra-driver + actix_web

As always when discussing performance, it would be helpful to share a benchmark if you can do so.

But most likely the answer will be that either the C++ cassandra driver or its Rust binding do something less smart than the Python cassandra driver at the algorithmic level.

You can check which of those it is by writing the same program a third time in C++, if that isn’t too much effort.

1 Like

And, as always, please link your other post so people can answer from an informed position.

1 Like

So have you tried cargo run --release ?

1 Like

Because the contents of the database and the code of Python belong to the company’s property, I can only grab the length of the content of curl, and the execution time of using curl to python.

  1. Curl to python average execution time: 0.533614
  2. Content-Length: 4965028

Rust side(just use cargo run ):

  1. Curl gives rust an average execution time of 2.678472
  2. Content-Length: 4965028
  3. rust-code

--release ?

1 Like

I have thought about it, it seems that I have to try it.

My benchmark has not been used --release , always using only cargo run . After the use time is increased to 0.902928, what is the difference?

from 2.678472 -> 0.902928, that looks like a decrease to me.

YES, execution time is reduced, efficiency is improved

Ok, so it’s still slower than python. The easiest way I found to profile rust was to use cargo flamegraph.

1 Like

At the first glance your code has way too much to_string. It seems like there are many memory stuff ongoing, which doesn’t happen on python implementation. Python is slower, when you are considering execution time, but if you are doing obsolete work on Rust side, it may become way slower.

2 Likes

I have also been reminded of the problem of to_string, but sometimes using &str will involve life-time problems…

Although I have never used it, let’s me try.

As long as all your keys are compile-time constants (which seems to be the case here), you can use &'static str as your key type without any lifetime issue.

EDIT: Removed bit about enum-indexed arrays since @Yandros is right that a struct will suffice in this case.

2 Likes

Here are some code snippets I will comment about:

use ::serde_json::Value;

...
unsafe fn parse_value (items_number_value : *const CassValue_) -> Value
{

    match cass_value_type(items_number_value) {
        CASS_VALUE_TYPE_TEXT |
        CASS_VALUE_TYPE_ASCII |
        CASS_VALUE_TYPE_VARCHAR => {
            /* 1. */
            let mut text = mem::zeroed();
            let mut text_length = mem::zeroed();
            cass_value_get_string(items_number_value, &mut text, &mut text_length);
            ...
}

...

/* 4. */
unsafe fn cassandra_connect (
    ...
) -> (Vec<HashMap<&'static str, Value>>, String)
{
    ...
    if ... {
        return (Vec::new(), "unknown feature".to_string);
    }
    ...
    (value_list, "ok".to_string())
}

unsafe fn cassandra_use (
    ...
) -> HashMap<String, Value>
{
    ...
    
    for ... {
        let (cass_data, cass_status) = cassandra_connect(...);
        /* 4. */
        if cass_status.contains("ok") {
            result_link.extend(cass_data);
        } else {
            status_link.push(cass_status);
        }
    }

    /* 4. */
    if status_link.contains(&"unknown feature".to_string()) {
        error!("unknown feature");
        /* 3. */
        let data_to_json = to_value(result_link).expect("data type was vec");
        /* 2. */
        status_map.insert("msg".to_string(), Value::String("unknown feature".to_string()));
        status_map.insert("result".to_string(), Value::String("unknown feature".to_string()));
        status_map.insert("data".to_string(), data_to_json);
        return status_map;
    } else if result_link.len() == 0 {
        /* 3. */
        let data_to_json = to_value(result_link).expect("data type was vec");
        /* 2. */
        status_map.insert("msg".to_string(), Value::String("ok".to_string()));
        status_map.insert("result".to_string(), Value::String("ok".to_string()));
        status_map.insert("data".to_string(), data_to_json);
        return status_map
    }

    /* 3. */
    let data_to_json = to_value(result_link).expect("data type was vec");

    /* 2. */
    status_map.insert("msg".to_string(), Value::String("ok".to_string()));
    status_map.insert("result".to_string(), Value::String("ok".to_string()));
    status_map.insert("data".to_string(), data_to_json);
    status_map

}

unsafe fn data_result (
    ...
) -> HashMap<String, Value>
{
    ...
    let data = cassandra_use(...);
    ...
    data
}

fn async_handler(url: Path<Url>) -> HttpResponse
{
    unsafe {

        let data= data_result(url.deviceid.as_str(), url.epoch.as_str(), url.feature.as_str());

        /* 2. */
        if data.get("msg").unwrap() == "unknown feature" ||
            data.get("msg").unwrap() == "Wrong time period, over one years" ||
            data.get("msg").unwrap() == "Wrong time period -" ||
            data.get("msg").unwrap() == "time range should be integer!" {

            HttpResponse::BadRequest()
                .content_type("text/html")
                .content_encoding(ContentEncoding::Gzip)
                .json(data)
        } else {
            HttpResponse::Ok()
                .content_type("text/html")
                .content_encoding(ContentEncoding::Gzip)
                .json(data)
        }
    }
}

My comments:

  1. Do not use mem::zeroed()! It is very dangerous, specially with unannotated types, since it is Undefined Behavior to use it with wrong types; the safe alternative is to use the right type-specific zeroing functions, such as:

    let mut text: *const c_char = ::std::ptr::null();
    let mut text_length: usize = 0;
    
  2. This is the main point performance-wise: do not use Strings and HashMaps when enums and structs can do the job:

    #[derive(
        Debug,
        Clone,
        Serialize,
    )]
    struct Status {
        msg: Msg,
        result: Msg,
        data: Value,
    }
    
    #[derive(
        Debug,
        Clone, Copy,
        PartialEq, Eq,
    )]
    enum Msg {
        Ok,
        UnknownFeature,
        WrongTimePeriod(Option<WrongTimePeriodReason>),
        TimeRangeShouldBeInteger,
    }
    
    enum WrongTimePeriodReason {
        OverOneYear,
    }
    
    impl Serialize for Msg {
        fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
        where
            S: Serializer,
        {
            use self::Msg::*;
            use self::WrongTimePeriodReason::*;
            serializer.serialize_str(match *self {
                | Ok => "ok",
                | UnknownFeature => "unknown feature",
                | WrongTimePeriod(Some(OverOneYear)) => "Wrong time period, over one years",
                | WrongTimePeriod(None) => "Wrong time period -",
                | TimeRangeShouldBeInteger => "time range should be integer!",
            })
        }
    }
    

    So that your code can become:

    unsafe fn cassandra_use (
        ...
    ) -> Status
    {
        ...
        else if result_link.len() == 0 {
            /* 3. */
            let data_to_json = to_value(result_link).expect("data type was vec");
            /* 2. */
            status = Status {
                msg: Msg::Ok,
                result: Msg::Ok,
                data: data_to_json,
            };
            return status;
        }
        ...
    }
    
    ...
    
    fn async_handler(url: Path<Url>) -> HttpResponse
    {
        let data = data_result(&url.deviceid, &url.epoch, &url.feature);
    
        /* 2. */
        let response_builder = match data.msg {
            | Msg::Ok => HttpResponse::Ok(),
            | _ => HttpResponse::BadRequest(),
        };
        response_builder
                .content_type("text/html")
                .content_encoding(ContentEncoding::Gzip)
                .json(data) // here is where Serialize comes into play
    }
    
  3. all three paths lead to defining

    let data_to_json = to_value(result_link).expect("data type was vec");
    

    So you should factor that part out to be DRY.

  4. Instead of a (Vec, String) tuple, where the String is either "ok".to_string() or "unknown feature".to_string() (in which case the Vec is empty), you should use an enum to define your error cases and wrap a returned Vec in a Result<Vec, MyError>:

    #[derive(Debug)]
    enum CassandraError {
        UnknownFeature,
    }
    
    /* better:
    impl ::std::fmt::Display for CassandraError {
        ...
    }
    impl ::std::error::Error for CassandraError {}
    */
    
    unsafe fn cassandra_connect (
        ...
    ) -> Result<
        Vec<Status>,
        CassandraError,
    > {
        ...
        if ... {
            return Err(CassandraError::UnknownFeature);
        }
        ...
        Ok(value_list)
    }
    
    unsafe fn cassandra_use (
        ...
    ) -> Status
    {
        ...
        
        for ... {
            match cassandra_connect(...) {
                | Ok(cass_data) => {
                    result_link.extend(cass_data);
                },
                | Err(CassandraError::UnknownFeature) => {
                    return Status {
                        msg: Msg::UnknownFeature,
                        status: Msg::UnknownFeature,
                        data: data_to_json,
                    };
                },
        }
    

And finally, there should be a clearer separation from C++'s cassandra FFI and your web server logic: the web server should not be calling unsafe functions.

10 Likes

Silly question, but worth asking, since it will make a bit difference:
@pili2026 I can’t see any links to your python code.
Does it make use of async IO?
Because if it does, it will have an advantage over your rust code which is all synchronous.

Can you provide a link to the python code please.

Sorry, because python code is the property of the company, I can’t provide a link.
But I can confirm that, python code currently not using async IO (not sure why)

Great! Just thought I’d check they’re on a level playing field in regards to async IO

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.