Initial Rust support for Thrift lands in Thrift mainline


#4

@sfackler Yes, I built with cargo build --release and then used those binaries. I noticed during the compilation of Thrift that it used a debug build for whatever reason, but I definitely tested with release binaries.


#5

Well, I definitely don’t intend for the implementation to be non-performant :slight_smile:

That said, I haven’t looked into performance seriously yet. My plan is to clean up the library (i.e. the libs blitz checklist), add missing unit tests, improve the ergonomics of the generated code and support both default values as well as the json protocol.

So, back to your specific question @coder543: how are you executing these 20 MULTIPLY calls - both for C++/Rust?


#6

gotcha, I was just trying to figure out if I did something wrong! It’s just odd that the performance difference is so noticeable, given how fast Rust code naturally is. Thrift servers will naturally be slower than a hand-coded server from what I’ve seen, because of protocol overhead, but they also shouldn’t take half a second to process a little over 20 RPC calls either. I wish you luck on improving the performance! I don’t see how I could have done anything differently enough to be wrong in the Rust code here, but if I did, I’m just as happy to know it’s my fault.

Rust:

for logid in 33..53 {
    // let's do...a multiply!
    let res = client
        .calculate(logid, Work::new(7, 8, Operation::MULTIPLY, None))?;
    // println!("multiplied 7 and 8 and got {}", res);
}

C++:

for (int i = 33; i < 53; i++) {
  Work work;
  work.op = Operation::MULTIPLY;
  work.num1 = 5;
  work.num2 = 7;
  try {
    client.calculate(i, work);
  } catch (InvalidOperation& io) {

  }
}

Duly note that in both cases, I have inserted this after the first calculate command, and left all other operations from the “tutorial” in place. Gist link 1 and Gist link 2.


#7

@coder543 Doesn’t look like there’s anything wrong with your setup. Let me check it out. I can’t promise a timeline unfortunately :frowning: But I’ll ping this thread if I find out anything.


#8

@coder543 So, this is what I get on my local machine:

took PT0.001391609S seconds to make 20 calculate calls // RUST
took 0.001279s wall, 0.000000s user + 0.000000s system = 0.000000s CPU (n/a%) // C++

FWIW, compiled the following:

C++

/bin/sh ../../libtool  --tag=CXX   --mode=link g++ -Wall -Wextra -pedantic -g -O2 -std=c++11 -Wno-deprecated-register -L/usr/local/lib   -o TutorialClient CppClient.o libtutorialgencpp.la ../../lib/cpp/libthrift.la -lpthread -lboost_timer
libtool: link: g++ -Wall -Wextra -pedantic -g -O2 -std=c++11 -Wno-deprecated-register -o .libs/TutorialClient CppClient.o -Wl,-bind_at_load  -L/usr/local/lib ./.libs/libtutorialgencpp.a /Users/allen/src/rust_projects/thrift/lib/cpp/.libs/libthrift.dylib ../../lib/cpp/.libs/libthrift.dylib -lssl -lcrypto -lz -lpthread -lboost_timer
{
  boost::timer::auto_cpu_timer t; // prints out timing when scope ends

  for (int i = 33; i < 53; i++) {
    Work work;
    work.op = Operation::MULTIPLY;
    work.num1 = 5;
    work.num2 = 7;
    try {
      client.calculate(i, work);
    } catch (InvalidOperation& io) {

    }
  }
}

Rust

cargo build --release
// let's do...a multiply!
let start = time::PreciseTime::now();

for logid in 33..53 {
  client.calculate(logid, Work::new(7, 8, Operation::MULTIPLY, None))?;
}

let end = time::PreciseTime::now();
println!("took {} seconds to make 20 calculate calls", start.to(end));

I used Boost CPU Timers in C++ and PreciseTime in Rust. AFAICT they should be equivalent. And from what I can tell the times are comparable. Rust is definitely a bit slower, but at least part of that’s because log (into which logid is inserted) is protected by a mutex in the Rust server but unsynchronized in the C++ one. All timings done at 11PM :slight_smile: on a fairly-recent MBP.

Is it possible you simply measured the entire client run time including argument parsing? Or, perhaps used the debug build (for me both the client and library build in release mode…)


#9

Earlier today, I double-checked, and the binaries I used came from the target/release folder after building with cargo build --release.

I was just using the time command built into my shell (fish) to measure the execution time. I could actually visibly see the multiply operations ticking down as they executed, it was that slow, whereas the C++ one was instantaneous.

I guess I’ll have to investigate this more on my end, but I tested it multiple times, and I didn’t see any notable variation in the results. I was using Rust 1.17.0 stable, if it makes any difference. I’m not on that computer at the moment, and it’s late here, so I can’t do any more testing today, but I am curious to try some more things!

The performance you’re seeing is the performance I would like to see, since that’s actually very good!


#10

It does seem like something in your setup; I’d be interested to hear if anyone else is experiencing the same problem.

As for the rustc version, I test on stable - so, 1.17.0 - and only use nightly for clippy; the timings I shared above used 1.17.0. One thing I will say is that simply running time on the command-line isn’t equivalent: the rust client does argument parsing etc. and they don’t execute the same service calls. It’s unlikely to be the only cause, but something to keep in mind.


#11

I’m doing more testing, and things are just strange on my computer with regards to the Rust version of Thrift. Even a Python client/server setup is able to run relatively instantly. Rust is the only one that’s being slow, and that makes zero sense. I have used Rust immensely over the past two years and I love it, it is a wonderful platform. End of story. So I’m not sure what’s going on over here. The timings are even slower today, with it taking nearly 2 seconds to run from starting the client until it finishes, and I can just watch the messages slowly tick down across the screen, so it isn’t argument parsing that’s slowing it down.

One other strange thing that’s happening is that all of the clients can interop with all of the servers, except for the Rust ones. The Rust client only works with the Rust server, and the Rust server only works with the Rust client. Everything else seems to either hang or spit out an error about an invalid message when trying to interop with the Rust Thrift code.

I’m rebuilding my copy of Thrift master from scratch and I’m going to test some other things here, I guess. If you have any advice, I’m all ears.

I’m using a heavily specced-out 2016 System76 Oryx Pro, so it’s not old hardware.

EDIT: I rebuilt, and nothing has changed. The Rust code can’t interop with other clients/servers, and it’s still slow. I’ve started instrumenting the code with PreciseTime, and each client.calculate call is taking some 88000220 nanoseconds, but the server-side processing appears to be roughly instantaneous, on the order of 4000ns. I ran 3 clients simultaneously, and the server was able to handle all of them without any delay, so it took the same amount of wall time to run 3 clients as it did to run 1 client.


#12

Inside the fn calculate(...) method itself, I started instrumenting, and the line let message_ident = self.i_prot_mut().read_message_begin()?; takes up the vast majority of the time.

Starting a PreciseTime at the beginning of the function, all of the code before that line took about 41071ns. From that line to the next, the PreciseTime jumped from 41_071ns to 87_915_893ns… so that’s interesting.

I instrumented the read_message_begin call on the CalculatorSyncProcessor::process function as well to get numbers on the server side, and it was taking about 87_924_036ns to run there as well… so, something might be wrong in that function. I’m not sure what.


#13

Hmm. Sounds like it’s blocking in reading bytes from the socket. At least that’s my completely uninformed guess (since I can’t reproduce that on my end).


#14

Can you do a “make clean” and “make precross” before trying again?


#16

interestingly, the c++ failed on the precross build.

make -C cpp precross
make[2]: Entering directory '/home/josh/workspace/misc/thrift/test/cpp'
depbase=`echo src/TestServer.o | sed 's|[^/]*$|.deps/&|;s|\.o$||'`;\
g++ -DHAVE_CONFIG_H -I. -I../.. -I../../lib/cpp/src/thrift -I../../lib/c_glib/src/thrift  -I/usr/include  -I../../lib/cpp/src -Igen-cpp  -Wall -Wextra -pedantic -D__STDC_LIMIT_MACROS -g -O2 -Wno-deprecate
d-register -MT src/TestServer.o -MD -MP -MF $depbase.Tpo -c -o src/TestServer.o src/TestServer.cpp &&\
mv -f $depbase.Tpo $depbase.Po
In file included from src/TestServer.cpp:32:0:
../../lib/cpp/src/thrift/server/TNonblockingServer.h:40:19: fatal error: event.h: No such file or directory
#include <event.h>
                    ^
compilation terminated.
Makefile:741: recipe for target 'src/TestServer.o' failed
make[2]: *** [src/TestServer.o] Error 1
make[2]: Leaving directory '/home/josh/workspace/misc/thrift/test/cpp'
Makefile:899: recipe for target 'precross-cpp' failed
make[1]: *** [precross-cpp] Error 2
make[1]: Leaving directory '/home/josh/workspace/misc/thrift/test'
Makefile:1085: recipe for target 'precross-test' failed
make: *** [precross-test] Error 2

#17

If you have access to a 64-bit Linux box or VM, I have uploaded the binaries that my system built in release mode. I copied these over to a remote Linux server and tested them there, and the results were unchanged, so it isn’t just happening on my laptop. The binaries were built on Ubuntu 17.04, but they’re pretty close to statically linked, so they’ll likely work on any modern Linux distro.

The issue could be with my copy of Thrift and how its being built, or it could be some strange way that the Rust Thrift code is interacting with the Linux networking stack.


#18

@coder543 I’d be genuinely surprised by that. The precross builds, i.e. the binaries run for the cross-language tests, are built on a ubuntu docker environment hosted on ubuntu and run on every checkin. It’s an n-language x n-langauge test (i.e. rust server is tested against all the other language clients and the rust client runs against all the language servers). Timeouts of the sort you’re describing would be very obvious.

I’ll check out the thrift code base on a fresh ubuntu install on AWS and see if I can repro this.


#19

Just letting you know that I tested again from a fresh clone of the Thrift repo on my remote server, and got identical results.

Basically, I…

git clone https://github.com/apache/thrift.git
cd thrift
./bootstrap.sh
./configure
make
cd tutorial/rs
cargo build --release
mv target/release/tutorial_* bin/
cd bin/

and then spin up a server and connect the client to it. I also took intermediate steps to see that the debug binaries were on the order of 13MB, the release binaries were about 4.5MB, and that when I moved the release binaries into the bin folder that they were the expected size, etc. just to be sure that I was using release binaries.


#20

Hmm. So, just to confirm everything, I spun up a fresh Ubuntu 16.04 instance on AWS. Note that this isn’t a custom box, or anything - just whatever AWS has, specifically “Ubuntu Server 16.04 LTS (HVM), SSD Volume Type - ami-80861296”. Installed the deps listed in the build Dockerfile, built off master and got the following timings:

C++ -> C++:   0.000932s wall, 0.000000s user + 0.000000s system = 0.000000s CPU (n/a%)
Rust -> Rust: took PT0.000821058S seconds to make 20 calculate calls
C++ -> Rust:  0.000872s wall, 0.000000s user + 0.000000s system = 0.000000s CPU (n/a%)
Rust -> C++: took PT0.000934903S seconds to make 20 calculate calls

So, unfortunately I’m not seeing anything like the behavior you are. If anything, the Rust implementation seems to be a touch faster here.

@coder543 I hope you’re not trying to connect the Rust tutorial client/server to the C++ one however? They’re completely different protocols/transports; the C++ one uses a buffered transport with the binary protocol, and the Rust one uses the framed transport with the compact protocol… (I had to edit the rust tutorial client to use the proper transport/protocol for interconnectivity).

Note also, that I’m using the exact code as above to achieve timings. This sounds more like an issue with your specific environment.


#21

I tried it on two completely separate environments, though, so it isn’t my specific environment. Both were Ubuntu 17.04, admittedly, so… I am now spinning up a VM running Ubuntu 16.04 on DigitalOcean. If it doesn’t work, then I’m going to PM you some credentials to this temporary VM so you can see for yourself if you want. I will be destroying the VM tomorrow or the next day, so security is a nonissue.

Either way, I’ll report back.


#23

DO; Ubuntu 17.04 (Binary/Buffered):

C++ -> C++: 0.001095s wall, 0.000000s user + 0.000000s system = 0.000000s CPU (n/a%)
Rust -> Rust: took PT0.000991742S seconds to make 20 calculate calls
C++ -> Rust:  0.000985s wall, 0.000000s user + 0.000000s system = 0.000000s CPU (n/a%)
Rust -> C++: took PT0.001090593S seconds to make 20 calculate calls

@coder543 I think I figured out why you’re seeing timings that are so far off. The Rust client/server binds to 127.0.0.1, not localhost, so it’s using IPV4. I changed it to bind to localhost to match the C++ client, and to use binary/buffered, and I get the timings above.


#24

So, for anyone who comes along afterwards, it looks like switching from Framed to Buffered helps a lot, switching from 127.0.0.1 -> localhost might help some for whatever reason? And Compact -> Binary is probably just good advice, from what I saw.

Would it be possible to push these changes upstream to apache/thrift? That’ll make it easier for newcomers, and it’ll make the Rust server play better with the other reference implementations of the tutorial code.


TWiR quote of the week
#25

@coder543 Absolutely. I’ll change the default implementations.

Also, @coder543 helped identify a surprisingly large difference in times when a Framed transport is used. I’m going to check if that’s something specific to Rust, or common to all transports. I really want to thank him for:

  1. Being super helpful in working with me until we figured out the exact setup that caused the issue (despite my claiming it was a problem with his setup!)
  2. Having a great attitude! We need more people in our community like this.