Error handling and UTF-8 validation

You're measuring the time for a lot of irrelevant things like how long it takes your shell to fork a new process and printing out all of your debug messages. This is probably fair for testing against the original application (if you don't have source code for it) but because you are printing a ton of debug messages, you are still getting biased results.

When I run the code in your latest playground link and wrap everything in a dumb timer:

let duration = std::time::Instant::now();

// ...

eprintln!("Total time: {:?}", duration.elapsed());

I get between 1.6 ms and 2.5 ms on my laptop.

When I run the same test after commenting all of the debug prints, I get something between 434.1 µs and 1.0 ms.

All tests were run by piping the HTML output to /dev/null, which ignores some blocking caused by the terminal.

$ cat ausfluge | ./target/release/text-sanitizer >/dev/null
Total time: 434.126µs

In order to create a more uniform test environment I also downloaded the test data into a statical text file which I fed then to the applications

$ echo $(wget -S -O - "http://www.lanzarote.com/de/ausfluge" 2>&1) > lanzarote-com_de-ausfluge.html

I also included the Rust Time Measurement into the main() function.

fn main() {
  let duration = std::time::Instant::now();

  //  [...]

  print!("{}", srsout.as_str());

  eprintln!("Total time: {:?}", duration.elapsed());
}

So this would show what are the penalities outside of the application.
I also made some improvements on the code. As I said that the clue lies in the application logic.
So the improved Rust Application with Error Handling runs now with 10.7ms on the Test Data:

$ echo $(cat lanzarote-com_de-ausfluge.html ; date +"%s.%N") | target/release/text-sanitizer >/dev/null ; date +"%s.%N"
Total time: 10.716834ms
1589628383.056664126

About the penality of the Terminal output It is also measured by the Rust Time Measurement:

$ echo $(cat lanzarote-com_de-ausfluge.html ; date +"%s.%N") | target/release/text-sanitizer ; date +"%s.%N"
1589628257.771927745
Total time: 11.487693ms
1589628257.783283633
$ echo "scale=3; 57.783283633-57.771927745"|bc -l
.011355888

Even this Result suggests that there is some inaccuracy in the Rust Time Measurements because measured more time that the surrounding Operating System.
The Terminal Penality would only make 1ms as measured by Rust.
But in Production this Application is invoked by the Parent Process and communicates normally over the STDOUT, so writing to STDOUT is one of its main functionalities.

On the other hand when measured with the same Data on the same Systems all three Versions of the Applications run under the same Circunstances and thus the Results are comparable in relation to each other but naturally not in absolute terms.

But I also reviewed the other application in the other Programming Language and also could make some improvements.
Now it runs as well in 10.6ms

$ echo $(cat lanzarote-com_de-ausfluge.html ; date +"%s.%N") | ./text-sanitizer.run -i es de ; date +"%s.%N"
1589627211.282232743
1589627211.292850963
$ echo "scale=3; 11.292850963-11.282232743"|bc -l
.010618220

The most decisive improvement is

  1. Not to allocate Memory little by little but in big Chunks:
    let mut srstxt = String::with_capacity(text.len());
  1. Not to copy Strings in-memory but working with References/Pointers
    which Rust does already fine by design.

Now I was also able to see the 354.7µs of Execution Time:

$ date +"%s.%N" ; target/release/text-sanitizer_v2 -i < lanzarote-com_de-ausfluge.html >/dev/null ; date +"%s.%N"
1589793780.372181577
Input time: 43.534µs
Parse time: 203.554µs
Total time: 354.7µs
1589793780.377126057
$ echo "scale=3; (80.377126057-80.372181577)*1000*1000"|bc -l
4944.480000000
$ echo "scale=3; 4944.48-354.7"|bc -l
4589.78

But this opens a question about a Time Gap of 4,6ms before and after the main() Function of Rust
So I printed a Timestamp at the main() Function entrance and includes some profiling measurements
As it can be seen in my Updated Version:

$ date +"%s.%N" ; target/release/text-sanitizer -i -p < lanzarote-com_de-ausfluge.html >/dev/null ; date +"%s.%N"
1589793005.975659287
0: 1589793005.978346173s
1: 1589793005.978429231s
Input time: 73.079µs
2: 1589793005.978545255s
3: 1589793005.978575797s
Parse time: 440.709µs
4: 1589793005.979051209s
Total time: 752.843µs
5: 1589793005.979129697s
1589793005.980647882
$ echo "scale=3; 05.980647882-05.975659287"|bc -l
.004988595
$ echo "scale=3; 05.979129697-05.978346173"|bc -l
.000783524

On Entering the main() Function it will print the Timestamp:

0: 1589793005.978346173s

And on Leaving it will print:

5: 1589793005.979129697s

So before printing the first Timestamp there is a Time Gap of 2.7ms:

$ echo "scale=3; 05.978346173-05.975659287"|bc -l
.002686886

Is this the time the Operating System needs to "fork" the Executable or is this the Rust Start-Up Sequence?
With the -i Parameter and the -p Parameter I was able to exclude external Helper Commands and feed the Test Data directly into the STDIN of the Process.
So I took the strace Activity Log to see that the Application is doing before printing the first line.

$ strace -ttt -o strace_txt-san-rs_2020-05-18-3.log target/release/text-sanitizer -i -p < lanzarote-com_de-ausfluge.html >/dev/null 
0: 1589795630.000705955s
1: 1589795630.000997402s
Input time: 427.296µs
2: 1589795630.001685844s
3: 1589795630.001953786s
Parse time: 778.675µs
4: 1589795630.002967769s
Total time: 2.527188ms
5: 1589795630.003486638s

In Summary before to be able to measure anything or print the first line Rust needs to load its External Dependencies:

$ cat strace_txt-san-rs_2020-05-18-3.log|grep -niE "open(.*\.so.*)" 
5:1589795629.996421 open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
9:1589795629.996613 open("/lib64/libdl.so.2", O_RDONLY|O_CLOEXEC) = 3
16:1589795629.996920 open("/lib64/librt.so.1", O_RDONLY|O_CLOEXEC) = 3
23:1589795629.997205 open("/lib64/libpthread.so.0", O_RDONLY|O_CLOEXEC) = 3
32:1589795629.997744 open("/lib64/libgcc_s.so.1", O_RDONLY|O_CLOEXEC) = 3
39:1589795629.998057 open("/lib64/libc.so.6", O_RDONLY|O_CLOEXEC) = 3

In total 84 Operations:

$ cat strace_txt-san-rs_2020-05-18-3.log|grep -ni "write(2, \"0: \""
86:1589795630.000725 write(2, "0: ", 3)    = 3

So all this time is part of the Rust Start-Up Sequence and it is an important aspect of the User Experience.

In regards to the Question "Does correct Error Handling add to the Processing Time?"
The Profiling Analysis shows that the correct Error Handling doubles the Processing Time:
The Community suggested version with minimal Parsing:

$ date +"%s.%N" ; target/release/text-sanitizer_v2 -i < lanzarote-com_de-ausfluge.html >/dev/null ; date +"%s.%N"
1589793780.372181577
Input time: 43.534µs
Parse time: 203.554µs
Total time: 354.7µs
1589793780.377126057
$ echo "scale=3; 80.377126057-80.372181577"|bc -l
.004944480

And the complete version with UTF8 Recognition and Error Handling:

$ date +"%s.%N" ; target/release/text-sanitizer -i -p < lanzarote-com_de-ausfluge.html >/dev/null ; date +"%s.%N"
1589793005.975659287
0: 1589793005.978346173s
1: 1589793005.978429231s
Input time: 73.079µs
2: 1589793005.978545255s
3: 1589793005.978575797s
Parse time: 440.709µs
4: 1589793005.979051209s
Total time: 752.843µs
5: 1589793005.979129697s
1589793005.980647882
$ echo "scale=3; 05.980647882-05.975659287"|bc -l
.004988595

I also see that the Rust Start-Up time and also the actual Parsing Time do variate very much between 420µs and 550µs on the same statical Data.
And all the Timestamp Read also increased considerably the overall Processing Time.
I could not find a way to initialize a timestamp without doing a SystemTime::now() which executes a System Call.

As a side remark the version implemented with the other programming language does also the same job in 4.2 - 4.8 ms:

$ date +"%s.%N" ; ./text-sanitizer.run -i es de < lanzarote-com_de-ausfluge.html >/dev/null ; date +"%s.%N"
1589799325.021618767
1589799325.025870571
$ echo "scale=3; 25.025870571-25.021618767"|bc -l
.004251804

In this regard I see the argument of C Programmers choosing the C Programming Language for sake of maximum speed.

1 Like

Until they actually try it for themselves that is. If they can ever be convinced to do so.

There is plenty of evidence around to show that Rust can match C in performance without too much trouble. For example: https://benchmarksgameteam.pages.debian.net/benchmarksgame/fastest/rust.html

As a long time C/C++ user of course one of the first things I did with Rust is to reimplement some of my smaller C/C++ codes in Rust. If only to prove to myself that I could do it. Happily in all cases Rust performance has been on a par with C/C++. This includes numerically intensive code as well as text handling.

1 Like

date; $command; date strikes me as a bad way to measure $command (imprecise, and subject to various systematic errors to do with your shell and system). I suspect the standard POSIX time command would handle this just fine. Many shells, including bash, also have a time builtin that would work.

2 Likes

Of course every execution just measures 1 single sample.
A complete benchmark study would take at least thousands of samples and study the variation in time and build the correct statistics.
I did not take it that forth because I also need to justify to the Project Stakeholders the bigger development effort with a clear measurable advantage.

But even though I already noticed that Rust itself is unable to measure the complete Execution Time because it needs itself its own Start-Up Time before it can take the first Timestamp.
Also all Timestamp Samples further increase the Processing Time of the whole Application and change the Test Outcome.
Launching it with the strace command slows its Execution down, too.

So the more I dig into it the more it becomes biased.

As the whole discussion comes to explain every benchmark test is always relative to the system where it runs on.
So by nature they can only compared to each other and never in absolute terms.

How can you demonstrate a "clear measurable advantage", or disadvantage for that matter, if you have not done the measurements?

I guess you could measure the execution time of a Rust program completely from Rust. Just run Rust from Rust using Struct std::process::Command.

But what do you want to measure with that? The performance of your operating system loader?

And why the emphasis on start up time? Are you really wanting to be restarting a program so much that that is significant?

3 Likes

If you're concerned about 2.7 ms runtime caused by loading shared libraries, keep in mind that the intrepid optimizers can statically link the libraries (e.g. with MUSL) and enable LTO for its excellent dead-code removal (to reduce the file size and therefore less code to load from disk).

Also do take notice that the FS cache plays a crucial role in reducing disk access latency. If you wanted to more precisely measure latency, you would take this into account by recording latency with and without a hot FS cache.

Ultimately, I do not understand what your goals are. Parsing and validating 27KB of HTML with sub-millisecond times is already insanely good. If you want to continue optimizing it more, then you should be aware that there are a few other tricks, such as adding this file to your repo to enable all optimizations for your CPU:

$ cat .cargo/config
[build]
rustflags = [ "-C", "target-cpu=native" ]

Or you could use a profiler and capture the hottest code paths for optimization... The problem is that the profiler I have (instruments on macOS) can only sample with 1 ms resolution, and the whole program completes in under 1 ms with the input file you are using. On Linux I believe the gprof sample time is about 10 ms. And oprofile claims it samples at 100,000 clock cycles by default. In any case, you should do some profiling! You might have to use a larger input file for a longer runtime, depending on your profiler.

Your last playground link from 10 hours ago still has a lot of suspicious code in it. Even the Some(io::stdin()) that I pointed out last time I looked at it. I suspect much of this code is burning CPU cycles needlessly, if that's what you really care about.

The last point I'll give you is an age old observation; you can't observe the system without affecting it. Every call to get a counter or the current time is just more wasted CPU cycles. Every branch to check cli args is just more wasted CPU cycles...

There is a tradeoff between ergonomics and performance. And in the grand scheme of things, we're still talking about sub-millisecond runtimes. So relatively speaking, I don't know if the effort required to optimize this any more will be worth it. But of course it's your decision for how you spend your time! Godspeed.

1 Like

That 2.7ms load time bugs me.

An out of the box "cargo build --release" of a "hello world" program in Rust produces a binary that is 17 times bigger on my Debian PC. After "strip".

For whatever reason the Rist "hello world" pulls in a lot more libraries:

$ ldd loop_blocking target/release/loop_blocking
loop_blocking:
        linux-vdso.so.1 (0x00007fffced7d000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fcaa72f0000)
        /lib64/ld-linux-x86-64.so.2 (0x00007fcaa74ec000)
target/release/loop_blocking:
        linux-vdso.so.1 (0x00007ffffa153000)
        libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f8157400000)
        librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f81573f0000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f81573cf000)
        libgcc_s.so.1 => /usr/local/lib64/libgcc_s.so.1 (0x00007f81571b0000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f8156fe0000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f8157470000)

Which is odd since Rust is not supposed to have a run time.

On the other hand execution time of "hello world" in Rust and C is about the same, 9ms to 10ms, with the crudest measurements with "time". So I'm not about to worry about it.

libstd is the reason. I mentioned earlier that using MUSL will eliminate the need to dynamically link. You can give it a try with GitHub - clux/muslrust: Docker environment for building musl based static linux rust binaries -- And don't forget all the size optimization tricks from GitHub - johnthagen/min-sized-rust: 🦀 How to minimize Rust binary size 📦

(Note that LTO builds will not work with Xargo on a recent nightly toolchain: LTO builds on recent nightlies fail "to get bitcode from object file" · Issue #292 · japaric/xargo · GitHub)

Thanks parasyte, that is interesting.

But I stop when I get to:

Usage

...
Pull and run from a rust project root:
...
docker pull clux/muslrust

I don't have anything against dynamic linking most of the time. I'm not targeting a limited embedded system. But I do wonder why it pulls in pthreads and such when no threads are used.

If you don't need C deps, which I think is the case here, then the following suffices:

rustup target add x86_64-unknown-linux-musl

And then just specify --target x86_64-unknown-linux-musl whenever building, like so:

cargo build --target x86_64-unknown-linux-musl --release
2 Likes

This is just the way it has to work because the linker doesn't check whether your crate actually uses std::thread. Your crate uses std and therefore transitively also pulls in std::thread.

libstd is built and distributed with the toolchain in a form that fits the general purpose target of crates that want to use everything libstd has to offer. To that end, it includes everything.

Thanks daboross.

Using musl like that cuts the run time of a Rust "hello world" by pretty much half.

As an old timer it still horrifies me that a "hello world" program, that only requires half a dozen instructions in assembler, results in a 300KB executable.

But I gave up worrying about that years ago.

1 Like

You can always use #[no_std] if you want a tiny hello-world. :sweat_smile:

$ ./target/release/hello
Hello, world!

$ ls -lh ./target/release/hello
-rwxr-xr-x  1 jay  staff   8.2K May 18 18:23 ./target/release/hello*

About the Some(io::stdin()) the Rust Profiling already showed that the Community suggested Code performed about a half times better than this construct. I also observed that the branch Ok(0) never occurs. When no STDIN is given the Application just hangs on until the first Byte might come but it never comes. So I adopted the Community suggested construct and introduced a -i flag to skip this part.
Only avoiding the ? Operator and resolving the Result correctly because it would again crash the Application.

I also deleted all time measurement code since it was slowing down the Application.
The Non-Intrusive Time Measurement seems to be the best was to see some realistic performance.
A test launcher and processing application could be great but even this would introduce some error.

As I already stated all execution time measurements are always ralative to the host that they are run on. So copying the binary to another hosts the Rust Application runs in 2 - 2.5 ms with the given test data. The Community suggested version runs in 2 - 2.2 ms on this host, too.

So these times are already extremely good in comparision with the Perl Script which does the same task in 30ms.
But they are not exceptional since the implementation with the other compiled language also runs in 2 - 2.5 ms on the same host.

This tool was initially conceived as replacement for the cat -A | sed command which resulted in this combination as slow and badly maintainable.
Still the cat -A command, which in its functionality could be compared to the Community suggested version runs on this host in 1.6 - 1.9 ms:

$ date +"%s.%N" ; cat -A < ../../lanzarote-com_de-ausfluge.html >/dev/null ; date +"%s.%N"
1589965452.555909867
1589965452.557905242
$ echo "scale=3; 52.557905242-52.555909867"|bc -l
.001995375
$ date +"%s.%N" ; cat -A < ../../lanzarote-com_de-ausfluge.html >/dev/null ; date +"%s.%N"
1589965484.325613841
1589965484.327429474
$ date +"%s.%N" ; cat -A < ../../lanzarote-com_de-ausfluge.html >/dev/null ; date +"%s.%N"
1589965486.343199781
1589965486.344948441
$ date +"%s.%N" ; cat -A < ../../lanzarote-com_de-ausfluge.html >/dev/null ; date +"%s.%N"
1589965487.671542468
1589965487.673274394
$ date +"%s.%N" ; cat -A < ../../lanzarote-com_de-ausfluge.html >/dev/null ; date +"%s.%N"
1589965488.949716252
1589965488.951385296
$ echo "scale=3; 88.951385296-88.949716252"|bc -l
.001669044
$ echo "scale=3; 87.673274394-87.671542468"|bc -l
.001731926
$ echo "scale=3; 86.344948441-86.343199781"|bc -l
.001748660
$ echo "scale=3; 84.327429474-84.325613841"|bc -l
.001815633

So I guess you are measuring the time it takes to read the file? cat -A by itself does not do any parsing or validation.

The Usage would be:

$ cat -A < lanzarote-com_de-ausfluge.html|sed -re 's/M-CM-6/oe/gi' -e 's/M-BM-[;\+]/"/gi'

which converts:

--2020-05-16 12:04:24-- http://www.lanzarote.com/de/ausfluge AuflM-CM-6sen des Hostnamen M-BM-;www.lanzarote.com (www.lanzarote.com)M-BM-+... 134.213.166.105 Verbindungsaufbau zu www.lanzarote.com (www.lanzarote.com)|134.213.166.105|:80... verbunden.

into ->

--2020-05-16 12:04:24-- http://www.lanzarote.com/de/ausfluge Aufloesen des Hostnamen "www.lanzarote.com (www.lanzarote.com)"... 134.213.166.105 Verbindungsaufbau zu www.lanzarote.com (www.lanzarote.com)|134.213.166.105|:80... verbunden.

an Output like:

$ target/release/text-sanitizer_v2 < lanzarote-com_de-ausfluge.html

--2020-05-16 12:04:24-- http://www.lanzarote.com/de/ausfluge Aufl\u{c3}\u{b6}sen des Hostnamen \u{c2}\u{bb}www.lanzarote.com (www.lanzarote.com)\u{c2}\u{ab}... 134.213.166.105 Verbindungsaufbau zu www.lanzarote.com (www.lanzarote.com)|134.213.166.105|:80... verbunden.

is even contraproductive because of the reserved Characters like \ and {} which would result in an even more unreadable construct like:

$ target/release/text-sanitizer_v2 < lanzarote-com_de-ausfluge.html|sed -re 's#\\u\{c3\}\\u\{b6\}#oe#g' -e 's#\\u\{c2\}\\u\{[ab]b\}#"#g'

to get the same result:

--2020-05-16 12:04:24-- http://www.lanzarote.com/de/ausfluge Aufloesen des Hostnamen "www.lanzarote.com (www.lanzarote.com)"... 134.213.166.105 Verbindungsaufbau zu www.lanzarote.com (www.lanzarote.com)|134.213.166.105|:80... verbunden.

That was why I was stating that the proposed solution does not produce the required result.

I would like to suggest using the lovely Hyperfine command line benchmarking tool:

It's written in Rust and can be installed easily with

$ cargo install hyperfine

This installs hyperfine into your ~/.cargo/bin folder (which you should add to your $PATH). Then run your benchmark command like this:

$ hyperfine --warmup 3 'target/release/text-sanitizer -i < lanzarote-com_de-ausfluge.html.x100 > /dev/null'
Benchmark #1: target/release/text-sanitizer -i < lanzarote-com_de-ausfluge.html.x100 > /dev/null
  Time (mean ± σ):      14.8 ms ±   0.4 ms    [User: 12.9 ms, System: 2.0 ms]
  Range (min … max):    14.0 ms …  16.1 ms    194 runs

Hyperfine will take care to run the command a 3 times to pre-warm your system (I think this mostly affects the disk cache with this benchmark) and it will then run the benchmark program repeatedly, keeping track of the timings.

Here I tested with the code from the playground and against a file with 100 copies of the Lanzarote flights (since the execution time is tiny otherwise). It was prepared with these two commands:

$ wget -S -O lanzarote-com_de-ausfluge.html http://www.lanzarote.com/de/ausfluge
$ for i in $(seq 100); do cat lanzarote-com_de-ausfluge.html >> lanzarote-com_de-ausfluge.html.x100; done
2 Likes