Did Rust make the right choice about error handling?

However as a long time C/C++ user, as well as other compiled languages like, some of the first things I did in Rust is to recreate some programs I had previously written in C/C++. Just to get a handle on how I would get on with Rust and of course get a feel for how it performs.

It turned out that with little Rust experience and not much thought about optimization my Rust results easily matched and sometime surpassed those of C/C++.

That's interesting to me. Did you prefer exceptions or error code while using C++? And did you switch to enums recreating your programs in Rust?

Note that Rust doesn't have a GC, and runs even without a runtime. Usage in embedded systems is one of most popular uses of Rust.

Also keep in mind that Rust is designed for zero-cost abstractions. Types like Option<Box<T>> optimize to a nullable pointer, identical to one you'd get from malloc.

2 Likes

Yes, Rust is not GC-oriented, thats why all this sum types machinery, memcpying data around again and again is inefficient. It could be efficient if all potentially large data would be allocated in heap, so we would just copying pointers around. But allocation is a pretty expensive action in non-GC languages.

Option<Box> is nullable pointer, that's great, but Option could have arbitrarily large size.

You have an incorrect impression that it is inefficient.

  • Rust uses inlining a lot (even across crates. It can even inline across languages). It optimizes out copies in most cases, and structs that don't escape functions even get optimized out entirely into separate variables.
  • Heap allocation has its cost too.
  • Given how expensive cache misses are, pointer indirection can be even more costly than copying.
  • Rust doesn't have constructors. fn new() -> Self is only a convention. You can return fn new() -> Box<Self> or do fn init(&mut Self) or use MaybeUninit where it matters and where the optimizer doesn't do that for you automatically.

I've written a bit more on upsides and downsides of such abstractions, compared to C: https://kornel.ski/rust-c-speed

10 Likes

Thank you, I'll read it.
But am I wrong when I say, that Rust has nothing to do with this:

struct S {
   a1: [i32; 1024],
   a2: [i32; 1024]
}
fn f1() -> Result<[i32; 1024], SomeError> { ... }
fn f2() -> Result<[i32; 1024], SomeError> { ... }
fn f() -> Result<S, SomeError> {
    { a1: f1()?, a2: f2()? }   
}

Actually fn init(&mut Self) -> Status is the "awfully unsafe" way all we do it in C (and sometimes in C++, where we have tremendous divergence in programming styles) :slight_smile: This effectively requires some "partially-formed" state for Self

And again, the standard library extensively uses -> Self convention, for example, many functions receive a factory FnOnce() -> T

I have almost never used exceptions. They really don't fit the embedded real-time systems I have been involved in.
Don't get me started. Philosophically I don't like exceptions.

The whole motivation behind exceptions is to allow one to write ones business logic, concentrate on what one likes to think ones program will do, without having lots of fiddly error checking and handling code obscuring that logic. Error situations are therefore swept under the carpet with "try" and kept out of sight with "catch".

However in my world view failure is not exceptional, it is a common happening, it's too important to be hidden away. Therefor failure handling should be in ones face in the code you write. Certainly in the face of those that read it.

Besides, so many times I have seen exceptions used for non-exceptional situations. For example I don't count failure to find a file the user has specified as exceptional, it is a regular happening. Using exceptions for handling normal behavior only obscures the code paths and makes code difficult to reason about. Totally the opposite of what they were created for.

Whenever I think about it long enough I conclude that the only truly unexpected, exceptional things that are worthy of using exceptions for are bugs in ones code or hardware failures. But then the best thing to do is die immediately rather than stagger along in an indeterminate state and causing more damage.

That is before we get into looking at performance, and behavioral determinism.

Yes, I use enums in Rust rather than special error values in return codes.

10 Likes

I don't advocate usage of exceptions. I am just trying to understand how Rust analogs of your C++ programs could be faster with all that inctroduced memcpying, comparing to returning error codes and propagating pointers to uninitialized data.

There is no special about "error" return codes behind "successful". The common C pattern is

enum Status {
    SUCCESS,
    ERROR1,
    ...,
    ERRORN
};
enum Status init(struct S *struct_to_initialize);

Echoing a suggestion that was given earlier, you should probably write some benchmarks so that you have some actual data to present. That would be a more useful conversation than, from what I can tell, is conjecture.

In my personal experience, most of what people think of as "efficient" is simply based on incomplete or incorrect data. An actual example: someone once observed that a percent-encoded decoding function written in C was a potential candidate for optimization. Their solution? Rewrite it in assembly. Given only this data, that may have been a reasonable conclusion. It still surprises me that the investigation didn't lead to another observation, chiefly that percent-encoding is not even required in the specific use case.

You see, percent-encoding was simply chosen as a serialization format for a character-delimited message protocol. Another reasonable conclusion would have been to replace that with a length-delimited protocol to optimize away the percent-encoding entirely.

The moral of the story is that going lower in the tech stack is not always the best way to optimize code, or even the best use of a developer's time. Beyond measurement and analysis, one should also look at the wider situation to find cheaper solutions. In other words, to answer the question, "why do we need this in the first place?" Understanding knock-on effects in a complex system will probably net you more gains than muddling over minutia. Cf. Amdahl's law.

11 Likes

It is perfectly possible to use partially uninitialized receivers in initialization functions, only you need to be explicit about it. Rust encourages the use of the type system itself much more extensively than C++, instead of special casing particular operators and functions. If there are multiple initialization steps, you may want to have add some newtype wrappers around &mut _ to represent them in the type system yourself. But the basic idea is:

fn init(this: &mut MaybeUninit<Self) -> Result<&mut Self, Status> {
  // .. do your in-place initialization.
  step_a(this)?;
  step_b(this)?;
  // SAFETY: we just initialized this
  Ok(unsafe { &mut *this.as_mut_ptr() })
}
1 Like

Generally types like Result<[i32; 1024], SomeError> are the exception to the rule. Moving large stack arrays around is generally considered a bad idea, and I imagine real world code would look more like this:

pub struct S {
    a1: [i32; 1024],
    a2: [i32; 1024],
}

fn f1(arr: &mut [i32; 1024]) -> Result<(), Error> {
    arr[10] = 100;
    Ok(())
}

fn f2(arr: &mut [i32; 1024]) -> Result<(), Error> {
    arr[20] = 200;
    Ok(())
}

impl S {
    pub fn new() -> Result<Box<S>, Error> {
        let mut s = Box::new(S {
            a1: [0; 1024],
            a2: [0; 1024],
        });
        f1(&mut s.a1)?;
        f2(&mut s.a2)?;
        Ok(s)
    }
}
4 Likes

@vmgolovin I tried using godbolt to inspect the assembly rustc generates with your example, is it doing what you'd expect?

For comparison, I tried to write a direct C++ translation and got this.

They both seem to be doing the same number of copies, although Rust's use of the global offset table and vmovups makes it a bit harder to count the memcpy calls.

4 Likes

Oddly enough, since first staring with C in 1982 I don't recall seeing that pattern used anywhere in any project I have joined.

The common patterns are more like:

size_t fread(void *ptr, size_t size, size_t nmemb, FILE *stream)

That does not even have an error return indication. One has to deduce it from the returned size.

Or

int open(const char *** *pathname* **, int** *flags* **);

Where the error indication is a special return value, "-1", and one still has to get the actual error cause from elsewhere.

Yes, the real world code would look more like this, but this requires [i32; 1024] (I use it just as example of a big type) be constructed "before it can do", so we have to allow a partially-formed state for that type. Or we have to use MaybeUninit and destroy it manually (are we still programming in Rust at this point?..).

Arrays are allocated on stack, vector on heap

MaybeUninit allows unitialized memory, you don't need to destroy stack variable manually.
Besides primitives do not have destructors so you can omit it anyway

Anyway if you need big data storage then you do the same as in C++:

  • Pass it by pointer/reference
  • Allocate it on heap

I'm not sure what's complicated about it.

Returning Result<(), Error> is perfectly fine because depending on luck it would be of the same size as Error (since enums can use padding for their dicriminant)

2 Likes

This is a common UNIX pattern (involving errno). But when this is impossible, it is returned.
int pthread_create(pthread_t *thread, const pthread_attr_t *attr, void *(*start_routine) (void *), void *arg);
But files and thread are kernel-managed resources, so they must be allocated in kernel space.
When it is not necessary, we usually have something like this:
struct tm *gmtime_r(const time_t *timep, struct tm *result);

No. I can allocate it on stack if I don't need dynamic lifetime and stack memory size is enough.

The complicated thing about it, that we want to minimize both allocation and memory moving. This is possible only by means of uninitialized objects or "effectively uninitialized" objects (Stepanov's partially-formed state).

I'm struggling to understand your point. Rust isn't restricted to C's calling convention and also is allowed to optimize memcpy and memmove away.

If in some cases it doesn't optimize well and profiling shows this causes a relevant performance hit, then Rust gives you the tools to optimize the code. I'm not sure how using optimizations in Rust, when and if necessary, are somehow "not Rust". That sounds like an arbitrary distinction that wouldn't be made in other low level languages.

11 Likes

gmtime returns NULL when it fails rather than the pointer it is supposed to.

I count that as a special value mixed up in the return value.

The way I see it, in your example there is a function return value that happens to be an enum indicating the error. And there is the actual return value one wants, written to the pointer to the struct.

It's not clear to me that would be more efficient than returning a result in Rust. It's certainly not as nice to read or reason about.

This is not different from C/C++ aside from the fact that Rust complicated using uninitialized memory.

I'm not sure how it is Rust's fail.
Rust falls into the same trap as C++ where you have to think how to optimize your code.
Just because you write Rust or C++ doesn't make your code fastest.

Rust has a lot of fancy stuff like slices and etc to pass pointer to storage so I'd say you should practice Rust a bit more before seeking to blame the tool.

P.s. Do not use examples of Rust code as guidance. If you need optimized code then write it and profile :slight_smile:Just because majority of Rust community doesn't worry about cost of move, doesn't make it language fault

2 Likes

It is certainly more efficient than returning Result because when after the function exit, you have the result exactly when you want it to be (you passed the address where to store it). If the function fails, it will not be there, and you can know about it via return value/errno. If you get Result you have to move the successful result from it.