Blog: Why not to use Rust

I lack the historical perspective that it would take to be sure, but I strongly get the impression that as of today, the ISO C++ standardization commitee is much more focused on the needs of a small number of language gurus than on those of the vast majority of C++ users.

Take a look at the C++17 feature set, for example. For this standard revision, the C++ commitee started with plenty of important usability improvements on the table, such as modules (aka fixing the broken include system), concepts (aka fixing the broken template system), or the Concurrency TS and Coroutines (aka fixing the broken futures and enabling standard and pleasant asynchronous library interfaces). These all had the potential to greatly improve the everyday experience of using C++.

Instead, they chose to focus on things like the Parallel STL (aka duplicating the thousands of data-parallel libraries out there so that gurus can do with one less dependency), elliptic integrals and Bessel functions (aka duplicating the thousands of advanced math libraries out there so that the gurus can drop another dependency), allowing static asserts to be undocumented (because the gurus can't be bothered to write documentation), or more constexpr stuff (aka engraving some old and popular compiler optimizations in stone so that the gurus can guarantee that they occur).

I hope the gurus are happy.

9 Likes

I think adding good Concepts, Modules, etc to C++ is hard, that's the most important reason for such features slipping away from C++ standard dates.

4 Likes

This is true, however one should not procrastinate replacing broken windows by thinking that putting pretty curtains behind them will be enough.

4 Likes

ā€¦ Meanwhile, the HPC compilers are probably still stuck on C++03 :Pā€‹

It's not easy, but they will try hard, for many many years to come, and they succeed on some things, because C++ is like a big blob of mud that's rolling. And if look at the conferences you will see things like:

Type safety, as shown here is just a small piece of Rust. Other pieces are visible in other talks and libraries.

2 Likes

For me, that blog post translates to "the only reason to not use Rust is if something/someone won't let you."


Regarding the C++ discussion, when I started programming the only viable oss version control system was cvs. It was horrible, but better than nothing. Then subversion was created and it was like a breath of fresh air, because it did the same thing well. Then alternatives exploded and among them git emerged as this amazing, amazing game-changer because it changed the whole approach to version control, enabling amazing things.

To me, Rust is that git-like game-changer of systems programming languages because it changes the whole approach, enabling amazing things.

I am curious to see if there is another wave of things better than git/Rust in my lifetime.

20 Likes

There are a few things that I think are reasons not to use Rust (not that there is anything better right now, but I think there can be something better). I think lifetimes are over complex, and there are too many passing conventions, you could get away with a RAII style mechanism that uses pass by reference when calling a function, and move semantics when returning things, resulting in a much simplified language with the same memory safety. I think the restrictions on write references are a problem, and that holding multiple write references to a collection (for example to swap items) is safe in many cases rust does not allow. Finally I think the type system is a bit ad-hoc and Rust does not make the best balance between keeping the type system simple and symmetrical and choosing the language semantics to match (I think the work on Chalk shows how the type system does not nicely fit as a consistent logic).

For me, at the moment Rust has the right idea with type-safety, but rejects too many safe programs, resulting in the style I want to write in getting rejected. It's easier for me to write in C++ right now.

1 Like

Let me try to comment on some of these points:

While I hate the syntax with passion and really hope we can come up with something easier to understand someday, I think the concept of lifetimes is sound and needed.

Recall that one design goal of Rust is to make it hard for people to break a function's interface when only changing its implementation. For this, you really need something to express, in a function's signature, that given two inputs, a function returns a reference to one of them (or to something else, like a global variable), because it is part of the interface contract for callers of the function, it sets a limitation on what kind of objects can be used as input. That's the same reason why there is no (full) type inference in function signatures.

The problem here is that there are good use cases both for moving input into a function, and for returning a reference, which would be hard to emulate if both of these options were disallowed.

Moving input into a function means that the function takes ownership of the input. That is appropriate when inserting data in containers, and whenever a piece of data is sent to someone else (another thread, an output peripheral...). When you are inserting a piece of data into a container, you usually don't want to make an unsynchronized copy of it, but rather to move it there and keep it there, so any copies should be explicit. Similarly, when sending a piece of data to the outside world, you're usually done with it, and don't want to use it again, so moving is the proper semantic as well.

Returning a reference is useful whenever you're returning a different view of the input data. Consider, for example, iterators which extract subset of strings according to various criteria (word by word, in blocks of N code points...). Needing to create a new string for each of these subsets is extremely inefficient, and totally unneeded: just return a view of the original's string data. But to implement such a view, you need pointers/references. A similar use case is container lookup functions: when searching for something in a container, you don't want to make an unnecessary copy of it when returning the result, but if your functions are required to return by move, you will need to either do that, or worse, move the content out.

Indeed, Rust's borrow checking definitely needs improvements in its handling of composite objects, from arrays to collections. I think the current rules were chosen because they are easy to implement while achieving the stated goal of forbidding memory unsefety (e.g. by modifying a collection as someone holds an iterator to it), but I would expect borrow gurus like @nikomatsakis to be open to suggestions on how to handle this kind of legitimate "partial borrow" use case better.

4 Likes

Obviously it's hard to cover all the details in a short post, but I believe there are reasonable solutions to those problems you mention.

For example with returning objects, any object local to the function should be returned using move semantics, anything passed in with move semantics should be returned by move, anything passed by reference should be returned by reference.

When passing objects into a function, I don't want the function definition to be different, so the caller should specify move or reference, the called function signature should not change.

Regarding containers, there are internal storage containers, where you always have to copy data in (like an array). If disallow references to be stored in containers then I think the above rules are safe, and you don't need lifetimes. So far this seems straightforward.

Now it gets a bit difficult, and this is just an early idea without too much though as to how to proceed:

So the only time I think you need lifetimes is when storing a reference to a container. If the lifetime of a reference is the stack frame of the function the referenced object is defined in, then we can automatically assign the lifetime to any collection as the lowest numbered reference lifetime stored to the collection. We would then limit code to those situations where this can be determined statically. If you cannot determine statically then you cannot prove the use of the collection safe, you need to copy, or have garbage collection.

That pretty much is the case today already.[quote="keean, post:16, topic:11388"]
When passing objects into a function, I don't want the function definition to be different, so the caller should specify move or reference, the called function signature should not change.
[/quote]
Caller choosing move vs reference can be accomplished using generic functions.

There are cases where you, as the function author, want to dictate the ownership semantics of arguments. Sure, you could take that flexibility away but you'd be losing legitimately useful functionality.

I might be misunderstanding you, but disallowing containers to store references is very limiting and would create performance hazards.

How about values storing references to other values? It's not just containers. The notion of a lifetime is always there as long as references are allowed. Other languages have this too, it's just not part of the type system (and so compiler doesn't point out mistakes) or there's a GC. But even in GC'd languages you often need to consider lifetimes - not for memory safety, but for leak safety.

Now, the current borrow checker isn't perfect and does reject some cases that you "know" to be safe. The issue is those cases aren't currently expressable in the language, and so you need workarounds. But any perceived limitations of the existing borrow checker shouldn't be viewed as evidence of lifetimes being unnecessary.

2 Likes

All this is enforced today by the borrow checker. And in many of these cases, you can do this without any explicit lifetime annotations, thanks to lifetime elision. These annotations become necessary, however, as soon as the lifetime of something is ambiguous when looking only at the function's signature.

Consider this function, as a trivial example:

fn return_ref(x: &T, y: &U) -> &V {
    /* ... */
}

What is the lifetime of the &V that is being returned? Is it that of x? That of y? The intersection of both of these lifetimes? The lifetime of something else, like a global variable? Without explicit lifetime annotations, the interface contract is ambiguous, and interface ambiguity is the enemy of maintainable code as it makes the validity of caller code implementation-dependent.

That would require function implementations to be built to work with both values and references. But these have very different interface contracts, and taking the intersection of all of these contracts would be limiting for the function implementation:

  • A value allows you to move, something which you cannot do with either &mut or & references.
  • An &mut reference allows you to modify the target data, an & reference doesn't.
  • An & reference can be freely duplicated (as that only results in read-only aliasing), something which has very different semantics and may be impossible when working with values and &mut.

It is thus fair of a function to require to solely be called with value, & or &mut parameters, as long as it needs the specific features of one of these parameter passing modes.

So, you would forbid people from having containers of Box for dynamic dispatch purposes, for example? As that is one trivial use case for storing references inside of a container.

2 Likes

I think it may seem like Rust's ownership model and the various ways to pass things around carries overwhelming (and seemingly unnecessary) complexity, but I think that's just part of the learning curve. Once beyond that curve, they're legitimately useful features that allow conveying desirable semantics to the compiler, which then helps in enforcing them (and can optimize codegen using them). That's not to say the implementation of this system isn't without quirks, as mentioned, but I think the idea is sound. The implementation will likely improve over time too.

And of course Herb Sutter's talk on "Leak Freedom by default" is an echo to adopt some common idioms beyond RAII for leak-free C++ code:
https://github.com/CppCon/CppCon2016/blob/master/Presentations/Lifetime%20Safety%20By%20Default%20-%20Making%20Code%20Leak-Free%20by%20Construction/Lifetime%20Safety%20By%20Default%20-%20Making%20Code%20Leak-Free%20by%20Construction%20-%20Herb%20Sutter%20-%20CppCon%202016.pdf

They truly try hard, but i don't expect quantum jumps by the C++ community.

1 Like

If a reference is part of the type system, rather than an argument annotation, you can pass everything by value (as references are values).

I may be missing something, but as far as I can tell, a reference is part of Rust's type system. You can build a struct with reference members (see e.g. iterators), or parametrize a generic entity with a reference. You can build types which contain references, like iterators, and instances of these types are values (with a finite lifetime). Could you clarify what are you thinking about here?

My mistake, you are right. It was the specialisation rules for type classes that stopped me treating values and references uniformly. But in this case your comment here is misleading:

The problem here is that there are good use cases both for moving input into a function, and for returning a reference, which would be hard to emulate if both of these options were disallowed.

And my comment about too many passing conventions was wide of the mark too. Essentially there is only one passing convention, and that is pass by value. A reference type &A is just an example of a value no different to an Int (ignoring lifetimes). The compiler chooses to copy or move depending on whether Copy is defined.

So I don't think that is part of the problem, and was a bit of a red-herring.

Ah, yes, sorry about this. Coming from a C++ background, I have learned to keep a strong mental distinction between passing parameters/returning results by value versus by reference, and I keep thinking in these terms when writing Rust code even though Rust's references are really first-class values.

So that leaves me with the lifetime annotations, the type system, and exclusive write references as my main problems with Rust. Also a failure to distinguish between immutability and read-only references. If lifetimes can be fully inferred then I find them less of a problem.

So, basically, lifetimes are inferred when...

  1. They don't affect a function's interface semantics (e.g. a function with no output, or which returns a value of unbounded lifetime like an i32)
  2. There is a good default choice based on established usage (either the function only has one input, in which case the lifetime of the output is inferred to be that of the input, or it is an object method, in which case the lifetime of the output is inferred to be that of the object ("self"))

In other cases, Rust does not attempt to impose a default, because as you saw in my example above (a function with two input references returning one reference), there may be no obvious good answer, and the more things Rust makes implicit based on arbitrary criteria, the more cognitive load it imposes on programmers. Implicit vs explicit is a difficult balance.

Regarding exclusive write references, it's certainly one of the most opinionated features of Rust, but I also think that although it lacks polish (e.g. disallows &mut to two independent container elements, which is safe), it is a very sensible one in general. Let's see what it buys us:

  • Programmers do not need to worry about aliasing ever again. Screw memcpy vs memmove, iterator invalidation, spooky action at a distance where touching one variable has an impact on another... life just became so much simpler.
  • Compilers do not need to worry about aliasing either, so you can scrap that broken restrict keyword and enjoy more reliable optimization "for free". Better autovectorization and code reordering, data that stays in registers instead of taking round trips through caches and main memory, etc.
  • And, of course, there is also everyone's favorite: data races. Gone for good. People can finally stop regarding multithreading as black voodoo magic and start realizing that every CPU is multicore today and punting on threads is not a reasonable option.

Regarding the type system and the failure to distinguish between immutability and read-only references, I do not understand your arguments here well enough to have a discussion. Could you elaborate a bit on these two?

1 Like

In other cases, Rust does not attempt to impose a default, because as you saw in my example above (a function with two input references returning one reference), there may be no obvious good answer, and the more things Rust makes implicit based on arbitrary criteria, the more cognitive load it imposes on programmers. Implicit vs explicit is a difficult balance.

There needs to be a principal type for types including lifetimes.

Regarding exclusive write references, it's certainly one of the most opinionated features of Rust, but I also think that although it lacks polish (e.g. disallows &mut to two independent container elements, which is safe), it is a very sensible one in general.

Or the compiler can track the aliases. Rust has an affine type system, but you could use a linear-alias type system.

Regarding the type system and the failure to distinguish between immutability and read-only references, I do not understand your arguments here well enough to have a discussion. Could you elaborate a bit on these two?

Mutability is a property of an object, whereas Readable, Writable, are properties of a reference to an object (a reference is a kind of access token, see capability based security). I think there needs to be a distinction between l-values and r-values as well. A 'value' is immutable, and cannot be referenced (is an r-value, has no location / has referential transparency), a 'container' is mutable, and can be referenced (is an l-value, has a location / does not have referential transparency).