Rust as a High Level Language

[Moderator note: Fixed some posts with broken Markdown formatting, and removed two comments discussing the formatting issue. Please feel free to flag posts (choose the "Something Else" option) with mangled formatting, or PM the author. Users replying by email, please make sure quoted text is followed by a blank line.]

3 Likes

Eager coercions are a very hackish feature. They interact terribly with inference, causing order-dependency and other craziness. If you add too much of them, type inference becomes so loosely-coupled it tears itself apart (see functions-destroy-closure-inference).

Rust's original approach was to use Rc (then @) everywhere so that nobody had to think about memory management. It's just that we found Rust's ownership/borrowing semantics to be sufficiently nice we started using them everywhere, through @ still exists.

I think that you're arguing with a strawman here. I don't think anyone thinks that Rust is a be-all, end-all language to replace all use cases.

I think that all that is claimed are the following:

  1. GC is also not a be-all, end-all answer to memory and resource management; see the fact that C and C++ still exist when there are many GC'd languages to choose from
  2. Purely functional programming is also not a be-all, end-all answer to problems of memory management, resource management, and parallelism
  3. It is possible to provide performance characteristics, including a mental model to judge and tweak performance characteristics, similar to C or C++, without exposure to the very unsafe semantics that they have
  4. It is possible to provide higher level constructs and nice tools that are commonly used in functional programming in such a language conveniently, efficiently, and safely
  5. The tools used to provide such type and memory safety are also useful for forming better organization of code than you might have in either C, C++, or higher level GC'd languages; the notion of who owns an object, who has a unique borrowed reference to an object, and who has a shared immutable reference to an object is something that is not made explicit in most other languages, but can be very useful for keeping track of your semantics and rules about who is allowed to do what.
  6. This combination of high-level features, some good thought on API design, and tools that help you structure your code a little better, can also mean that Rust is suitable to use beyond the the use cases that you might expect if you just focus on point 3. Point 3 is what most people think of as the reason for using Rust, the thing that makes it stand out, but the supply of reasonable high level abstractions, tools to help you reason a little better about ownership and sharing, and then some thoughtful API and ecosystem design around it, help make it appropriate for other tasks outside that core niche that differentiates it the most.

I don't think anyone thinks that Rust's approach is going to be employed everywhere, just that you shouldn't think of it as a one-trick pony for low-level, performance sensitive stuff where you want safety, and that it's actually applicable to a much broader range of applications.

6 Likes

Please continue that discussion in the correct thread. I've replied to you there:

My point was higher-level optimizations or abstractions can eliminate the overhead (runtime and programming) for both in some cases.

For example from the other thread:

I am open-minded to your points and hope to see the use cases where Rust shines. Per my prior reply to arielb1, maybe iterators were not such a great idea, yet they are well matched to the low-control over resource conflicts, so I think that could possibly serve as an example of:

I don't know if a well designed GC functional programming language could displace the C++ lineage entirely. I am giving it some thought. And I don't claim to be omniscient. The community collectively knows more than any one of us.

I tried searching for this phrase, but Google finds this thread plus neighbouring "Why are ... mut ... required ..." one.

Give more direct link to the discussion about interference between those features please.

I have sketched a proposal to make Rust the afaik first programming language in the world that can afaics completely solve Wadler's famous Expression Problem. This would I think perhaps elevate Rust to one of the best high-level programming languages.

Note my 2 cents opinion is I am currently loathing the apparent tsuris pita of low-level resource lifetime management, especially when I am thinking it can be better encapsulated another way.

I think that while Rust has an amazing ability to abstract (much like Haskell), it shines best in these situations (disclaimer: I have not actually written code in these fields):

  • Highly concurrent code where zero-copy messaging is a performance requirement.
    Rust can safely share mutable data structures across threads, without resorting to deep copying. I know of no other production-grade language that can do this, and I suspect that it can be a massive performance win.

  • Low-level code where the resources for a garbage collector do not exist
    GC'd languages cannot run everywhere. You will never see a garbage collected language running on in a small embedded device; these languages simply cannot run there. Rust can, because it uses manual memory management and allows to program without the heap where needed. Rust may someday be usable for safety-critical code, because it allows for completely deterministic resource management and living within fixed, preallocated amounts of memory.

  • Code that provides services to other code.
    Rust excels at garbage collectors, memory managers, and other "backbone" code. Such code cannot be written in a garbage-collected language because of the obvious recursion.

  • Real-time code.
    You cannot write low-latency audio code in a GC'd language. GC pauses would be devastating. You can write such code in Rust. The same goes for kernel-mode interrupt handlers.

  • Highly performance-critical code
    Rust is fast. And it stays fast even when high-level abstractions are used, and without losing safety. In most languages, one either does not have memory safety or must give it up for speed. This is not true in Rust. In the benchmarks game, Rust and Swift each win 4 benchmarks against the other, but Rust does so with no unsafe code, while Swift is compiled with -Ounchecked, which I believe turns off array bounds checks, among others. Furthermore, Rust wins by a higher safety margin.

  • Libraries
    Rust's minimal runtime requirements (the part of libstd and its dependencies that are used by the application/library, assuming dead-code elimination is done by the linker or LTO is used) and lack of a runtime to initialize mean that Rust excels at writing libraries to be used by C or other code that have a C API. You cannot do this at all in Swift I believe.

Rust obliterates the competition when it comes to such tasks. That it is good at other areas is a testament to how well it is designed. Indeed, I find that it is quite good in other fields. But it will never be as productive as (say) OCaml, or F#.

2 Likes

I agree it is important to identify Rust's focus and strengths.

Unless I am reading too much between the lines, you are ostensibly oversimplifying the issue of concurrency.

First of all, let's establish for all readers that parallelism is not concurrency{1}. Parallelism is partitioning the algorithm into chunks that can run independently for performance. Concurrency is the avoidance of blocking a thread, for performance and to reduce latency of interactivity. For performance, we prefer to use the minimum number of threads because each thread adds task switching and stack memory overhead, yet we also don't want fewer threads running than there are CPU cores{2}.

Concurrency is not achieved just by having tasks running on separate threads and passing data between them. Rather concurrency is fundamentally about avoiding all thread sequencing primitives{3} and instead forming an inductive algebra code path DAG (directed acyclic graph) of asynchronous futures{4}, or inverting the control into a reactionary coinductive algebra dependency DAG driven by an event loop{5}. In the former case, afaics we must have either GC or ARC (with strong and weak references) else I visualize that the resource lifetime management is going to infect everything and get far too hairy to be justified except where GC performance is absolutely intolerable.

In the asynchronous DAG model, the code path (distinguished from thread) can be entire blocked while awaiting the future (thus no possibility for a race condition) or it can split with one code path continuing immediately and thus in this case one of the splits has to receive an immutable reference. I don't really see how mutable resource borrowing can help us because in the former case the code is blocked and in the later case it doesn't wait for the future to return? Seems we merely need to have our future enforce that one of the splits takes an immutable reference to any data that both of the splits can reference.

And I am conjecturing that we will not get clean code without native support for generators, coroutines, and/or continuations. I intend to dig deeper into the details of that in the future to test my intuition with detailed examples.

YouTube links below skips into the same video, Sean Parent: Better Code: Concurrency:

  1. Parallelism and Concurrency, Revisited | Existential Type
    https://www.youtube.com/watch?v=32f6JrQPV8c#t=511

  2. https://www.youtube.com/watch?v=32f6JrQPV8c#t=951
    https://www.youtube.com/watch?v=32f6JrQPV8c#t=3766
    https://www.youtube.com/watch?v=32f6JrQPV8c#t=4495

  3. https://www.youtube.com/watch?v=32f6JrQPV8c#t=531
    https://www.youtube.com/watch?v=32f6JrQPV8c#t=52
    https://www.youtube.com/watch?v=32f6JrQPV8c#t=684

  4. https://www.youtube.com/watch?v=32f6JrQPV8c#t=2164

  5. https://www.youtube.com/watch?v=32f6JrQPV8c#t=3288
    https://www.youtube.com/watch?v=32f6JrQPV8c#t=3632

1 Like

Agreed. Garbage collected languages require 5 times more memory to achieve the same performance as explicit memory deallocation. With only three times as much memory, the collector runs on average 17% slower than explicit memory management. However, with only twice as much memory, garbage collection degrades performance by nearly 70%.

Incorrect.

And note that immutability can make GC of temporary objects much more efficient:

Haskell's computation model is very different from that of
conventional mutable languages. Data immutability forces us to produce a
lot of temporary data but it also helps to collect this garbage
rapidly. The trick is that immutable data NEVER points to younger
values. Indeed, younger values don't yet exist at the time when an old
value is created, so it cannot be pointed to from scratch. And since
values are never modified, neither can it be pointed to later. This is
the key property of immutable data.

This greatly simplifies garbage collection (GC). At anytime we
can scan the last values created and free those that are not pointed to
from the same set (of course, real roots of live values hierarchy are
live in the stack). It is how things work: by default, GHC uses
generational GC. New data are allocated in 512kb "nursery". Once it's
exhausted, "minor GC" occurs - it scans the nursery and frees unused
values. Or, to be exact, it copies live values to the main memory area.
The fewer values that survive - the less work to do. If you have, for
example, a recursive algorithm that quickly filled the nursery with
generations of its induction variables - only the last generation of the
variables will survive and be copied to main memory, the rest will be
not even touched!

Javascript is an example that immutability isn't the only case of creating too many temporary objects.

I realize Rust can do static analysis of explicit memory deallocation, but tangentially note even reference counting memory management can't be assumed to be real-time:

Not real-time
Naive implementations of reference counting do not in general
provide real-time behavior, because any pointer assignment can
potentially cause a number of objects bounded only by total allocated
memory size to be recursively freed while the thread is unable to
perform other work. It is possible to avoid this issue by delegating the
freeing of objects whose reference count dropped to zero to other
threads, at the cost of extra overhead.

And reference counting memory management isn't faster than GC:

Here are the numbers I got on my dual proc PIII-600:

ref_gc: 531ms
ref_rm: 3563ms
ref_rs: 844ms

And reference counting won't free the dangling circular references that GC will.

Edit: non-crashing "memory leaks" with GC and Rust's resources lifetimes are caused by higher-level semantics and data structures which have unmatched add/remove items{1}. And given that Rust's lifetimes are more unsafe than GC, then Rust may have crash (accessing freed memory) memory leaks which GC doesn't have. Edit#3: the prior sentence is incorrect, as evident by the linked discussion.

Edit#2: on further thought the following paragraph from my prior edit is untrue, except the italicized part remains true:

Although afaik it is possible to model and check these higher-level resource lifetimes with Rust's checker, afaik _Rust can't insure the programmer will or that the code will be reasonably manageable in certain circular dependency imperative spaghetti scenario_s. Another interesting point is that Rust's lifetime checker could be useful even alongside GC.

  1. Can you have memory leaks with a garbage collector? - Stack Overflow
    http://c2.com/cgi/wiki?MemoryLeakUsingGarbageCollection

You cannot have circular references in rust:

fn main() {
    let x;
    let y;
    x = &y;
    y = &x;
}

<anon>:5:9: 5:11 error: mismatched types:
 expected `_`,
    found `&&_`
(cyclic type of infinite size) [E0308]
<anon>:5     y = &x;
                 ^~

So Rust code is safe from these kind of problems, unless you write an unsafe code block, in which case you are at least warned about these kind of problems. So it looks like (safe) Rust code prevents more memory leaks than a garbage collector.

The Sean Parent talk on data structures goes to some lengths to explain why incidental-data-structures are a bad thing. We want data structures to be connected, non-circular, logically-disjoint and owning. As locality matters we should prefer to use arrays or vectors.

So you should not need to write unsafe code, and Rust helps you write good code that adheres to the above principles.

Don't spread incorrect notions. You can totally have circular references as long as no destructors are involved.

use std::cell::Cell;

#[derive(Copy, Clone)]
struct Ref<'a> {
    data: &'a Cell<Option<Ref<'a>>>
}

fn main() {
    let mut v = Vec::new();
    v.push(Cell::new(None));
    v.push(Cell::new(None));
    v[0].set(Some(Ref { data: &v[1] }));
    v[1].set(Some(Ref { data: &v[0] }));
}

Sorry I got it wrong. Why didn't the example I tried work? Perhaps I should rephrase that and say you cannot have circular references statically. Perhaps even that using 'Cell' is to be avoided?

It certainly doesn't change the goal of having no incidental data structures.

The example code won't leak memory like an RC cycle will. It's also memory safe.

The reason it didn't work is right there in the error message; it can't infer the data type.

Presumably when 'v' goes out of scope, it all gets cleaned up because the 'Cell' is owned by the vector. I guess what I meant to say is you cannot get a cycle of 'owning' references in Rust. You also cannot get a cycle of 'borrows', but you can get a cycle of non-owning references (using Cell or raw pointers). I think I can claim I over-simplified it, rather than getting it completely wrong :slight_smile:

There's still RC cycles to worry about...

I would have thought RC as it represents 'shared ownership' should not be allowed to have cycles? That would fit with the semantics of ownership (and we are just talking about sharing it after all). I guess I don't agree with the design choice to allow cycles in shared ownership. (but me not liking it doesn't change the fact that people have to deal with it, if that is how Rust is currently implemented).

There was sort of a proposal that would use the typesystem in a complicated way to prevent some kinds of cycles/leaks with Rc, but it was decided against.

Defining what a "memory leak" even is is difficult, and memory leaks are memory safe. Rust's focus is very specific; we don't claim to solve all problems. :smile: