Rust as a High Level Language


#68

I am open-minded to your points and hope to see the use cases where Rust shines. Per my prior reply to arielb1, maybe iterators were not such a great idea, yet they are well matched to the low-control over resource conflicts, so I think that could possibly serve as an example of:

I don’t know if a well designed GC functional programming language could displace the C++ lineage entirely. I am giving it some thought. And I don’t claim to be omniscient. The community collectively knows more than any one of us.


Blog Post: Lifetime Parameters in Rust
#69

I tried searching for this phrase, but Google finds this thread plus neighbouring “Why are … mut … required …” one.

Give more direct link to the discussion about interference between those features please.


#70

I have sketched a proposal to make Rust the afaik first programming language in the world that can afaics completely solve Wadler’s famous Expression Problem. This would I think perhaps elevate Rust to one of the best high-level programming languages.

Note my 2 cents opinion is I am currently loathing the apparent tsuris pita of low-level resource lifetime management, especially when I am thinking it can be better encapsulated another way.


#71

I think that while Rust has an amazing ability to abstract (much like Haskell), it shines best in these situations (disclaimer: I have not actually written code in these fields):

  • Highly concurrent code where zero-copy messaging is a performance requirement.
    Rust can safely share mutable data structures across threads, without resorting to deep copying. I know of no other production-grade language that can do this, and I suspect that it can be a massive performance win.

  • Low-level code where the resources for a garbage collector do not exist
    GC’d languages cannot run everywhere. You will never see a garbage collected language running on in a small embedded device; these languages simply cannot run there. Rust can, because it uses manual memory management and allows to program without the heap where needed. Rust may someday be usable for safety-critical code, because it allows for completely deterministic resource management and living within fixed, preallocated amounts of memory.

  • Code that provides services to other code.
    Rust excels at garbage collectors, memory managers, and other “backbone” code. Such code cannot be written in a garbage-collected language because of the obvious recursion.

  • Real-time code.
    You cannot write low-latency audio code in a GC’d language. GC pauses would be devastating. You can write such code in Rust. The same goes for kernel-mode interrupt handlers.

  • Highly performance-critical code
    Rust is fast. And it stays fast even when high-level abstractions are used, and without losing safety. In most languages, one either does not have memory safety or must give it up for speed. This is not true in Rust. In the benchmarks game, Rust and Swift each win 4 benchmarks against the other, but Rust does so with no unsafe code, while Swift is compiled with -Ounchecked, which I believe turns off array bounds checks, among others. Furthermore, Rust wins by a higher safety margin.

  • Libraries
    Rust’s minimal runtime requirements (the part of libstd and its dependencies that are used by the application/library, assuming dead-code elimination is done by the linker or LTO is used) and lack of a runtime to initialize mean that Rust excels at writing libraries to be used by C or other code that have a C API. You cannot do this at all in Swift I believe.

Rust obliterates the competition when it comes to such tasks. That it is good at other areas is a testament to how well it is designed. Indeed, I find that it is quite good in other fields. But it will never be as productive as (say) OCaml, or F#.


Most coveted Rust features
#72

I agree it is important to identify Rust’s focus and strengths.

Unless I am reading too much between the lines, you are ostensibly oversimplifying the issue of concurrency.

First of all, let’s establish for all readers that parallelism is not concurrency{1}. Parallelism is partitioning the algorithm into chunks that can run independently for performance. Concurrency is the avoidance of blocking a thread, for performance and to reduce latency of interactivity. For performance, we prefer to use the minimum number of threads because each thread adds task switching and stack memory overhead, yet we also don’t want fewer threads running than there are CPU cores{2}.

Concurrency is not achieved just by having tasks running on separate threads and passing data between them. Rather concurrency is fundamentally about avoiding all thread sequencing primitives{3} and instead forming an inductive algebra code path DAG (directed acyclic graph) of asynchronous futures{4}, or inverting the control into a reactionary coinductive algebra dependency DAG driven by an event loop{5}. In the former case, afaics we must have either GC or ARC (with strong and weak references) else I visualize that the resource lifetime management is going to infect everything and get far too hairy to be justified except where GC performance is absolutely intolerable.

In the asynchronous DAG model, the code path (distinguished from thread) can be entire blocked while awaiting the future (thus no possibility for a race condition) or it can split with one code path continuing immediately and thus in this case one of the splits has to receive an immutable reference. I don’t really see how mutable resource borrowing can help us because in the former case the code is blocked and in the later case it doesn’t wait for the future to return? Seems we merely need to have our future enforce that one of the splits takes an immutable reference to any data that both of the splits can reference.

And I am conjecturing that we will not get clean code without native support for generators, coroutines, and/or continuations. I intend to dig deeper into the details of that in the future to test my intuition with detailed examples.

YouTube links below skips into the same video, Sean Parent: Better Code: Concurrency:

  1. https://existentialtype.wordpress.com/2014/04/09/parallelism-and-concurrency-revisited/
    https://www.youtube.com/watch?v=32f6JrQPV8c#t=511

  2. https://www.youtube.com/watch?v=32f6JrQPV8c#t=951
    https://www.youtube.com/watch?v=32f6JrQPV8c#t=3766
    https://www.youtube.com/watch?v=32f6JrQPV8c#t=4495

  3. https://www.youtube.com/watch?v=32f6JrQPV8c#t=531
    https://www.youtube.com/watch?v=32f6JrQPV8c#t=52
    https://www.youtube.com/watch?v=32f6JrQPV8c#t=684

  4. https://www.youtube.com/watch?v=32f6JrQPV8c#t=2164

  5. https://www.youtube.com/watch?v=32f6JrQPV8c#t=3288
    https://www.youtube.com/watch?v=32f6JrQPV8c#t=3632


Blog Post: Lifetime Parameters in Rust
#73

Agreed. Garbage collected languages require 5 times more memory to achieve the same performance as explicit memory deallocation. With only three times as much memory, the collector runs on average 17% slower than explicit memory management. However, with only twice as much memory, garbage collection degrades performance by nearly 70%.

Incorrect.

And note that immutability can make GC of temporary objects much more efficient:

Haskell’s computation model is very different from that of
conventional mutable languages. Data immutability forces us to produce a
lot of temporary data but it also helps to collect this garbage
rapidly. The trick is that immutable data NEVER points to younger
values. Indeed, younger values don’t yet exist at the time when an old
value is created, so it cannot be pointed to from scratch. And since
values are never modified, neither can it be pointed to later. This is
the key property of immutable data.

This greatly simplifies garbage collection (GC). At anytime we
can scan the last values created and free those that are not pointed to
from the same set (of course, real roots of live values hierarchy are
live in the stack). It is how things work: by default, GHC uses
generational GC. New data are allocated in 512kb “nursery”. Once it’s
exhausted, “minor GC” occurs - it scans the nursery and frees unused
values. Or, to be exact, it copies live values to the main memory area.
The fewer values that survive - the less work to do. If you have, for
example, a recursive algorithm that quickly filled the nursery with
generations of its induction variables - only the last generation of the
variables will survive and be copied to main memory, the rest will be
not even touched!

Javascript is an example that immutability isn’t the only case of creating too many temporary objects.

I realize Rust can do static analysis of explicit memory deallocation, but tangentially note even reference counting memory management can’t be assumed to be real-time:

Not real-time
Naive implementations of reference counting do not in general
provide real-time behavior, because any pointer assignment can
potentially cause a number of objects bounded only by total allocated
memory size to be recursively freed while the thread is unable to
perform other work. It is possible to avoid this issue by delegating the
freeing of objects whose reference count dropped to zero to other
threads, at the cost of extra overhead.

And reference counting memory management isn’t faster than GC:

Here are the numbers I got on my dual proc PIII-600:

ref_gc: 531ms
ref_rm: 3563ms
ref_rs: 844ms

And reference counting won’t free the dangling circular references that GC will.

Edit: non-crashing “memory leaks” with GC and Rust’s resources lifetimes are caused by higher-level semantics and data structures which have unmatched add/remove items{1}. And given that Rust’s lifetimes are more unsafe than GC, then Rust may have crash (accessing freed memory) memory leaks which GC doesn’t have. Edit#3: the prior sentence is incorrect, as evident by the linked discussion.

Edit#2: on further thought the following paragraph from my prior edit is untrue, except the italicized part remains true:

Although afaik it is possible to model and check these higher-level resource lifetimes with Rust’s checker, afaik _Rust can’t insure the programmer will or that the code will be reasonably manageable in certain circular dependency imperative spaghetti scenario_s. Another interesting point is that Rust’s lifetime checker could be useful even alongside GC.

  1. http://stackoverflow.com/questions/10577534/can-you-have-memory-leaks-with-a-garbage-collector
    http://c2.com/cgi/wiki?MemoryLeakUsingGarbageCollection

Blog Post: Lifetime Parameters in Rust
Inversion-of-control resource management
Blog Post: Lifetime Parameters in Rust
Blog Post: Lifetime Parameters in Rust
#74

You cannot have circular references in rust:

fn main() {
    let x;
    let y;
    x = &y;
    y = &x;
}

<anon>:5:9: 5:11 error: mismatched types:
 expected `_`,
    found `&&_`
(cyclic type of infinite size) [E0308]
<anon>:5     y = &x;
                 ^~

So Rust code is safe from these kind of problems, unless you write an unsafe code block, in which case you are at least warned about these kind of problems. So it looks like (safe) Rust code prevents more memory leaks than a garbage collector.

The Sean Parent talk on data structures goes to some lengths to explain why incidental-data-structures are a bad thing. We want data structures to be connected, non-circular, logically-disjoint and owning. As locality matters we should prefer to use arrays or vectors.

So you should not need to write unsafe code, and Rust helps you write good code that adheres to the above principles.


#75

Don’t spread incorrect notions. You can totally have circular references as long as no destructors are involved.

use std::cell::Cell;

#[derive(Copy, Clone)]
struct Ref<'a> {
    data: &'a Cell<Option<Ref<'a>>>
}

fn main() {
    let mut v = Vec::new();
    v.push(Cell::new(None));
    v.push(Cell::new(None));
    v[0].set(Some(Ref { data: &v[1] }));
    v[1].set(Some(Ref { data: &v[0] }));
}

#76

Sorry I got it wrong. Why didn’t the example I tried work? Perhaps I should rephrase that and say you cannot have circular references statically. Perhaps even that using ‘Cell’ is to be avoided?

It certainly doesn’t change the goal of having no incidental data structures.


#77

The example code won’t leak memory like an RC cycle will. It’s also memory safe.

The reason it didn’t work is right there in the error message; it can’t infer the data type.


#78

Presumably when ‘v’ goes out of scope, it all gets cleaned up because the ‘Cell’ is owned by the vector. I guess what I meant to say is you cannot get a cycle of ‘owning’ references in Rust. You also cannot get a cycle of ‘borrows’, but you can get a cycle of non-owning references (using Cell or raw pointers). I think I can claim I over-simplified it, rather than getting it completely wrong :slight_smile:


#79

There’s still RC cycles to worry about…


#80

I would have thought RC as it represents ‘shared ownership’ should not be allowed to have cycles? That would fit with the semantics of ownership (and we are just talking about sharing it after all). I guess I don’t agree with the design choice to allow cycles in shared ownership. (but me not liking it doesn’t change the fact that people have to deal with it, if that is how Rust is currently implemented).


#81

There was sort of a proposal that would use the typesystem in a complicated way to prevent some kinds of cycles/leaks with Rc, but it was decided against.

Defining what a “memory leak” even is is difficult, and memory leaks are memory safe. Rust’s focus is very specific; we don’t claim to solve all problems. :smile:


#82

Please kindly :slight_smile: do not quote me out-of-context. I didn’t write that Rust will allow statically checked owned circular references. I wrote that sometimes we encounter circular dependencies required by the most straightforward implementation of an algorithm, and then I presumed we would have to use unsafe code or mangle our algorithm sufficiently to appease Rust.

The others covered the other potential points. Seems I agree with all that was written.


#83

So in conclusion compared to GC, Rust’s lifetimes incur more hassles and force you to declare code unsafe to handle circular or multiple mutable references, but have the advantage:

In other words, as I documented upthread, GC can achieve the same average throughput performance as explicit memory mgmt but requires 3 - 5 times more memory and/or it can attain the same average (or maximum) pauses latency but with a loss in average throughput. Rust’s lifetime management can optimize both but at the cost of more hassles and complexity.

I have an additional thought which is that most software that is not for mission critical applications typically will have those sort of semantic “memory leaks” I mentioned upthread, thus asymptotic memory leak performance may dominate in both GC and Rust’s lifetimes for these sort of programs and thus Rust may not have a significant advantage over GC for these non-mission critical apps. Does that make sense? Note as mentioned at one of my upthread links, C++ would simply crash sooner than reach asymptotic memory leak performance because of accessing freed memory instead of maintaining a circular reference that is no longer accessible by the program (even though still accessible by the chain of references from the variables on the stack).


Blog Post: Lifetime Parameters in Rust
#84

The eternal GC vs. manual-memory-management comparison depends on your usage pattern. High-performance code (in all languages) avoids most memory allocation overhead by intelligent use of object pools. Ownership/borrowing makes object pool usage quite safe and easy in Rust.

GC is generally much faster in “Lisp-style” code - if you have lots of tiny objects referencing each-other and frequently created and destroyed. Manual memory management is generally faster in “kernel-style” code - when you have a large heap of large objects that are rarely created or destroyed.


#85

I understood all you wrote, except this part. Why faster? Do you mean less hassle to code?

Do you mean the generational GC can be more optimized to recycle and free these temporal objects?


#86

When you have lots of object churn and only few objects leaving eden / young gen / whateveryounameit, GC can become arena allocation + a bit of copying, whereas manual management must free each object by itself.


#87

I see. That is what I thought, but now I understand how to phrase it better. Efficiency is gained through batching because the GC has an orthogonal statistical or batch perspective, versus point-by-point allocation and deallocation of explicit memory mgmt.