Rust From Pure-Functional Scala

I've been using Scala with pure-functional libraries such as (mainly) Cats professionally for years. The learning curve is high but I have become convinced that the extra effort pure functional programming requires is worth it - never have I worked on large-scale systems with so few bugs and that were so easy to analyze (no need for a debugger, ever).

One thing about Rust that has intrigued me is the claim that the borrow-checker/ownership model makes immutability unnecessary - or, at least, makes the type of strict, managed state as manipulated in a monadic pipeline in pure FP unnecessary.

So the promise of getting the benefits of pure FP without the overhead (the learning curve plus the always-there overhead of everything being more effort) is quite tempting!

What I've found so far, though, is that the manual memory management required by Rust makes using it at least as time-consuming and complex. (To some degree this will change as I develop more expertise, but certainly tracking ownership, lifetimes, values vs. references, etc., adds objective complexity.)

There is one related matter I am curious about:

Are there advantages of the Rust borrow-checker system over garbage-collection specifically in highly concurrent programs?

Obviously we have the constant-latency advantage, but, is there anything more?

Also, if there are any out there I'd love to hear from people doing pure FP in Scala (or another language that supports it) that have moved to Rust.

3 Likes

Performance and safety gains

2 Likes

Rust does not have manual memory management (unless you opt-in to highly unsafe abstractions). Rust has automatic memory management.

Citation needed. Rust's core data structures are no more complex than necessary.

The fact that there's no GC running all the time means that there are no GC pauses in Rust – the behavior related to memory management is thus much more predictable.

Apart from that, the fact that you can't write race conditions in (safe) Rust due to the borrow checker and the other aspects of the type system is probably much more valuable in most cases than micro-optimizing memory performance could be.

11 Likes

Based on what? What's the theory here?

FYI, here is one little experiment I did: I created gRPC servers in Scala & Rust, each of which accept a simple object and write it to a MongoDB cluster, with high concurrency.

After VM warmup, the Scala version is about 70% faster in terms of RPS.

This is not surprising because the JVM is highly-tuned, and because this is a strongly IO-bound system.

My point is just that Rust is not always faster.

"Citation needed. Rust's core data structures are no more complex than necessary."

The extra complexity is in deciding when to store by value or by reference (even for objects as simple as strings), when to pass by value or by reference, etc. None of these questions exist in a GC'd language.

I think the plain inability to create race conditions is indeed very powerful. Pure FP concurrency systems give you that too.

What I'm trying to figure out is if the guarantees of correctness, especially with regard to concurrency, that the Rust model provides out-of-the-box is as powerful as a pure FP effects system like Cats Effect or ZIO. My gut feeling is it is, but I'd love to hear from anyone that's gone down this path..

That's not very hard, though. You store stuff by value – your data structures should almost always contain values, not references. References should be treated as short-lived views into some pre-existing data, and be handed out only temporarily.

GC'd languages may not have this dimension to their type system, but they introduce a whole lot of complexity precisely because of that (shared mutability is the big one, but confusion around "reference identity" and inadvertent leaks are also annoying – empirically, much worse to debug than it would have been hard to design with these questions in mind upfront).

2 Likes

Is it that simple? I've already been instructed to optimize structs by sharing strings between them using references.

There's also more complexity in knowing when to use Dyn, Box, etc.

I'm just pointing this out - it's certainly not a criticism. It's an inevitable side-effect of eschewing a garbage collector.

However I'm certainly still a Rust novice and may not be grasping enough to see that this complexity largely falls-away in practice...

Yes, all these problems exist, but all are eliminated with a pure-functional library.

(Using shared mutable state is the worst possible way to design any concurrent system IMO.)

2 Likes

I think I'm going to have to code a substantial concurrent subsystem in both Rust and ZIO (my new favorite pure FP library) and see what I think.

If there are any (ex) Pure FP Scala people out there, though, please chime in.

FWIW, the reason I originally started learning Rust was because I was trying to make a Minecraft clone in Scala but found no way to overcome constant GC freezes without making the code ugly.

To address the main thing you're talking about, I can't really think of a theoretical fundamental reason why average throughput would have be significantly faster for Rust-style garbage collection than JVM-style garbage collection. I could be wrong though.

For your gRPC example it would be interesting for you to start a thread posting the code to get some other eyes on it. It's certainly possible that Rust is losing that one fair and square, but there can also be some mistakes a new Rust developer can make that'll be their code run a lot slower than it could with some minor improvements.

One thing I might also mention, though, is that, while Rust's ownership model does solve the same problem as traditional functional programming immutability (that being unconstrained state mutation hairballs that are difficult to reason about), I feel that Rust's ownership model more easily handles side effects than the functional programming style. Side effects in pure functional programming, handled with monads and stuff, are pretty notorious. On the other hand, Rust's ownership model's ability to let you reason about in-place mutation in constrained ways makes it quite natural to, say, pass a &mut File into a function and expect that the function may write to the file, or pass a &mut Sender to a function and expect that the function may send messages down that concurrent queue. This may make it easier to construct Rust programs which use a broader variety of concurrency patterns, rather than the limited set of patterns such as fork-join that maps very nicely into functional programming.

7 Likes

Obligatory "did you run with the --release flag?".

It would be nice if you could share your code. It's quite likely that you have made some novice errors which caused a drop in performance.

That's the problem: Rust gives you more control, which means you can usually make Rust programs faster than Java ones (and even more faster than Scala, given its overhead). But it's just as easy to misuse that power and tank your performance. Rust puts much higher requirements on programmer skills than e.g. Java, which solves many hard memory management problems at the JVM level.

Storing references in structs is usually an antipattern. It has some valid use cases (e.g. a by-reference iterator for Vec<T> will of course store a reference to the buffer), but more often is caused by some misplaced premature optimization. The problem is that a struct with a lifetime is always restricted to exist within a single specific scope (e.g. a specific function), and can never move out of it. This is not a problem if it is already semantically restricted to that scope (e.g. it's a temporary view, like an iterator or a buffer slice), but usually that's not what you want, and it just causes problems.

It may work if you're doing some zero-copy deserialization, where you just parse a message into a borrowing struct, do some operation on it and discard it, moving to the next message (e.g. you could be doing some packet filtering). Even in this case, you should consider whether a reference-counted buffer (like Bytes) would suit your case better.

Rust tends to have far fewer boxing than languages based on garbage collection, since borrowing and stack-allocating values is used extensively.

Rust is ahead-of-time optimized (partially by the compiler, and partially by the programmer). This means that you are not beholden to the whims of the JIT. You don't need to wait for your code to warm up and optimize (which is very important for short-running processes, including serverless aplications), and you don't get random performance regressions due to the randomness of JIT optimizations. The AoT compiler tends to produce very predictable output, and in any case you can benchmark your compiled artifacts before you put them in production, catching early any regressions (as usual caveats apply, not everything can be tested outside of production).

Smaller data structures with less indirection tend to give better cache utilization and less memory traffic.

14 Likes

The big difference is that GC only solves use-after-free for memory. It does nothing at all to help with use-after-free for sockets, files, locks, etc. And it turns out that once you have all of rust's rules to help you not write mutexed data after it's unlocked, applying those rules to memory is comparatively easy.

Rust's rules are arguably most useful in highly-concurrent programs, because &mut is such a strong statement. If you have one of those, you know that nobody else can be reading or writing it. That's nice in single-threaded programs, but it's in highly-concurrent ones where it really shines, because "nobody" also means "no other threads"! So if you have &mut, it's a guarantee that you don't even need to think about threading for that code. And similarly, a & to a type (well, to a type that doesn't transitively contain an UnsafeCell) means that it can't change while you hold that reference. That means it can safely be read non-atomically, whereas in GC languages it's typical that the language doesn't know whether something is shared, and thus it needs to always use atomic reads (definitely it needs them for objects, assuming a concurrent GC, but they tend to end up using them for integers too to prevent tearing).

Remember that the original reason Rust became anything other than a side project was to making it feasible to write parallel page layout for a browser. That's really it's raison d'Γͺtre more than anything else. And, in fact, pre-1.0 Rust actually had a garbage collector (and green threads, and other such things) because it was expected that they'd be needed to reasonably solve this problem. It's only later that they got removed, once all those "you're only allowed to read this so it's safe to share between threads" and "this allows writing so it can only be moved between threads, not shared" annotations turned into "hey, wait, these solve the memory allocation issues too!"

22 Likes

I have moved to Rust from Scala, though not doing pure FP.

The borrow-checker not only protects you from null, dangling pointers and memory leaks, but from being thread-unsave. In Java/Scala, you usually have to assume that each object can be accessed by other threads through other references and you therefore have to lock, which is expensive and introduces contention. Or you use something like Akka, which introduces a lot of extra complexity. I was particularly disillusioned with Akka when I used Akka Stream in hope of using multiple threads (i.e. CPUs), but the resulting code still only used a single CPU, despite all the fanfare about reactive streams. Or when I spent two days finding the cause of a mysterious timeout, which was caused by a Future still not having completed even though the containing code had completed many hours ago.

And to talk about performance: the JVM is an interpreter, and interpreters are slow. And each object has an object header of at least 40 bytes, IIRC.

Things like performance and thread-safety are simply not high priorities for Scala. Scala's highest priorities are research (look, DOT calculus) and teaching (it must be clear to CS 101 students).

5 Likes

Yes, it is that simple. If you can elaborate on the specific advice you got, we can critique it, but in general, references are to be treated as temporary. There is also more than one way to "share strings" without references – one way is to use Rc or Arc, the reference-counted smart pointer types, which incur no requirement of thinking about lifetimes (since, unlike references, these types provide shared ownership).

That is not related to the borrow checker. Dyn is needed when you want dynamic dispatch. It has rare but legitimate use cases, primarily 1. when you must use dynamism because generics simply wouldn't work (eg. a router for an HTTP server must store function pointers of a uniform type, keyed by arbitrary strings), or 2. when you want to get rid of generics because eg. you don't want to expose some implementation-detail type parameters in your API.

Box is needed when you want to heap-allocate a single object. Allocating a Box is the heap-allocated counterpart of simply storing an object by-value. It is usually needed when you want a recursive data structure – which is completely logical, given that a by-value recursive data structure would have infinite size.

Indeed, when you don't know the language very well, it is entirely conceivable that you find it hard. That's not a problem with the language, though.

5 Likes

Note that this exists in GC'd languages too.

For example, in C# I can use void Consume(IEnumerable<int> seq) (which is like dyn Iterator<Item = i32>), but I can also do void Consume<T>(T seq) where T: IEnumerable<int> (which is like `impl Iterator<Item = i32>).

Controlling exactly where the dynamic dispatch happens (and doesn't happen!) is a big deal for high-performance code. So you can ignore it, and hope that optimizations and branch prediction do a good enough job when everything is dynamic, or the language needs to offer a way to pick. That's true regardless of memory management strategy.

(See also final in Java, for example, to say that something should be statically dispatched instead of the default dynamic dispatch.)

This is a great point, because once you're outside the "we have a GC; it'll be fine" zone, the languages tend to be missing the features that would help you these things more easily.

For example, C# added ArrayPool<T> Class (System.Buffers) | Microsoft Learn to help high-performance code avoid GC costs. But C# doesn't have anything to make sure that, when you give an array back to the pool that you didn't save a reference to it somewhere. The language assumes that it's fine to just have references to it wherever, since after all the GC will clean it up once it's no longer referenced.

This is exactly a classic use-after-free problem -- putting the array back in the pool is "free"ing it, and the code that did that better not use it any more -- but of course C# doesn't have the language features to help avoid that.

Whereas in Rust the ownership and lifetimes model lets you set it up so that once you move something into the pool, Rust checks that nothing else is referencing it, even though the memory is still allocated.

13 Likes

To begin, no offense, but you are missing something if you begin with the assumption that even IO-bound code is going to be faster in Rust than on the JVM. That's not the case at all.

Again, at this point, the JVM is extremely highly-optimized - and the ability to recompile with new optimizations in realtime is an advantage non-VM languages do not have.

(20 years ago I demonstrated to quants at a hedge fund that Java could and did outperform C++ for most of what they were doing.)

The weaknesses of the JVM are memory footprint, cold start, and inconsistent latency due to GC, but, not overall throughput - definitely not for anything substantially IO-bound.

Your presumption that Scala will be less performant than Java is also incorrect. It depends on many factors. Pre-Java 7 GC dealt poorly with the large amount of short-lived objects that functional languages tend to produce, but that is no longer the case.

The differences in performances I was referring to here are "small" in that they are within a factor of two or three - definitely under an order of magnitude. They could be due to multiple things:

  • ZIO has an extremely efficient concurrency model based on fibers (green threads)
  • Maybe the gRPC & Mongo libs are more highly-optimized than the Rust counterparts
  • Etc.

I expected the IO-bound test to be close. What I didn't expect what that even in my CPU-bound test, Rust was only marginally faster - about 20% faster than the Scala/ZIO version.

Yes, with a release build and with no rookie mistakes because those were corrected here. If you look at my profile you can see the thread - I posted the code here and made the changes people suggested, which did make a difference but not a huge one.

When I look at the serious benchmarks, Rust is often neck-and-neck with the JVM - but sometimes 10x faster. I haven't yet figured out what it is about those particular tests that make Rust perform so well.

I am talking about the data here:

Regarding performance in general, the small memory footprint of Rust executables, zero startup cost, and consistent latencies matter a lot in some contexts! So even if overall RPS throughput is "only" 20% faster for Rust, it's not like that's inconsequential - always.


For fuller discloser the main reasons I'm interested in Rust now are these:

  • I find the possibility that the advantages of pure functional programming can be had without its attendant complexity fascinating. Lambda calculus and category theory are beautiful things, but, if a borrow-checking system makes them extraneous, that's amazing

  • I am sick & tired of the fact that the Scala ecosystem and tools pretty much suck. In 2023 there isn't one Scala 3 IDE that works really well, all the time. (Yes, this is largely due to Scala's very powerful type system but a PITA is a PITA no matter the reason.)

  • Rust has lots of momentum now. In the latest Tiobe Scala is at .26% and Rust .7%. So both are niche languages, but, hey, Rust is almost three times more popular - and it has upward momentum (unlike, sadly, Scala).

If anything can turn the Scala future around it's ZIO, but that remains to be seen.

2 Likes

I am well aware of that, and, yes, it's great.

ZIO gives you the same thing in Scala with finalizers (ZManaged, Scope, etc.), but to have this built into the language is awesome.

Agree, this is super powerful, and I feel the full impact of this fact hasn't come to me yet.

I'm repeating myself now, but, for the record, my comparison is not Rust. Vs. Scala naked, but Rust vs pure functional Scala (we'll say with ZIO). Pure FP solves these issues (but obviously is extraneous to the language per se).

Akka, is, frankly, a PoS, especially the originally, untyped actor version.

No offense, but this is a seriously incorrect statement. First, the JVM is not an interpreter, but a virtual machine that runs compiled code. Secondly, such a VM has the advantage of being able to recompile on-the-fly with knowledge gained by observing real-time performance.

JVM performance often equals or even exceeds that of Rust or other pre-compiled languages. Check the benchmarks linked above.

VMs have weaknesses (footprint, startup time, GC latency), but this is not one of them.

Your comments apply to actually interpreted languages (say, Python), not the JVM.

Again, no offense, but you need to get out more. Scala is used heavily in finance, including in real-time trading systems, though not in areas like HFT, where consistent, microsecond-level latencies are required.

3 Likes

Whenever I hear finalizers I get worried.

I'm not familiar with ZIO, so I looked it up (google gave me this), and this quote about ZManaged I think emphasizes the difference to me

Resources do not survive the scope of use, meaning that if we attempt to capture the resource, leak it from use, and then use it after the resource has been consumed, the resource will not be valid anymore and may fail with some checked error, as per the type of the functions provided by the resource.

In Rust's it's not "may fail with an error". Instead, it can't escape the scope in the first place. And, as a bonus, that means that it doesn't need to check the "did it escape?" flag on every use.

It's like the difference between std::unique_ptr<Foo> in C++ and Box<Foo> in Rust. A unique_ptr might be nullptr. So it's not that Rust is safe because it checks for nullptr, it's that the Box can't be null at all. If one wants a might-not-be-there Box in rust, that's an Option<Box<_>> -- which, thanks to layout optimizations, doesn't even take more memory than just the Box.

Guarantees like that can only come from compiler-checked object lifetimes. Yes, there's an annotation burden for it, but the magic of Rust is that the burden usually isn't that bad, and that frequently the additional work to understand ownership helps improve the program anyway.

(Not always, admittedly! There are certainly various specific situations in which GC is quite handy.)

11 Likes

It prevents large classes of data races. Things like

https://www.uber.com/blog/data-race-patterns-in-go/

I used to use Scala/ScalaJS quite a bit, now playing with OCaml.

One thing that we can not do in Scala/OCaml is to control whether a field of a struct is stored "inline" or as a ref on the heap. Rust allows this control, but at the intellectual cost of writing T, Box<T>, Rc<T>, or Arc<T>.

And this is my current take on Rust -- if you want the low level control of memory layout, you have to write more code. From an information theoretic perspective, there is no way to avoid this: you want to specify how things are laid out in memory -- so you have to write more code to state how you want it laid out in memory.

My current view is: start with Scala/OCaml, benchmark, and only write the high-performance low-latency multi-core part in Rust; for everything else, it's quite overkill.

1 Like