Rust runtime compat in Clojuresque lisp frontend


#1

Hello!

I plan to write an LLVM frontend for Clojure, partially because it’s awesome, partially for my own education, and partially because I see a demand: Many Clojurists writing scripts must currently employ somewhat lackluster tooling and language support built around the unworkable, unstable server-side infrastructure of node.js, which doesn’t really even implement threading. Conversely, Clojure on its native Java platform takes about a second just to start up on my 3.5GHz quad-core, let alone intern everything inside of an actual Clojure module. The alternative I see is to write a Clojure implementation that can ahead-of-time compile images into assembly. Currently, I call this project Flak (no meaning; sounds cool).

However, much of the pragmatic appeal around the Clojure project is its seamless, near-reflexive operation with imperative language runtimes, e.g. Java from the original Clojure sources, node.js and browser API from ClojureScript — references to variables in global scope and interning modules from their respective packaging systems can be more reflexive than in any other lisp-family language, period, and it isn’t as though this hasn’t been thoroughly explored in other areas of lispdom: Just look at the amazing work on cl-autowrap.

I believe the ability to drop down into a lower level language so effortlessly is an important aspect of the marriage of parallel paradigm and productivity that makes Clojure pragmatic, and remains responsible for a big part of what makes it compelling to use.

The upshot is that seamless interoperation with an imperative ecosystem, the host platform, is part of the Clojure tradition, but presents a bit of a challenge in building something similar on Rust. As I’ve brought this project to lisp users, especially those familiar with Clojure, there has been great excitement around the prospect of that level of interoperation with an ecosystem like Rust’s, which is rapidly gaining in popularity and stands only to grow in the near future (AFaIK). Rust semantics have the added bonus of beautifully complementing Clojure’s approach to concurrency by solving problems in ways that Clojure and Clojurists cannot and/or will not (might as well be writing Java). The ultimate goal would be completely effortless interoperation with Rust embracing Rust idiom through ergonomic declarations of dependencies, and leveraging Cargo in to be able to write forms such as

(extern crate some-lib)

mapping to Rust source like extern crate some_lib; in the top-level module of a normal crate, and, following that, intern Rust definitions much as Rust code does, e.g. use:

(use some-lib::SomeStruct)

(.meth (SomeStruct::new (& ["Here's" "a" "few" "args"])))

…and call into the appropriate method, as with respect to Java libraries under the JVM. A separate runtime would likely be implemented with a rudimentary generational garbage-collection scheme, incorporating a record in binding meta-data reminiscent of wrapping everything in Arc<T> before passing vars to subsequent generations, in which I resort to mark-and-sweep: This results in some different fundamental semantics than the ownership system. I hope to loosen up the memory constraints around Rust types partially in the interest of clojuresque flexibility of paradigm, and partially because I simply don’t understand it very well, but don’t find that memory safety to be a big goal for a language whose only implementations are currently garbage-collected. Clojure makes similar concessions with respect to mutability in favor of concurrency, it still runs blazing-fast on top of the JVM. If you feel this should be done differently, I would love to hear how, but I value the ergonomics of this approach.

Here we finally come to the bit that applies to the forum. Although I would love to interoperate with rust, I would also like to avoid resorting to simply using Rust source code as a compile target, for fear of driving up compilation times and having to build around the compiler as a separate component — the consensus among more experienced lispers seems to have been been that an integral feature of a lisp is to be able to JIT-evaluate top-level forms in a loop, dumping their bindings into the image and results into stdout on-demand: It is no coincidence that virtually every self-respecting lisp-family language implementation boasts a read-eval-print loop. The ability to introspect on code as data and program in terms of transformations of this data is arguably what defines a lisp.

Thus, many have suggested that rather than limit myself to statically compiling rustc -> LLVM IR -> asm, building a runtime on top of Rust, I build a separate, minimalist runtime happening to be compatible with Rust’s. There are a few complexities I find a little confounding in this approach, which, especially starting out, may make it more feasible simply to target Rust and somehow dynamically incorporate its artifacts into an image, or perhaps even its mid-level intermediate representation. There are questions, many probably implied somewhere in the midst of the context dump, for either path from this fork in the road, and their answers will determine how I proceed.

Foremost on my mind now: Is it possible to resolve qualified symbols without introspecting on sources in order to make use of precompiled Rust libraries, e.g. .rlib? (One rather difficult kink to work out is how to robustly hash namespaces consistently with rustc).

Of course, there are some considerations that have to be taken into account in the course of addressing this question, but given that even constructors of Structs are exposed essentially as free-standing static functions, with some pipework around fleshing out the runtime, they don’t seem too damning to resolve. This would likely be the simplest solution to exposing quick, dirty interop still satisfactory for everyday use and leveraging what’s already been written in a blazing-fast systems programming language in early iterations without requiring Rust libraries to expose a C interface.

All of these complexities for what would appear to be a rather brittle and lackluster solution to interoperating with Rust led me to my second question: Would it be feasible to somehow enlist librustc to generate a lisp image? I envision reducing the EDN sexpr frontend to something like Rust high-level intermediate representation and passing it into the rest of the compiler machinery to retrieve bindings that could then be easily serialized and trivially recalled, as in shared object files. To take this approach would probably be best in keeping with the Clojure philosophy of “embracing the host platform.” However, despite my enjoyment of using Rust, I haven’t a good enough idea of its internals to lay out a good implementation plan on the spot.

Maybe I’ve been going about all of this all wrong. My only prior programming has been at the application level, so I would really like some guidance. Looking forward to reading any and all of what y’all have to say. If this is an inappropriate forum for this topic (and it may well be), please be so kind as to point me somewhere I can take my inquiry and my wall.


#2

There’s a pretty serious challenge inherent to hooking any GC’d language up to Rust libraries: Rust’s types almost all make claims about ownership and aliasing that GC subverts.

For example, suppose your Clojure code wants to call some Rust function that takes a &T for some T. A Rust shared reference means not only that the callee can’t modify the referent, but that nothing in the system can modify the referent. This is essential to Rust’s idea of a shared reference; all the safety and concurrency guarantees depend on it.

But the reason one wants a GC in the first place is that one wants to not worry about who might have a pointer to what. An object you’ve plucked out of a GC heap could have any number of references to it, from anywhere in the system. So unless the value is completely immutable for all its users, then you can’t correctly create a &T pointing to it.

Okay, so values in Clojure are all immutable. But then suppose you want to call a Rust function that takes an argument by value — that takes ownership of it. Rust guarantees that values are only moved when they’re not borrowed. How do you ensure that in a GC’d system? You have to know that there is only one reference to the value, which is a dynamic global property of the system.

Mutable references: mutatis mutandis.

The exception is Copy types. Unfortunately, you can’t really participate in the Rust ecosystem if Copy types are the only ones you can exchange between one language and the other.

People say, “Rust doesn’t have a GC”. But when I look at the lengths people go to to create safe interfaces to GC’d values in Rust (see Josephine and cell-gc), I start to feel like it’s deeper than that: Everything that makes you need a GC seems to be exactly what Rust scrupulously avoids. Rust is, I dunno, the diametrically opposite point on the globe from GC. That’s over-dramatic, but I feel like there’s something pretty fundamental there.

(I’m pretty sure people will follow up with a dozen excellent ideas for doing the integration you want; I’m sure I’ll learn a lot!)


#3

For example, suppose your Clojure code wants to call some Rust function that takes a &T for some T. A Rust shared reference means not only that the callee can’t modify the referent, but that nothing in the system can modify the referent. This is essential to Rust’s idea of a shared reference; all the safety and concurrency guarantees depend on it.

In my opinion, it’s not unreasonable to ask Clojure writers to use Rust semantics in the course of interfacing with Rust libraries. Clojure writers in Java have to make similar concessions; at the risk of stating the obvious, Java code has to be passed values that Java expects. This is rendered compatible with Java’s semantics and not impossible to integrate with Clojure largely by virtue of the fact that clojure.core's constructs are, for the most part, implemented in terms of Java. Like Rust, Java has a type system capable of ensuring that Clojure data subscribing to certain requirements can masquerade as something Java interfaces will recognize, and vice versa. Primarily, manipulations on user-defined Java types (analogous to those found in a crate) would be done by Java procedures called from Clojure code owing to the difficulty of safely, practically, and generically supporting that sort of shim, which is eased somewhat in the standard library by the fact that the two intrinsically have some java.lang.* types in common, e.g. String and Pattern: The ergonomics of the API exposed in clojure.string and clojure.core are made possible by Clojure constructs like the . macro, which allows Clojure libraries to encapsulate the functionality given by mutable state under the hood in more Clojuresque API. Similar cases of this are evident in core.async, which uses a set of powerful macros to expose concise asynchronous transformation and passing of data, but uses Java threading under the hood.

Returns from combining calls to Java procedures can subsequently be wrapped in generic Clojure synchronization primitives and data structures for manipulations by Clojure, and mapped to strip out what parts of the payload don’t need to persist. Data generated by functions in terms of mutable state can be packaged by the caller however it wants; Rust in particular places a premium on this principle. Given that this packaging can be done in terms of Rust itself, ultimately, it seems just another case of asking users to recognize exactly what code does, and if we didn’t ever do that, a lot more of the enterprise than lies in the scope of the project would be in hot water.


#4

I don’t quite understand what you mean, but what I know of Clojure I like, and I like Rust, so seeing them work together well would be great.


#5

Oh, sure; sorry if I was confusing

maybe I’ll just try internals.rust-lang.org