Rust-like scripting language

# Note: split is a monkey patching module, not a function.
use string.split

# A weak spot, currently there is no "auto trimming".
# Let's fix that for the moment.
_int_ = int
int = |s| _int_(s.trim() if s: String else s)

# Here we go:
array = |s| s.split("\n").map(|line| line.split(",").map(int))

# Alternatively:
array = |s| list(list(int(x)
   for x in line.split(","))
      for line in s.split("\n"))

You know, what the heck. Let's design one. Rust has about three key concepts:

  • Algebraic data types
  • Lifetimes
  • multiple immutable XOR single mutable

I feel that for a scripting language, lifetimes aren't the most appropriate as scripting languages don't need to provide strong memory guarantees, so we'll scratch that. Let's start with algebraic data types.

In terms of 'atomic' data types (using the lispy definition of 'atomic'), we've got the following:

  • Bytes (xFF, x00)
  • Integers (10, -420)
  • Floating point numbers (-20.20)
  • Strings ("Hello, world!")
  • Booleans (True, False)

Vecs and HashMapss are so ubiquitous at this point they deserve to be their own thing. In terms of combinatory data types, we've got:

Structs, a set of identifiers mapping to data types:

struct <Name> {
    <field>: <Type>,
    ...,
}

Enums, a set of variants with an optional data type:

enum <Name> {
    <Variant> <Type>,
   ...,
}

Tuples, a sized list of different data types

(<item of Type 1>, ...)

Vecs, a dynamically sized collection of the same type:

[<item of Type>, ...]

Maps, a dynamically sized collection mapping one type to another:

{
    <item of Type 1>: <item of Type 2>,
    ...,
}

I don't think that a Rust scripting language would forgo static type checking, rather, it would use a flexible Hindley-Milner type inference system. This would make it possible to forgo type annotations in function definitions. Additionally, with the work recently done on compile-time garbage collection, it's totally feasible that a new Rust scripting language could do that too.

One final key feature of Rust is impl, which allows traits to be implemented for specific types. I think types should be able to have impl statements, though traits need not be defined. rather something similar to Go's interfaces or Python's duck typing should be used. Finally, semicolons need not apply.

This is all a fairly pointless what-if exercise, so here's a sieve:

fn sieve(limit) {
    primes = [2]
    for n in 3..limit {
        for prime in primes {
            if n % prime == 0   { break                 }
            if n >= sqrt(prime) { primes.push(n); break }
        }
    }
    return primes
}

Wow little did I know, but I should have guessed, that my suggestion for determining data types at run time in a Rust like scripting language here: Rust-like scripting language - #5 by ZiCog, was invented in the late 1950's and has a name.

Mind you, had I ever read anything about Hindley-Milner that started like Wikipedia "A Hindley–Milner (HM) type system is a classical type system for the lambda calculus with parametric polymorphism." I would never have guessed it was anything to do with determining data types in a scripting language.

What you have nicely outlined there is just what I had in mind. If only I had the chops to build such a thing.

2 Likes

Notes on Smaller Rust points out that lifetimes are more than just memory safety and a scripting Rust would have to include it. Otherwise, we simply have OCaml. It has ADTs, its garbage collected, its module system is amazing and... The first Rust compiler was even implemented with it. I think thats what I would call Rust's soulmate. Not really a scripting language, but close. :smile:

2 Likes

I agree, it would have to have lifetimes.

The goal in my mind would be to create a scripting language that is as syntactically and semantically as close to Rust as possible. But could be hacked around with and run as quickly as Python or Javascript.

Then, when one feels the need for speed or whatever other reason one could easily recast ones program into actual Rust.

As far as I can tell Hindley-Milner idea allows us to not have to do all that messy type specifying everywhere as we would expect for a scripting language.

Hopefully it can also be used to do life time tracking at run time without having to put all those ugly tick marks into the source.

But what do I know, this may not even be possible.

1 Like

I have been thinking about such a simpler Rust-like language for a while (I'll call it RustScript in this post).
In my opinion it would really stand out from the myriad of other scripting languages, if it were binary-compatible with Rust crates. Then you could first write your application in RustScript, but still use Rust libraries and data structures for code that needs to be very performant or you could port parts of your application to pure Rust as the design stabilizes. Of course there would need to be a suitable API.

I think this could also help Rust become more popular. Companies that are currently not willing to wait months for their programmers to become productive in Rust could let their new hires start writing applications in RustScript and their more experienced coders do the high performance parts in Rust.
Basically a similar model to Python and the Python C API.

Of course since Rust doesn't have a stable ABI, this would mean working on top of rustc, i.e. transpiling RustScript to Rust and letting rustc compile the final binary. Transpilation to Rust might not be easy, but should be doable if the language is designed for this purpose and it doesn't have to be zero cost.

The problem would be compile times. For a lighter language meant to be used (nearly) interactively Rust compile times are too long. Maybe we could get there by compiling dependencies only once as dylibs and just recompiling the current RustScript crate?

2 Likes

Ocaml is a great programming language, I see it as a higher-level analogue to Rust. Because Rust was inspired by ML, a Rust-derived scripting language would certainly be similar to ML.

What I think makes Rust unique (compared to other ML languages) are impl traits (though Ocaml does support classes), borrow-checking, and zero-cost abstractions.

Lifetimes aren't the same as borrowing, though they are very similar. AFAIK, borrowing helps Rust infer the lifetimes of data in the program, and provides certain guarantees to how that data can be accessed and modified.

The Rust borrow checker, from what I understand, uses scoping and borrowing rules to determine where in the program variables are live, meaning still accessible. This region of where the variable is live is called the variable's lifetime. Rust currently uses the NLL borrow checker, which computes the lifetime of each reference, and the lifetimes of loans to that reference. A borrow checker error arises when a statement accesses a reference that violates some loan.

IIRC, The Polonius borrow checker intends to make this lifetime inference more flexible. Instead of directly computing the lifetime of everything, if starts by finding the origin of each reference. Polonius does away with directly computing liveness. Instead, it states that a loan is live if some live variable has that loan.

// modified from nikomatsakis presentation on polonius
let mut map: HashMap<u32, String> = HashMap::new();

let twenty_two = match map.get(&22) {
    Some(v) => v,
    None => { map.insert(22, "boop".to_string()); &map[&22] },
} 

This would throw a borrow checker error with NLL, but not in Polonius, because v is not live in the None branch of the match.

But note that lifetimes aren't explicitly needed for borrowing to work. A system with borrowing (i.e. single mutable xor aliasable immutable) could manage lifetimes of the objects with a garbage collector, or statically determine the lifetimes using ASAP (as static as possible) memory management techniques.

I'm developing an experimental programming languages that forgoes garbage collection and other traditional memory management techniques. I hasn't been released yet as it's still under heavy development. In short, it tries to infer borrowing and lifetimes dynamically, using a memory-management-technique I call 'vaporization'. The rules of vaporization-based memory management are simple:

  • Values are immutable, variables are mutable references to values.
  • When a variable is reassigned or goes out of scope, the value it holds is released.
  • When a variable is used, a copy of the data it contains is used.

It also makes the following optimizations:

  • When a variable is passed to a function, a reference to its value is passed.
  • The last use of a variable before it is released does not make a copy.

What does this look like in practice? Values are immutable, variables are references to values:

x = 7
y = x
x = x + 2
-- (comment) y is still 7

When a variable is reassigned or goes out of scope, the value it holds is released.

x = 7
x = 9 
-- 7 is released

When a variable is used, a copy of the data is contains is used.

x = 7
y = x
-- y is not the same 7 as x
-- think of it as `let y = x.clone()`

Note, however, that although this system is memory safe, it's also memory intensive.

x = 17
y = x + x -- three copies of x exist

To combat this, a few optimizations are made. There are a few optimizations used, though I'll cover the most impactful ones. First, When a variable is passed to a function, a reference to its value is passed.

-- function syntax is `<pattern> -> <expression>`
increment = x -> x + 7

x = 7
y = increment x

Here, a reference to x is passed into increment. However, this language is fork on mut, so x wouldn't be copied until x + 7 inside increment. This prevents passing many copies of the same data around functions, say in a recursive function, for instance.

Finally, the last use of a variable before it is released does not make a copy.

x = 7
x = x + x

Let's annotate it. V<N> indicates that all V<N> are the same. Additionally, V<Nf> indicates the last use of V<N>.

x<1> = 7
x<2> = x<1> + x<1f>

Following the above rule, x<1f> is not a copy of the value of x, rather it is the value of x itself. When writing code that might mutate something in a linear manner, this significantly reduces the memory usage:

x = [1, 2, 3]
x = x + [4] -- no copies are made

This language also supports flexible hygienic macro system, which 'hides' the assignment in most cases (like mutating an object). In combination with passing references to functions, copies of the data are only made when the data needs to be used in two places at once. I haven't discovered any memory leaks or excessively high memory through testing yet. If you have any feedback, or notice that something's off, please leave a reply :slight_smile:

It really would. One of the hard things about writing a programming language is building an ecosystem. Being able to jumpstart off another language's ecosystem gives a large benefit.

Have there been any discussions for standardizing Rust's ABI? I couldn't find any, but I swear I read something once... who knows. Anyway, perhap RustScript could target the MIR or something. This would still have high compile times, but it would circumnavigate Rust's compiler frontend. Compiling to Rust would also allow compiling to wasm and the like, and you get the free performance gains from the work on the Rust compiler.

If the language were build on top of Rust, a FFI that interops with Rust might be able to do the job.

I agree, compiling Rust dependencies statically would be a good idea if such a language were ever implemented.

1 Like

Absolutely. I've heard the OCaml is actually a very good language but the ecosystem struggles a bit and it is a nightmare on Windows which really hurts adoption.

I've actually brought that up on the internals forum before:

Unfortunately it looks like it essentially isn't going to happen any time soon and maybe never.

I think that's part of the reason why transpile-to-JS languages (think typescript) are so popular these days. OCaml is fun to use and has great packages, but like any old language suffers from old design paradigms and package specifics that are seldom in use today. For a language to remain viable, It has to undergo a major update at least once every five years, and OCaml's package system hasn't been overhauled in forever.

I forgot who said it (I think it might've been about using Rust in the Fuschia microkernel) but I remember what I read about Rust's ABI. Their core argument went something like this:

For a language to do well at a systems level (i.e. OS), it needs to be able to interface with other applications. Because applications can be written in different languages, it's necessary for a protocol, such as an ABI, to allow interface between them. C's ABI is very stable which is why C is in use in most operating systems today. Rust's ABI is unstable and there are no plans to stabilize it, which makes it a poor candidate for OS level work.

I don't completely agree with this, but I do think that a static ABI (maybe bound to each Edition) would immensely benefit the Rust ecosystem.

1 Like

10 posts were split to a new topic: Modular ABI for Rust

@mbrubeck @BurntSushi can posts 56 and 57 be moved into a new topic? I don't want the ABI draft proposal especially to get lost 50 pages down an unrelated topic.

3 Likes

Done.

3 Likes

The differnce is more subtle I think.
Python use classes to declare objects, but most ot the time, what defines an object's behavior are 'protocols' eg. a set of methods the object implements, independently with the class inheritance (inheritance exists mainly to avoid code repetition).

So in python, dynamically resolved methods are a kind of generalization of the concept of traits

1 Like

I've been working with Python again lately, and I realized why I like it (and Rust) so much. I can't work with unformatted or inconsistently formatted code. Python enforces that at syntactic level both it and Rust has strong opinions about naming and formatting. So I can jump into a code base without having to wonder if this now is a struct or a function or a variable. Generally it's quite clear.

Edit: Forgot to say: it's generally similar to my experience with natural languages. I take about 3-4 times longer to writing which doesn't capitalize correctly, doesn't use punctuation at all or incorrectly and has lots of spelling mistakes. Analogously, if code isn't formatted consistently, I take a lot longer to "grasp" the basic meaning. So more time goes by before I can start figuring out what it's doing. That's what I meant which I find so nice about Python: it's a syntax error to format your code inconsistently.

1 Like

Your suspicion is right by counter example. My second favorite language, Kotlin, is not dynamically typed and has a REPL mode like Python.

I would love to have something similar in Rust, but if it required losing the ability to detect memory errors and data race conditions at compile time, I'll live without the REPL capabilities. Having a Rust REPL might make it easier for the novice programers to learn.

There has been some interest/work on a Rust REPL:

Another Rust REPL is evcxr:

I can't think if a dynamically typed and compiled language. Although I guess it could be done. Has it?

Common Lisp

2 Likes

There had to be one. Never used it.

... or have I? Does Racket count as Common Lisp?

I've used Python for years and I can't say I love it. I think it is very simple and approachable to beginners (myself, years ago), but as time went by I started creating more and more bugs in Python and spend a lot of time debugging my program because the cryptic error messages.

IMO Python had strayed off the "one-way-to-do-thing" path. With the introduction of decorators, lambdas, and comprehensions it had become a much more colorful language than it set out to be.

A lot of modern JavaScript has been growing on me. It's the worst language with the best comebacks IMO. If I had to sound in on Rust's companion scripting language, I'd go for Lua and JavaScript.

Also check out Dyon, a rust-based scripting language.

1 Like