Is there a Rust equivalent of an ML "signature"?

In ML-family languages, there is a concept of an "implementation" and its "interface". The interface is often called a "signature" (the implementation being called a "structure"), and the idea is that the structure is type-checked, producing a "type" that must match the signature.

This allows an implementor to write their signature and then as they change their implementation, know that they're not breaking that signature -- which is the contract with the consumers of the implementation. Of course, when it comes to optimizing compilation, the compiler peers past the signature, but for type-checking, it is never necessary.

I'm new to Rust, but from what I can tell, it doesn't have this concept. Am I mistaken?

1 Like

In Rust, the line [access-moifier] [extern modifier] [ABI-modifier] fn function-name<type parameter>(arguments) -> return-type is the signature, and hence the interface.
The implementation is would be the body of the function, and the return type of the body must be the return type of the function.
But yes, unlike ML (I suppose, I don't know ML), I don't think people in Rust say like that.

2 Likes

For a single type Rust doesn't really have that, but for generics that sounds like traits: https://doc.rust-lang.org/stable/book/ch10-02-traits.html

2 Likes

So, taking OCaml as a typical example of an ML language, the "signature of an API" you are talking about could be something like:

API

module type Stack = sig
  type 'item collection
  val empty: 'item collection
  val push: 'item collection -> 'item -> 'item collection
  val pop: 'item collection -> ('item * 'item collection) option
end

which defines an API called Stack.

Transposition from OCaml to Rust terms
  1. type 'item collection

    with a collection (associated[1]) type, which is generic over some 'item parameter (In OCaml, the "lifetime syntax" of 'identifier signifies a type parameter[2]; also, generic parameters, in OCaml, go before the generic thing: t option rather than Option<T>, for instance).

    trait Stack {
        type Collection<Item>;
        …
    }
    
  2. val empty: 'item collection

    with a empty (associated) constant (val) of (generic) type Collection<Item>:

    trait … {
      …
      /// Pseudo-code, in Rust.
      const EMPTY<Item>: Self::Collection<Item>;
    }
    
  3. val push: 'item collection -> 'item -> 'item collection

    with a push (associated) constant whose (generic) type is that of a function signature (in Rust, this would be an associated function), with the signature being written in a currified fashion:

    trait … {
      …
      fn push<Item> (
          _: Self::Collection<Item>,
          _: Item,
      ) -> Self::Collection<Item>
      ;
    }
    
  4. And ditto for pop (T * U is the OCaml syntax for a tuple type; in Rust: (T, U)).

An implementor / implementation

Now, we can try to define an implementation of this:

(*   our implementor : the API it implements *)
module ListStack     : Stack                   = struct
  type 'item collection = 'item list
  let empty = []
  let push list new_head = new_head::list
  let pop = function
    | head::tail -> Some (head, tail)
    | [] -> None
end

with an implementation using the very pervasive in ML languages data structure: a cons-list.

  • for short, in Rust parlance: () is the empty list, otherwise lists are defined as something like (head, tail), with head being the first item of the list, and tail being a sub-list with the remaining items. That is, the sequence, 42, 27, 0, in a cons-list, with Rust tuples, would be expressed as:

    (42, (27, (0, ())))

    In OCaml, the (head, tail) list-packagin is done with the :: operator, both for constructing and destructuring, and the empty list is represented as [].

If we were to forget some item, or implement it with an incorrect signature, we'd get an implementation mismatch error. For instance, reversing the order of args in push, we get:

Error: Signature mismatch:
Modules do not match:
  sig
    type 'item collection = 'item list
    val empty : 'a list
    val push : 'a -> 'a list -> 'a list
    val pop : 'a list -> ('a * 'a list) option
  end
is not included in
  Stack
Values do not match:
  val push : 'a -> 'a list -> 'a list
is not included in
  val push : 'item collection -> 'item -> 'item collection

In Rust

The "generic const" thingy is not directly expressible, in Rust (we'd need a helper trait for that); and the generic associated type requires nightly and would needlessly complexify the API here. Rather, we're gonna move genericity from the associated items up to the OCaml Module / Rust trait; which actually makes more sense (the reason this is not done in OCaml is that their Module genericity only works over other Modules (the whole thing being called a Functor), and that lifting a type to a Module is cumbersome, so it ends up being way more convenient to just be generic at the associated item level.

But in Rust we'll directly and properly be generic at the trait level:

trait Stack<Item> {
    type Collection;
    const EMPTY: Self::Collection;
    fn push (_: Self::Collection, _: Item) -> Self::Collection;
    fn pop (_: Self::Collection) -> Option<Item, Self::Collection>;
}

Note: OCaml modules don't have Self, so we need an associated item for it. To be completely idiomatic in Rust, that Self::Collection associated type could direclty be the implementor of the trait: let's replace Self::Collection with Self:

trait Stack<Item> : Sized {
    const EMPTY: Self;
    fn push(self, _: Item) -> Self;
    fn pop(self) -> Option<(Item, Self)>;
}

and the implementation:

enum List<T> {
    Nil,
    Cons(T, Box<List<T>>),
}

impl<Item> Stack<Item> for List<Item> {
    const EMPTY: List<Item> = List::Nil;
    fn push(self: List<Item>, item: Item) -> List<Item> {
        List::Cons(item, Box::new(self))
    }
    fn pop(self: List<Item>) -> Option<(Item, List<Item>)> {
        match self {
            | List::Nil => None,
            | List::Cons(head, tail) => Some((head, *tail)),
        }
    }
}

And now, regarding the OP question: what if we had the implementation violate the desired API? We'd get an error as well:

error[E0053]: method `push` has an incompatible type for trait
  --> src/main.rs:16:23
   |
14 |     impl<Item> Stack<Item> for List<Item> {
   |          ---- this type parameter
15 |         const EMPTY: List<Item> = List::Nil;
16 |         fn push(item: Item, this: List<Item>) -> List<Item> {
   |                       ^^^^
   |                       |
   |                       expected enum `List`, found type parameter `Item`
   |                       help: change the parameter type to match the trait: `List<Item>`
   |
note: type in trait
  --> src/main.rs:5:20
   |
5  |         fn push(_: Self, _: Item) -> Self;
   |                    ^^^^
   = note: expected fn pointer `fn(List<Item>, Item) -> List<_>`
              found fn pointer `fn(Item, List<Item>) -> List<_>`

Testing it

Now, we can even write a test for this "generic" API.

OCaml

Generic test:

(* Convenience stuff: *) 
(* rather than doing `foo (bar (baz …))`, write `foo @< bar @< baz …` *)
let (@<) f arg = f arg 
(* assert_eq helper *)
let assert_eq x y = assert (x == y)
(* non-keyword assert (to play with @<) *)
let assert_ predicate = assert predicate
    
module TestStack(S : Stack) = struct
  let test_it () = (
    let l = S.empty in
    let l = S.push l 42 in
    let l = S.push l 27 in
    let (fst, l) = Option.get @< S.pop l in
    assert_eq fst 27;
    let (snd, l) = Option.get @< S.pop l in
    assert_eq snd 42;
    assert_ @< Option.is_none @< S.pop l;
  )
end

Running it against our ListStack:

module TestListStack = TestStack(ListStack);;

TestListStack.test_it();;

Rust

Generic test (we have to pick the integer item type in advance, rather than inside the test's body):

struct TestStack<S : Stack<i32>> /* whatever: */ (*mut Self);
impl<S : Stack<i32>> TestStack<S> {
    fn test_it() {
        let l = S::EMPTY;
        let l = l.push(42);
        let l = l.push(27);
        let (fst, l) = l.pop().unwrap();
        assert_eq!(fst, 27);
        let (snd, l) = l.pop().unwrap();
        assert_eq!(snd, 42);
        assert!(l.pop().is_none());
    }
}

Running it against our List:

type TestListStack = TestStack<List<i32>>;

TestListStack::test_it();

Note that there is a level of extra boilerplate here, in Rust, because OCaml couldn't feature a function generic over a Module, only "Modules" can (then called Functors), thence the TestStack helper Functor, which is the one providing the non-generic testing function.

I've tried to replicate the resulting semantics in Rust, but given that Rust does not have the distinction between traits (module signatures) & impls (module struct), and types, a generic function can then take a type parameter expected to impl the given trait.

  • Thence yielding the following simplification:

    fn test_it<S : Stack<i32>>() {
        let l = S::EMPTY;
        let l = l.push(42);
        let l = l.push(27);
        let (fst, l) = l.pop().unwrap();
        assert_eq!(fst, 27);
        let (snd, l) = l.pop().unwrap();
        assert_eq!(snd, 42);
        assert!(l.pop().is_none());
    }
    
    test_it::<ListStack<i32>>();
    


  1. in Rust parlance ↩︎

  2. with the abandoned in_band semantics Rust considered featuring at some point ↩︎

14 Likes

This looks like a typo, did you mean *mut S? Also, the playground link after this code is not clickable for me.

1 Like

[again, I'm just starting out with Rust; I've been programming ML language since Standard ML in 1989 and caml-light since 1991, so I hope you'll forgive my lack of knowledge]

This is good: a "trait" is the "interface" for a "type". But

  1. this only works for types that can have traits. Not for modules.

  2. When accessing the actual implementation, one does so via the name List, not Stack.

The idea with an ML signature, is that you don't need to have access to the implementation, in order to write your code, period.

Are these things possible in Rust ? And sure, the answer may be "nobody should need such a thing". But that's really the same as "no".

1 Like

There’s no direct analogue to ML signatures or C headers. This does limit separate compilation quite a bit.

Though, when programming Rust, I try to keep human benefits of signatures in mind. I try to structure my source code such that API fits concisely into one file, and try to push implementation details elsewhere.

Folks generally tend to rely on cargo doc for teasing apart impl details from public API, but I generally try to optimize source code layout such that cargo doc is not needed.

5 Likes

I guess one could create a crate, module or even single method that had the required signature but no code in it except returning a dummy value of the required type. That at least would allow on to write code against the signature(s), the API.

Come time to actually build a working program substitute the dummy with the real code that provides the API.

This would would allow one to write code against the given signatures without the code behind the signatures.

Why would one want to do that?

Thanks for reporting this; I must have forgotten to perform the copy-paste :sweat_smile: Luckily all the components were in the post, so reconstructing it was easy :upside_down_face: : Playground

Here *mut S would have worked as well, but I've personally gotten in the habit of just using *mut Self (which works!) to get Rust to shut up about the unconstrained variance of the parameters, without having to remember / repeat the generic params:

- struct Foo<S> (
+ struct Foo<'lt, S, Added> (
       *mut Self,
  );

This is useful in cases such as this one, where the type itself is just playing the role of a "(Rust) module with generic parameters" (thereby leading to the type never being instanced and variance not mattering).

Other examples

The typical example of it being:

pub struct ptr<T : ?Sized> (
    *mut Self,
);

impl<T : ?Sized> ptr<T> {
    pub const NULL: *mut T = ::core::ptr::null_mut();
}

so as to be able to use ptr::NULL everywhere :upside_down_face:


or when playing with hand-rolled Higher-Kinded Types and so on:

struct Ref_<T : ?Sized> (
    *mut Self,
);

impl<T> HKT for Ref_<T> {
    type T<'lt> = &'lt T
    where
        Self : 'lt,
    ;
}
1 Like

True! Contrary to OCaml's modules, and file APIs which yield implicit Modules as well, in Rust a module is just a namespacing and privacy boundary which cannot declare just the API as an external file (basically all what @matklad said is on point :100:; but I prefer to confirm this myself as well, given that my initial post forgot to cover it (which could be perceived as dismissive, I guess)). So, as Matklad mentioned, you can try to articulate the code to keep most of the API in one file, and the implementations and logics in another, but there is no hard constraint to do so.

That being said, from experience, in Rust most APIs are associated to types; free functions being there mostly to act as entrypoints or global wrappers. There are thus not many free functions with unconstrained APIs; and the associated type functions or APIs can often[1] be abstracted behind a trait, to then use:

to still get the ergonomics of inherent impls. This could be another very rigurous approach to keep the API / impl split more in mind; but truth be told I don't think I've seen people do that in practice :sweat_smile:


  1. modulo -> impl Trait… basically ↩︎

TIL that you can use Self in type definition, thanks :smile:

1 Like

In the ML style of development, a package publishes both .ml and mli files. The .mli file contains the interface that is implemented by the .ml file. As a developer, if I publish a new version of a package that does not change any of its .mli files from the previous version, I can publish it with the knowledge that users of this package cannot incur build-errors by upgrading (perhaps by only bumping the minor release number or patchlevel). And if I do change the .mli file, I can warn users of the package (perhaps by bumping the major version-number).

For sure, one can almost always delete all the .mli files and still build a project just fine. But this isn't just about "information hiding" -- otherwise, pub/private would be enough. It's about giving a precise specification of what consuming packages can rely upon. And (at least in OCaml) that specification can be generated from the implementation (nor is it mandatory).

Last: one doesn't switch between an empty version of a package (the "specification") and the real version (the "implementation"). The interface is part of the package, and a publisher can check that the interface has not changed by just checking that the file hasn't changed.

3 Likes

OK. Got it.

Could be useful. Can't tests in Rust be used to the same effect?

The visibility aspects of ML signatures are also interesting, since they don't impose binary a pub/private visibility, if you've ever had two crates in a workspace, and want to share code between them beyond the public interface. For instance for use in the tests of two crates in the workspace but publicly export a different signature, that is another thing. I am under the impression that rust avoided this intentionally for documentation reasons, but it is a feature that i have lamented not having multiple times.

Indeed, tests can provide similar coverage, by failing to build when the interface changes. But I'd note that even back when TDD was starting to spread thru the industry (e.g. with Ruby-on-Rails), it was widely-acknowledged that a lot of tests were there to check for type-correctness. Rust is statically-typed, so you don't need to do that within crates: the compiler does it. But between crates, there's nothing besides just "compile both crates and see if something breaks". And sure, that works, but the purpose of "interface specifications" is that you don't have to check -- you know that if the interface file doesn't change, then any downstream consumers won't be affected (well, modulo bugs/behaviour).

1 Like

How about having a test crate that tests the crate under development. Change either and the build will fail.

Sure, that'll work. But you have to -write- that test crate, whereas with an interface, it can be automatically generated from the implementation. As @ratmice notes, this allows you to finely control what aspects of the implementation are made visible outside the crate.

The ML solution sounds better, more reliable, in that it sounds like it was designed into the language, but Rust has a tool that tries to check for inadvertently introduced breaking changes:

I wonder why Rust doesn't have such interface descriptions. The early Rust team surely would have known of the possibility. I imagine the answer might lie somewhere in the depths of the ancient mailing list archives.

I only vaguely recall it being about documentation, etc. If we imagine that rust does have signatures,
and that they directly map to structures in 1:1 correspondence, where ML structures and modules can have multiple signatures for a structure, it becomes more clear why you might want to make such a trade off. Basically for simplicity. And once you have that 1:1 correspondence, signatures can seem pretty redundant.

At least that is how I look at it, they traded some flexibility for lower complexity, to lower the barrier to entry, because honestly even in MLs it seems typically to be the case that there is a 1:1 correspondence between a given structure and a single signature, even though there could be others with more or less fields..

Note that it does have them, in a way, they're just not written by humans. Pipelined compilation works by emitting an interface description for a crate before it's completely done, so that the next one can start.

One likely difference between ML and Rust is that ML is generally more type-erased. Traditional ML generics are polymorphic, not monomorphized, so less detail is needed in signature files. As you can see in C++, if you need all the struct fields, inlined functions, and templates to be in the interface file, then the separation rapidly becomes more of a pain than a value.

2 Likes