Review of a language addition idea

I am relatively new to rust, so this may already be a thing in one way or another. I had an idea for an addition to the rust language, and was hoping to understand how good/bad of an idea it was.

I am calling my idea compile time type tagging. Essentially you can create a type tag for a specific type, and then tag that type when applicable. I think an example would be best...

This example creates a function to tag a string as ascii if it is ascii, and then any function that requires the type tag would be compile time checked for that tag.

Tag IsAscii for String;

pub fn check_ascii(self) -> Option<<IsAscii>self> {
    if self.is_ascii() {
        Some(<IsAscii>self)
    } else {
        None
    }
}
pub fn make_ascii_lowercase(&<IsAscii>mut self) {blah}

I do not like this syntax, but I was just trying to check the soundness of the idea. The ergonomics would need to be improved significantly.

I think this could be applied in a similar way to traits where you can create your own Tags and apply them to any types, or you can take an external Tag and apply it to your own types, but you can't apply an external Tag to an external type.

Existing functions could be tagged such that it does not undo the tag (since any mutable function would have to reset the type tag). I am not sure what this syntax would look like, but:

// This function could be tagged to indicate it does not change the tag state
pub fn make_ascii_lowercase(&mut self);

The goal of this type tagging is to create a sort of contract driven design that would have minimal impact on the runtime. The only runtime change would be the checks functions that would apply the tag to the type. It might also be possible to sometimes do this at compile time for simpler cases.

Thoughts? If the idea sucks I am fine with that. Just trying to learn by experimenting.

1 Like

This is what we call a newtype.

struct Ascii(String);
fn from_str(text: &str) -> Option<Ascii> {}
fn foo(text: Ascii) {}
8 Likes

Maybe I am misunderstanding, but I don't think this is what I am looking for. The newtype method does not protect you from modifying the string to contain non-ascii characters. Also you would not be able to access the existing string methods without going through the self.0.foo(). Also creating a whole new type for each contract constraint would make it very difficult to compose multiple constraints, where as you could create as many tags as you wanted.

It doesn't protect you, but it does protect the users of your code - you just need to restrict mutable access to the string (by making the field private and not providing any mutable accessors).

5 Likes

But then you don't have a string. The goal of what I was trying to come up with is to provide constraints on existing types. At the end of the day, I still want a string with all of the functionality, but I also want to be able to constrain that type for a more limited set of functionality based on a specific contract. I want to be able to modify the string, and then it would loose the tag until it was re-checked. This would allow the full use of the type as a normal string, but also enable compile time checking on a more constrained set of functionality.

Well, you can have an as_str method to provide &self -> &str conversion (or just implement AsRef/Deref/etc.), which will be enough for any read-only case. And everything mutable should go through the logic inside newtype anyway.

into_inner(self) -> String to the rescue.

10 Likes

I agree this would solve the issue of making sure the IsAscii type is always valid but.

  1. There are plenty of existing string functions that don't change the IsAscii status, even ones that mutate the string such as make_ascii_lowercase. The newtype solution would have to loose the IsAscii type with into_inner(self) in order to call this function

  2. You can't tag the type multiple times without convoluted code, so if you had multiple constraints such as IsAscii, NotEmpty, SomethingElse? (these aren't great examples, but I am trying to show the difference between composability of both approaches), then you would have to create a type for every combination of these constraints, where as with a tag system, you would just create rules for when to tag, and untag each Tag.

You can implement those functions on your Ascii type with a one-line implementation that delegates to the str function. You have to indicate which functions you want to provide, somehow.

In your example code you're already doing the same thing: you have a make_ascii_lowercase function.

10 Likes

Since you want an extensible set of properties for the types, it seems to me that the feature you want is called design by contract. If that is correct, that I can't see it happening in Rust in the foreseeable future. It's just too complex, and with too many unexplored design possibilities. Which functions should be annotated? All? That's hard to do even for the standard library, and unrealistic for crates.io. Which properties are important enough to annotate? I think different people would have different opinions. E.g. it's nice that make_ascii_lowercase preserves the property of the string being ASCII, but does it preserve length? Should we somehow encode "being lowercase" at the type level? How should the compiler check that the guarantees are satisfied? If you want an exact type match, you'll have lots of boilerplate and error-prone, if redundant, conversions in your code. Probably the compiler should do some reasoning to automatically deduce at least simple conditions, but at that point there are many possible ways to implement such inferences. Probably it would require to pull in something like an SMT solver, but that's way too complex to put into the Rust compiler. Even if we pulled in e.g. Z3, it would leave open the questions of ergonomics, proof composition and steering the solver's proof search.

If you really want to use function contracts, you can try one of the formal verification systems which attempt to implement such reasoning for Rust. Prusti allows you to encode pre- and post-conditions of functions, and to check symbolically that they are really satisfied. Creusot is an alternative verifier based on Why3 infrastructure. I don't have direct experience with either, so I don't know which one would be a better choice. The Rust Formal Methods group page may also contain other alternatives.

4 Likes

In a more limited form, refinement types[1][2][3] are also similar to your proposed concept of type tags. They are also very close semantically to the full contracts, and thus imho unlikely to be added to Rust, for the same reasons. That said, there are experiments on more limited pattern types (closed PR), which may eventually get into the language. It doesn't support your use case though, and it's very far away even if it's ever added.

If you just want to add some compile-time information for existing types, you can consider adding a marker trait and implementing it for the relevant types. Example from the rand crate:

pub trait CryptoRng {}

impl<'a, R> CryptoRng for &'a mut R where
    R: CryptoRng + ?Sized, 
{}

impl<R> CryptoRng for Box<R, Global> where
    R: CryptoRng + ?Sized, 
{}

impl<R> CryptoRng for BlockRng<R> where
    R: BlockRngCore + CryptoRng, 
{}

impl CryptoRng for ChaCha12Rng {}

impl CryptoRng for ChaCha12Core  {}

// etc

impl CryptoRng for OsRng {}

impl CryptoRng for StdRng {}

impl CryptoRng for ThreadRng {}

impl<R, Rsdr> CryptoRng for ReseedingRng<R, Rsdr> where
    R: BlockRngCore + SeedableRng + CryptoRng,
    Rsdr: RngCore + CryptoRng, 
{}

This allows a generic function fn my_fn<R: Rng>(..) to strengthen its trait bounds to R: Rng + CryptoRng if it can properly work only for cryptographically secure RNGs. For example, you need CSRNG to generate symmetric and asymmetric encryption keys.

A custom type MyType which supports variants with extra guarantees could be made into a generic type, like this:

struct MyType<T> {
    _ph: PhantomData<T>,
    // extra stuff
}

// Zero-sized types which are used only to encode compile-time guarantees
struct HasFoo;
struct HasBar;
struct HasFooAndBar;

impl<T> MyType<T> {
    // put unconditional methods here
}

impl<T> MyType<T> {
    fn ensure_bar(self) -> Option<MyType<HasBar>> { todo!() }

    fn ensure_foo(self) -> Option<MyType<HasFoo>> { todo!() }

    fn ensure_foo_and_bar(self) -> Option<MyType<HasFooAndBar>> { todo!() }
}

impl MyType<HasBar> {
    // methods which are valid only when an instance of type has Bar
}

// etc

I wouldn't recommend it as a common solution to encoding invariants, but it does have its uses. Examples of crates which use this approach for compile-time metaprogramming are typenum and generic-array. I have also seen it in the wild used by more concrete crates working with concrete types (rather than doing metaprogramming), but I can't recall any specific examples at this moment.

While you can't use this approach directly with foreign types, you can always make your own generic wrappers over them. Unfortunately, this approach isn't particularly composable w.r.t. multiple different conditions.

3 Likes

If you want to see the (nightly-only for now) way to do this, see https://doc.rust-lang.org/std/?search="as_ascii".

The problem with tags, as I see them, is one of the following:

  • You can only call methods that match the tags, at which point it's just a different syntax for a newtype, since you're going to have to forward everything anyway.

  • You can also call methods that don't match the tags, in which case it's useless for your ascii example, since I could still call .push_str("☃") on a &mut<IsAscii> String.

If you want a realistic shot at a feature getting added, you should have it build as much as possible on existing things.

So maybe your problem is "gee, compositionality of repr(transparent) newtypes is awkward sometimes", and thus rather than making tags, you want a way that someone still makes a newtype, exactly like they do today, but there's some feature (I have no idea what it would be) where for some Foos and Bars you can mix Foo<Bar<T>> and Bar<Foo<T>> more easily.

9 Likes

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.