Nutype v0.2 is released

Blog post: Nutype 0.2.0 is out! | Sergey Potapov (greyblake)
Github: GitHub - greyblake/nutype: Rust newtype with guarantees

Nutype is a library that allows to set and guarantee extra constraints on newtypes.

For example, the new version 0.2 supports regex, which allows the following:

use nutype::nutype;

#[nutype(validate(regex = "^[0-9]{3}-[0-9]{2}-[0-9]{4}$"))]
pub struct Ssn(String);

// valid SSN
let ssn = Ssn::new("123-45-6789").unwrap();

// Invalid SSN
assert_eq!(
    Ssn::new("123-45-67"),
    SsnError::RegexMismatch
);

It makes it impossible to obtain an instance of Ssn that violates the defined regex.

3 Likes

I like the crate and will likely use it in the future, thanks for your work!


One thing that stood out to me was that you don't (at least I couldn't find it) mention the term invariant anywhere in the docs or the README. Marking new_unchecked as unsafe implies that the constraints put on the inner value by validation form a safety invariant, which you should then be allowed to soundly exploit in unsafe code. (One example that comes to mind is #[validate(max_len=255)] and then serializing the string length as a u8).

Is this left unspecified on purpose or am I not allowed to rely on this invariant in unsafe code?

Hi! Thanks for your comment.

Indeed, the word invariant is not mentioned.

I've read through your message a few times, but failed understand the question fully. I haven't read yet the article about validity invariant, so maybe it will fill my gap.

But would you mind rephrasing your question?

Am I allowed to rely on the validation provided by this crate for soundness?

Let's say I write the following code using nutype (I can't think of a better example right now)

#![feature(vec_into_raw_parts)]

use nutype::nutype;

#[nutype(validate(max_len = 255))]
#[derive(*)]
struct ShortString(String);

impl ShortString {
    pub fn rebuild(self) -> Self {
        let rebuilt = unsafe {
            let (ptr, len, cap) = self.into_raw_parts();
            String::from_raw_parts(
                ptr,
                len as u8 as usize, // imagine it was serialized as a u8
                cap
            } // this is UB if len > 255
              // (not because of the memory leak, but the string potentially not being utf-8)
        };

        // SAFETY: len comes from a u8, making it impossible for this to be None
        unsafe { ShortString::new(rebuilt).unwrap_unchecked() }
    }

    pub fn nonsense(&self) {
        if self.len() > 255 {
            core::hint::unreachable_unchecked()
        }
    }
}

Both functions, rebuild and nonsense, are UB if the struct has a string with more than 255 bytes, one is just more subtle than the other. Are these functions sound?


I just noticed I accidentally mixed up validity and safety invariants, I edited the previous post and added a link that makes the difference clearer (imo).

It's a tricky example that I've never thought about.

Short answer: nutype will not repair the string if it's constructed with unsafe and has potential UB.

ShortString is mostly just a wrapper around String.

I'd like to note, that your example will not compile, because instantiating a value like

ShortString(rebuilt)

is not possible (that's one of the main points of nutype).

Instead you'd need to do

ShortString::new(rebuilt).unwrap()

Not, that ShortString::new(rebuilt) will return Err if rebuilt.len() > 255.
However, in this example it's not gonna happen, because len is obtained through u8.

Let me know, if this answer you question and if I can help you with anything else.

1 Like

Fixed.


That's clear to me, as a string with invalid UTF-8 is already UB. My question is whether I can rely on nutype's guarantees in unsafe code, such as invoking UB if the validation didn't turn out correct. Perhaps this also deserves a new thread.

You can rely on nutype's guarantees within unsafe { } as long as you don't use ::new_unchecked().

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.