Say I have a type EmailAddress(String) whose constructor does validation/sanitization before returning a Result<Self, _>. This is great for data going into a database. But once in the database it is trusted. We don't want to use the constructor on the way out of the database, e.g. our validation rules may tighten over time, (making once valid database entries, now invalid).
So we want to trust that the &str or String coming from the database is safe, without allowing anyone to just construct an EmailAddress
Easiest solution is to just have an unsafe constructor that skips validation and call that on the DB retrieval code. You are, after all, just asserting that the DB retrieval is safe, and unsafe is the canonical way to do that.
You could also go through an intermediate type, or put the EmailAddress constructor on String via an extension trait, but these kinds of solutions ultimately depend on the DB retrieval code having private access to something that your EmailAddress-constructing users don't.
The critical question is whether you want library UB for violations, or if you "just" want violating the rules to be a logic error. It sounds, in this case, like you want the latter because you need weird data in the database to be returnable.
At that point, the general answer is just to make an associated function with a big ugly name.
But memory is a synonym for data. And I really like the unsafe approach, because the data returned from a database is just like data you may get from FFI.
That's not my understanding of what memory safety means, which is things like "use after free", accessing memory that is not defined ( outside the range of memory allocated ), and such-like.
[ I just checked, and both clippy and cargo will issue warnings if you mark a code block as unsafe when it isn't ]
I am not saying using unsafe in this way is definitively wrong, I would consider it confusing and un-ideomatic though, just my opinion.
It really depends what invariants the library guarantees. Typically, NonZeroU8 does not obviously refers to memory, but because the library docs say it has to be non zero, its new_unchecked method is unsafe. The same goes for str being valid UTF-8 for example.
Regarding the OP issue, it depends whether the "valid email" invariant is used to guarantee soundness afterwards, or exposed to the user with such a guarantee.
My understanding is that the memory safety of the standard library does depend on Strings having valid UTF-8 data. Whereas (I presume) this does not apply regarding the validity of an email address (it's possible it does, if there is unsafe code in use which assumes email addresses are valid, but this seems an unlikely scenario to me ).
Well it really depends what the library doc says. If it says the email is valid for example, I could expect EmailAddress to contain an @, and write some really optimized code:
The general: Please don't use unsafe unless you're actually depending on the invariant for memory safety elsewhere. Despite the standard library being quite strict about this (see how Box::leak isn't unsafe, for example) we still get many people who try the "but how bad could it be?" response when told that they're doing something UB. Any place that needs unsafe for just logical correctness actually weakens Rust's soundness propositions.
The specific: the OP specifically mentioned that the rules will tighten over time, so there's no guarantee that what's in the database actually meets the rules, and thus it's particularly inappropriate to say "I promise this is validation-clean" for this data from the DB.
These are both relied upon for memory safety. The only reason Option<NonZeroU8> can have size 1 is because it's immediately unsound to make a NonZeroU8 that holds a zero. And str depends on it for codepoint decoding (like .chars()) -- it can read off the end of a buffer if the last byte has the high bit set, for example, because that can't happen in valid UTF-8 where such a byte always has the correct number of continuation bytes after it.
error: Undefined Behavior: entering unreachable code
--> /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/hint.rs:51:14
|
51 | unsafe { intrinsics::unreachable() }
| ^^^^^^^^^^^^^^^^^^^^^^^^^ entering unreachable code
|
= help: this indicates a bug in the program: it performed an invalid operation, and caused Undefined Behavior
= help: see https://doc.rust-lang.org/nightly/reference/behavior-considered-undefined.html for further information
= note: inside `std::hint::unreachable_unchecked` at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/hint.rs:51:14
= note: inside `std::option::Option::<&u8>::unwrap_unchecked` at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/option.rs:820:30
= note: inside `core::str::validations::next_code_point::<std::slice::Iter<u8>>` at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/str/validations.rs:49:23
= note: inside `<std::str::Chars as std::iter::Iterator>::next` at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/str/iter.rs:44:18
= note: inside `<std::str::Chars as std::iter::Iterator>::fold::<(), [closure@std::iter::Iterator::for_each::call<char, fn(char) {std::mem::drop::<char>}>::{closure#0}]>` at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/iter/traits/iterator.rs:2170:29
= note: inside `<std::str::Chars as std::iter::Iterator>::for_each::<fn(char) {std::mem::drop::<char>}>` at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/iter/traits/iterator.rs:736:9
note: inside `main` at src/main.rs:5:5
--> src/main.rs:5:5
|
5 | s.chars().for_each(drop);
| ^^^^^^^^^^^^^^^^^^^^^^^^
It's also probably easier to justify if you think of the DB as arbitrary memory that can be accessed from anywhere. If you can safely decode an invalid email from the DB into an EmailAddress, then you can round-trip an invalid email back into the DB without validation. Then the DB is effectively corrupted, and who knows where that will have an impact.
That's quite clever, thanks! At first I thought that I'd be able to smuggle the token to the non-dangerous code via dangerous(|token| token), but compiler stopped me:
error: lifetime may not live long enough
--> src/main.rs:24:35
|
24 | let token = dangerous(|token| token);
| ------ ^^^^^ returning this value requires that `'1` must outlive `'2`
| | |
| | return type of closure is dangerous::Dangerous<'2>
| has type `dangerous::Dangerous<'1>`
Now the constructor that bypasses validation must accept one of these types as a parameter to the function.
I'd probably have to re-export the token in another package though to avoid cyclical dependencies (Database needs Email to create the EmailAddress, Email needs Database to ask for the AccessToken)
Edit: 25-04-2023: So honestly, I've come to regret this a little bit. It's really cool, and I put the constructor behind a feature flag so that tests in foreign crates can construct these types, but every so often running unit tests just breaks because for one reason or another the feature is not enabled correctly in one of my crates, and sometimes the feature is enabled when it shouldn't be etc. so it's just more effort to maintain, than it is to review bad uses of a from_trusted fn that doesn't ask you to supply a unconstructable PriviledgeToken