Niche optimization, NonZero and `improper_ctypes`

Niche optimizations are very useful. The following types are all the same size as u32.

use std::num::NonZeroU32;

type A = Option<NonZeroU32>;
type B = Result<NonZeroU32, ()>;
type C = Result<(), NonZeroU32>;
enum NonNonZeroU32 {
    Zero,
    NonZero(NonZeroU32)
}

However, only the Option is marked as FFI safe by Rust's lint. Is the lint being overly restrictive here or is it true that even the simplest niche optimization cases are still subject to being undone? Because I've come to semi-rely on this optimization in Rust code so I hope it wouldn't break. It would also be useful to be able to use it for more than Option in FFI. At the very least for Result types.

The enum NonNonZeroU32 case works without a warning. It was fixed by this PR:

https://github.com/rust-lang/rust/pull/60300

Perhaps the author @mjbshaw could comment on whether it's feasible to extend this to Result-like enums, too.

Thanks, that's great! Having this guarantee extend to include Result like enums would be particularly useful. My biggest worry is that the optimisation was considered unstable.

Generally, if a type is allowed by the lint, this makes its layout part of the language-level guarantees, and thus the conceptual Rust specfication, instead of just an optimization that any implementation is free to implement or omit (or anything in between). This means that the language team has to sign off on doing this, at least.

While applying the optimization for the currently allowed Option-like enums is fairly simple (one variant's field cannot be zero, so zero is used to represent the dataless variant), for other cases it is less simple, so this looks like a not insignificant additional guarantee.

1 Like

I think the optimization could be reasonably applied to Result-like enums where one of the variants is a zero-sized, and I don't think it would be too hard to implement. I'd like to be able to use Result<NonZeroPointer, ()> in some FFI situations. I know I could just use Option<NonZeroPointer> but sometimes I'd like to have Result's semantic meanings.

But yes, as @jschievink mentioned, this is something that needs signoff from the lang team. It might be worth posting this to the internals forum to get some feedback, and if it's positive, submitting a PR that T-lang then votes on.

2 Likes

The default answer to "can the language guarantee ________ for me?" is "No." (Disclaimer: Not an official team statement, but certainly a defacto rule as far as I can tell.)

Option<NonZeroU32> and friends were a reasonable extension of the existing Option<Box<T>> and Option<&T> guarantees. But I'm not aware of any layout guarantees ever being made for Result, so this would have a fairly high hill to climb.

Especially for Result<NonZeroU32, ()>, which is something I would personally consider an anti-pattern. If you have (literally) nothing to say about an error, why not just use Option?

And while Result<(), SomethingNonZero> is more justifiable for polarity reasons, if one just needs to expose it in FFI one can make a thin wrapper that just calls .err() to get out an FFI-compatible type.

Plus, Result<Zero, NonZeroU32> seems like it'd lead inexorably to a request to make Result<PositiveI32, NegativeI32> work as a way of doing error codes, and any number of other related things.

6 Likes

The problem with using Option for Result<NonZeroU32, ()> is that Option has "Maybe" semantics and not "Error" semantics. This could be solved by using a custom enum type, except that Result is still special. Sure it's an anti-pattern but it's one we can thank K&R for so long as some C API still uses it.

A thin wrapper around FFI makes sense but can end up with a lot of thin wrappers that could just as well be handled automatically by the type system, instead of needing manual conversion for every function call.

4 Likes

While I agree that () is not better than "NoneError", the error type, even when zero-sized, can be more meaningful than () (and "NoneError").

For instance, Vec's .pop() method could have had the following signature instead:

struct VecIsEmptyError;

impl Vec<T> {
    fn pop (self: &'_ mut Self) -> Result<T, VecIsEmptyError>
    { ... }
}

Back to the topic at hand, although it is non-normative, Rust seems to be going in the direction the OP has mentioned:

One point remains unclear in this quote: does the discriminant elision apply to something like:

struct None;

enum MyOption<T> {
    Some(T),
    None(None),
}

It would be surprising if MyOption did not operate like Option, but from that it follows that Result<T, ZST> also has to operate like Option, since MyOption<T> = Result<T, None> ~ Result<T, ZST>.

3 Likes

Yes, it would be surprising, but the reference makes no such guarantees that Result<T, Zst> will have it's discriminant elided.

Result<T, Zst> has a field in both variants, so it is not an Option-like enum, so nothing is guaranteed.

2 Likes

I don't think it's unclear in the slightest. It directly says

the other variant has no fields

and your MyOption example has fields in both variants.

Given the disscussion so far, I would amend my opening post to read something like the following...


It would be useful if a single zero-sized type field in an enum could be treated as equivalent to a unit variant for the purpose of guaranteed layout. For example, if ZST is zero sized like the unit type (), then:

enum ResultLike {
	Variant1(ZST),
	Variant2(NonZeroI32)
}

Would be, for guaranteed layout purposes, equivalent to:

enum OptionLike {
    Variant1,
    Variant2(NonZeroI32)
}
1 Like

Given your restatement, would you now list some relatively-compelling reasons why that would be useful?

1 Like

In C APIs, one common way of representing an error is with an i32 (or other integer type) that is either zero or non-zero. The zero case represents an error. The non-zero case represents success. This could naturally be represented in Rust as:

Result<NonZeroI32, ()>

However this isn't currently FFI safe. Instead you can use Option<NonZeroI32> for this but Option does not have the error semantics that is intended by the API.

4 Likes

MyOption is indeed not included by the quote, but it is not mentioned as a counter example either, despite there being no reason whatsoever for MyOption and Option to have different semantics. So it is unclear whether it not being guaranteed for MyOption is an oversight or if there is an explicit reason behind it.


I find @chrisd suggestion for using Result<(), NonZeroI32> to be quite neat. I'd personally go for ignoring the lint in that case by adding a sanity check:

const _: [(); 1] = [(); unsafe {
    type Src = Result<(), ::core::num::NonZeroI32>;
    type Dst = i32;

    /// This is *very* unlikely to break
    const _: [();
        ::core::mem::size_of::<Src>()
    ] = [();
        ::core::mem::size_of::<Dst>()
    ];

    union Transmute {
        src: Src,
        dst: Dst,
    }
    
    // this cannot fail given that we can go from `&Err::<T, E>` to `&E`
    Transmute {
        src: Ok(())
    }.dst == 0
} as usize];
1 Like

I'll note that allowing a field, even a ZST field, starts to allow some weird things.

For example, here's a legal ZST:

enum Seven {
    Seven = 7
}

So should Result<Seven, NonZeroU32> guarantee this optimization?

4 Likes

Similarly, you could raise the alignment of a ZST

#[repr(align(64))]
struct OverAlign;

Should Result<OverAlign, NonZeroU32> have the same calling convention as u32?

5 Likes

Given over-alignment and other repr annotations, I suspect the guarantee would only say Result-like enums are Option-like when one payload is Unit-like.

So Result<Seven, NonZeroU32> would be an option-like result-like, because Seven is unit-like (size 0 align 1). Result<OverAlign, NonZeroU32> would not, because OverAlign is not unit-like because it has a nonminimal alignment.

Alternatively, it could only provide the guarantee for explicitly unit types (struct X;) (with #[repr(Rust)] and no other repr annotations) and not any other monostate type.

"All" the enum part of the guarantee would have to say is that a variant Variant(Unit) for any unit struct Unit is treated the same in layout as the unit variant.

It then falls out that Enum { A(()), B(T) } ≈ Enum { A, B(T) } and gets the option-like layout guarantees.

I think it makes sense to the user to guarantee the option layout for results with a unit variant. The actual impact on compiler (/specification) complexity is unclear at best, however.

4 Likes

Thank you scottmcm and KrishnaSannasi for noting those issues. Obviously then ZST is overly broad and even unit-like needs qualification. And thanks CAD97 for working through those details.

I don't mean at all to downplay how difficult it may be to fully specify. But I do think it would be useful if it can be.

I would argue no, and it should not be FFI-safe. It could (as long as the alignment of the ZST was <= the alignment of the other type), but we should start conservative. I would only allow this if we also started allowing #[repr(transparent)] on struct NewType(NonZeroU32, OverAlign);, which is currently rejected. These two are closely related in my eyes and I think we should be consistent in how we handle them, which today would mean Result<OverAlign, NonZeroU32> would not be equivalent to u32 or FFI-safe.

In other words:

IF

  • #[repr(transparent)] Tuple(T, E); is a valid, FFI-safe type for some T and E, AND
  • Either Option<T> or Option<E> (for whichever T/E is a non-ZST) is an FFI-safe type and has the same representation as Tuple

THEN

  • Result<T, E> should have the same representation and FFI-safety as Tuple (and by extension, the corresponding Option<T> or Option<E>)

I think this is something that can be reasonably understood, communicated, guaranteed, and implemented.

4 Likes

Maybe niche filling specifics should only be guranteed by language for FFI usage only if enum is annotated with some #[repr(optionlike)] instead of automatically based on its shape?

That would make it simple: no #[repr] => no FFI. Has #[repr] => look its docs.

1 Like