Safe macro constructor for "unsafe" type?

Hello,

I’m working on a data type for which certain invariants need to be observed upon construction. I have written a macro that ensures this statically. Is there a way to allow users of my crate to use this macro, but disallow them constructing that type otherwise? I believe that this is not possible, because a macro just expands to some code, but perhaps I’m overseeing something. Or perhaps there is another solution to my problem?

What follows is a detailed description.

The data type is supposed to efficiently represent a sequence of subsequences of symbols. The symbols are dumb and pointer-sized, so that in order to represent the subsequence “a, b” followed by the subsequence “c” the following representation in memory seems appropriate: “2, a, b, 1, c”. (The numbers are the lengths of the subsequences.)

My motivation for this representation is keeping all the bits together in memory.

Let’s say that I have a macro that when used like this

let in = seqofseq![a b; c];

expands to

let in = SeqOfSeq([
    Element { len: 2 },
    Element {
        symbol: Symbol::new("a"),
    },
    Element {
        symbol: Symbol::new("b"),
    },
    Element { len: 1 },
    Element {
        symbol: Symbol::new("c"),
    },
]);

where Element is either a symbol, or the length of a subsequence

pub union Element {
    symbol: Symbol,
    len: usize,
}

and SeqOfSeq is a slice of Elements

pub struct SeqOfSeq<T>(T)
where
    T: ?Sized,
    T: AsRef<[Element]>;

All of this works within a single module. The macro uses some magic to count the lengths of the subsequences, so that seqofseq![a b; c] compiles down to a static literal, or at least something simple and efficient.

But if I make that macro available outside of the defining module, I also have to allow the literal to which it expands. But then I can no longer ensure that the lengths will be correct and the whole thing becomes unsafe.

The alternative seems to be to provide a safe API for construction of such sequences-of-sequences. But I do not see how to do this in an efficient and convenient (for the user) way.

Thanks!

this is not possible, no matter what you do, the code that the macro generated will be in the user's crate, so it must use pub items of your own crate.

maybe the best you can do is mark the constructor #[doc(hidden)], which hides the item from the documentation, and rust-analyzer will ignore the item for autocompletion.

besides #[doc(hidden)], another trick people commonly use is use some "ugly" and suggestive name for the pub item or mod to reduce the chance of accidental use by the user, something like:

#[doc(hidden)]
pub mod __macro_internal_do_not_use_directly {
    /// SAFETY: dot not use this directly, use the macro `create_foo_safe!()` instead
    unsafe pub fn create_foo() ->  super::Foo {
        todo!()
    }
}

Thanks. As far as I can see there’s no way to require the use of unsafe around a SeqOfSeq literal? (That would be something like “unsafe types” but that does not seem to exist.)

If it existed, then my macro could expand to unsafe { SeqOfSeq(...) } and it would be documented that the safety of this has been checked.

Otherwise, I could provide some unsafe API for construction of these SeqOfSeqs, but given that each subsequence can have a different length I don't yet quite see how to do this efficiently.

no, unsafe can only be used with functions (and traits), not types.

I think it's enough to leave the constructor unsafe, possibly with a warning message in the documentation, and suggest the user to use the macro to construct to value:

/// SAFETY: the safety requirement is ..., normally you should use `construct_foo_safe!()` instead
unsafe fn construct_foo(seq: [Element]) -> Foo {
    todo!();
}

#[macro_export]
macro_rules! construct_foo_safe {
    ($arg:exp) => {{
        const safe_seq: ... = do_some_check($arg);
        // SAFETY: the condition is checked
        unsafe {
            construct_foo(safe_seq)
        }
    }};
}

I don't know how idiomatic it is or isn't, but you can fake it.

Maybe someday.

Your example expansion is already creating a SeqOfSeq::<[Element; 5]> (Rust doesn't have unsized locals so far, you have to unsize behind a pointer). And while I wouldn't be surprised if nominal tuple type constructors have some bespoke optimizations, they're at least notionally also function items.[1] So I don't know that there's really much loss here.


  1. You can't let _ = SeqOfSeq::<[()]>([][..]) since you can't pass unsized values, for example. ↩︎

Indeed, that’s what I ended up doing. Originally, I wanted to construct the type “directly” inside the macro, without even using a constructor. But this was silly (see below).

That's one clever trick.

Ah, you mean that I could just add

impl<T> SeqOfSeq<T>
where
    T: AsRef<[Element]>,
{
    pub unsafe fn new_unchecked(elements: T) -> Self {
        Self(elements)
    }
}

and then make the macro expand to

{
    let elements = [...];
    unsafe { SeqOfSeq::new_unchecked(elements) }
}

Well, this is what I tried and it seems to solve my problem perfectly! I’m not sure why I didn’t come up with this solution myself. I guess because of the T: ?Sized bound that I need for some other functions, but here it must not be present. But as you say this is not a limitation.

In your playground example you use #[repr(transparent)]. Should I also use it? From looking it up my impression is that it’s only relevant for FFI.

It's relevant when you need a guaranteed layout. For example, if you needed to use unsafe to go from [Elements] to SeqOfSeq<[Elements]> for some reason.

If built in unsizing coercion is good enough for you, you may not need it.