How to get a buffer type with the right size and alignment for a type?

In C++:

template<class T>
using buffer_t = std::aligned_buffer<sizeof(T), alignof(T)>;

After this sizeof(buffer_t) == sizeof(T) and alignof(buffer_t) == alignof(T), but buffer_t is just an array of char's with the right alignment, suitable for doing a placement new of T. I'm not sure how to accomplish the same in generic code in rust.

fn make_buffer<T>() -> [u8; std::mem::size_of::<T>()] {
    [0; std::mem::size_of::<T>()]
}

Results in:

error: generic parameters may not be used in const operations
 --> src/lib.rs:1:49
  |
1 | fn make_buffer<T>() -> [u8; std::mem::size_of::<T>()] {
  |                                                 ^ cannot perform const operation using `T`
  |

I would also need the alignment to be correct, and I'm not sure how to do that at all because I really doubt #[repr(align(std::mem::align_of::<T>()))] is gonna work.

My ultimate goal is I have a U: 'a that I want to hold in a struct by value, but with the lifetime erased. If the struct directly stores U it will get infected with the 'a lifetime (this is for building new primitives with unsafe -- I use other mechanisms to enforce the data can't actually be used after drop). So I want to have an array of u8s with sufficient alignment as a struct field, and manually move the U: 'a into that space by copying the bits. Then I can cast to hand out &U with lifetime tied to my struct.

You probably want to be calling the allocator directly, as in this example.

2 Likes

Alternatively, create a MaybeUninit and then work with pointers.

1 Like

I don't think I can use MaybeUninit here, because as soon as my generic struct takes U: 'a as a type argument it inherits the 'a lifetime, and erasing the lifetime is the goal. Really I want whatever you would use to get the array type I expect MaybeUninit needs for its own implementation.

I'm trying to get an array type I can hold as a field so that I can avoid heap allocating.

MaybeUninit<T> contains a ManuallyDrop<T>, which is still generic. Not sure how (or even why) you are planning to get rid of the lifetime. By the way, both types are magic (they are language items), which is needed for the compiler to grant them:

  1. the special property that values stored in ManuallyDrop aren't dropped automatically; and
  2. that MaybeUninit is allowed to be uninitialized.

[MaybeUninit<T>; N] is as close as you can get IIUC.

2 Likes

The best idea I can come up with off the top of my head is to declare your struct #[repr(align(8))] to minimize the possibility of an alignment mismatch, and then use [MaybeUninit<u8>]::align_to{,_mut} to get a properly-aligned transmuted slice.

But that certainly feels messy, and you might have trouble making your buffer exact-sized. If the type you’re trying to store might have a larger alignment requirement, you’ll also need to be careful about moves that might misalign the inner type.

1 Like

Say the only types I cared about supporting had alignment 1 and that I have defined this struct:

struct Holder<const N: usize> {
    buf: [u8; N]
}

I can now write:

impl<const N: usize> Holder<N> {
    unsafe fn move_inside<T>(&mut self, t: T) {
        std::ptr::copy_nonoverlapping(&t as *const T as *const u8, &mut self.buf[0] as *mut u8, std::mem::size_of::<T>());
        std::mem::forget(t);
    }
}

After calling move_inside my Holder<N> will store t but with type and lifetime information completely erased. I can later retrieve the object back out with casting (obviously the onus is on me to not create UB). The problem is this only works if I only care about alignment 1 and have worst case sized my N for all T that I may care about. I need a trait that given a T gives me the right holder...

trait GetHolder<T> {
    type HolderT;
}

impl<T> GetHolder<T> for T {
    type HolderT = Holder<{ std::mem::size_of::<T>() }>;
}

But this fails to compile for the same reason make_buffer does.

It gets you the right size and layout on the stack... but it's true you can't erase the type information (including alignment) and then safely move it around as bytes; You'd need it to be pinned to the stack and work with pointers. Except I guess along the lines @2e71828's suggestion of a maximum (assert!-checked) align.

This crate has only one version and few downloads, but the code seems reasonable (ignoring the high bar of getting anything useful back out soundly).

I don't think that is possible as-is in the current language, as AFAICT there's simply no way to get size_of and align_of in a const generic context, the compiler will refuse any non-trivial expression to be used as the count of an array type. (I don't know where this limitation comes from, but I remember having seen the same kind of question before and not having been able to implement it directly.)

However, you still didn't specify why you (think you) need this. It is likely that there is a substantially better alternative, rather than just going full BCPL. Are you trying to parse a binary zero-copy format generically? Are you trying to build a heterogeneous collection? Are you trying to build some sort of an executable image in memory? Erasing types and lifetimes shouldn't be the goal per se.

You can go the route of sticking in 'static as a placeholder, or you can try to work with for<'a> type/bounds. But fundamentally, if you're naming something which relies on a type with a lifetime in it, you need to introduce a lifetime somewhere.

But also an important note: you don't need for<'a, U: 'a> unless that 'a is used for something else. If you just have a generic without any lifetime bounds, the user can stick any type in there, including ones carrying lifetimes. The obvious example being Vec<&i32>.

'static bounds do get implied, but as far as I recall, only in dyn Trait (Box<dyn Trait> is Box<dyn Trait + 'static>, &dyn Trait is &'a dyn (Trait + 'a) IIUC) and sometimes with Self types of impl blocks without an explicit '_ (this one warns).

MaybeUninit (currently) isn't really any special. It's a lang item, but that's so that diagnostics can point at it, rather than imbuing any special semantics (currently).

MaybeUninit is (currently) just[1] a union; union MaybeUninit<T> { init: ManuallyDrop<T>, uninit: () }. (Disclaimer: being able to define MaybeUninit like tvis is relying on implementation details that aren't necessarily guaranteed for user code in the face of compiler updates yet.) There's no lifetime covering going on; you just have a union that's either T or nothing (()).

If the goal is to get an inline buffer sufficient for putting T in, use MaybeUninit<T>. If you want untyped storage, use heap allocation. There's no way to get statically sufficient storage for T without naming T.

(Very long term, I'd like to have a struct Buffer<const LAYOUT: Layout>, but that's a good way off from being reasonably possible. You can sorta almost get partway there with <const SIZE, const ALIGN> and { size: MaybeUninit<[u8; SIZE]>, align: <Const<ALIGN> as AlignZst>::ALIGNER }, but it's very much a hack not worth the complexity and limitations. And you still need to name T in order to know what it needs, anyway.)


  1. There are some further details, such as #[repr(transparent)] and whatever that means, but they're not related to the point I'm making. ↩︎

2 Likes

I was trying the way I was because I don't know how to do this generically. If I'm given some unknown type U is there a way for me to get "U but with all lifetimes converted to static"? E.g. if I have a generic function foo<U>(u: U) -> ??? what do I write in place of the ??? so that if I pass in a Iter<'a, T> I get out a Iter<'static, T>? Separate from how it would be implemented I'm asking how do I write that signature? If foo only operated on Iter instead of any type then it's easy.

What are you actually trying to do? This isn't a meaningful transformation on its own.

My best guess is that you're trying to take some arbitrary T and shove it through some API which requires T: 'static. You can't do this.

You can ask the caller to give you a 'static version of the type and a way to reinject a lifetime, like yoke does.

You could write the opposite of the Yokeable trait, so you have some

trait MakeStatic {
    type Output: 'static;
}

which you then require people to implement (or derive).

But there's no way to write a type-level transform that takes an arbitrary type and makes a 'static version of it. That's just not a well-formed ask.

1 Like

No. I suspect you're just working towards something unsound.

1 Like

To that point, the provider API soundly utilizes 'static TypeId tags to request both references and values from the same interface, but it necessarily uses separate entry points for request_ref and request_value. It's possible to extend the provider API to describe and process other lifetime-carrying structs, but each one needs to define their own 'static “tag” type and request_view entry point that uses that tag type. The fundamental reason is that &'static str could either be a request_ref::<str> or request_value::<&'static str>, and there's fundamentally no way to differentiate the two without the user specifying which they want. (This manifests as overlap between impl<T: 'static> SelectTag for T { type Tag = tag::Value<T>; } and impl<'a> SelectTag for &'a str { type Tag = tag::Ref<str>; } for 'a = 'static.)

It's very interesting to chase extending the provider tag system into a reflection system, but it's searching for a needle in a haystack of mines trying to find a useful formulation.

Implement an Rc supporting a Rc::map that can take F: Fn(&T) -> U (notice that it's U not &U). In other words I want:

fn get_iterator(v: Rc<Vec<i32>>) -> RcMap<Iter<'static, i32>>
{
    v.map(|v| { v.iter() })
}

AFAICT it is sound as long as the RcMap and the Rc share the same reference count. The RcMap staying alive will keep the Vec alive, so the iterator should stay valid. But so far I haven't found a way that compiles to write map while staying generic. The type map wants should be something like F: impl Fn(&T) -> (U + '_) or F: impl for<'a> Fn(&T) -> (U + 'a). Erasing the lifetimes was me trying to implement RcMap in a way where it didn't get infected by the non-static lifetime on the Iter that v.iter() is going to return.

It is unsound if you ever expose Iter<'static, i32> to user. Example UAF case assuming it have impl Deref for RcMap<T>

let rc = Rc::new(vec![1, 2, 3]);
let rcmap = get_iterator(rc);
let iter: Iter<'static, i32> = (*rcmap).clone();
drop(rcmap);

for n in iter {
    // n points to deallocated memory
}
4 Likes

This is, as far as I'm aware, not possible without a trait implementation on Iter. With such a trait implementation and not using Deref, it's possible, though significantly unwieldy.

(map is attach_to_cart and/or map_project, but note these still have some extremely subtle (but resolvable) soundness issues.)

fn get_iterator(v: Rc<Vec<i32>>) -> Yoke<Iter<'static, i32>, Rc<Vec<i32>>>
{
    Yoke::attach_to_cart(v, |v| v.iter())
}

struct Iter<'a, T>(slice::Iter<'a, T>);
// forwarding impls

// I don't think this can be derived, as that requires fields to be Yokeable
unsafe impl<'a, T> Yokeable<'a> for Iter<'static, T> {
    type Output = Iter<'a, T>;
    fn transform(&'a self) -> &'a Iter<'a, T> { self }
    fn transform_owned(self) -> Iter<'a, T> { self }
    unsafe fn make(this: Iter<'a, T>) -> Self {
        mem::transmute(this)
    }
}

You also have the limitation that the output iterator can't be mutated (iterated) in the yoke and has to be pulled out, so the utility of this is extremely limited, since calling .iter() is essentially zero cost for most iterable types.

let yoked = get_iterator(v);

// iterate
let it: &Iter<i32> = yoked.get();
for n in it.clone() {}
// ... or just
for n in v.iter() {}

// advance one
let mut n = None;
let yoked = yoked.map_project(|mut it, _| {
    n = it.next();
    it
};

A transform_mut which allows advancing the iterator is technically possible but even more difficult to use in order to ensure it stays sound. A get_mut interface is fundamentally unsound (see the linked docs as to why).

1 Like

The 'static in the generic argument is just to prevent the lifetime from infecting the RcMap. You would still want accessing the object through RcMap to go through something like fn get<'a>(&'a self) -> Iter<'a, T>. Which I guess means I need both directions in terms of swapping static and a real lifetime.

This interface requires cloning the Iter out of the self-referential container. That's why yoke gives essentially fn get<'a>(&'a self) -> &'a Y::Rehydrated<'a>; you can still clone the Yoked data, but you don't have to if you don't want to. yoke is designed first and foremost to support zero-copy deserialization patterns, e.g. yoking Cow-like views which may borrow from the cart or have an owned expensive-to-Clone version, e.g. JSON string deserialization can borrow from the source in the common case the source string doesn't contain escapes, but if the string has to be unescaped, an owned string has to be created; thus, Cow<'de, str>, and not wanting to clone it.

In the purpose-built use case, it's not just Cow; it's much larger optimistically-zero-copy deserialized views of encoded Unicode data tables for icu4x.

The point here, though, is that if you're cloning the view out to use it anyway, vec.iter() is a terrible example use case, because just calling vec.iter() is the same amount of work as iter.clone(). This holds for basically all by-ref iterators; no reasonable by-ref iterator involves any work to clone beyond data shuffling; essentially all by-ref iterators are just pointers into the owning structure, and creating those pointers is just more data shuffling.

The amount of data shuffling added by passing Yoke<Iter, Arc<Data>> and using yoke.get().clone() instead of just passing Arc<Data> and using data.iter() almost certainly outweighs any potential miniscule savings of using iter.clone() instead of data.iter().

If you can accomplish what you're trying to do with yoke, it's (almost certainly[1]) sound, and if you can't, it's (probably[2]) unsound.


  1. Usage is sound, except if it falls into the small soundness holes in yoke's API. All known holes are either a deliberate soundness attack (cart covariance) or fixable without primary API impact (by-value access to the cart, e.g. via replace_cart, is obviously unsafe but also unsound for unique carts like &mut or less obviously Box; the wrap_cart_with_* methods do this but could be rewritten to not cause UB for these), so while fixing these soundness holes is important and something we're working on doing (and in some cases, extending the language to make doing practical), it's unlikely to impact whether your usage is theoretically sound. ↩︎

  2. yoke has one big compromise — it only allows shared access to the cart payload. Obviously this is what you want for Arc carts, but this isn't a strict requirement of self-referencing if the cart can give unique access. However, it makes more things possible (e.g. shared access to the cart), as well as importantly mitigate the impact of the noalias soundness hole without a language way to make the cart dormant (I believe that while a unique cart causes Rust-level UB, specifically because yoke only allows shared access, LLVM UB also requires a shared-mutable cart payload, because the noalias semantics mean “no aliased mutation,” not “exclusive.” The unsoundness is still important to fix, but this one is probably benign for the time being.) ↩︎