Converting between (unsized) transparent types safely in place

I have a type OrderedVecSet which is a transparent wrapper around Vec<T>:

#[repr(transparent)]
struct OrderedVecSet<T>(Vec<T>)
where
    T: Ord;

It obviously has the same layout as Vec<T>, but with the library-level guarantee that the elements are in ascending order and without duplicates.

Since I want to use Rc<OrderedVecSet> a lot, I would like an unsized version to reduce the number of indirections:

#[repr(transparent)]
struct OrderedVecSetUnsized<T>([T])
where
    T: Ord;

I'm struggling to work out how to convert between OrderedVecSet<T> and Rc<OrderedVecSetUnsized<T>> using only safe Rust. My current implementation is below, but uses std::mem::transmute, which I would like to avoid if possible. Also, as I understand it, the version below reuses the original allocation, and I would like to keep it that way.

use std::mem;
use std::rc::Rc;

impl<T> OrderedVecSetUnsized<T>
where
    T: Ord,
{
    fn from_slice_unchecked(slice: Rc<[T]>) -> Rc<Self> {
        // SAFETY: OrderedVecSetUnsized is a transparent wrapper around a slice.
        // (Actually, this is not safe.)
        unsafe { mem::transmute(slice) }
    }
}

impl<T> From<OrderedVecSet<T>> for Rc<OrderedVecSetUnsized<T>>
where
    T: Ord,
{
    fn from(set: OrderedVecSet<T>) -> Self {
        let rc: Rc<[T]> = set.0.into();
        OrderedVecSetUnsized::from_slice_unchecked(rc)
    }
}

Can it be done in safe Rust? (Stable Rust is preferred, but if necessary, solutions requiring unstable are also fine.)

1 Like

It does not. Vec<T>Rc<[T]> is a conversion that can't easily be done in-place because the reference counts are stored in the same allocation with the data, so From<Vec<T>> for Rc<[T]> makes a new allocation and copies the slice into it. source

2 Likes

This is UB. repr(transparent) doesn't do much on generic structs. For example, transmuting Vec<i32> into Vec<u32> is UB as they are different types and neither have defined memory layout.

2 Likes

Ah, you're right. That's a shame, but I guess it's unavoidable. I might have to rethink the construction of Rc<OrderedVecSetUnsized<T>> then, since I was just going to construct OrderedVecSet<T>s and convert them. Maybe the unsized values will be more trouble than they are worth.

And this is exactly the sort of reason I wanted to avoid unsafe Rust in the first place. Admittedly, this was just a placeholder since I am still trying to find a safe substitute before resorting to unsafe code, so I hadn't yet read the Rustonomicon section linked in the docs yet (which I see contains your exact example). Now this brings up the question of how to do this with unsafe code, because evidently I don't know how to do that either.

1 Like

I think part of this can actually work.

You can't prevent the copy. As @trentj says, Rc stores the strong and weak counts in-line within the allocation, so Rc<[T]> and [T] have different layouts. But you can convert Rc<[T]> to Rc<OrderedVecSetUnsized<T>> without copying. Or at least the documentation for Rc::from_raw seems to indicate that.

It says

The raw pointer must have been previously returned by a call to Rc<U>::into_raw where U must have the same size and alignment as T . This is trivially true if U is T . Note that if U is not T but has the same size and alignment, this is basically like transmuting references of different types. See mem::transmute for more information on what restrictions apply in this case.

So instead of transmuting between Rc<[T]> and Rc<OrderedVecSetUnsized<T>>, which is UB(? see footnote), you can use Rc::into_raw and Rc::from_raw to effectively transmute between &[T] and &OrderedVecSetUnsized<T>. And because OrderedVecSetUnsized is simply a repr(transparent), new-type wrapper around [T], it is guaranteed to have the same layout.

You could change your implementation to

impl<T> From<OrderedVecSet<T>> for Rc<OrderedVecSetUnsized<T>>
where
    T: Ord,
{
    fn from(set: OrderedVecSet<T>) -> Self {
        let rc: Rc<[T]> = set.0.into();
        unsafe { Rc::from_raw(Rc::into_raw(rc)) }
    }
}

I can't see anything that would make this unsound, but maybe I'm missing something?

Footnote

As a separate question, is it actually UB to transmute between Rc<T> and Rc<U>, where U is a repr(transparent) wrapper around T? Rc only contains two fields:

  • NonNull<RcBox<T>>, which points to a repr(C) struct
  • PhantomData<...>, which is zero-sized

There is only one non-zero sized field, so the actual memory layout is fixed. I suppose Rc reserves the right to add new fields, which justifies the UB label. But practically, the whole point of Rc is to act as a single pointer to the RcBox, so I can't see why anyone would ever actually do that. Unfortunately, UB is still UB, so you shouldn't do it. But it seems like you could probably get away with it if you were a real maverick. I guess they provide into_raw and from_raw to avoid this exact problem.

Edit: I just noticed the comment on RcBox:

// This is repr(C) to future-proof against possible field-reordering, which
// would interfere with otherwise safe [into|from]_raw() of transmutable
// inner types.
5 Likes

Yes, and not just for backcompat reasons - as recently mentioned in #[repr(transparent)] – Why? wrapping something in a structure can change how it's treated in ABI. For Rc<T> to be soundly transmuted to Rc<U>, Rc<_> itself would need to have an appropriate repr, which it does not.

from_raw and into_raw are indeed the exact mechanism by which you should do this, and they work reliably in this case because RcBox<_> is repr(C).

1 Like

@trentj, follow up question then.

For the following, assume U is a repr(transparent) wrapper around T.

Even if transmuting from Rc<T> to Rc<U> is officially UB, is there any possible case you can think of where the result would not be what you would expect? Every transitive field of Rc<U> is either repr(transparent), repr(C), or zero-sized. And the top-level repr(rust) struct contains only a single pointer. In this specific case, I don't see any way Rc<T> and Rc<U> could possibly be treated differently by the compiler, either at the layout level or the ABI level. Am I missing anything?

1 Like

This is bad logic. In previous versions of rustc, struct Foo(u8, u16, u8); was always identical in layout to that same thing with repr(C). But then someone implemented field reordering, and it wasn't any more.

It doesn't matter how it's currently implemented. If you want to rely on something, make a PR to the docs for libs-api to FCP to add that guarantee.

(We've also been adding more intentional breakage related to this, too -- you might have seen that transmute will now abort your program in certain situations that used to "work", for example.)

It sounds like what you really want is what GitHub - rust-lang/project-safe-transmute: Project group working on the "safe transmute" feature is working towards.

4 Likes

@scottmcm, my question isn't really about what is safe to do. I acknowledge it's definitely UB and has to be in the general case, for all the reasons listed. I also don't really want anything. The correct way to do this transformation is with into_raw and from_raw. That's clear.

My question is really to understand if I'm fundamentally missing a piece of my mental model. In this particular case and with this particular version of std, it seems to me that the compiler really has no reasonable alternative other than binary compatibility between Rc<T> and Rc<U>. The only way it could give anything different would be deliberate breakage, which I didn't realize would actually happen in some cases. Is that correct? Or am I missing some other aspect of the problem?

1 Like

In other words, "Even if I tell the compiler that it is free to ignore everything I've written because I have deliberately lied to it, is there any case where it might actually do so?"

UB means that you have violated your contract with the compiler. Anything at all can happen once you have released the compiler from all obligations to follow your directions.

1 Like

@TomP, as I have already said, I understand that it is UB. I understand the compiler makes absolutely no guarantees about the resulting code. I understand no one should do it. That's not what I'm asking.

My question is whether, given the assumptions stated above, is there any reasonable transformation the compiler could do that would lead to binary incompatibility. Here, I've defined reasonable as anything other than deliberate breakage. Again, I understand that the compiler is free to deliberately break, but that's not my question. My question is about understanding what transformations the compiler actually does.

Sometimes it seems like the phrase "UB" has a similar effect on conversations. It can send them in absolutely any direction.

2 Likes

When your program compiles but has UB, it is well-formed syntactically but not semantically. You seem to be asking whether LLVM behaves in a semi-predictable manner when presented with a semantically ill-formed program. Which release of LLVM and rustc? When presented with which exact LLVM-IR stream?

To me this question is akin to asking whether it is possible to predict the trajectory of a ball on an honest roulette wheel when one has exact measurements of the forces involved. How exact? What about Brownian motion? What about dust irregularities on the surface? Maybe van der Wahl's forces?

1 Like

If the question is "is there, now, any combination of flags that will require the current compiler with the current std to actually transform that to a ud2 instruction, for example? None that I'm aware of.

But will it happen in the future? Perhaps.

2 Likes

Apparently my question is too ill-formed to be answered, so I'll just stop here. Sorry to @Ontonator for derailing the thread.

1 Like

I couldn't think of any "reasonable" case today (but am unwilling to assert that means anything).

I understand how that can be frustrating when you're just trying to suss out current behavior, or for conversations on the "what should the spec be" or "what can the compiler implementation itself assume" level.

Here I blab on awhile about why I think things are that way.

The thing is, people come around not-infrequently trying to nail down exactly what the compiler actually does with the goal of exploiting that in their programs and calling it a valid program. And asking how they can disable some optimizations to "turn off" things like rust's exclusive borrow guarantees, not understanding that it's a foundational part of the language. They're often coming from C/C++, where this is just what you do, because actually avoiding all UB is so impractical.

As people embrace Rust, they tend to absorb that UB is not only respected (in the sense that you never assume you can "get away with it"), but also that this separation is valued. Code that happens to get away with it, even when based on technical knowledge, is to be avoided. You're setting up for failure later and it's counter to Rust's reliability goals.

The net result is a culture where saying "but you could reliably avoid UB in this case" is going to get you a lot of "no, and just don't", even when you've prefixed it with "I know but...". At least on a user forum like this.

I just used the term UB a lot where I should have used "unsound".

Additionally, if the compiler is not guaranteed to avoid doing something, it is allowed to do something. So answering this question of whether the compiler transforms this particular case requires proving a negative. And here, "the compiler" means not only rustc, but also llvm, and you would need to be an expert in both. Speaking personally, I'm still pretty ignorant of what PGO does or doesn't do in practice, for example. Speaking more generally, I can imagine knowing rustc quite well and still being unsure of what transformation llvm applies.

This makes me, at least, quite hesitant to make assertions about what non-guarantees you can count on generally.

Anyway, back to the "is this a reasonable assumption [given a million caveats]" topic. The relevant UCG topic would seem to be this one on the layout of single-(non-1-aligned-ZST)-field structs. The outcome -- not yet guaranteed via RFC! -- is that you can assume the same layout. The issues linked in the following "unresolved questions" section are also interesting (IMO).

5 Likes

Perfectly reasonable. That's what I expected the answer to be.

Hmm, I didn't realize statements in the UCG are not "official". Where would such an RFC be accepted then? And more broadly, if you can't trust the UCG as an authoritative source on what is/isn't UB, then what can you trust?

It will take me some time to read through the unresolved issues. Thanks for the links.

Also, one follow-up question. Is PhantomData<T> considered a 1-ZST regardless of T? I think it must, otherwise Rc<T> would have the alignment requirements of RcBox<T>, which it definitely doesn't.

Edit: I think I can answer that last question myself, by scrolling up slightly in the UCG.

In particular, a struct with no fields is a ZST, and if it has no repr attribute it is moreover a 1-ZST as it also has no alignment requirements.

PhantomData<T> has no repr attributes, so it is a 1-ZST.

1 Like

The RFC repo, and see also the RFC book. (From the book you have to check the tracking issues at a minimum to see the actual state of things.)

There's no spec yet, but generally: official documentation and RFCs. The idea behind UCG was to work towards a spec or at least more RFCs, but it blew up and stalled out somewhat, sadly.

Relevant quote.

Anyway: the standard library docs say "check the nomicon"
then the nomicon says "here is some advice and ultimately we don't know, maybe check UCG"
then UCG says "ultimately we don't know it's probably like this but there's no RFC yet"
then Ralf says "probably it should be allowed if the layout matches".

(This is another reason the answer is always "it's guaranteed" or "don't assume anything", and not "definitely defined", "definitely UB", or "technically unsound but ok as things are implemented for now". There's no solid line between the latter two yet.)

Yes.

3 Likes

Yeah, this looks safe to me (although I don't have a great track record). I did have to add a cast though, so now I have:

impl<T> From<OrderedVecSet<T>> for Rc<OrderedVecSetUnsized<T>>
where
    T: Ord,
{
    fn from(set: OrderedVecSet<T>) -> Self {
        let rc: Rc<[T]> = set.0.into();
        unsafe { Rc::from_raw(Rc::into_raw(rc) as _) }
    }
}

And while this isn't safe, at least it is sound now. (And there probably isn't a safe alternative unless/until Project "safe transmute" produces results.)

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.