Casting `Rc<T>` to `Arc<T>`

Is it safe to transmute an Rc<T> into an Arc<T>? I'm aware that this can be potentially unsound if you use the Rc<T> or its clones after this, I'm just wondering if they are layout compatible. (In my application, I have a big single threaded computation to produce an Rc graph, and then I want to "freeze" it and send it to another thread, without additional copying.)

In general, layouts of any Rust struct cannot be guaranteed, unless it is #[repr(C)]. Arc isn't, hence this is unsound.

11 Likes

This is unhelpful (and partially false). I'm not asking if they are guaranteed to be layout compatible, I'm asking if they are, currently and for the last 57 versions of rust. I'm quite aware that there are no promises. It's only UB if the layouts don't match. Unstable and unsound are very different things.

Directly transmutting would be unsound for the reasons you state. But I would like to note that both Rc and Arc take the exact same approach of storing a NonNull pointer to an inner type that is repr(C).

I would have to think about what pre-conditions need to be satisfied, but I believe there would be some way to convert between them soundly. You would have to use Rc::into_raw and Arc::from_raw.

@digama0, since you say the graph is frozen, I assume you never need to modify it again. Do you need to be able to read it from multiple threads? Or do you just need to transfer it to another thread?

2 Likes

You can't really know this, because the Rust compiler can mess around with a struct's layout in completely arbitrary ways.

10 Likes

This is not correct. The language makes no guarantees that your code won't miscompile if you transmute between repr(Rust) structs. Just because it gives you what you want practically doesn't mean it's not UB.

15 Likes

@bradleyharden I need to read it from multiple threads. Previously, I was actually sending the Rc itself around, with a wrapper to ensure that Rc::clone is never called to avoid the data race, but now I have a place where I actually do need to clone it from the reader thread, and so it should really be an Arc::clone.

@RedDocMD I'm not sure what you mean: the current and last 57 versions of the compiler are all visible to the public on github, so you certainly can know this. This is application code, not library code, so I'm okay with compiler sensitive details, at least when it seems to be relatively stable.

@bradleyharden Yes, that's right, the language doesn't guarantee that version 58 won't break my code. But that doesn't make it UB on old versions. UB only applies to a single execution on a specific compiler and version, and as long as on that version no bad reads or out of bounds accesses are made then it is defined behavior. Since it's unstable, a "non-breaking" change to the compiler can make the code UB, but the original code would not be UB in this scenario.

For the reference, the transmute's document mentions this nomicon page which lists possible UB cases.

When transmuting between different compound types, you have to make sure they are laid out the same way! If layouts differ, the wrong fields are going to get filled with the wrong data, which will make you unhappy and can also be UB (see above).

So how do you know if the layouts are the same? For repr(C) types and repr(transparent) types, layout is precisely defined. But for your run-of-the-mill repr(Rust), it is not. Even different instances of the same generic type can have wildly different layout. Vec and Vec might have their fields in the same order, or they might not.

3 Likes

I think the problem here is that we're using different definitions of the term "undefined behavior". As I understand it, UB relates to the contract between you and the language, nothing more. If you violate that contract, then you are invoking UB. The compiler may give you exactly what you want, and you may be able to verify that it gives you exactly what you want in all previous versions of the compiler, but you are still invoking UB according to that definition.

8 Likes

I'm using "undefined behavior" in the sense used in for example the C/C++ specifications, which roughly tracks the way it is used in Rust as well. It is an error condition in the Rust abstract machine. If the compiler "happens to" lay out two structs the same way, then the R-AM will not throw an error condition if you transmute between them, because memory is not typed in the R-AM - it does not have the necessary runtime information to tell, even if it wanted to.

1 Like

More generally, I think it would be sound to convert Rc to Arc using the into_raw and from_raw methods, so long as you can guarantee you are converting all outstanding strong and weak pointers. If you know that all of them are contained within a single graph structure, and you can convert the entire graph at once, I think it could work.

However, I think there's a caveat that this all depends on the strong and weak counts being interpreted exactly the same by both Rc and Arc. I think that's true, but there are some wrinkles. For example, here's a comment in the Arc source code:

    strong: atomic::AtomicUsize,

    // the value usize::MAX acts as a sentinel for temporarily "locking" the
    // ability to upgrade weak pointers or downgrade strong ones; this is used
    // to avoid races in `make_mut` and `get_mut`.
    weak: atomic::AtomicUsize,

It sounds like usize::MAX is only a transient value for weak, so it wouldn't be relevant if you convert your whole graph structure at once. But the conversion is on somewhat tenuous grounds.

2 Likes

Undefined behavior is not about what actually happens, it's about what could happen. I don't think you could ever meaningfully speak about what actually happens, because there are effectively infinitely many scenarios for the compiler to consider. The only way to speak meaningfully about the compiler output is to discuss the contract between the programmer and the language. And by transmuting between repr(Rust) structs, you are violating that contract and violating assumptions of the Rust abstract machine.

Edit to add one more comment:

The compiler will make assumptions based on the "language spec" (fuzzy for Rust). If you violate those assumptions in any way, then there's no way to tell whether the compiler will miscompile your code. I suppose you could, in principle, inspect the compiler source and consider the effect of changing that assumption on all possible optimizations and their interactions. That might give you a "guarantee" for a particular version of the compiler. But that's a tall order and practically not possible.

9 Likes

There probably aren't any compiler versions that would lay them out in different ways considering that the internal allocated type is #[repr(C)] in both cases, and the Arc/Rc structs themselves are just a single pointer. However, it has not always been this way — they were not marked #[repr(C)] until 2020 in this commit, and before that, I wouldn't be surprised to see them laid out differently.

I would feel better about a piece of code that puts the Rc graph into a struct that unsafely implements Send, and somehow guarantees that all clones of the Rcs are moved across threads "together", and that the user can't get a clone of the Rcs.

I would feel even better about a piece of code that uses Arc to generate it in the first place.

11 Likes

Technically it's true, but the compiler devs will be very unhappy with it as if enough people relies on undocumented, accidentally happened, internal implementation detail of the compiler it would be practically a breaking change to change such behavior. That's why they're implementing struct layout randomization.

5 Likes

I think this is really the best option. It seems unlikely that the performance penalty of Arc would be a problem during creation of the graph. But maybe it's a huge graph?

In this context, I’m curious as to how much performance benefit you’re getting in practice by using Rc instead of Arc while generating the graph. Certainly you wouldn’t be trying this unsafe operation at all if there wasn’t some performance benefit, right?

3 Likes

While I agree with you, the scope of the could there is important. It could mean "for all possible inputs to the compiled program" (which is what I mean) or it could mean "for all future versions of the compiler / all spec-compliant compilers" (which is what you and RedDoc mean). Unfortunately there is no spec so the latter interpretation is already on shaky ground.

@alice That's actually exactly what I've been doing so far. It's a bit tricky to read a weak pointer without Rc::upgrade but it can be done using only stable (but unsafe) functions in the API.

@bradleyharden @steffahn The issue is not that it's a huge graph, but that the code that generates the graph is very hot. For context, it's an interpreter for a garbage collected language. The graph that results is simply the output of the interpreted program, which is then imported read-only into other modules. Steffahn is right that I should benchmark whether Arc itself is a big cost, but the running of the interpreted code is by far the most expensive part of the program at the moment.

Even if generating the graph turns out to be slow, since the graph is immutable after construction, I strongly suspect there are even more performant ways than the Rc transmute. E.g. maybe storing all of the values in a big vector and using indexes would be even faster due to its reduction in the number of allocations.

8 Likes

FYI, I wasn’t even trying to criticize this optimization, but I’m just genuinely curious as to what kind of speedup Rc can give over Arc in realistic settings.

I have been considering that option as well. Doing this during the construction phase might be iffy, since the user code might do a lot of construction and destruction of short lived objects, and if I just use a bump allocator then I'm liable to produce huge outputs with a lot of dead space. Having the "freeze" operation reallocate everything in a dense vector with entirely separate types for everything also has the advantage of making serialization possible (I tried low effort serialization with serde/bincode and it was a waste of a couple days of work; it turned out that deserializing was slower than running the user code in the first place, because serde doesn't support non-tree Rc graphs, it duplicates everything, which on top of being potentially exponentially worse on non-contrived inputs is also semantically wrong since I use Rc<RefCell<T>> for mutable pointers).

1 Like