Casting `Rc<T>` to `Arc<T>`

@alice, interestingly, I think there is a case to be made that transmuting Rc to Arc is at least straddling the border and has one foot in "sound" territory. The Unsafe Code Guidelines indicate that both Rc and Arc are single-field structs and thus have equivalent layouts. I guess the UCG is not "official", though, so this isn't yet guaranteed by the language. But as you say, there's really nothing the compiler could reasonably do to make them incompatible without intentionally trying to break your code.

1 Like

It occurs to me that another way to avoid the transmute would be if Rc offered a function equivalent to Arc::clone. That's an actual standard library change, though, and it might just be too confusing of a function for newcomers (it wouldn't have to be an unsafe function, but you have to be writing weird code like me to need it). (EDIT: Actually, this wouldn't really work, since if it returned Rc then you are hosed when it calls Rc::drop.)

They are definitely not guaranteed to be laid out in the same way, see e.g. this PR of mine that clarifies the actual current guarantees, but the compiler will lay them out in the same way today, so it does not, strictly-speaking, invoke UB today.

XY solution-ish:

9 Likes

Ooh, that is a solution to my cloning problem: I still have to pass around a hybrid_rc::Rc<T> behind a Sync wrapper, but I can now clone a hybrid_rc::Arc out of the wrapper without breaking the original Rc using to_shared.

3 Likes

What is unstable today can easily become unsound tomorrow.

3 Likes

That's true. What of it? Not all code cares about stability wrt future versions.

My requirements are for code that works, for now and for as long as I need to continue using it, with good performance, and I'm using Rust because it's easier to write than C, but my concern for soundness is only an instrumental goal. The program I'm writing doesn't need to be correct, because it generates an object that is separately checked for correctness. As long as it makes it to the finish line on the specific inputs given to it, it has achieved its purpose. (The architecture works like this because Rust's safety guarantees are not good enough for my application, which is formal verification. So Rust is completely excluded from the trusted code base.)

EDIT: Sorry, I shouldn't get too aggressive about this, but I often see a tendency in Rust circles to demonize quick-and-dirty code of the sort that is common in C, and I think we should recognize that priorities can vary significantly depending on the application domain and there are legitimate reasons why soundness might not be at the top of the list even if a project is using Rust.

6 Likes

I get it, but I think you're going to keep getting friction, for at least two distinct reasons:

  • One is cultural -- preferring "definitely sound" to "happens to work today" is pretty entwined with the general safety guarantees of Rust, which are highly valued. Seems likely that you'll need to get past this part of the conversation every time. (This also leads to people not wanting "quick-and-dirty like C" to be common in Rust, as that would weaken it's "safer and more reliable" associations.)

  • Conflation of terminology: you led with "Is it safe", and in a Rust context, this is almost always going to be taken as "is it sound". Similarly, UB is usually not considered a what-it-compiled-to term, like was already hashed-over above.

The latter could be mitigated by being more explicit or using different terminology. But you'll also probably never get a very satisfying answer, due to the former cultural considerations -- people are a lot less interested in "getting away with" unsoundness compared to C, and thus don't know or necessarily even care when you can do it, because "you shouldn't".

22 Likes

As long as it makes it to the finish line on the specific inputs given to it, it has achieved its purpose.

And if you run into UB and your program puts out garbage, or it randomly crashes, does it serve its purpose?
Presumably something like that would require a lot of debugging, testing and bashing your head against the wall. That's probably not worth it, given that the things you want to do unsound things for are trivially done with sound unsafe code, or without using unsafe at all.

Previously, I was actually sending the Rc itself around, with a wrapper to ensure that Rc::clone is never called to avoid the data race

That's not actually sufficient on its own, and if you do this you're just going to get data races on the refcount and mysterious and rare crashes.

1 Like

Certainly not. Of course, like any good rustacean I have added small essays to all my unsafe blocks, so if something does go wrong I know where to look. To be clear, I'm not "anti-safe-Rust" or anything like that. I know full well the advantages of having code that is 95% safe code, and one of them is a lot less head-bashing.

The whole reason I started down this path is because it can't trivially be done without unsafe code. (In fact it's worse than that, it's unstable implementation details in this case.) It can maybe be done with an overhaul, but given that I already tried one overhaul and got nothing but a waste of time and a negative performance gain for my trouble, I'm not in a hurry to try that again. The next best thing would be to outsource to a crate to do the dirty work, assuming it can be used as a drop in replacement, and hybrid_rc looks like it can do that so I'm thnankful to Yandros for the pointer.

Care to elaborate? How are you going to get a data race on the refcount if no one is writing to it?

Rc::drop() also reads and writes the refcount to decide whether or not to free the underlying memory.

2 Likes

In this case, the other threads get a &MySyncWrapper(GraphWithRc), built from a &GraphWithRc, so none of the Rc's will be dropped by the other threads as they don't own the data. The sync wrapper's lifetime is linked to an actual Arc which ensures the other threads have dropped read access first. If they could Rc::clone then they could drop those clones, but that's exactly what the sync wrapper is preventing.

This sounds much more sensible than what you originally wrote, which implied that you were sending owned Rcs to the other thread, and not just references to them:

From this quote, I assumed that you cloned the Rc on the original thread, wrapped them up, and then sent the entire wrapper to the other thread.

2 Likes

Ah, actually that part is sort of true as well. The whole GraphWithRc is moved from one thread to another as part of a larger data structure, so the thread that does the dropping is not necessarily the same as the one that does the construction. (More specifically, this is part of an async function which is running on a multithreaded runtime, so although it appears to be the same thread doing the construction and the destruction it might be paused and sent around in between await points.) However, the reader threads have access only while the graph is not in motion, in the same way that a Box<T> cannot be moved while a &T is around.

I suspect most of the friction here is not because you're using unsafe, but because you're using it in a dodgy way for no reason.

It seems like you are trying hard to avoid the negligible (or possibly even non-existent) costs of some atomic accesses, but you're a fine with designing a data structure full of pointers and indirection that is scattered across memory, which will destroy your performance? It sounds like you're optimizing the wrong thing. I suspect (as @alice pointed out also) that you can get much better speedups by using a vector and indices.

Unfortunately that's not always true, you can cause UB and have its effect manifest far away from the source. A typical example would be an out of bounds write overwriting an unrelated piece of data.

You could use an Arc.From what I'm reading you construct the graph once and then send it to another thread to do...things with it that don't actually change anything about it? Only creating and destroying - not reading - the graph incurs the added cost of atomic accesses.

Have you actually benchmarked this?

3 Likes

The "graph construction" phase is actually the majority of the program. This cross-thread reading thing is a comparatively small part (it is what happens when one file imports another), while everything else happens inside a single file on a single thread. Introducing atomic refcounting everywhere for just a few, rare, read only cross thread accesses seems like a high price to pay. Ideally, I would not be using Rc/Arc either, but rather a proper garbage collector, but there isn't a really good garbage collector library for Rust (ironically).

Like I said, maybe "a vector and indices" can be done but it's an overhaul and it's not obviously a win because who is doing the garbage collection in this scenario? If I do something like for _ in 0..10000000 { let x = (1, 2); } in the interpreter (where we assume for the sake of discussion that (1, 2) allocates) then does that mean my vector is now full of dead allocations and I run out of memory?

I've been calling it an Rc graph because that's what it looks like from the rust side, but it's running arbitrary code, and "constructing a graph" is only a side effect, the main point is to execute the code, and so executing the code is the thing that is optimized most heavily. Lots of small allocations is a part and parcel of running code in these kinds of languages. Show me an interpreter for lisp, scheme, javascript or python that doesn't create lots of small allocations for the code I just presented.

Well no, in this instance if there is breakage it will be due to code in an unsafe block, namely the transmute that we've been talking about this whole time. The error might manifest somewhere else but if there is data corruption or some other weird bug it will be the unsafe blocks that get the scrutiny. That's rust working as intended.

@mejrs @steffahn I benchmarked the performance of just replacing Rc with Arc on an example that normally takes about 1.7 seconds. I only ran it a handful of times manually to get a ballpark, but I see about a 6% regression (1.682 s -> 1.785 s) with a standard deviation of 2% in the baseline.

8 Likes

Like I said, maybe "a vector and indices" can be done but it's an overhaul and it's not obviously a win because who is doing the garbage collection in this scenario? If I do something like for _ in 0..10000000 { let x = (1, 2); } in the interpreter (where we assume for the sake of discussion that (1, 2) allocates) then does that mean my vector is now full of dead allocations and I run out of memory?

The memory will be dropped when the backing vector itself is dropped.

They probably don't do that out of the box, but if you use a graph library you'll likely find out that all the nodes happen to be adjacent in memory. It's an easy and good optimization, so there's usually no reason not to do it.

Well no, in this instance if there is breakage it will be due to code in an unsafe block, namely the transmute that we've been talking about this whole time.

The problem with unsafe is that it pollutes the entire abstraction. "safe" code might violate an invariant that unsafe code relies on and cause otherwise sound unsafe code to do bad things. I'm guessing you have a lot of unsafe in your program, and if you are not careful w/r to abstraction barriers that can leak into your entire program.

For example, Vec::set_len contains no unsafe code, but if it is misused it can cause undefined behavior. If someone figured out Vec was unsound, fixing it would require auditing the entire module, not just the unsafe blocks.

So... I do run out of memory then. The backing vector is the entire memory of the interpreted program here, so it won't be dropped until the program terminates, which means in a hot loop example like that one I will allocate repeatedly and never free. "Never deallocate" is a valid garbage collection strategy, but it is limited for obvious reasons. Plus even if I don't run out of memory I can still easily get not particularly local data, i.e. if the hosted program builds a linked list with a bunch of allocation between each link then these correspond to widely separated indices in the vector, so not very local. Also if the T in my Vec<T> is an enum of all different kinds of object then I can end up using a ton of space for small data. Real allocators avoid this issue by reserving different sizes for different things.

You impute too much. All of my unsafe is encapsulated in modules with a clear safety barrier and copious documentation. I did my homework, thank you very much. I do this because it is good for maintenance and makes bugs easier to spot. Most of my bugs are not safety violations, and in fact I don't think I have ever gotten a memory safety error while working on the project (which is up to about 55 000 lines at the moment). Rust does its job very well.

2 Likes

I benchmarked the performance of just replacing Rc with Arc on an example that normally takes about 1.7 seconds. I only ran it a handful of times manually to get a ballpark, but I see about a 6% regression (1.682 s -> 1.785 s) with a standard deviation of 2% in the baseline.

That is definitely where I'd start looking into unsafe to speed things up :slight_smile: But I would not transmute between Rc and Arc. I'd be fine with the read-only reference across threads.

All of my unsafe is encapsulated in modules with a clear safety barrier and copious documentation. I did my homework, thank you very much.

I assume this is about your "mm0" project? I've glanced through it and most of the unsafe usage is

  • lifetime transmutes, which seem to be to get around borrowchecker limitations. I haven't looked into them but from my experience those are either unneeded or unsound.
  • transmutes between various repr(Rust) structures, similar to the Rc to Arc transmute you want to do. I'd recommend avoiding them, or putting repr(C/transparent) on them for the cases where they are your own structs.
  • Some unsafe that's useless because there are safe apis to do the same things.
  • Some ffi, which is fair.

Overall I think there is a decent amount of reckless or unneeded unsafe. I guess we'll have to agree to disagree on that though.

At any rate, I'd recommend running miri and cargo test as part of your CI. It's little effort and these are also very nice features of Rust :slight_smile:

1 Like