Is RefCell a zero-cost abstraction

I think you could still get UB. Imagine if you had a shared reference to a struct, read one field and used that field value to interpret a union value also in that struct, but on some intermediate step, you inadvertently mutate the union and tag fields. Now you're reading a union value with the wrong interpretation. Which would require unsafe, but be perfectly valid if you knew your reference was unique, but now it is UB because someone else is observing it.

I'm not really sure, but I think some sound unsafe code could become unsound in such a manner.

Actually, here might be a simpler example:

Create a shared reference to the same object, and then mutate it in such a way to cause it to be reallocated, then use the shared ref. I don't think noalias is relevant here, because there's nothing you can do safely with the shared ref, even if you know someone else is mutating it.

To answer the question that is in the subject of this thread (but not the body): No, RefCell is not zero-cost. It tracks borrows at run-time. Cell is, but only allows replacement of its contents, not references.

Iterator invalidation is the classic example. Here's some further explorations.

10 Likes

That's a quite straightforward "yes"; shared mutability is probably the #1 cause of memory unsafety. The prime
example is iterator invalidation. Consider the following code:

let mut vec = vec![1, 2, 3];
let ptr = &vec[0];
vec.extend(4..1000);
println!("{}", ptr);

If this compiled, ptr would be dangling as extend would cause a reallocation of the vector's buffer.


One more important distinction to be made is that shared mutability would still be UB even with noalias turned off, because the language defines it so. The existence of UB and the set of code considered valid or invalid has nothing to do with what the optimizer happens to allow. It has to do with what the language definition, the memory model, the "abstract machine", etc. allows in theory. UB means "this program violates the constraints of memory safety". It doesn't mean "guaranteed crash", nor does not crashing mean "free from UB".

14 Likes

The RefCell type is not zero-cost because it must prevent the existence of multiple mutable references, and to do that it must count them. If you compare to the Cell type, which is zero-cost, it is interesting to note that obtaining a zero-cost abstraction for this imposes some rather strict requirements on what kind of mutation you can do. Read more here.

4 Likes

I tend to disagree. "UB" is not a god-made thing, and neither are the language definition, memory model, "abstract machine" (which doesn't even exist as such according to claims elsewhere in this forum). Unless "UB" means something else in Rust than it means conventionally it is not useful to put any undesired behavior under the "UB" umbrella. "UB" means that the compiler and runtime is literally free to do anything - crash or not, output "pineapple" for all we know. See Regehr's blog post for an introduction of the common use of the term.

Specifically, I don't understand/tend to not concur with the argument that iterator invalidation leads to undefined behavior in the presence of a universal garbage collector (as was claimed by Niko in one of the blog post linked; I'm not aware of any examples where iterator invalidation led to a "browser compromise" when a universal garbage collector was used.) And Rc<RefCell<>> arguably is a universal garbage collector. Certainly, in languages like Java mutation of a collection can lead to the invalidation of iterators. This may result in exceptions, or it may even result in silent program failure, but it doesn't result in undefined behavior - it's what's commonly (at least in the C world) called implementation-defined behavior.

Before you point out that I'm switching the subject here, I am of course interested in the aggregate cost of Rc<RefCell<T>> primarily since this combination represents the interior mutability pattern that AFAICS most closely emulates access to a struct as you may do it in other languages. Now whether this pattern can really be avoided without introducing extensive (and expensive) indirection is another topic I'm very interested in learning about.

Just a reminder that RefCell<T> can have a cost and yet be a zero-cost abstraction. "Zero-cost abstraction" refers to how much overhead the abstraction itself adds, not how much it actually costs. "Do you pay for this abstraction when you do not use it?" and "If you hand-coded it yourself, could you do better?" are two key questions here.

So "what does RefCell<T> cost" vs "is RefCell<T> a zero-cost abstraction" are two different questions. Could you do runtime borrow checking faster by implementing it differently? That's the primary question to be answered for the latter question.

Shared mutability can cause memory unsafety regardless of restrict, and even in single-threaded contexts. A classic example is iterator invalidation. (EDIT: Oh, I see this was brought up since I last opened this tab)

14 Likes

Undefined Behavior can only be understood in terms of the documents or conventions that establish which behaviors are defined. History has shown that using the compiler binary as this definition is a bad idea, as it hamstrings the development of the language: If all programs emitted by one particular version of one particular compiler are defined to be “correct,” there’s no room left for language developers to make improvements.

Instead, all most programming languages have a kind of contract between the language developers and users that describe what each is allowed to change so that they don’t step on each others’ toes. UB is the design space that the compiler authors have reserved for themselves, whether or not any particular compiler version produces inconsistent code when it’s used, and any non-UB code should work on any compiler version.

By “compiler version” here, I don’t just mean released binaries, but also any supported combination of configurations as well: rustc and rustc without noalias are two different compilers targetting the same language, and therefore have the same set of undefined behaviors. This is what @H2CO3 was referring to with his statement that “The existence of UB and the set of code considered valid or invalid has nothing to do with what the optimizer happens to allow.”

In Rust, the exclusivity of access provided by &mut is deeply baked into the language design, and is often a key point when analyzing unsafe code for soundness. Removing that guarantee would invalidate all of those analyses, potentially leading to soundness holes in all manner of Rust libraries.

12 Likes

Is that a chiasmus? :upside_down_face:


So, as per the current definition of Rust the language, a &mut reference is a unique one, per definition (FWIW, at some instances it was suggested it be called &uniq, precisely because people reasoning about it as a mutable reference rather than as a unique one were not understanding the language correctly). Violating its uniqueness guarantee goes against its very definition in the language, thus representing a "contract violation" w.r.t. the compiler guarantees, i.e., the very definition of UB.

That being said, you can very legitimately ask about the rationale behind this design / ask if the restrictions could be loosened. To think about the pros and cons of replacing &uniq in the language with &cell (and at that point, replace & _ with &cell _ as well, and we'd have C++-like references):

  • pro: it simplifies a bit the mental model, by not having to worry about aliasing.

  • con: by now having to deal with aliased mutability, writing some code in a bug-free no-matter-the-API-misusage manner becomes more complex1 at best, or more difficult, or even straight up impossible. To avoid the memory-unsafe bugs of languages such as C/C++, a global gc becomes highly advisable, and we are back to being a gc / managed language, such as Python, Kotlin, go, etc.

    • Another option is to get rid of mutability, since it plays so poorly with aliasing. This comes at the price of more copying, or usage of more complex data structures to palliate that. Rust becomes a functional language (OCaml, Haskell, etc.)

1 yet another chiasmus

Granted, having to produce &uniq references can be demanding, at times, but in practice not that often (otherwise Rust would not be language one could be productive with!). But let's not overlook the power, the peace of mind, that being given &uniq references, or, by contraposition, being given &shared references that happen to imply immutability of the referee, both offer.

Again, a Vec's ptr/len/cap fields are immutable when referred to by a shared reference, thus making &mut Vec<_> the only way one can append an element to the vec, potentially modifying the *ptr allocation, it immediately follows that yielding &'shared Elem references to the Vec's heap-allocated Elements when given a &'shared Vec<Elem> is something safe to do, and which no API misusage can break whatsoever.

Designing correct and robust code, that is, way less buggy code, becomes way easier in Rust thanks to that. That's why I wouldn't be surprised that the language with which mid-to-long term code is written and where the maintainers of such code are most productive turned out to be Rust (and functional languages; but that's where Rust efficiency (by not having to deal with the optimized immutable collection shenanigans from functional languages) might outshine those).

The power that comes from &uniq (and &shared often meaning &immutable) is one of Rust greatest strengths :slightly_smiling_face:

I'm tempted to inverse-quote uncle Ben, and say:

3 Likes

Could you provide an example or 2 of such analyses that would be invalidated?
Let's restrict ourselves to Rc<RefCell<>> (so that we avoid use-after-free errors), and let's make the assumption that the pointers in question are not aliased.

That is definitely incorrect. You can replace Vec<i32> with Rc<RefCell<Vec<Rc<RefCell<i32>>>>>, i.e. insert the "universal GC" at every possible point, and yet if shared mutability were allowed, it would blow up (have UB) upon reallocation due to a use-after-free.

Update: this blows up, as expected.

3 Likes

This would be the case only if the iterator were represented as an address into the reallocated buffer. That's incompatible with "universal gc" - although admittedly we haven't really defined what universal gc is. Iterators under this model are stored as indices, and the index is valid even after reallocation. I understood "universal gc" as not allowing you to take the address of things without pinning down the referenced object, or not taking the address at all.

A Vec::[Into]Iter is represented as such.

The point is, in Rust, it is not expected that you insert Rc<RefCell> at arbitrary places, even within the guts of a data structure. As such, even if you insist on doing this, you won't end up with a GC language, and you can't arbitrarily redefine or ignore what the designers of the language say about what is or isn't defined behavior.

1 Like

Let's take a step back here.

I agree (if that's the point) that you cannot implement iterators (or anything else) using direct pointers if you allow reallocation. That's pretty clear, undoubtedly.

My question is specifically what I am paying the overhead of .borrow_mut() for in a Ref<RefCell<>>. (*) If all accesses, mutable or not, go through the Ref (so no direct pointers are kept anywhere), would we agree that there's no potential for use-after-free of the heap allocated object to which the Rc refers?

That then leaves the issue of the compiler assuming that no mutable references exist (and thus being able to perform the same optimizations a C compiler could when a pointer is declared with the restrict keyword). This issue could be side-stepped with a marker trait or something similar like it's already done for some types.

My question sort of is, what else? What are other examples where things would completely break under the alternative scenario that, say, multiple calls to borrow_mut() were allowed to succeed or perhaps equivalently, if Rc<T> allowed mutable access to T (and accesses weren't optimized under the noalias assumption).

(*) I'm especially interested in that since so far no one has been able to present a solution to the alternative question I posed of how to avoid interior mutability that doesn't involve double indirection or hashing or indexing via integers, or avoiding direct references altogether.

You are paying the overhead of runtime borrow checking. You are allowed to call borrow_mut twice in a row, keeping the first mutable borrow in scope, as borrow_mut takes &self, an immutable (and thus shareable) reference. This means that your code will compile, and as the guarantee of soundness, it will panic at runtime, which in turn requires maintaining a borrow counter and a branch.

Yes. That's a very elaborate way of saying that the implementation of RefCell is sound. (BTW this has nothing to do with heap allocation or Rc; the memory unsafety demonstrated above concerns the Rc object itself, not the pointed heap allocation.)

I don't understand what you are getting at here. Declaring that a type does or does not uphold the "no shared mutability" rule doesn't actually affect what it does in reality, and does not license you to violate the language spec. The only correct way to introduce interior mutability is by means of UnsafeCell, raw pointers, or some wrapper type that abstracts over either of them (RefCell uses UnsafeCell internally).

You can get into trouble with functions that invoke user callbacks:

fn f<F:Fn()>(x: &mut X, cb: F) {
    // get some information from x, or partially update it
    cb(); // let user code do things
    // finalize changes to x
}

Under the normal Rust rules, there’s no danger of cb trying to use the partially-updated x. If you allow Rc to produce an aliased &mut X, however, the callback function may have its own reference and make unexpected changes that break the behavior of f.

Even if the intermediate code isn’t completely arbitrary, you’d have to check that every function you call while updating x is ok with seeing it in an inconsistent state, as they all might have a reference to it. If &mut is guaranteed to be non-aliased, however, there is no such danger.

3 Likes

Here's another example using threads. This playground contains a race condition which causes the assertion to sometimes fail.

Weird that you'd ask the question I've already given an aswer to; is there some part of my answer you did not understand / you'd want to see further explained?

One of the things you're paying for is the safety guarantees of Rust, and all the benefits that flow from that*. Instead of UB, you'll get a panic, should you try to create a situation that could violate Rust's memory safety guarantees (like multiple mutable borrows).

Aside from the examples others have provided, the unique &mut property is so baked into the language that it's nigh impossible to be sure you've enumerated all the cases that rely on it. Formal proofs like Stacked Borrows are also based on invariants like the &mut rule. If you're just asking out of curiosity or trying to map out what you can and can't do in unsafe, I'm not discouraging you, but it's something to be aware of.

(Granted, I'm unaware of an analysis that assumes you're going to wrap everything in an Rc<RefCell<_>>... because that's generally not a reasonable thing to do, and requiring it would be some language that isn't Rust.)

Use raw pointers and unsafe everywhere. This throws out many benefits of Rust, so you'll rarely see it suggested on a Rust forum. (One could also argue it qualifies as interior mutability.)


*: One of the benefits which is underrepresented in my opinion, is the ability to reason locally about safe code, while corralling an entire category of bugs into the (hopefully relatively few and small) unsafe portions of your code base.

If I, the programmer, write a function that takes a &mut, and I don't share that with any other function I call, I know exactly what's going on with the data behind it for the entirety of the function. And if any bug happens during runtime that violates that knowledge -- that alters the data while I hold the &mut, or even observes the data in an intermediate state -- that bug must lead back to an unsafe part of the code somewhere. The root bug cannot be in any safe part of the code!

That's an extremely valuable benefit, even if there is the occasional runtime cost.

You may enjoy this blog series. In particular, this section goes over the benefits with regards to local reasoning. (Apologies if I'm repeating myself, I recently suggested that series elsewhere and I can't remember if it was to you or not.)

5 Likes

I'm actually relying on this property in a specialized wrapper around SQLite that I'm writing. I also rely on the lifetime system to tie the lifetime of strings and blobs to that of the SQL prepared statement object they are coming from, which eliminates the possibility of dangling pointers (given that executing or resetting the statement works with &mut methods exactly for this reason).