Is RefCell a zero-cost abstraction

In another thread the question of what Rc<RefCell<T>> is was discussed.

I had asked about the rationale for paying the runtime cost for .borrow/.borrow_mut when RefCell is used in combination with Rc. Is this cost being paid so that the compiler can maintain the restrict assumption of all pointers (in other words, to reap potential optimization benefits), or is there a memory safety reason. (Usually, borrowing rules assure that memory safety is not violated.)

RefCell exists to enforce memory safety. It allows you to mutate structures behind a shared & references, and thus it has to track at runtime whether any other references are given out. Otherwise, you could have one function taking a shared reference and another taking an &mut, and since the optimizer will assume no mutations are possible while the shared reference exists, it will make a false assumption, leading to UB.

Rc has little to do with this, because Rc doesn't usually typically offer &mut references. The primary way to mutate through Rc is through a shared reference, which is what RefCell enables.

4 Likes

Perhaps I wasn't clear.

When I said: "reap potential optimization benefits" I meant the same as what you phrased as: "the optimizer will assume no mutations are possible while the shared reference exists."

My question is whether there are other reasons besides this.

On that note, though, let me check if my understanding here is correct or not. Suppose you turned off noalias in the backend (as it was disabled apparently due to bugs for a long time). If so, would there still be places in which UB could result if mutations were allowed while a shared reference exists (in a single-threaded context)? What would be examples?

I think you could still get UB. Imagine if you had a shared reference to a struct, read one field and used that field value to interpret a union value also in that struct, but on some intermediate step, you inadvertently mutate the union and tag fields. Now you're reading a union value with the wrong interpretation. Which would require unsafe, but be perfectly valid if you knew your reference was unique, but now it is UB because someone else is observing it.

I'm not really sure, but I think some sound unsafe code could become unsound in such a manner.

Actually, here might be a simpler example:

Create a shared reference to the same object, and then mutate it in such a way to cause it to be reallocated, then use the shared ref. I don't think noalias is relevant here, because there's nothing you can do safely with the shared ref, even if you know someone else is mutating it.

To answer the question that is in the subject of this thread (but not the body): No, RefCell is not zero-cost. It tracks borrows at run-time. Cell is, but only allows replacement of its contents, not references.

Iterator invalidation is the classic example. Here's some further explorations.

10 Likes

That's a quite straightforward "yes"; shared mutability is probably the #1 cause of memory unsafety. The prime
example is iterator invalidation. Consider the following code:

let mut vec = vec![1, 2, 3];
let ptr = &vec[0];
vec.extend(4..1000);
println!("{}", ptr);

If this compiled, ptr would be dangling as extend would cause a reallocation of the vector's buffer.


One more important distinction to be made is that shared mutability would still be UB even with noalias turned off, because the language defines it so. The existence of UB and the set of code considered valid or invalid has nothing to do with what the optimizer happens to allow. It has to do with what the language definition, the memory model, the "abstract machine", etc. allows in theory. UB means "this program violates the constraints of memory safety". It doesn't mean "guaranteed crash", nor does not crashing mean "free from UB".

14 Likes

The RefCell type is not zero-cost because it must prevent the existence of multiple mutable references, and to do that it must count them. If you compare to the Cell type, which is zero-cost, it is interesting to note that obtaining a zero-cost abstraction for this imposes some rather strict requirements on what kind of mutation you can do. Read more here.

4 Likes

I tend to disagree. "UB" is not a god-made thing, and neither are the language definition, memory model, "abstract machine" (which doesn't even exist as such according to claims elsewhere in this forum). Unless "UB" means something else in Rust than it means conventionally it is not useful to put any undesired behavior under the "UB" umbrella. "UB" means that the compiler and runtime is literally free to do anything - crash or not, output "pineapple" for all we know. See Regehr's blog post for an introduction of the common use of the term.

Specifically, I don't understand/tend to not concur with the argument that iterator invalidation leads to undefined behavior in the presence of a universal garbage collector (as was claimed by Niko in one of the blog post linked; I'm not aware of any examples where iterator invalidation led to a "browser compromise" when a universal garbage collector was used.) And Rc<RefCell<>> arguably is a universal garbage collector. Certainly, in languages like Java mutation of a collection can lead to the invalidation of iterators. This may result in exceptions, or it may even result in silent program failure, but it doesn't result in undefined behavior - it's what's commonly (at least in the C world) called implementation-defined behavior.

Before you point out that I'm switching the subject here, I am of course interested in the aggregate cost of Rc<RefCell<T>> primarily since this combination represents the interior mutability pattern that AFAICS most closely emulates access to a struct as you may do it in other languages. Now whether this pattern can really be avoided without introducing extensive (and expensive) indirection is another topic I'm very interested in learning about.

Just a reminder that RefCell<T> can have a cost and yet be a zero-cost abstraction. "Zero-cost abstraction" refers to how much overhead the abstraction itself adds, not how much it actually costs. "Do you pay for this abstraction when you do not use it?" and "If you hand-coded it yourself, could you do better?" are two key questions here.

So "what does RefCell<T> cost" vs "is RefCell<T> a zero-cost abstraction" are two different questions. Could you do runtime borrow checking faster by implementing it differently? That's the primary question to be answered for the latter question.

Shared mutability can cause memory unsafety regardless of restrict, and even in single-threaded contexts. A classic example is iterator invalidation. (EDIT: Oh, I see this was brought up since I last opened this tab)

14 Likes

Undefined Behavior can only be understood in terms of the documents or conventions that establish which behaviors are defined. History has shown that using the compiler binary as this definition is a bad idea, as it hamstrings the development of the language: If all programs emitted by one particular version of one particular compiler are defined to be “correct,” there’s no room left for language developers to make improvements.

Instead, all most programming languages have a kind of contract between the language developers and users that describe what each is allowed to change so that they don’t step on each others’ toes. UB is the design space that the compiler authors have reserved for themselves, whether or not any particular compiler version produces inconsistent code when it’s used, and any non-UB code should work on any compiler version.

By “compiler version” here, I don’t just mean released binaries, but also any supported combination of configurations as well: rustc and rustc without noalias are two different compilers targetting the same language, and therefore have the same set of undefined behaviors. This is what @H2CO3 was referring to with his statement that “The existence of UB and the set of code considered valid or invalid has nothing to do with what the optimizer happens to allow.”

In Rust, the exclusivity of access provided by &mut is deeply baked into the language design, and is often a key point when analyzing unsafe code for soundness. Removing that guarantee would invalidate all of those analyses, potentially leading to soundness holes in all manner of Rust libraries.

12 Likes

Is that a chiasmus? :upside_down_face:


So, as per the current definition of Rust the language, a &mut reference is a unique one, per definition (FWIW, at some instances it was suggested it be called &uniq, precisely because people reasoning about it as a mutable reference rather than as a unique one were not understanding the language correctly). Violating its uniqueness guarantee goes against its very definition in the language, thus representing a "contract violation" w.r.t. the compiler guarantees, i.e., the very definition of UB.

That being said, you can very legitimately ask about the rationale behind this design / ask if the restrictions could be loosened. To think about the pros and cons of replacing &uniq in the language with &cell (and at that point, replace & _ with &cell _ as well, and we'd have C++-like references):

  • pro: it simplifies a bit the mental model, by not having to worry about aliasing.

  • con: by now having to deal with aliased mutability, writing some code in a bug-free no-matter-the-API-misusage manner becomes more complex1 at best, or more difficult, or even straight up impossible. To avoid the memory-unsafe bugs of languages such as C/C++, a global gc becomes highly advisable, and we are back to being a gc / managed language, such as Python, Kotlin, go, etc.

    • Another option is to get rid of mutability, since it plays so poorly with aliasing. This comes at the price of more copying, or usage of more complex data structures to palliate that. Rust becomes a functional language (OCaml, Haskell, etc.)

1 yet another chiasmus

Granted, having to produce &uniq references can be demanding, at times, but in practice not that often (otherwise Rust would not be language one could be productive with!). But let's not overlook the power, the peace of mind, that being given &uniq references, or, by contraposition, being given &shared references that happen to imply immutability of the referee, both offer.

Again, a Vec's ptr/len/cap fields are immutable when referred to by a shared reference, thus making &mut Vec<_> the only way one can append an element to the vec, potentially modifying the *ptr allocation, it immediately follows that yielding &'shared Elem references to the Vec's heap-allocated Elements when given a &'shared Vec<Elem> is something safe to do, and which no API misusage can break whatsoever.

Designing correct and robust code, that is, way less buggy code, becomes way easier in Rust thanks to that. That's why I wouldn't be surprised that the language with which mid-to-long term code is written and where the maintainers of such code are most productive turned out to be Rust (and functional languages; but that's where Rust efficiency (by not having to deal with the optimized immutable collection shenanigans from functional languages) might outshine those).

The power that comes from &uniq (and &shared often meaning &immutable) is one of Rust greatest strengths :slightly_smiling_face:

I'm tempted to inverse-quote uncle Ben, and say:

3 Likes

Could you provide an example or 2 of such analyses that would be invalidated?
Let's restrict ourselves to Rc<RefCell<>> (so that we avoid use-after-free errors), and let's make the assumption that the pointers in question are not aliased.

That is definitely incorrect. You can replace Vec<i32> with Rc<RefCell<Vec<Rc<RefCell<i32>>>>>, i.e. insert the "universal GC" at every possible point, and yet if shared mutability were allowed, it would blow up (have UB) upon reallocation due to a use-after-free.

Update: this blows up, as expected.

3 Likes

This would be the case only if the iterator were represented as an address into the reallocated buffer. That's incompatible with "universal gc" - although admittedly we haven't really defined what universal gc is. Iterators under this model are stored as indices, and the index is valid even after reallocation. I understood "universal gc" as not allowing you to take the address of things without pinning down the referenced object, or not taking the address at all.

A Vec::[Into]Iter is represented as such.

The point is, in Rust, it is not expected that you insert Rc<RefCell> at arbitrary places, even within the guts of a data structure. As such, even if you insist on doing this, you won't end up with a GC language, and you can't arbitrarily redefine or ignore what the designers of the language say about what is or isn't defined behavior.

1 Like

Let's take a step back here.

I agree (if that's the point) that you cannot implement iterators (or anything else) using direct pointers if you allow reallocation. That's pretty clear, undoubtedly.

My question is specifically what I am paying the overhead of .borrow_mut() for in a Ref<RefCell<>>. (*) If all accesses, mutable or not, go through the Ref (so no direct pointers are kept anywhere), would we agree that there's no potential for use-after-free of the heap allocated object to which the Rc refers?

That then leaves the issue of the compiler assuming that no mutable references exist (and thus being able to perform the same optimizations a C compiler could when a pointer is declared with the restrict keyword). This issue could be side-stepped with a marker trait or something similar like it's already done for some types.

My question sort of is, what else? What are other examples where things would completely break under the alternative scenario that, say, multiple calls to borrow_mut() were allowed to succeed or perhaps equivalently, if Rc<T> allowed mutable access to T (and accesses weren't optimized under the noalias assumption).

(*) I'm especially interested in that since so far no one has been able to present a solution to the alternative question I posed of how to avoid interior mutability that doesn't involve double indirection or hashing or indexing via integers, or avoiding direct references altogether.

You are paying the overhead of runtime borrow checking. You are allowed to call borrow_mut twice in a row, keeping the first mutable borrow in scope, as borrow_mut takes &self, an immutable (and thus shareable) reference. This means that your code will compile, and as the guarantee of soundness, it will panic at runtime, which in turn requires maintaining a borrow counter and a branch.

Yes. That's a very elaborate way of saying that the implementation of RefCell is sound. (BTW this has nothing to do with heap allocation or Rc; the memory unsafety demonstrated above concerns the Rc object itself, not the pointed heap allocation.)

I don't understand what you are getting at here. Declaring that a type does or does not uphold the "no shared mutability" rule doesn't actually affect what it does in reality, and does not license you to violate the language spec. The only correct way to introduce interior mutability is by means of UnsafeCell, raw pointers, or some wrapper type that abstracts over either of them (RefCell uses UnsafeCell internally).

You can get into trouble with functions that invoke user callbacks:

fn f<F:Fn()>(x: &mut X, cb: F) {
    // get some information from x, or partially update it
    cb(); // let user code do things
    // finalize changes to x
}

Under the normal Rust rules, there’s no danger of cb trying to use the partially-updated x. If you allow Rc to produce an aliased &mut X, however, the callback function may have its own reference and make unexpected changes that break the behavior of f.

Even if the intermediate code isn’t completely arbitrary, you’d have to check that every function you call while updating x is ok with seeing it in an inconsistent state, as they all might have a reference to it. If &mut is guaranteed to be non-aliased, however, there is no such danger.

3 Likes