Is RefCell a zero-cost abstraction

godmar · July 5, 2021, 4:41pm

2e71828:

godmar:
Or, perhaps, an alternative design would be (that's a question)
struct X {
mutablefield1: Cell<T1>,
mutablefield2: Cell<T2>,
...
}
and then use a `Rc<X>` ? In which case each of the cells could be assigned individually with `.set()` etc., and a direct `&mut T1` is never obtained?
This is indeed a viable pattern and behaves as you describe.

I will think about this some more. I would like about this pattern that I could make certain fields what in C++ would be const and follow a "immutable by default" style - basically, all fields that aren't Cells would be quasi-const automatically, something that's if I understand it correctly is not possible with the current design of Rust (I seem to recall that fields in a struct were immutable unless you specified mut when I did my first attempt at Rust years ago, but I think this design was changed.)

ps: thinking about this approach some more, though, it would mean I couldn't have any fields that aren't themselves mutable, but require mutability for access, such as an embedded HashMap<> which must be borrowed mutably. But it may work for some cases.

H2CO3 · July 5, 2021, 6:05pm

It is first-class. You can in general do anything with an Rc you can do with any other type. It is not a compiler builtin, which is probably for the better.

What exactly do you mean by "leak"?

I'm confused. Memory safety doesn't need to be "provided" for all types and values, one-by-one – it's built into the language in the absence of unsafe, and user code can't (modulo bugs) break it. Accordingly, Rc doesn't break it, either. Rc is memory safe.

Because the need for them is rare and they have alternatives for when dynamic borrowck and refcounting are an absolutely unacceptable cost. Therefore the issue you are bringing up is mostly a non-issue.

bes · July 5, 2021, 6:59pm

You keep mentioning Swift as an example where memory management "just works" (my words), but there are several pitfalls in Swift

"Conflicting access to memory" as described by their own documentation Memory Safety — The Swift Programming Language (Swift 5.7)
There is no built-in thread safety in Swift, even when using GCD. Rust is on a whole other level.

The amount of extra work I need to do to make my iOS app work Correctly, is ridiculous. Working in Rust is just a pleasure because I know memory safety and threading just works.

I don't have the technical knowledge to answer your specific Rust questions but I know "how Swift does it" isn't the answer.

godmar · July 5, 2021, 7:34pm

It's my understanding that both Rc<T> and RefCell<T> are single-threaded abstractions, so the reference to thread-safety does not apply.

godmar · July 5, 2021, 7:46pm

It can be dereferenced, as was explained by 2e71828 earlier in this thread.

When it is dereferenced an immutable &T pointer results, which can be passed on and used in the scenarios described by some earlier in this thread, and then the argument/worries about whether this pointer is aliased or not apply when it comes to memory safety.

If a smart pointer were built into the language (like in Swift or Python, for that matter), it would be impossible to extract a &T from a BuiltinRc<T>, so the issues would not arise.

That is good to know. To be clear, I'm not concerned about the refcounting cost (which I'll gladly pay to avoid use-after-free), just the dynamic borrowck cost on top, which appears to have marginal benefit and it's a cost I seem to have to pay only due to the existing design of Rc<RefCell<T>>.

Now I'd like to learn what those alternatives are so I can use them.

H2CO3 · July 5, 2021, 7:58pm

The "marginal" "benefit" is that you can still have shared mutability in a controlled, memory-safe manner. It's not really "marginal", and it's not really a "benefit"; it's a rare but absolute necessity when shared mutability is needed for some reason.

Most of the time, the best alternative to shared ownership is unique ownership. For example, you'll have much less trouble with an encapsulated, synchronized pair of (Vec<String>, HashMap<String, usize>) if you need a bidirectional, mutable key <=> index mapping, for example, instead of (Vec<Rc<str>>, HashMap<Rc<str>, usize>).
For complex graph datastructures, which is when Rc usually arises, you can just use indices or map keys, maybe with a strongly-typed wrapper like NodeId(usize) or NodeId(String).
If neither of these are sufficient, which is an exceedingly rare circumstance (I would go so far as saying it only ever applies to implementing a core, super optimized data structure), then you can drop down to raw pointers and unsafe.

skysch · July 5, 2021, 8:33pm

Rc and RefCell enforce their single-threadedness using the type system. The fact that you cannot blindly substitute them for Arc and Mutex is what makes them so great. If the language had one set of these 'builtin' you'd suffer for having to use the wrong one when the other is more appropriate.

jameseb7 · July 5, 2021, 8:34pm

The issue is that an Rc represents shared ownership and can itself be cloned, so it doesn't matter if the reference is passed on in any case, since there is already the potential for aliasing. Making it built-in won't change that, it would just move much of what is in the standard library into the compiler instead. There are no worries about memory safety though, because both Rc and & references are explicitly designed to be shared. It's for that reason that Rc doesn't implement DerefMut, which would allow for obtaining a &mut reference, since &mut references are required to be unique. Rc thus ensures safety by design.

godmar · July 5, 2021, 8:55pm

I'm not sure what you're referring to by "shared mutability in a controlled manner,"
because it was my understanding that RefCell's .borrow_mut() doesn't support shared mutability - only one mutable reference can be obtained or else the thread panics.

I called the benefit of dynamic borrowck on top of a hypothetical enhanced BuiltinRc<T> "marginal" because:

(a) my program runs successfully when using .borrow/.borrow_mut, so shared mutable references don't actually arise

(b) even if there were attempts to acquire multiple mutable references, the consequences would become logic errors, not safety errors --- again, this is not true under the existing design, but under the hypothetical design where the compiler understands the nature of the smart pointers. (*)

With respect to the alternatives you described, I think - I hope I'm not misrepresenting this - they match the ones previously recommended and all involve one form or another of indirection. This can be extremely low overhead indirection (as in the use of indices to which you refer in 2.), or indirection that could become more costly on hotter paths, such as the use of map keys. In many ways, these techniques amount to avoid storing long-lived pointers (which is a bit ironic given the thrust of Rust's design, don't you think?)

I do not plan to explore (3.) the use of raw pointers and unsafe, at least not yet.

I'm a bit unsure how certain you are about statements such as exceedingly rare and how opinionated they could potentially be. I thought Rust was a relatively young language, and doesn't have many libraries (like Boost, for instance) that have been engineered and optimized for a long time. But I'm obviously no expert here.

(*) btw, I'm not suggesting that Rust's existing design be scrapped and replaced with smart pointers, or that there exist only one universal thread-safe smart pointer type as in Swift; still, I view adding a smart pointer type as I envision it as a potential useful addition to the language in cases where it is appropriate.

alice · July 5, 2021, 9:03pm

The reason we call it shared mutability is that you can have multiple references to it (i.e. several Rc<RefCell<T>> objects), and you can mutate it through any one of them. The fact that .borrow_mut() enforces that these mutations borrow it in non-overlapping regions is not in contradiction with the fact that the access to the value is shared.

H2CO3 · July 6, 2021, 12:26am

No, I don't think it is ironic at all, and I ask you not to hijack the discussion towards unrelated and technically or logically unfalsifiable claims and concepts from æsthetics such as "irony".

Again, needing shared mutability is an exception, not the rule. I do not consider it bad for the language to provide an escape hatch in the case when the basic design principle isn't completely satisfactory. Certainly it's a lot better than everyone writing unsafe left and right. It's a pragmatic choice that tries to give most people useful tools to work with. The goal is not to design a silver bullet, a 100% pure pradigm, which is perfect all the time and solves all problems by itself. The goal is instead that a higher general quality of software be attained by using general, easy-to-conceptualize rules of thumb such as exclusive mutability, and to still make other designs possible if necessitated by real-life problem solving.

Your sarcasm is not welcome. My comments are based on several years of experience with writing Rust, and accordingly, I continue to stand by my claim of the need for interior mutability being vastly less frequent than the problems that can be solved without it. Furthermore, I'm not the only one claiming this – you can ask many other Rust programmers about interior mutability, and you will
get similar opinions. Of course, some applications may need to use more interior mutability – this is typical of safe wrappers around unsafe C libraries, for example. But those are, again, somewhat advanced use cases which many Rust programmers don't encounter or need to think about often (or at all).

You continue to allude to Swift's design. First of all, I would like to point out that Rust is older than Swift (Rust was conceived in 2007, while Swift started in 2010
and was only publicly released in 2014). Second, the age of a language has very little to do with its technical merits. Various flavors of multiple decades old languages such as Pascal, Basic, and COBOL are still used in the industry, yet very few programmers would cite those languages as an example of great language design. They have had their place, and have largely been superseded by design principles which have subsequently been proven superior.

As of writing this post, crates.io features more than 60 thousand crates with more than 7 billion cumulative downloads.

The most important libraries, including std, and things like serde, libc, rand, syn, quote, just to mention a few, are under active development and maintenance. Many of them have already been around pre-1.0, and yet many others are developed by members of the core Rust teams. To dismiss their collective efforts sounds downright insulting to me, although at this point I have to try very hard to keep myself from speaking for them.

godmar · July 6, 2021, 12:27am

There was no sarcasm.

godmar · July 6, 2021, 12:48am

BTW, I was using Swift just as an example I thought this audience might be familiar with. It works the same in Python or other languages that use automatic reference counting.

The original reference to reference counting is George Collins' 1960 paper: A method for overlapping and erasure of lists.

quinedot · July 6, 2021, 12:52am

The uniqueness guarantee is at the center of Rust's memory safety, lack of data races, and concurrency safety guarantees. The performance edge isn't the primary motivator, and in fact, being sure you're preserving those qualities is so desirable as to warrant a potential performance hit (e.g. due to indirection) for most use-cases. From this vantage, there is no irony at all.

(Incidentally, once you're used to thinking in an ownership-oriented fashion, you'll feel much less bent out of shape.)

Yandros · July 10, 2021, 3:51pm

Note that we agree on this: the issue with pervasive &cell references everywhere (resulting from losing &uniq references in the language, like all the other languages do) is indeed that we already lose many immutability guarantees. While the most dangerous bugs stem from concurrent mutations, sequential mutations already suffice to introduce some bugs or make the code a bit more complex to reason about. So we both agree that having guaranteed immutability is a win, there. But you must acknowledge that without &uniq being the keystone abstraction for mutation, the mutations can happen from any shared & reference, which means that unless extra abstractions were baked into the language, &immut would cease to exist. Here is a diagram showcasing a bit what I mean:

The best idea behind's Rust great design was to realize that the whole left arm of that diagram could exist, and that a whole language could be built atop such a foundational design.

(That, and the whole concept of lifetime, are the two keystones of Rust; and funnily enough, they were made to coexist).

No other language that I know of has this concept of unique reference / handle built into the language, in a way that is both checked by the compiler at construction time, and potentially exploited by the compiler on the receiving end (e.g., LLVM's noalias-based optimizations, which, incidentally, have been quite buggy since no other language (including C and C++) has managed to actually feature a proper pervasive use of them, leaving the feature undertested for years).

Granted, the right arm is easier to work with from a producer of references point of view, and it's actually quite nice that Rust can still feature them, with a proper expression of most of their capabilities baked at the type level (e.g., them being non-Sync unless the type is to pay an extra runtime cost to ensure synchronization / data-race-free mutations (although it does not guarantee lack of race conditions )). But working with these already disables some compiler optimizations, as you very well mentioned, and thus induces other runtime costs: during all the posts in this thread, you have claimed that RefCell could be removed in favor of a stronger garbage collector and thus extra usage of gc-managed pointers. While I can't refute your argument (I do believe that garbage collectors have had such a gigantic amount of engineering and research baked into them that they have become wonderful pieces of technology, even performance-wise), that whole talk is merely replacing one runtime cost (in the case of RefCell, that of checking and aliased mutable flag), with another runtime cost (the whole gc runtime). I don't have the knowledge nor the tests to quantify and actually compare both costs, and while you may have that knowledge (if that's the case, you ought to provide sources backing your claims), you definitely not seem to have the benchmarks / measurements either.

This, in my opinion, makes the talk about sheer runtime performance / cost a bit futile, and is why I won't further engage in that direction, since I don't have the time or motivation to setup proper benchmarks and measurements: neither can I confirm that you're right, nor can I claim that you're wrong. But neither can you. If you want to continue investigating this thought experiment, which is indeed interesting, do try to provide some measurements

I will, from now on, discuss a bit more about &cell, which I mentioned a few times, since it's indeed a very interesting avenue, and yet nobody asked about it. It's also kind of related to that diagram above. To recap that diagram and Rust's design, there are three kind of references in Rust:

&unique and thus mutable references, dubbed &mut by Rust:
unique mutable
& (shared) references to types featuring no interior / shared mutability:
shared immutable
& shared and yet mutable references to types thus necessarily featuring interior mutability:
shared mutable

When you look at their summarized names, one interesting thing to observe, and which the current best theoretical model for Rust aliasing rules, Stacked Borrows is based on, is that while &Cell< … > is a shared mutable reference, if one were to write the following program:

let mut x: i32 = 42;
let p: &mut i32 = &mut x;
// C-like code style; not worrying about `&unique` refs:
let p1: *mut i32 = p;
let p2 = at_x;
unsafe { *p1 += 1; }
unsafe { *p2 = 0; }
assert_eq!(x, 0);

is valid / well-defined code. This means that p1 and p2 are actually imbued with the shared mutable semantics of a &Cell< i32 >.

And this can actually be showcased by safe Rust by writing the very same program without unsafe:

use ::core::cell::Cell;

let mut x: i32 = 42;
let p: &mut i32 = &mut x;

let p1: &Cell<i32> = Cell::from_mut(p);
let p2: &Cell<i32> = p1;
p1.set(p1.get() + 1);
p2.set(0);
assert_eq!(x, 0);

This is a very interesting observation, especially if we were to rename the things as follows:

Cell → Mut,
mut x → uniqueable (I know, the name is ugly, but those are the semantics of mut on a binding);
&mut … → &uniq …

use ::core::cell::Cell as Mut;

let uniqueable x: i32 = 42;
let p: &uniq (i32) = &uniq x;

let p1: &Mut <i32> = Mut::from_unique(p); // Downgrade the ref
let p2: &Mut <i32> = p1; // can copy `&Mut` references!
p1.set(p1.get() + 1);
p2.set(0);
assert_eq!(x, 0);

I personally find such renaming to be quite enlightening: we observe that Rust never forced the mutable references to be unique, it only favored, syntactically, the &unique-based references to be the ones to feature the most ergonomic mutation. But nothing prevents one from "downgrading" / letting go of the uniqueness properties of the &uniq reference, and just keep the mutable properties of it: one can get a &Mut <_> our of a &uniq (_).

Once that is done, since the ship has already sailed and we won't be naming things &uniq anymore, to remain compatible with the current naming, the only name available for that Cell::from_mut() transformation, if rather than being just a library construction, it were blessed by the language, would be &cell references:

let mut x: i32 = 42;
let p: &mut i32 = &mut x;

let p1: &cell i32 = p;
let p2: &cell i32 = p1;
p1.set(p1.get() + 1);
p2.set(0);
assert_eq!(x, 0);

The advantage of making these references first-class citizens by promoting them to the language level would be to be able to easily let the language perform the most needed feature that Cell's API is currently unable to do on its own: structural / to-fields projection. That is, that if you have a &cell (i32, u8), you should be able to have a (&cell i32, &cell u8), or that a &cell [T] ought to be equivalent to a &[Cell<T>] (this one is featured in the standard library). In the meantime, we have a user-level / third-party crate and macro to do this for us:

TL,DR

I do believe that &cell references –which already exist in Rust except for the currently lacking sugar– do feature the non-required-unique and with-no-runtime-check mutable reference semantics you are asking for;
except of course, for &cell playing quite poorly with lifetime-tracked / compile-time-tracked pointers and more complex reallocating structs (e.g., featuring both a fn push: (self: &CellVec<Item>, Item) and a fn get (self: &'lt CellVec<Item>, usize) -> &'lt Cell<Item> obviously wouldn't work).
But that last point can indeed be handled by using runtime mechanisms to keep "sub-pointers" alive (e.g., a fn get (&'_ CellVec<Item>, usize) -> Rc<Item> could very well work despite a CellVec::push (although it wouldn't be thread-safe without extra locking mechanisms, of course)). In the case of a garbage collector, this would mean replacing Rc<_> with the Gc<_> pointer of some flavor of a garbage collector.
So doing would mean replacing the overhead of a RefCell in, for instance, a RefCell<Vec<T>> (and other API / semantical changes) with the overhead of almost systematically Rc<_> / Gc<_> - wrapping each and every item used by such more complex / convoluted &cell-mutable collections (such as that CellVec), and it is not at all clear whether that change of runtime costs is necessarily a win.
This change thus opens the way to making it more difficult to identify which references / handles point to truly immutable data, and which don't. In practice, once some accesses are &cell / aliased-mutable, then all the accesses need to account for mutations happening under their very feet (immediately hindering compiler optimizations, by the way), which paves the way for bugs in the code.
More generally, a lot of code out there does not need those aliased-mutability (e.g. &cell) shenanigans, and, on the contrary, clearly benefits from being able to define and feature an API based off precisely chosen &immut and &uniq references, which get to be thread-safe without any extra runtime-cost, not need a gc, and even sometimes potentially open the door to more advanced compiler optimizations.

adamreichold · July 12, 2021, 9:05pm

I think one important point that was mentioned earlier

does imply a downside to adding such a built-in smart pointer to the existing language, i.e. it would be contagious. When being handed a BuiltinRc<Something>, it would only allow calling functions on Something and its internals that accept BuiltinRc<T> themselves, but never &T or &mut T thereby requiring implementing all of the involved data structures using BuiltinRc and paying the cost for the heap allocations and dynamically checking for use-after-free even though for example only the top-level allocation actually needs it.

I think one of the most useful insights from the design of Rust's ownership semantics was that controlling for shared mutability is actually the same as controlling for data races. So while Rc and RefCell explicitly target single-threaded usage, the general question of what is gained by upholding Rust's mutable-xor-aliasing rules does always also involve the concerns of multi-threaded applications.

godmar · July 13, 2021, 7:17am

I think when people read that Rust supports reference counting, they assume that they can program in a style that they are used to from languages such as Python if they wish. Perhaps someone like the hypothetical Alan in the async Rust vision document?

Based on my understanding now, this style is indeed possible in Rust by using Rc<T> throughout for all structs T, and by wrapping all fields in a Cell. It doesn't need RefCell<T>. Like in Python, in this style, you can't take references to anything (and you wouldn't need to). (This means struct/slice/etc. projection is not necessary.) Like in Python, it's effectively single-threaded. Unlike in Python, you don't have an extra garbage collector besides the reference counting machinery to deal with cycles of unreachable objects.

One could create a simple language with Python-style assignments and syntax and compile it to Rust, I think.

A question then is whether Rust should provide more direct support for this style of single-threaded programming (given that forgoing direct references likely costs performance, as Yandros surmised above). The current style when written in Rust would be awkward and verbose (requiring .set, .get, Cell::new(), etc. etc.).

It's perhaps a possible thought experiment. What if, for instance, the compiler saw an assignment to a field that would be rejected under the borrow checker, but where the "cellified" version of it would not. Should it replace, or offer to replace, the field with a Cell-wrapped field and replace the assignments and accesses with get/set if that compiled? It's my understanding that this could lead to identical code.

And similarly, perhaps the option of wrapping in an Rc<> could be similarly offered. There is previous work on automatic program enhancement in response to failing to compile.

Lastly, with respect to whether Rust proper could or should be changed to support some of these fundamentally non-rusty styles in one way or another, I'll share something I read that surprised me, which is that - apparently - the Rust compiler already gives special built-in treatment to Rc as it is a language item. (Now I don't think it's feasible or fruitful to try to change the actual Rc as it currently is in the way I discuss above, but I do find it interesting that there's no dogma against adding to the language or even changing the language when needed.)

2e71828 · July 13, 2021, 7:27am

The only special treatment I know of is that Rc<Self> is a valid self type in methods.

godmar · July 13, 2021, 7:30am

I actually spent almost 10 minutes trying to find a link to the page where I read (exactly) this, but was only able to find the lang items page. There's a page somewhere that lists box, Box::new, and also Rc. I came across it when trying to find information about the box keyword. But the Rust documentation currently is in a state that makes googling and refinding information somewhat difficult.

ps: finally found it ("googled: rust rc receiver type box"). The section name is "Special Types and Traits"

Yandros · July 13, 2021, 9:54am

[Meta] By the way, @godmar, thanks for creating this thread, I find the whole discussion that has followed (and is following) quite interesting

Indeed, that's a very legitimate question. I think that you may find the following blog post(s) relevant:

My own "rushed" two cents (by rushed I mean I haven't fully thought about all the implications):

as you mentioned, perhaps some of these types such as {A,}Rc and Cell fields could be slightly more lang-ified (not in the sense that they're given special super powers (like Box currently has), but rather, in the sense that the language may sometimes unsugar to constructions using these types (that is, lang items are, in a way, the part of the core/standard library the language itself is allowed to use (≠ the hard-coded path / magic elements strategy used in other languages). A nice example of this is Option::Some(…); it needs to be usable by the language for the unsugaring of for <pat> in <iterable> { … } into match <iterable>.into_iter() { mut iter => while let Some(<pat>) = iter.next() { … }})
together with a standard library extension providing Cell-friendly collections, such as that CellVec I mentioned in the previous post (renamed to AliasedVec within what follows),

then, we could imagine some kind of compiler preprocessor that would convert .smallrs files into their equivalent .rs files, thus allowing, both interop with classic Rust codebases, and simplifying writing some code:

//! example.smallrs
pub
fn cumulative_sums_inplace (v: AliasedVec<i32>)
{
    mut s = 0; // or `var s`, etc.
    for x in v {
        s += x;
        x = s; // <- unsure about the syntax here…
    }
}

which would unsugar into something like the following Rust code:

use ::core::cell::Cell as Mut;
/// `Gc<T>` is like `Rc<T>`, except with an actual (single-threaded)
/// garbage-collector runtime capable of handling cycles.
use ::some::gc_impl::Gc;

pub
fn cumulative_sums_in_place (
    v: Gc<AliasedVec<i32>>,
)
{
    //      this alloc could be skipped if known
    //      not to escape the local scope
    //      vvvvvvv
    let s = Gc::new(Mut::new(0));
    //                            this clone could be skipped if
    //                            known to be the last owned use.
    //                            vvvvvvvvv
    for x /* : Gc<Mut<i32>> */ in Gc::clone(&v) {
        s.set(s.get() + x.get());
        x.set(s.get());
    }
}

And, obviously, .smallrs would allow, much like Python and other languages do, "fun" things like:

fn silly (v: AliasedVec<i32>)
{
    for x in v { v.push(x); } // probably an infinite loop
}

By the way, most of these ideas could be implemented without official support, except for the required knowledge of the types in question: super proc-macro, or build.rs, or external CLI tool could take care of performing this rewrite. But I lean towards the external CLI tool, implemented as an experimental Rust plugin, in order to get access to proper type knowledge and whatnot.

Topic		Replies	Views
Why do all docs say RefCell is bad?	30	10109	April 23, 2020
Inversion-of-control resource management	86	13364	October 16, 2021
Zero cost abstraction or misunderstanding? code review	18	1999	January 16, 2022
Motivating example for `RefCell<T>` help	21	3345	January 12, 2023
Borrow/move/closure symantics are driving me to my wit's end help	61	5126	January 12, 2023

Is RefCell a zero-cost abstraction

TL,DR

Related topics