Lazy initialization vs. interior mutability

wduquette · July 27, 2019, 4:18pm

I have a type with a string value that is computed in lazy fashion; but once it is computed, it never changes. I've got an implementation, but I'm wondering whether there's a more idiomatic (or simply more clever) way of doing it.

I'm currently using an implementation like this:

pub struct Value {
    inner: Rc<RefCell<InnerValue>>,
}

struct InnerValue {
    string_rep: Option<Rc<String>>,
    ...
}

impl Value {
    pub fn as_string(&self) -> Rc<String> { ... }
}

Either the string exists when the Value is created, or it's computed and saved by a method the first time it's wanted. Clients need to be able to retrieve the string_rep on demand.

I'm not entirely happy with this implementation; it seems like I've got a whole lot of allocation going on, and it seems overly complicated. In particular, I've got that second Rc in Option<Rc<String>>. That's because I don't want the use of interior mutability to appear in the API (semantically, the Value is immutable). Rather than letting a stray Ref escape, the retrieval method calls RefCell::borrow, clones the Rc<String>, and returns it.

What I'm wanting is something where I can still compute and return the string_rep in lazy fashion; but I can provide a method like this:

impl Value {
    // I.e., the lifetime of the result is the same as the lifetime of the `Value`.
    pub fn as_string(&self) -> &str { ... }
}

Is there a canonical way to do this that doesn't involve RefCell? Is there a simple unsafe solution?

Yandros · July 27, 2019, 4:24pm

Indeed, once the value is known to have been initialized (e.g., no None in InnerValue), RefCell checks are superfluous.

For a manually checked version of RefCell, there is UnsafeCell (on top of which RefCell is built).

For this particular pattern, however, there is already a crate that does the unsafe for us: ::once_cell

wduquette · July 27, 2019, 4:34pm

I'll take a look at once_cell. Thanks!

wduquette · August 3, 2019, 5:14pm

This worked very well. I started using the ::once_cell crate, and then converted over to using UnsafeCell<Option<String>> directly. The UnsafeCell is created with either Some, or None; and it is queried in exactly one method which returns it if Some and sets and returns the string if None.

Yandros · August 3, 2019, 6:24pm

Yes, as long as you don't unsafe impl Sync or something like that it should be sound (hard to tell without seeing the code)

wduquette · August 11, 2019, 3:06pm

You can find it in here, if you're interested:

github.com

wduquette/molt/blob/master/molt/src/value.rs

//! The Value Type
//!
//! The [`Value`] struct is the standard representation of a data value
//! in the Molt language.  It represents a single immutable data value; the
//! data is reference-counted, so instances can be cloned efficiently.  Its
//! content may be any TCL data value: e.g., a number, a list, a string, or a value of
//! an arbitrary type that meets certain requirements.
//!
//! In TCL, "everything is a string": every value is defined by its _string
//! representation_, or _string rep_.  For example, "one two three" is the string rep of a
//! list with three items, the strings "one", "two", and "three".  A string that is a
//! valid string rep for multiple types can be interpreted as any of those types;
//! for example, the string "5" can be used as a string, the integer 5, or a list of one
//! element, the value "5".
//!
//! Internally, the `Value` can also have a `data representation`, or `data rep`, that
//! reflects how the value has been most recently used.  Once a `Value` has been used
//! as a list, it will continue to be efficiently used as a list (until it is used something
//! with a different data rep).
//!

This file has been truncated. show original

But I'm an old C programmer. I like not having to worry about safety, but I still know how if I need to.

Yandros · August 11, 2019, 5:09pm

github.com

wduquette/molt/blob/8b0105ccca962aa4bb8ab10d7cebe75d0b69aff8/molt/src/value.rs#L376-L395


      
          pub fn as_str(&self) -> &str {
              // FIRST, get the string rep, computing it from the data_rep if necessary.
              // self.inner.string_rep.get_or_init(|| (self.inner.data_rep.borrow()).to_string())
          
          
    // NOTE: This method is the only place where the string_rep is queried.
              let slot = unsafe {&*self.inner.string_rep.get()};
          
          
    if slot.is_some() {
                  return slot.as_ref().expect("string rep");
              }
          
          
    // NOTE: This is the only place where the string_rep is set.
              // Because we returned it if it was Some, it is only ever set once.
              // Thus, this is safe: as_str() is the only way to retrieve the string_rep,
              // and it computes the string_rep lazily after which it is immutable.
              let slot = unsafe {&mut*self.inner.string_rep.get()};
              *slot = Some((self.inner.data_rep.borrow()).to_string());
          
          
    slot.as_ref().expect("string rep")
          }

(Context: string_rep: UnsafeCell<Option<String>>)
this is indeed sound, good job; but it relies on Value not being Sync (else there could be a data race if two threads attempted to initialize it). I recommend you mention this fact somewhere in the NOTE:s, to "prevent" someone from adding an unsound unsafe impl Sync for Value {} later on.
- Or even better, you could go and add an assert_not_impl!(Value, Sync) to explicitely make compilation fail if someone were to add such impl.

if slot.is_some() {
    return slot.as_ref().expect("string rep");
}

you can avoid this usage of .expect:

if let &Some(ref inner) = slot {
    return &**inner;
}

or if you prefer to let Rust do stuff under the hood, you can just do:

if let Some(inner) = slot {
    return inner;
}

wduquette · August 11, 2019, 5:16pm

Thank you very much for taking the time to give me some style notes! Writing idiomatic Rust takes time to learn, and I'm still very much learning.

Regarding Sync, I'll see about adding the assert_not_impl. In fact, Value is part of a much larger system that isn't Sync either; the usual thing in multi-threaded TCL programming is to keep each TCL Interp in its own thread and send TCL commands back and forth. I suppose it would be possible to make the Interp Sync if I worked at it....

system · November 9, 2019, 5:24pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Is this a safe use of UnsafeCell? help	14	1696	January 12, 2023
Lazy initialization help	16	5625	January 12, 2023
Less awkward use of RefCell help	2	1093	January 12, 2023
After more than half years of rust in production i still dont understand how lazy_static works. What am I doing wrong?	4	2599	November 25, 2019
When Does Interior Mutability Make Sense? help	18	1482	May 16, 2023

Lazy initialization vs. interior mutability

Related topics