Global once_cell::sync::Lazy is not thread safe

I'm new to Rust and coming from a C++ background. I feel explaining what I want to do with C++ terms is easier for me.

I want to create a global Hashmap what holds many objects (pointers)

during runtime, I give a key to the hashmap and get a weak pointer to an object in return.

The hashmap has the ownership of the pointers.

I don't know how to achieve this?

The hashmap is created with

static STATES: Lazy<HashMap<&String, Arc<dyn StateFunc>>> = Lazy::new(|| {
    let hm = HashMap::new();
    hm
});

getting:

12 | / static STATES: Lazy<HashMap<&String, Arc<dyn StateFunc>>> = Lazy::new(|| {
13 | |     let hm = HashMap::new();
14 | |     hm
15 | | });
| |___^ `dyn StateFunc` cannot be sent between threads safely

StateFunc is a trait:

trait StateFunc{
    fn to_next(&self, c: char) -> Option<&Arc<dyn StateFunc>>;
}

I'm using a trait object, because I want to use it as the base class to achieve Polymorphism.

If to translate the above in C++ , I want:

class StateFunc{};
class StateA: public StateFunc{};
class StateB: public StateFunc{};
std::map<std::string, std:: shared_ptr <StateFunc>> globalHashMap{}

I only intent to use it in a single thread.

Arc requires both Send + Sync to be thread-safe itself. You can add these constraints in your trait object declaration like Arc<dyn StateFunc + Send + Sync>. Or you can make it required for all uses of your trait by declaring trait StateFunc: Send + Sync {}.

Then you might want to make STATES a thread-local instead, and use unsync::Lazy.

3 Likes

If you want a threadsafe version of this, then have a look at the lazy_static crate.

In general this is a bad idea, because 1. Global mutable state is undesirable and potentially fragile, and 2. Depending on usage patterns you can get runtime contention issues.

Why not just create a HashMap in your main fn instead, and pass that (or a mutable borrow to it) along to the rest of your code? That way you can get compile time rather than runtime checks of your accesses to the HashMap.

2 Likes

lazy_static and once_cell serve the same purpose. The error isn't crate-specific.

Lazy doesn't allow mutability. From what OP has said, it sounds like they're looking to essentially have some number of statics, but keyed. Nothing about this implies mutable globals.

Unless the title is wrong, once_cell::sync::Lazy is not threadsafe, and lazy_static definitely is.
In addition, that same title suggests mutability because thread safety is not an issue if there are only readers. Even data races aren't an issue then.

Then wouldn't OP be better off with a global, readonly, indexable array? It reduces the lookup times from O(log n) to O(1), and has better cache locality.

The title is misleading. The error is `dyn StateFunc` cannot be sent between threads safely. That is the issue — Lazy implements Send + Sync when all its components are Send + Sync. Anything that is dyn without a Send + Sync bound still has to be thread-safe to be stored in a static, unless that static is also thread-local.

Yes, it would be more efficient, assuming they're also fine with the change from strings to integers for indexing. That's not the question at hand, though.

I merely mentioned it because it seems bad form to allow someone with an X/Y problem to just continue on their path undisturbed.

I took the advice of passing it into functions, now I'm stuck at a different place:

impl Copy for Arc<dyn StateFunc>{}

impl StateFunc for StartState
{
    fn to_next(&self, s: &HashMap<&String, Arc<dyn StateFunc>>, c: char) -> Option<Arc<dyn StateFunc>>{
        match c {
            '\n' => Some(s[&StartState::name()]),
            _ => None
        }
    }
}

Here The Some() call fails, because Arc doesn't implement the Copy trait,

However implementing Copy for Arc also doesn't work, because I can't implement Copy for something that has a destructor.

What I want to do is returning a weak pointer to the heap allocation.

This will return a weak pointer:

Arc::downgrade(&s[&StartState::name()])
1 Like

Arcs are cloneable, and that does the very same thing that Copy would do if Arc also implemented Copy. So you can do this:

impl StateFunc for StartState
{
    fn to_next(&self, s: &HashMap<&String, Arc<dyn StateFunc>>, c: char) -> Option<Arc<dyn StateFunc>>{
        match c {
            '\n' => Some(s[&StartState::name()].clone()),
            _ => None
        }
    }
}

Alternatively you can likely* return a borrow to it:

impl StateFunc for StartState
{
    fn to_next(&self, s: &HashMap<&String, Arc<dyn StateFunc>>, c: char) -> Option<Arc<dyn StateFunc>>{
        match c {
            '\n' => Some(&s[&StartState::name()]), // Note the &s[..] rather than a .clone() call
            _ => None
        }
    }
}

*I haven't put this through borrowck, so it could potentially fail there.

I can see how / why your wrote that coming from C++, but in Rust

image

:smile:

So, the idea is that what C++ calls copy constructors is actually called in Rust Clone and not Copy.
Rust uses the word Copy to refer to a subset of Clone-able types: those that are TriviallyCopyable in C++ parlance, i.e., those that can be copied using a very specific compiler built-in: bitwise copies.

Rust really wants to optimize that Copy-ies are just memcpys, so one cannot plug custom code there. That's what Clone is for.

The core idea is that Rust, precisely because of how footgunny it was in C++, decided to make function calls explicit. No more of "this type mismatch led to an implicit copy and and implicitly cast which could throw an exception, instead of a type error" hell; instead, if custom copy code needs to be invoked, the programmer needs to write an explicit .clone() call. Otherwise, if you don't write anything, then that's because there is no custom copy code, i.e., that the type is Copy (see below for more info).

In a way, the image is not so bad: a Copy may be a perfect "copy", whereas a Clone involves a more hand-crafted / synthesized "copy" operation:

Rust C++
Clone Copy constructor
(and auto-generated assignment)
Copy TriviallyCopyable

So now you should realize why Copy for Arc is not possible: TriviallyCopyable for shared_ptr makes no sense!


Now, you may be wondering how to write your example code.

For that, you need to know that container[idx], in Rust, is sugar for:

*container.index(idx)
 ^^^^^^^^^^^^^^^^^^^^ Indexing operation yields a reference `&T`
^ we can (copy-)dereference from `&T` to `T` if and only if the referee is `Copy`

For types that do require custom code to be copied, i.e., for Clone-able types, we can go from &T to T by explicitly calling .clone():

container.index(idx) .clone()
                     ^^^^^^^^ go from `&T` to `T`
^^^^^^^^^^^^^^^^^^^^ Indexing operation yields a reference `&T`

i.e., ("operations * and & cancel each-other")

(&*container.index(idx)) .clone()

i.e.,

(&container[idx]) .clone()

which, by dot operator autoref, can simply be written as:

container[idx] .clone()

Hence the solution:

impl StateFunc for StartState
{
    fn to_next(&self, s: &HashMap<&String, Arc<dyn StateFunc>>, c: char) -> Option<Arc<dyn StateFunc>>{
        match c {
-           '\n' => Some(s[&StartState::name()]), // Using Copy
+           '\n' => Some(s[&StartState::name()].clone()), // Using Clone
            _ => None
        }
    }
}
5 Likes

You never-ever want to use the &String type (and generally watch out for overuse of &, because it's a false friend to C++'s references and pointers):

  • It's a needless double indirection. It's an immutable borrow of a growable string, which is entirely pointless, because immutable access prevents you from growing the string. For borrowed strings use &str (which is a single level of indirection and doesn't require tracking capacity for no reason). For owned strings use Box<str> pointer or String (which is pointer-y enough, especially that Rust has single ownership).

  • & is not a general-purpose pointer. It comes with very specific semantic: it's temporary and non-owning. In this case it means there are no strings stored in your HashMap, and all the strings are supposed to be stored somewhere else (not vaguely stored by the program on heap somewhere, but specifically by you in some variable in a larger scope).

In Rust Box and Arc are pointers. &Arc is a pointer-to-pointer, which is also temporary and bound to a scope it came from.