Lockless lazily computed lookup table

I'm exploring an idea to lazily compute some lookup tables at runtime on a separate thread instead of baking them into the data segment of the executable.

The high-level idea is that the code that would used lookup tables would check if the table is there and, if it is, use it. If not, it would check if the table is being built and, if not, start a thread to build it. If the table is not there, a slower path would be taken.

I'm wondering if the check for "is the table there" can be a lockless if the table, once built, is not delocated (i.e. once built, the table stays on the heap for the rest of the lifetime of the process).

On to specific questions:

Is Option<&[T; N]> a single word? I'm thinking that it should be, considering that the reference can be a pointer without the length, since length is part of the type, and None fits into the same pointer as null.

If a thread sees that a static Option<&[T; N]> is Some(arr), is it always safe to access arr? That is, does the memory model actually match my expectation that a word written to by one thread either shows up as fully not yet written or fully written to another thread and that if the address of the heap allocation has been written to a word and the word has become visible to other threads, those threads can also see the heap-allocated memory and what was written to the heap-allocated memory prior to the address of the buffer getting written to the shared word? (std::sync::atomic::Ordering suggests that it might not be OK to expect the writes to the allocated buffer to be visible to all threads that can see the address of the buffer.)

As for the "is the table being built" check if the "is the table there" check fails, is there some kind of less-expensive-than-mutex test-and-set primitive for a boolean that makes a one-way transition? (I guess if "is the table there" is lockless, making "is the table being built" lockless doesn't really matter, but I'm curious if it can be made lockless, too.) Is compare_exchange on AtomicBool it?

You can use ::std::mem::size_of to find that out :slight_smile:

You definitely don't want to use Option for this sort of thing - even if it were word-sized, you don't have enough control over the compiler or cpu memory model to make it work for that.

Would something like lazy-static work?

If not, and you are certain that using an RW lock would be to slow, you can look up some C++11 (maybe some Rust somewhere) code for using double-checked locking to initialize a static global - this one appears correct to me, although it's strange they use fences instead of an acquire load and a release store.

if the address of the heap allocation has been written to a word and the word has become visible to other threads, those threads can also see the heap-allocated memory and what was written to the heap-allocated memory prior to the address of the buffer getting written to the shared word? (std::sync::atomic::Ordering suggests that it might not be OK to expect the writes to the allocated buffer to be visible to all threads that can see the address of the buffer.)

In general, this is not true although much hardware (and by accident, software) satisfies this. In general, loads and stores to different variables obey no ordering. You can specify that an atomic store happens after previous atomic stores (in other words, if the result of that store is visible, all prior stores are visible) with Release, and an Acquire load means that all later loads happen later in program order.

In the wiki page, the release fence is used so that storing the pointer to the singleton happens after creating the singleton, and the acquire fence means that loading the pointer to the singleton happens before reading from the singleton. Together, they mean you can't read a non-null pointer but read a partially initialized singleton. There are more formal definitions of all the orderings in the C++11 memory model that are worth reading, but the above will suffice for probably 95% of cases where you have to use them.

As for the "is the table being built" check if the "is the table there" check fails, is there some kind of less-expensive-than-mutex test-and-set primitive for a boolean that makes a one-way transition?

It really doesn't matter here, but as for things exposed to rust you could fetch-add a usize from 0 to 1. On Intel, many arithmetic operations including test and set have atomic versions. For this however, there's no reason to deal with extra flags of that sort - the pointer already is one. If it's null, the table hasn't been created and otherwise it has.

Won't that cause readers to block until the static has been initialized? (As opposed to spawning an initialization thread and taking a slow path meanwhile without blocking.)

Thank you. I take it that I need an AtomicPtr written to with Ordering::Release and read with Ordering::Acquire.

Why don't I need an extra flag? If I don't have a "being created" flag, won't multiple threads potentially see the pointer as null and kick off initialization some of which will end up with wasted compute and a leaked heap allocation?

@hsivonen take a look at Cargo's lazy cell: https://github.com/rust-lang/cargo/blob/master/src/cargo/util/lazy_cell.rs

It shows how to create such "lazy initialized" value without accounting for thread safety.

To handle thread safety properly, you can add stdlib Once to the mix: Once in std::sync - Rust

To handle thread safety properly, you can add stdlib Once to the mix:

Ah, an you certainly need AtomicPtr as well!

Yeah, it seems to me I need AtomicPtr for the pointer and Once abstracts over what I wanted to do with the atomic flag manually. Thank you.

Ahhh, if you don't want to block while waiting on this then you could replace a mutex protecting the actual object with a flag which objects take the fast path over.

Simply put, any unsynchronized memory access (i.e., any load or store that's not explicitly atomic or protected by a lock or something similar) can lead to undefined behavior. So don't do that.

Sources: Nomicon, LLVM documentation
(Note that the latter isn't technically binding for Rust-the-language, but it's relevant for what will be miscompiled by today's rustc. Also note that an LLVM undef value does not cause UB just by existing, but handling such a value carelessly can easily lead to UB by way of miscompilation.)