Statics inside functions are fun

I've been learning rust as a hobby for 2 years now. I don't actually code much in it, but I like to read a lot about it and sometimes help a friend when he as questions. I'm weird like that.

Because of that, I only realized this week that statics inside functions cannot be backed by new data each time the function runs, because how would that be realized? Statics on the heap?

So they are just like global statics but only accessible from the functions scope. After realizing that I had some fun writing a memoized fibonacci function with mutable statics. Definitely unsound when used in multiple threads. But hey. Who cares about soundness in the rust community, am I right?

Maybe it will give you a chuckle or maybe you even didn't realize this yourself.

fn main() {
    #[inline(never)]
    fn fib(n: u32) -> u64 {
        static mut CACHE: [u64; 1_000_000] = [0; 1_000_000];

        if n == 0 || n == 1 {
            return 1;
        } else if unsafe { CACHE[n as usize] } != 0 {
            return unsafe { CACHE[n as usize] };
        } else {
            unsafe { CACHE[n as usize] = fib(n - 1) + fib(n - 2) };
            return unsafe { CACHE[n as usize] };
        }
    }

    dbg!(fib(92));
}

(Playground)

Errors:

   Compiling playground v0.0.1 (/playground)
    Finished dev [unoptimized + debuginfo] target(s) in 0.62s
     Running `target/debug/playground`
[src/main.rs:16] fib(92) = 12200160415121876738

2 Likes

Interesting... You can deal with the multi-threaded problem using thread local storage. Not 100% sure on the safety but should prevent data races.

use std::thread_local;

thread_local! {
    static mut CACHE: [u64; 1_000_000] = [0; 1_000_000];
}
1 Like

Ah nice. I actually didn't know this. Thanks :slight_smile:

Does anyone know where thread_local!{} puts its memory? Is there a per thread static section?

1 Like

or… just use AtomicU64? (playground with minimal changes)

Edit: If the redundant .loads are eliminated, the (optimized) assembly (x86) seems to even be the same as the static mut version.

9 Likes

Yeah yeah, I know :slight_smile: Although, I have to say it's way more fun to post a suboptimal solution and see what people come up with^^

1 Like

What does the 1.. part mean in the code below?

else if let cached@1.. = CACHE[n as usize].load(R) {
        cached

I guess it is somehow an if cached >= 1 but I've never seen this.

EDIT
Ahhh damn, it's just a pattern, just like in a match statement. Holy cow. That's neat.

3 Likes

In that case, your cache is way too big as you've demonstrated by showing the largest non-overflowing result. Just pregenerate and store all the non-overflowing values. :slightly_smiling_face:

6 Likes

I would like to know this too. I think my understanding was that thread locals are kinda both delegated to LLVM and handled there in a platform-specific way? But really that's more of a guess than something you should quote me on, I was having trouble figuring it out and would be happy if someone who knows could tell me.

Yes, thread locals are an OS service like threads in general. For example, in pthreads you use pthread_key_create to create a "key" or id for a new thread-local, and then use pthread_setspecific and pthread_getspecific to access its value, and each thread sees a different value associated with the same key. The thread implementation maintains some suitable shared map-like data structure that's keyed by both thread id and the threadlocal key. Presumably you could implement a similar thing purely in user code, I don't think it requires any OS-level magic in principle.

2 Likes

Practically this wouldn't be implemented “purely in user code”. You need at least some difference between threads, or else you couldn't do that.

And, of course, most OSes provide you with “thread id” which you may, then, use to implement thread-local storage.

But the exact same kernel component that gives you “thread id” also gives you thread local storage!

So it's not “purely user-space implementation”, it “implementation where we take the complicated tool liked chainsaw and use it like a sledgehammer”.

Sometimes this actually makes sense, e.g. if you want to create shared library which is entirely divorced from libc (so you can use it easily on GLibC platform, on Musl platform, on Android/Bionic platform, etc).

Most of the time that's bad idea, though.

Specifically, the OS ABI guarantees a structure hanging off a register reserved for per-thread information.

In Windows x64 for example, this is the TEB structure and is pointed to by the gs register.

On Linux, it's the TCB pointed to by FS, though things are a little more complicated there due to libc getting involved.

This is a pretty good rundown of what's involved A Deep dive into (implicit) Thread Local Storage

1 Like

Wow. This is cool. Thanks for sharing :wink:

3 Likes

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.