In theory, does `static` (global) variables 'perform' better than allocating them with a lifetime 'almost as long as' `fn main()`?

I understand that all claims on performance ought to be measured and hence backed by benchmarks.

However, I would like to ask theoretically if there are good reasons to believe that there will be performance benefits to defining static variables instead of allocating them at the start of main() function (hence, with lifetime 'almost as long as' main()).

Suppose I have these codes:

#[derive(Debug)]
pub struct Edge {
    pub(crate) from: &'static str,
    pub(crate) to: &'static str,
    pub(crate) symbol: &'static str,
    pub(crate) bid_or_ask: BidOrAsk,
    pub(crate) rate: AtomicF64,
    pub(crate) threshold: AtomicF64,
    pub(crate) stop_flag: RwLock<bool>,
}

pub(crate) static ETH_BTC_EDGE: Edge = Edge {
    from: "ETH",
    to: "BTC",
    symbol: "ETHBTC",
    bid_or_ask: BidOrAsk::BID,
    rate: AtomicF64::new(ZERO_FLOAT),
    threshold: AtomicF64::new(ZERO_FLOAT),
    stop_flag: RwLock::new(false),
};

pub(crate) static ETH_XRP_EDGE: Edge = Edge {
    from: "ETH",
    to: "XRP",
    symbol: "XRPETH",
    bid_or_ask: BidOrAsk::ASK,
    rate: AtomicF64::new(ZERO_FLOAT),
    threshold: AtomicF64::new(ZERO_FLOAT),
    stop_flag: RwLock::new(false),
};

where I know at compile time which variables are going to be created at runtime, hence I declare them as static.

Compared to:

fn main() {

    let edges = make_eth_xrp_btc_edges();
    ...
}

Assuming that the Edge struct are going to have identical fields (because well, I want to have say 4 threads updating the rate field), can we form a strong suspicion that the former codes will be faster than the latter?

I am asking because in C/C++, as statics are stored in the .bss or .text segment, their access and memory usage can be much faster (?).

(One can create 'static data dynamically via leaking in various forms.)

No, there's no intrinsic difference. Memory is memory, it doesn't matter where it is. 'static memory is not faster to access than stack memory or heap memory. "Static", "stack", and "heap" are purely man-made concepts; they are simply names given to different memory areas at different addresses. But every address of the RAM has the same access time.

Storing the state in locals may actually be better because the stack is statistically more likely to be in cache in most programs.

7 Likes

If we suppose compiling with zero optimizations, translating the abstract machine operations directly to machine code, then static has a single advantage over let here: with static, the data is already ready to use — present at a known address — when the program is loaded, but with let edges =, the data must first be copied into the stack space allocated for edges.

But, the compiler is capable of optimizing away that copy and directly accessing the static data, regardless of whether you use a let or not. More generally, lots of literal, constant data in your program is likely to be implemented as being a static value even if you haven’t explicitly asked for it to be. For example:

fn main() {
    let x = &[1, 2, 3, 4];
    let y = &std::time::Instant::now();
    eprintln!("{x:p} {y:p}");
}

When I run this program on the Playground, x and y are wildly different addresses, because x is pointing into static memory (that is, the place where the program was loaded from disk to) and y is pointing into the thread’s stack.

There are still cases where the compiler might not, though, so picking the static when you have the opportunity might be a good choice. But if you're trying to squeeze out the last drop of performance, then don’t just follow coding-style rules; profile the program to determine which parts need the most attention; make changes and benchmark them to see whether they are improvements; and study the assembly your program compiles to. Studying the assembly will also teach you about how the compiler treats programs in general.

3 Likes

I'd take this a step further and say that high-level code organization should not be based on perceived "performance" benefits, and especially not micro-optimizations. Global state is hard enough as-is; putting in more of it for "speed" is just asking for trouble.

You should structure your code with locality, ease of composition, and explicitness (when it makes sense) in mind by default; and that in turn dictates that almost all data you create, use, and update should be declared as locals and passed as arguments, instead of being shoved into a Large Implicit Environment™ which then makes it magically appear at highly non-obvious places.

2 Likes

Non-mutable statics are not global state. I agree with you that global state should be avoided, but that’s not what is being proposed here.

1 Like

I don't agree with that. Non-mutable statics can still be dynamic and depend on launch-time or build-time values, such as environment variables.

For example, I would absolutely hate to read code that is littered with branching dependent on a global static config object that was e.g. lazily initialized based on environment variables on first access. I'd much rather read code that initializes the config object as a local and then passes an (immutable) reference to functions that need some of the config.

And that wasn't what OP was proposing either. The topic of this thread was putting literal data in let vs static.

1 Like

Yes but it's irrelevant, my point still stands in general.

Also, I want to clarify that the static variables are meant to be mutable as well through interior mutability, e.g. the field .rate is AtomicF64 which is of course interior mutable.

Oh, I missed that. Thanks for pointing that out. Definitely do not use statics, then; not for performance reasons but simply because it’s bad program organization. It’ll be a lot harder to test, for example.

3 Likes

Generally no. In reality it's so deeply in "it depends" you can go mad chasing it.

RAM is very very slow compared to speeds at which modern CPUs can process data, so program speed heavily depends on making the best use of caches.

However, details of caches are complex, and varying between vendors and CPU generations. There are many levels and sizes of them, and they're designed around several heuristics, some clever, some primitive.

See for example cache associativity. There are certain worst-case access patterns and memory addresses which will keep kicking out data out of the cache, and you can't do much about this when your addresses may be literally random thanks to ASLR and non-determinism in the program.

8 Likes

There's even (theoretical) reasons why static data might be (very slightly) slower: if it's allocated you will likely get the address in an argument as a register, which can be accessed with a simple, short instruction, while a static will have the address encoded into the instruction stream (bloating the size), likely as an instruction-relative address that will need to use an additional ALU operation to compute the absolute address.

This isn't too say you should prefer heap, of course, simply that performance is weird and unintuitive.

1 Like

This looks to me like a memory layout optimization - in general, that's a class of optimization that should be entirely driven by tools, not by humans. Ideally, if you're doing this, you'd teach the compiler how to detect that main is not called by the Rust code at all (but only by the startup code), and from there how to detect values in main that can sensibly be moved from the stack frame to the program image.

Note, too, that by making a variable static, you're making the compiler's job optimizing the rest of the code harder; the compiler works on the basis of being able to prove some interesting property about your code, and thus permitting a transform that optimizes it. Moving a variable from local scope in main to global scope makes the analysis harder, in turn making it more likely that the compiler will miss an optimization opportunity.

These two are in conflict; C++ will put static consts in .rodata, but they are then read-only and cannot be mutated without UB. If you're going to mutate them, they have to go in .data if they have non-zero default values, or in .bss if an all-zeroes bit pattern is an acceptable initializer.

.rodata is read-only (and on at least Windows, macOS and Linux, your program will raise an OS-level fault if you try to modify it without complicated trickery to remove the read-only protection). .bss and .data are both read-write, with .bss being used for things that can be zero-intialized, and .data for things that have an initialization value. All decent compilers, including Rust's compiler, will put things in .bss whenever possible, only using .data when an all-zeroes bit pattern is not the correct starting value.

6 Likes

Um, I don't know about PE or Mac OS, but you surely mean .rodata? .text is for executable code, not const data. (Also the loader doesn't really care about these sections, in ELF there is a separate table for which offsets and lengths should be mapped with what permissions, the names are just a convention used by the toolchain, for the actual executable they don't matter, though the dynamic loader may make use of some specific sections, but I don't know if it does that by name or otherwise.)

1 Like

I do mean .rodata on modern systems, yes. Thanks for the correction - when I learnt about this stuff, .rodata didn't yet exist, and compilers used .text for read-only data.

3 Likes

Visual studio emits .rdata instead of .rodata, but otherwise it's identical behavior on Windows with PE, though the thing doing it is actually conventionally called "the loader" there as the OS handles both dependency resolution and memory mapping (and resource lookup and a million other things)

there is actually an important case where static data has better performance, which is down to a singular difference.

for extremely large in-line (i.e. not behind a pointer or some other form of indirection) values, a significant amount of time will be wasted copying the data onto the stack. it is partially for this reason that include_bytes! has type &[T; N] instead of [T; N].

however, immutable static variables will be initialized as soon as the program is loaded into memory by the linker, and for some embedded systems with a memory-mapped ROM, you don't even need to incur that cost.

basically, unless mem::size_of::<T>() gives a value over several thousand, it's totally fine[1] to just put it in a stack variable.


  1. and usually preferable, due to cache and organizational concerns ↩︎

1 Like

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.