Best practice on defining static variables which implements Copy

Let's say we define the following global static array. Is it possible that the static array MY_ARR get copied implicitly when I use it?

static MY_ARR: [usize; 5] = [1, 2, 3, 4, 5];

If I want to make sure that the static array is never implicitly copied, should I make it a static reference instead?

static MY_ARR: &[usize] = &[1, 2, 3, 4, 5];


No. Optimizations may cause the array to be copied even in this case, if the compiler deems copying to be better than referencing, for any reason.

Generally though, you shouldn't care whether a Copy value is copied or not, especially such a small one (yours is 20-40 bytes). The definition of Copy is that bitwise copies behave identically. If they don't, then, well, you've got a problem.

I used a very small array here for illustration. But if the static array is a very large one and I absolutely do not want it to be copied accidentally in any means, do we prefer to define static reference?

Still no.

First of all, you can't "absolutely" prevent copying, as per the reasons given above.

Second, there's no material difference between the two declarations apart from

  1. the superficial syntactic one (whether you need to dereference the static in order to access the value)
  2. the difference in types (the slice implies a "dynamic" size, i.e., the type doesn't change with the number of elements).

However, #1 is really purely syntactic (a static FOO: &[…] = &[1, 2, 3, …] is an address that points into a static array itself, it's as if you made a separate static and then a reference in two steps), and #2 is an artefact of your concrete example (you could have declared a reference to a fixed-size array just as well).

Generally, I dislike creating references to arrays in statics, because the fundamental model of values is ownership, so a reference feels spurious unless I can't use anything else. One example of "can't use anything else" is when you specifically want the code not to care about the number of elements, in which case you have no choice but to construct a slice (as opposed to an array), which in turn must be behind indirection, since it's unsized.

1 Like

What code would cause the array to be copied?

That's not quite fully accurate; it depends significantly on how you're defining "absolutely." To the amount we can ensure any property of our code under optimizations, static items are not copied unless you use them by-value.

The address of a static item is guaranteed to be consistent (unlike const items).

Namely, if I have

const ARR: [usize; 5] = [1, 2, 3, 4, 5];
// somewhere
println!("{:p}", &ARR);
// somewhere else
println!("{:p}", &ARR);

then there is no guarantee that the printed addresses are the same. Similarly, with

const ARR: &[usize; 5] = &[1, 2, 3, 4, 5];
// somewhere
println!("{:p}", ARR);
// somewhere else
println!("{:p}", ARR);

or even with &[usize], there's still no guarantee that the shown addresses are the same.

Within a single codegen unit (roughly, a single crate), it's likely that they are the same address. From different codegen units (roughly, separate crates), it's likely that they aren't.

However, with

static ARR: [usize; 5], [1, 2, 3, 4, 5];
// somewhere
println!("{:p}", &ARR);
// somewhere else
println!("{:p}", &ARR);

it is guaranteed that both println! will see and print the same static at the same address.

This isn't a 100% guarantee that all compilers will never introduce a spurious copy; under the as-if rule of optimization, the compiled program is free to do whatever extra work it wants to do so long as that doesn't change the behavior of the code you wrote, and only changes "trivial" details such as how quickly it executes, or how much memory it allocates (stack or heap[1]) or copies around.

What puts a bound on behavior is that the compiler wants to produce good code. Since the address of a static is guaranteed to be consistent, a reference to that static must have the same address. If you never actually observe the address (and the compiler is able to determine this), then it's allowable for the reference to be to a different address. But what benefit is there to that? It's simpler and just more efficient in essentially all cases for the compiler to just give you a reference to the shared static memory when you take a reference to the static.

To answer the OP question directly, given

static ARR: /* type */ = /* expression */;

you semantically get a reference to the static, and no copy of the static is performed; the reference is to the same "allocated object" with the same address as every other reference to the static. If it does not change the behavior of the code, the compiler/optimizer is allowed to introduce a spurious copy, but it has no reason to.

If you wrote const ARR instead, a copy is semantically performed every time you mention ARR, even if the type is not Copy, and even if you just immediately take a reference. If it does not change the behavior of the code, the compiler/optimizer is allowed to remove unnecessary copies, and it's typically quite eager to do so.

  1. Yes, the compiler is allowed to insert completely spurious heap allocation if it wants to. This goes hand in hand with justifying the removal of heap allocations. In order to justify removing heap (or stack) allocation, the optimizer incidentally must also receive sufficient justification to allow inserting spurious allocation. ↩︎

1 Like

Your static MY_ARR: [usize; ...] = ... declaration should be fine as it is.

The only way it can be implicitly copied is if you explicitly pass it around by value, but arrays like that are almost always passed around by reference anyway.

This is what I was referring to. I very deliberately didn't mention anything about the address being stable or not (it isn't what is being asked, anyway).

Thank everyone for detailed explanations!

The following code is an example that static arrays are copied.

static A: [usize; 10] = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9];
static B: [usize; 10] = [0, 10, 20, 30, 40, 50, 60, 70, 80, 90];

fn f() {
    let a = [A, A, A, B, B];

The user might meant to define

let a = [&A, &A, &A, &B, &B]

While true, the use site is asking for the copy (it's not fully implicit, and the code wouldn't compile if the type weren't copyable). It's unlikely (though not impossible) that someone wants [&[usize; 10]; 5] and then accidentally gets and successfully uses [[usize; 10]; 5]. If the statics were &[usize; 10], the use site could still write [*A, *A, *A, *B, *B].

It is fair to worry about large structurally-Copy values getting copied around unnecessarily when not strictly necessary, causing excessive stack usage and copying. It can be difficult for the optimizer to remove unnecessary stack copies because of address uniqueness guarantees.

But the way to address this is by using types. If you have some large data blob that shouldn't be implicitly copied around, it should have some non-copy type, e.g. struct Blob<const N: usize>([u8; N])[1], not introduce a defensive reference into the static data region that just exists to add syntactic salt to copying out of the static data region.

If your extra reference or type exists for some semantically explainable reason (e.g. while the static data is a usize slice, it's more specific than just a usize slice, it's X), then it's idiomatic for it to exist. If it solely exists as syntactic salt against copies, it's nonidiomatic. You can still argue its utility, but it's not idiomatic.

  1. If you add a Deref impl to whatever NotCopy wrapper, you can even make &NotCopy<T> implicitly usable as &T in many cases. ↩︎


Agree that defining wrapper non-Copy structs is the idiomatic way.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.