Static initialization of self-referencial types

Hi all!

For a while now, we are using a hack to perform static initialization of variable that must contain a reference or pointer to themselves. Empirically, it works.

struct X(*const X);
unsafe impl Sync for X{}

impl X {
    const fn new_at(this: &X) -> Self {
        Self(this)
    }
}
static A:X = X::new_at(&A);

Such constructs are then nested:

struct Y(X,X);

impl Y {
    const fn new_at(this: &Y) -> Self {
        Self(X::new_at(&this.0),X::new_at(&this.1))
    }
}

static B:Y = Y::new_at(&B);

This trick depends on what it is allowed to do with a reference to a variable that is being initialized. This is not documented in the Reference.

Empirically it works. It is used to statically initialize a global allocator heavily tested (only on a single plateform). It save us from writing ASM block or to add code next to the entry point.

But I wonder how much we can rely on this hack? Specifically, what are the chance that this hack is broken by a compiler update?

No need to use unsafe. Static references are working fine:

struct Rec {
    field: i32,
    r: &'static Rec,
}

static R: Rec = Rec {
    field: 42,
    r: &R,
};

static R1: Rec = Rec {
    field: 1,
    r: &R2,
};

static R2: Rec = Rec {
    field: 2,
    r: &R1,
};

(extended example in playground)

2 Likes

There is no unsafe block in my code...

I didn't say "block"

That is needed because my structure has a pointer field.

Is it really guaranteed that this pointer will not lead to data race when accessed from different threads?

Actually in may code it is nested within thread local or mutex guarded structures, I just reduced the example to the minimum.

Okay, why does it have a pointer field though? In your example code the pointer is used for the self-referential reference. Which can be replaced by a static reference. If there is any other reason, you should perhaps provide a more complete example.

Or simpler! your code perfectly exemplify my problem.

Are you sure that the initialization of field r by &R is not undefined behavior?

If it doesn't contain unsafe it is not undefined behavior, that's how Rust works ^^

Or more precisely: if you encounter undefined behavior without using unsafe, you can either blame the author of a library you’re using that caused the undefined behavior with its internal use of unsafe (behind some non-unsafe fn API), or you can blame the rust compiler and file a bug there.

2 Likes

Thank you,... I suppose I should believe, is that has simple? Realy, Seriously?? So if the compiler accepts it, it is probably ok. I am a C++ coder, I suppose paranoia is not needed in Rust. It will take time I accept not to be paranoid anymore!!

2 Likes

Indeed, it is a huge relief.

It makes it way harder to use unsafe correctly in Rust though, since you do have to hold up the guarantee yourself that your code does not do anything violating "memory safety" ever no matter how adversarial the input that your API gets called with as long as the caller didn’t use unsafe themself. And even without it, there are more guarantees to uphold when working with e.g. raw pointers in Rust compared to C++. So as a beginner you should probably avoid using unsafe yourself or at least ask for advice or make sure you’ve read and understood the nomicon very thoroughly first.

By the way, Rust’s "memory safety" is exactly the reason why you cannot easily produce undefined behavior in rust, since undefined behavior could result in memory corruption, etc. This is also the reason why e.g. integer overflows in release mode are not undefined behavior but clearly defined as wrapping around instead. (Even though you are not supposed to use this behavior as they produce panics on overflow in debug mode.) The standard library goes to great lengths at avoiding undefined behavior from malicious but safe code, e.g. in the source code for the atomic reference counting smart pointer you’ll find safeguards against problems that can only occur if a program clones the same pointer more than usize::MAX times.

There is also (AFAIK) still a problem with forward progress guarantees that LLVM wants to have resulting in the possibility of undefined behavior in safe Rust code, but this is of course considered a bug in the rust compiler and will get fixed eventually. Link explaining the topic.

1 Like

Since I was curious what would happen when you try to de-reference the recursive reference too early, here’s what happens:

#[derive(Clone, Copy)]
struct Recursive {
    field: i32,
    rec: &'static Recursive,
}
impl Recursive {
    const fn new_containing(this: &'static Recursive) -> Self {
        Self {
            field: 42,
            rec: this,
        }
    }
    const fn new_copying(this: &'static Recursive) -> Self {
        *this
    }
}

// works fine
static R1: Recursive = Recursive::new_containing(&R1);

// works, too
static R2: Recursive = Recursive::new_copying(&R1);

// doesn’t work:
// static R3: Recursive = Recursive::new_copying(&R3); 

The last line when uncommented produces an error:

error[E0391]: cycle detected when const-evaluating `R3`
  --> src/lib.rs:26:1
   |
26 | static R3: Recursive = Recursive::new_copying(&R3); 
   | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   |
note: ...which requires const-evaluating `R3`...
  --> src/lib.rs:26:1
   |
26 | static R3: Recursive = Recursive::new_copying(&R3); 
   | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   = note: ...which again requires const-evaluating `R3`, completing the cycle
note: cycle used when const-evaluating + checking `R3`
  --> src/lib.rs:26:1
   |
26 | static R3: Recursive = Recursive::new_copying(&R3); 
   | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
3 Likes

Yes it really is beautiful. You can throw together code without fear as long as you don't throw that unsafe keyword in. It really frees you to be more creative and have more fun programming because in many, many cases, if it compiles, it does what you think it will do.

That's a really nice error message for such an obscure edge case.

Niko Matsakis explains really well the dependency-driven architecture of the Rust compiler (which is the reason why these kinds of errors can come up) in this presentation:

Relevant slides around the 11:34 timestamp:

For reference, the latter example gives the following error (if you fix the type error first by writing [u8; LEN as usize])

  Compiling playground v0.0.1 (/playground)
error[E0391]: cycle detected when const-evaluating + checking `DATA::{{constant}}#0`
 --> src/lib.rs:2:18
  |
2 | const DATA: [u8; LEN as usize] = [1, 1, 1];
  |                  ^^^^^^^^^^^^
  |
note: ...which requires const-evaluating + checking `DATA::{{constant}}#0`...
 --> src/lib.rs:2:18
  |
2 | const DATA: [u8; LEN as usize] = [1, 1, 1];
  |                  ^^^^^^^^^^^^
note: ...which requires const-evaluating `DATA::{{constant}}#0`...
 --> src/lib.rs:2:18
  |
2 | const DATA: [u8; LEN as usize] = [1, 1, 1];
  |                  ^^^^^^^^^^^^
  = note: ...which requires normalizing `LEN`...
note: ...which requires const-evaluating + checking `LEN`...
 --> src/lib.rs:1:1
  |
1 | const LEN: u8 = DATA[0] + DATA[1] + DATA[2];
  | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
note: ...which requires const-evaluating + checking `LEN`...
 --> src/lib.rs:1:1
  |
1 | const LEN: u8 = DATA[0] + DATA[1] + DATA[2];
  | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
note: ...which requires const-evaluating `LEN`...
 --> src/lib.rs:1:1
  |
1 | const LEN: u8 = DATA[0] + DATA[1] + DATA[2];
  | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
note: ...which requires type-checking `LEN`...
 --> src/lib.rs:1:1
  |
1 | const LEN: u8 = DATA[0] + DATA[1] + DATA[2];
  | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  = note: ...which again requires const-evaluating + checking `DATA::{{constant}}#0`, completing the cycle
note: cycle used when checking that `DATA` is well-formed
 --> src/lib.rs:2:1
  |
2 | const DATA: [u8; LEN as usize] = [1, 1, 1];
  | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

error: aborting due to previous error

For more information about this error, try `rustc --explain E0391`.
error: could not compile `playground`.

To learn more, run the command again with --verbose.
2 Likes

I did write loop {} in place of abort in panics in debug mode when the allocator is used as a global allocator! Tx. I thought naively I could fix this dirty way of stopping the program latter.