Possible bug regarding linker script constants

Hello all!

I'm working on some no_std/no_main code and I'm having some trouble with using constants from a linker script. At the moment I'm trying to minimize the amount of assembly used to initialize the platform before getting into the Rust environment.

I believe I have either found a compiler issue or that I'm fundamentally misunderstanding how to bring external constants into Rust. Here the relevant sections from the linker script and Rust:

    .kernel_text : ALIGN(8)
    {
        . = ALIGN(8);
        __start_kernel_text = .;
        *(.text .text.*);
        . = ALIGN(8);
    } > RAM AT>ROM
    . = ALIGN(8);
    __end_kernel_text = .;
    __load_kernel_text = LOADADDR(.kernel_text);
// Linker script constants
extern "C" {
    // destination location in RAM for kernel .text to be copied to
    pub static __start_kernel_text: *mut u8;
    pub static __end_kernel_text: *mut u8;

    // location in ROM where the kernel .text is to be copied from
    pub static __load_kernel_text: *const u8;
}

I got stuck at the first major step of copying the kernel .text section from ROM into RAM. I'm first attempting to figure out how many bytes need to be copied, but here lies my issue:

    let size = unsafe {
        __end_kernel_text.offset_from(__start_kernel_text)
    };
    if size != 0 {
        // do something with size
    }

If you look at the assembly generated by this code, rather than using the known constant values of these pointers, the code is dereferencing the pointers and comparing the data they point at.

mov    0xf80500,%eax
cmp    0xf80018,%eax
jne      f8030f <setup+0x127>

Am I missing something here?

Where do you see the dereferencing? The assembly you posted literally checks for the equality of 0xf80500 and 0xf80018. There is no memory operation going on. (And these couldn't be the pointed values even if there were – they are too big to fit in a u8.)

I got the disassembly with objdump, with uses GNU/AT&T syntax. Here's the intel vs at&t syntax.

Intel syntax

objdump -j .kernel_text -M 386 -M intel -D target/386-none/release/rustos.elf

  f802fb:	a1 00 05 f8 00       	mov    eax,ds:0xf80500
  f80300:	3b 05 18 00 f8 00    	cmp    eax,DWORD PTR ds:0xf80018
  f80306:	74 07                	je     f8030f <setup+0x127>
  f80308:	66 ba 62 f8          	mov    dx,0xf862
  f8030c:	31 c0                	xor    eax,eax
  f8030e:	ee                   	out    dx,al
  f8030f:	e8 23 fe ff ff       	call   f80137 <start>

GNU/AT&T syntax:

objdump -j .kernel_text -M 386 -D target/386-none/release/rustos.elf

  f802fb:	a1 00 05 f8 00       	mov    0xf80500,%eax
  f80300:	3b 05 18 00 f8 00    	cmp    0xf80018,%eax
  f80306:	74 07                	je     f8030f <setup+0x127>
  f80308:	66 ba 62 f8          	mov    $0xf862,%dx
  f8030c:	31 c0                	xor    %eax,%eax
  f8030e:	ee                   	out    %al,(%dx)
  f8030f:	e8 23 fe ff ff       	call   f80137 <start>

For AT&T syntax a bare number is a memory location and immediates are prefixed with a '$' sign.

1 Like

Oh, you are right, it's AT&T syntax. I always forget that.

For what it's worth, this also causes the compiler to dereference __start_kernel_text as well.

unsafe "C" { pub static __start_kernel_text: *mut u8 }
//...
// dereferences __start_kernel_text as well
let address = unsafe { __start_kernel_text as usize };

I guess I sort of solved this.

In C, you'd typically access these values by doing the following

extern uint8_t __start_kernel_text[];
extern uint8_t __end_kernel_text[];
extern uint8_t __load_kernel_text[];
// ...
size_t size = __end_kernel_text - __start_kernel_text;

I figured out how to get around this by some other unsafe shenanigans. Rather than declaring the symbols as pointers, declare them as the underlying type (u8 in this instance).

extern "C" {
    pub static __start_kernel_text: u8;
    pub static __end_kernel_text: u8;
}
// ... later ...
let size = unsafe {
    let start = &__start_kernel_text as *const u8;
    let end = &__end_kernel_text as *const u8;
    end.offset_from(start)
};
if size != 0 {
    // do something with size
}

It results in the assembly you'd expect (Intel syntax)

  f802fb:	b8 00 05 f8 00       	mov    eax,0xf80500
  f80300:	3d 18 00 f8 00       	cmp    eax,0xf80018
  f80305:	74 07                	je     f8030e <setup+0x126>
  f80307:	66 ba 62 f8          	mov    dx,0xf862
  f8030b:	31 c0                	xor    eax,eax
  f8030d:	ee                   	out    dx,al
  f8030e:	e8 24 fe ff ff       	call   f80137 <start>

I guess it comes down to the fact that these linker script constants don't exist in physical memory. Rust doesn't have an equivalent type to uint8_t[]. Rust was assuming a pointer named "__start_kernel_text" existed at the address the linker script set and it was "dereferencing" to get the value it contained.

It would be useful if there were a decorator or something that prevented accesses to __start_kernel_text directly. Right now I've just removed "pub" from everything and require accessing the constants through accessor functions.

Oh, okay, so that is what you are trying to do. You are actually looking for the address of the variables, and not their value. That's very different from what you seemed to want based on the first piece of code you posted!

So, of course, the Rust compiler isn't buggy. There isn't any sort of special behavior for constants configured by linker scripts. There isn't any implicit dereferencing.

What happens is that the two statics are place expressions (as are regular variables). Unless they are stored directly in registers, reading their value needs to go through memory. It is basically the case that given a variable or static, reading its value by mentioning its name foo semantically generates code that is equivalent to *&foo: look up the address of the variable, dereference it, and read from it. In this manner, every variable access involves a "pointer dereferencing", regardless of their type. So, just as I suspected, the values being read are not the u8 values pointed by the pointers; they are the values of the pointers themselves.

What you were expecting is more of a footgun of C. In C, arrays often "decay" into pointers. That is, an array-typed variable T foo[] will often be implicitly converted to, and behave as if it were, a pointer to the first element. Mentioning the name of the variable will then not read the whole array out from memory by-value. The expression foo, instead of meaning *&foo, will magically mean &foo[0] instead. This is a peculiarity in C, and it is inconsistent with how the rest of the types work.

Rust has uniformity and consistency among its core values. Every time you read a variable by value, you will get its value. It is not the case that sometimes you get its value and sometimes you get its address (at least conceptually – of course, compiler optimizations and the code actually being generated are a different question). So, if you are interested in the address of a variable or a static, you have to explicitly take its address.