Data in &'static kernel entry point argument seems corrupted

I'm developing an OS in Rust, I'm still relatively new to it. I have a custom UEFI bootloader that boots my WIP kernel, they both use my own custom boot protocol.

Details hidden; out of date, see discussion below

The kernel entry point signature is as follows:

type EntryPoint = fn (&'static boot_proto::BootData) -> !;

BootData is defined as the following:

pub const CURRENT_REVISION: u64 = 0x1;

#[derive(Debug)]
pub struct BootData {
    pub revision: u64,
    pub tags: &'static [tags::TagType],
}

impl BootData {
    pub fn new(tags: &'static [tags::TagType]) -> Self {
        Self {
            revision: CURRENT_REVISION,
            tags,
        }
    }
}

Take into consideration the following example kernel code:

static SERIAL: Mutex<SerialWriter> = Mutex::new(SerialWriter::new(0x3F8));

#[no_mangle]
fn kernel_main(boot_data: &'static boot_proto::BootData) -> ! {
    let mut serial = SERIAL.lock();

    writeln!(serial, "Bootloader data: {:#X?}", boot_data).unwrap();
    assert_eq!(explosion.revision, boot_proto::CURRENT_REVISION);

    loop {
        unsafe { asm!("hlt") };
    }
}

A portion of the boot loader code is the following:

// [..]
let mut explosion = Box::new(boot_proto::BootData::new(Default::default()));
let mut tags = Vec::with_capacity(3);
// A bunch of initialization here, then an exit boot services call
explosion.tags = tags.leak();
unsafe {
    asm!("cli");
    core::mem::transmute::<_, fn(&'static kaboom::ExplosionResult) -> !>(
        elf.header().entry_point() as *const (),
    )(Box::leak(explosion));
};

Now that you know the behaviour of both ends, let's inspect the serial output:

Bootloader data: BootData {
    revision: 0x6972646C,
    tags: [
        ACPI(
            0x0000000007ed1f80,
        ),
        ACPI(
            0xafafafafafafafaf,
        ),
        ACPI(
            0x0000000000010001,
        ),
        ACPI(
            0x82e3000010202020,
        ),
        ACPI(
            0xafafafafafafafaf,
        ),
        ACPI(
            0xafafafafafafafaf,
        ),
        ACPI(
            0xafafafaf6c617470,
        ),
        ACPI(
            0xafafafafafafafaf,
        ),
        ACPI(
            0xafafafafafafafaf,
        ),
        ACPI(
            0xafafafafafafafaf,
        ),
    ],
}

That doesn't look good, but why is this happening?
The boot loader clearly is passing the correct data.

Forgot to mention that I'm linking the kernel using a custom linker script and a custom target. Here is a gist with both: Kernel ld script, and custom target specification (github.com)

The tags are the problematic ones, right? Could you give more details regarding:

?

The revision in the struct is also wrong, it's a random value instead of 0x1.

It's hard to diagnose without more context :grimacing:, or the actual code you used (the one you've given has several typos)

I'm understanding of the limitations of the info I've given. This is a closed-source project, which is why I've not provided the entire source code, I'll try to provide better context in a bit, my apologies.

1 Like

@Yandros That took a bit. I've updated the gist with more code. Posting it again just in case: Kernel & boot loader code. (github.com).

Thanks, that should already allow for more knowledgeable people in this are than I to help you :slightly_smiling_face:

In my case I don't get the exact semantics of the memory mappings shenanigans, so I'll assume they're fine.

Could you move the info!("{:#X?}", explosion) right before calling the entry()?

  • If you have access to some debugger of sorts, could you inspect the layout of that explosion static reference (and that of its referee) before and after the call to entry()?

  • If you don't, you can try to mock a debugger through debug printing, by using stuff such as:

    unsafe // Safety: `*it` must not contain padding or uninit bytes
    fn debug_bytes (it: &'_ impl ?Sized)
    {
        let bytes = ::core::slice::from_raw_parts(
            <*const _>::cast::<u8>(it),
            ::core::mem::size_of_val(it),
        );
        info!("{:#X?}", bytes);
    }
    macro_rules! debug_bytes {( $place:expr $(,)? ) => (
        debug_bytes(&$place)
    )}
    

    so as to do:

    let at_explosion = Box::leak(explosion);
    unsafe {
        debug_bytes!(at_explosion); // shallow-inspect the `&'static` itself
        debug_bytes!(*at_explosion); // shallow-inspect the struct
    }
    

    right before the entry() call, and, ideally, do something similar right at the start of kernel_main().

Form there, two cases:

  • either the addresses match, and there is something wrong with the virtual memory setup / the memory mappings;

  • or the addresses don't match, and there is something wrong with the call ABI and/or the layout of these things.

    • In that regard, you can reduce the chances of ABI issues by using #[repr(C)] things at as many layers as possible (using, for instance, SharedSlice<'lt, T> or slice_ref<'lt, T> instead of &'lt [T]), together with extern "C" functions (mainly for kernel_main and the EntryPoint type definition).
1 Like

Thanks for your suggestions. I will try them soon and report my results

Before


After

No corruption happening

debug_bytes!(explosion) and debug_bytes!(*explosion)
Before


After

debug_bytes!(explosion) before and after are obviously not the same
so, it's the 2nd case

@Yandros, I changed the entry point ABI to extern "C", but, the addresses still don't match, even if I change the argument to take just u64:

[ INFO]:  src/main.rs@245: 0x0000000006538018
Fuse ignition begun.
Boot loader data ptr: 0x6537918

EDIT: Just as an experiment, I decided to call the entry point manually using inline assembly, like so:

unsafe {
    asm!("call {}", in(reg) entry, in("rdi") explosion);
}

and...

[ INFO]:  src/main.rs@245: 0x0000000006538018
Fuse ignition begun.
Boot loader data ptr: 0x6538018
Fuse initialization complete.

Huh, it works.
Does this mean I've found a compiler bug?
I'm suspecting that rustc might be ignoring the function's calling convention and using the entry point convention. Since UEFI uses the efiapi ABI, that would only make sense.
EDIT 2: Bingo! I've just found a compiler bug!

[ INFO]:  src/main.rs@248: 0x0000000006538018
Fuse ignition begun.
Boot loader data ptr: 0x6538018
Fuse initialization complete.

Above output is with extern "efiapi"
Final edit: I made an issue in the rust GitHub repo: Rust ignores extern ABI used for called function if is different than the caller's · Issue #88749 · rust-lang/rust (github.com)

1 Like

In your transmute you explicitly told rustc to cast the entrypoint to a function with the default extern "Rust" calling convention, but if it's actually been defined as an extern "C" or extern "efiapi" function and they pass around arguments differently that would explain the "corrupted" arguments you've been seeing.

Here is a list of all calling conventions supported by Rust:

https://doc.rust-lang.org/nomicon/ffi.html#foreign-calling-conventions

3 Likes

The examples in the post itself are out of date. See the gist instead.
But, both the entry point and the transmute used the same signature and ABI.
So yeah, it's a compilation bug.
EDIT: note: I'm using rust nightly

https://gist.github.com/VisualDevelopment/9260154493fa8b6b006a9a1ff9cfa017 is missing the source for kaboom.

Yup.
Here is lib.rs:

kaboom source code
#![no_std]

pub mod tags;

pub const CURRENT_REVISION: u64 = 0x5;

pub type EntryPoint = fn(&'static ExplosionResult) -> !;

#[derive(Debug)]
pub struct ExplosionResult<'a> {
    pub revision: u64,
    pub tags: &'a [tags::TagType<'a>],
}

impl<'a> ExplosionResult<'a> {
    pub fn new(tags: &'a [tags::TagType<'a>]) -> Self {
        Self {
            revision: CURRENT_REVISION,
            tags,
        }
    }
}

I've also added this:

static _KERNEL_MAIN_CHECK: kaboom::EntryPoint = kernel_main;

to kernel/src/main.rs in order to make sure both use the same signature

EDIT: Updated the gist with more code

ExplosionResult and TagType need a #[repr(C)] to ensure that the two versions of kaboom (compiled for the bootloader cq the kernel) use thr same layout for them. This is not necessarily the fix though.

Why would the layout be different? As far as I know, the layout should not be compiled differently each time you run rustc.

And you need to use extern "C" fn in both the bootloader when calling and the kernel when defining the kernel entry point. The default rust abi is unstable.

The -Cmetadata argument passed by cargo may be different. It is possible that this causes a different layout.

I see.

I had no idea. Thanks for all the information