Data in &'static kernel entry point argument seems corrupted

VisualEhrmanntraut · September 6, 2021, 11:54am

I'm developing an OS in Rust, I'm still relatively new to it. I have a custom UEFI bootloader that boots my WIP kernel, they both use my own custom boot protocol.

Details hidden; out of date, see discussion below

The kernel entry point signature is as follows:

type EntryPoint = fn (&'static boot_proto::BootData) -> !;

BootData is defined as the following:

pub const CURRENT_REVISION: u64 = 0x1;

#[derive(Debug)]
pub struct BootData {
    pub revision: u64,
    pub tags: &'static [tags::TagType],
}

impl BootData {
    pub fn new(tags: &'static [tags::TagType]) -> Self {
        Self {
            revision: CURRENT_REVISION,
            tags,
        }
    }
}

Take into consideration the following example kernel code:

static SERIAL: Mutex<SerialWriter> = Mutex::new(SerialWriter::new(0x3F8));

#[no_mangle]
fn kernel_main(boot_data: &'static boot_proto::BootData) -> ! {
    let mut serial = SERIAL.lock();

    writeln!(serial, "Bootloader data: {:#X?}", boot_data).unwrap();
    assert_eq!(explosion.revision, boot_proto::CURRENT_REVISION);

    loop {
        unsafe { asm!("hlt") };
    }
}

A portion of the boot loader code is the following:

// [..]
let mut explosion = Box::new(boot_proto::BootData::new(Default::default()));
let mut tags = Vec::with_capacity(3);
// A bunch of initialization here, then an exit boot services call
explosion.tags = tags.leak();
unsafe {
    asm!("cli");
    core::mem::transmute::<_, fn(&'static kaboom::ExplosionResult) -> !>(
        elf.header().entry_point() as *const (),
    )(Box::leak(explosion));
};

Now that you know the behaviour of both ends, let's inspect the serial output:

Bootloader data: BootData {
    revision: 0x6972646C,
    tags: [
        ACPI(
            0x0000000007ed1f80,
        ),
        ACPI(
            0xafafafafafafafaf,
        ),
        ACPI(
            0x0000000000010001,
        ),
        ACPI(
            0x82e3000010202020,
        ),
        ACPI(
            0xafafafafafafafaf,
        ),
        ACPI(
            0xafafafafafafafaf,
        ),
        ACPI(
            0xafafafaf6c617470,
        ),
        ACPI(
            0xafafafafafafafaf,
        ),
        ACPI(
            0xafafafafafafafaf,
        ),
        ACPI(
            0xafafafafafafafaf,
        ),
    ],
}

That doesn't look good, but why is this happening?
The boot loader clearly is passing the correct data.

VisualEhrmanntraut · September 6, 2021, 12:03pm

Forgot to mention that I'm linking the kernel using a custom linker script and a custom target. Here is a gist with both: Kernel ld script, and custom target specification (github.com)

Yandros · September 7, 2021, 9:35am

The tags are the problematic ones, right? Could you give more details regarding:

?

VisualEhrmanntraut · September 7, 2021, 9:37am

The revision in the struct is also wrong, it's a random value instead of 0x1.

Yandros · September 7, 2021, 9:48am

It's hard to diagnose without more context , or the actual code you used (the one you've given has several typos)

VisualEhrmanntraut · September 7, 2021, 10:06am

I'm understanding of the limitations of the info I've given. This is a closed-source project, which is why I've not provided the entire source code, I'll try to provide better context in a bit, my apologies.

VisualEhrmanntraut · September 7, 2021, 2:15pm

@Yandros That took a bit. I've updated the gist with more code. Posting it again just in case: Kernel & boot loader code. (github.com).

Yandros · September 8, 2021, 12:19pm

Thanks, that should already allow for more knowledgeable people in this are than I to help you

In my case I don't get the exact semantics of the memory mappings shenanigans, so I'll assume they're fine.

Could you move the info!("{:#X?}", explosion) right before calling the entry()?

If you have access to some debugger of sorts, could you inspect the layout of that explosion static reference (and that of its referee) before and after the call to entry()?

If you don't, you can try to mock a debugger through debug printing, by using stuff such as:

unsafe // Safety: `*it` must not contain padding or uninit bytes
fn debug_bytes (it: &'_ impl ?Sized)
{
    let bytes = ::core::slice::from_raw_parts(
        <*const _>::cast::<u8>(it),
        ::core::mem::size_of_val(it),
    );
    info!("{:#X?}", bytes);
}
macro_rules! debug_bytes {( $place:expr $(,)? ) => (
    debug_bytes(&$place)
)}

so as to do:

let at_explosion = Box::leak(explosion);
unsafe {
    debug_bytes!(at_explosion); // shallow-inspect the `&'static` itself
    debug_bytes!(*at_explosion); // shallow-inspect the struct
}

right before the entry() call, and, ideally, do something similar right at the start of kernel_main().

Form there, two cases:

either the addresses match, and there is something wrong with the virtual memory setup / the memory mappings;
or the addresses don't match, and there is something wrong with the call ABI and/or the layout of these things.
- In that regard, you can reduce the chances of ABI issues by using #[repr(C)] things at as many layers as possible (using, for instance, SharedSlice<'lt, T> or slice_ref<'lt, T> instead of &'lt [T]), together with extern "C" functions (mainly for kernel_main and the EntryPoint type definition).

VisualEhrmanntraut · September 8, 2021, 1:42pm

Thanks for your suggestions. I will try them soon and report my results

VisualEhrmanntraut · September 8, 2021, 3:12pm

Before

After

No corruption happening

debug_bytes!(explosion) and debug_bytes!(*explosion)
Before

After

debug_bytes!(explosion) before and after are obviously not the same
so, it's the 2nd case

VisualEhrmanntraut · September 8, 2021, 3:26pm

@Yandros, I changed the entry point ABI to extern "C", but, the addresses still don't match, even if I change the argument to take just u64:

[ INFO]:  src/main.rs@245: 0x0000000006538018
Fuse ignition begun.
Boot loader data ptr: 0x6537918

EDIT: Just as an experiment, I decided to call the entry point manually using inline assembly, like so:

unsafe {
    asm!("call {}", in(reg) entry, in("rdi") explosion);
}

and...

[ INFO]:  src/main.rs@245: 0x0000000006538018
Fuse ignition begun.
Boot loader data ptr: 0x6538018
Fuse initialization complete.

Huh, it works.
Does this mean I've found a compiler bug?
I'm suspecting that rustc might be ignoring the function's calling convention and using the entry point convention. Since UEFI uses the efiapi ABI, that would only make sense.
EDIT 2: Bingo! I've just found a compiler bug!

[ INFO]:  src/main.rs@248: 0x0000000006538018
Fuse ignition begun.
Boot loader data ptr: 0x6538018
Fuse initialization complete.

Above output is with extern "efiapi"
Final edit: I made an issue in the rust GitHub repo: Rust ignores extern ABI used for called function if is different than the caller's · Issue #88749 · rust-lang/rust (github.com)

Michael-F-Bryan · September 8, 2021, 4:35pm

In your transmute you explicitly told rustc to cast the entrypoint to a function with the default extern "Rust" calling convention, but if it's actually been defined as an extern "C" or extern "efiapi" function and they pass around arguments differently that would explain the "corrupted" arguments you've been seeing.

Here is a list of all calling conventions supported by Rust:

VisualEhrmanntraut · September 8, 2021, 4:41pm

The examples in the post itself are out of date. See the gist instead.
But, both the entry point and the transmute used the same signature and ABI.
So yeah, it's a compilation bug.
EDIT: note: I'm using rust nightly

bjorn3 · September 8, 2021, 8:04pm

https://gist.github.com/VisualDevelopment/9260154493fa8b6b006a9a1ff9cfa017 is missing the source for kaboom.

VisualEhrmanntraut · September 9, 2021, 6:29pm

Yup.
Here is lib.rs:

kaboom source code

#![no_std]

pub mod tags;

pub const CURRENT_REVISION: u64 = 0x5;

pub type EntryPoint = fn(&'static ExplosionResult) -> !;

#[derive(Debug)]
pub struct ExplosionResult<'a> {
    pub revision: u64,
    pub tags: &'a [tags::TagType<'a>],
}

impl<'a> ExplosionResult<'a> {
    pub fn new(tags: &'a [tags::TagType<'a>]) -> Self {
        Self {
            revision: CURRENT_REVISION,
            tags,
        }
    }
}

I've also added this:

static _KERNEL_MAIN_CHECK: kaboom::EntryPoint = kernel_main;

to kernel/src/main.rs in order to make sure both use the same signature

EDIT: Updated the gist with more code

bjorn3 · September 9, 2021, 7:39pm

ExplosionResult and TagType need a #[repr(C)] to ensure that the two versions of kaboom (compiled for the bootloader cq the kernel) use thr same layout for them. This is not necessarily the fix though.

VisualEhrmanntraut · September 9, 2021, 7:42pm

Why would the layout be different? As far as I know, the layout should not be compiled differently each time you run rustc.

bjorn3 · September 9, 2021, 7:42pm

And you need to use extern "C" fn in both the bootloader when calling and the kernel when defining the kernel entry point. The default rust abi is unstable.

bjorn3 · September 9, 2021, 7:43pm

The -Cmetadata argument passed by cargo may be different. It is possible that this causes a different layout.

VisualEhrmanntraut · September 9, 2021, 7:45pm

I see.

I had no idea. Thanks for all the information

Topic		Replies	Views
Using Rust in a Linux kernel module help	8	5008	January 12, 2023
Linking to mangled Rust symbols help	4	1159	January 12, 2023
[Kernel] `static` written once and read without `unsafe` help	10	1251	June 26, 2021
Translating FFI signatures from C to Rust help	5	758	January 12, 2023
Code generation for local stack variables in release vs. debug mode help	7	751	January 12, 2023

Data in &'static kernel entry point argument seems corrupted

Related Topics