Can rust create truly position independent code?

I have a kernel (elf binary) which I load into an arbitrary location and jump to it.
This works, for the most part, but I noticed that attempts to use a trait causes a crash because the vtables contain absolute addresses, and so are jumping to a location where my code hasn't been loaded.

I tried adding the following to my target file:

    "position-independent-executables": true,
    "relocation-model": "pic"

But that doesn't appear to work either (still tries to call into where code isn't loaded, but not really sure how/where the addresses are coming from).

I found a similar thread here:

Which seems to suggest that a fully relocatable binary isn't really possible without manually patching the absolute data pointers in the vtable the GOT (what is the GOT, btw? Where can I learn more about this?). Is this still the case, or are there other ways to create a binary which can be loaded and executed from any address?

I am curious if there are thoughts about the last post, though, which seems to imply that the original poster was able to get it somewhat working.

1 Like

GOT is the global offset table in ELF. Wikipedia has an article on it that isn't entirely useless: Global Offset Table - Wikipedia (and has more details than I knew about it before this).

Unfortunately I can't help you with the rest of your question. I thought GOT was only relevant when linking dynamically, but seems I was wrong.

1 Like

You can read the Drepper's Article on subject.

The core things are still the same even if there were added quite a few extensions.

GOT is relevant for PIC and normally main binary is not PIC, thus don't need it. But of course if you try to create PIC code you get GOT, too — even if you don't have any shared libraries.

The story here lies with the fact that modern CPUs are not designed to support PIC code (I think the most famous CPU that was designed for that was 6809).

In particular x86-64 can only use addresses to read or write memory, but when you want an indrect call… you need absolute address for that.

This makes it absolutely impractical to have vtable which doesn't have absolute addresses: it's much easier to adjust these addresses once, when program is loaded, then to [try to] create a specialized compiler that no one would use in production.

It could be interesting to have something like that for education purposes, but, apparently, interest is not so but as to make a fork of Rust compiler that would work in such a mode.

Vtables are part of the data section. Position independent code does not apply to them. Even with position independent code you still have to apply relocations for the data section.

1 Like

Modern cpu's do support pc-relative addressing. In fact it is generally more efficient than absolute addressing for addresses that are part of the code. Pc-relative calls can generally be done with a single instruction, while absolute addressing needs multiple instructions to assemble the address in a register before doing an indirect call (except on x86_64 where you can use a single 10 byte movabs instruction followed by a 2 byte call in the place of a 5 byte relative call instruction) 32bit x86 is the major exception. https://rust.godbolt.org/z/1hq4xhv5P What they don't support is loads and stores of pc-relative pointers stored in a register, but those are not that useful for position independent code anyway at least in userspace as it did effectively destroy ASLR and doesn't help with dynamic linking.

The reason for having the GOT is to support dynamic linking without having to apply relocations to the code section (allowing it to be shared between processes, thus reducing memory usage)

In userspace most executables are PIE (position independent executable) executables. This is effectively PIC + some minor optimizations enabled by using different rules for symbol resolution.

2 Likes

Seriously? Which instruction should I use to jump to non-fixed address with the use of current PC on x86-64, arm, or risc-v? And if these are not “modern” then what do you call “modern”?

Well… to decide that we first need to explain which instructions do you plan to use…

Only when target is fixed. Just one word… and yet such a difference. Calls to fixed address are useless if we want to implement traits.

Nope. That's only true if we talk about call to a fixed address, not selectable one.

Maybe, but without these it's impossible to implement dyn Trait.

By default all compilers create “normal”, not PIE executables. Yes, it's true that on some platforms these are not support (Android 5.0+ is one example), but that's not because someone decided that it would bring some kind if “security”.

I'm yet to see deluge of exploits uleashed on regular GNU/Linux distros that have become possible because executables are non-PIE…

I don't think that what position independent code is about. My understanding of PIC is that one can jump (or call) to some place in blob of code using the offset from the current position of the program counter in that same blob of code. At least that was the case the last time I thought about such things, while programming the 6809 in assembler as it happens. In that respect the offset of the target address from the jmp/call is always fixed and the code can be moved around.

Jumping to some non-fixed address, as in a vtable, is something different.

4 Likes

echo 'int main() { return 0; }' | gcc -x c -v - |& grep -- -pie shows that GCC tells the linker to produce a PIE executable by default.

PIC/PIE is mandatory for full ASLR support. And ASLR only exists for security reasons. ASLR interferes with debugging, so if security was not an issue, we probably would never have invented ASLR.

Well… dyn Trait requires that… so how can that work? Isn't that where we started:

Yes, some distros do that, too. Your point here? Go downloaded GCC, even latest version, run ./configure --help and you'll see:

  --enable-host-pie       build position independent host executables

By default PIE is disabled and there are no plans to change that, at least for now.

Yes. And that would have probably been better. But that's separate issue.

You're trying to do what I do in my toy OS, so I'll share some of my setup and hopefully it's useful. Here's my JSON:

{
  "arch": "x86_64",
  "code-model": "kernel",
  "cpu": "x86-64",
  "crt-objects-fallback": "false",
  "data-layout": "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128",
  "disable-redzone": true,
  "features": "-x87,-xsave,-mmx,-sse,-sse2,-sse3,-ssse3,-sse4.1,-sse4.2,-avx,-avx2,-fma,-f16c,+soft-float,+shstk",
  "link-self-contained": {
    "components": [
      "linker"
    ]
  },
  "linker-flavor": "gnu-lld-cc",
  "llvm-target": "x86_64-unknown-none-elf",
  "max-atomic-width": 128,
  "metadata": {
    "description": "Hexeract kernel build environment",
    "host_tools": false,
    "std": false
  },
  "panic-strategy": "abort",
  "plt-by-default": true,
  "position-independent-executables": true,
  "pre-link-args": {
    "gnu-cc": [
      "-m64"
    ],
    "gnu-lld-cc": [
      "-m64",
      "-nostdlib",
      "-Wl,--no-dynamic-linker",
      "-z", "cet-report=error"
    ]
  },
  "relro-level": "full",
  "rustc-abi": "x86-softfloat",
  "stack-probes": {
    "kind": "inline"
  },
  "static-position-independent-executables": true,
  "supported-sanitizers": [],
  "target-pointer-width": 64
}

There's good news and bad news with this. The good news is that it does create a proper PIE kernel, without any need for a special linker script or other shenanigans. It's linked at base address zero, is a DYN elf rather than an EXEC, and has the appropriate PIE flags in the dynamic array.

The bad news is that there are still some relocations to be fixed up. These are almost certainly vtables, since those need absolute addresses in memory.

The relocations are fairly easy, at least. There's only one type of relocation in my kernel file - R_X86_64_RELATIVE. If you're already loading enough of the elf file to parse out program headers, then getting the DYNAMIC segment and parsing the RELA tags to get the relocations is pretty easy. While you're at it, you should also look for a GNU_RELRO segment to make sure your page tables have the right permissions once relocation is completed.

Another thing to consider is kernel modules - My OS is a microkernel, but if you're going to have kernel modules then you'll need to deal with a lot more relocation and symbol machinery to make that happen. It's worth making sure your elf handling code is robust in that case.

2 Likes

It is. I don't know. It's just that PIC s about jumping somewhere relative to where you are. While jumping through a variable (vtable) must be something else. I the code can be loaded at address A or address B then PIC code will still run. But somebody has to tweak that variable (vtable) before running the code to ensure we are in A or B. Which is, I guess, what happens when a dynamic library is loaded.

Or PIE binary. Yes. And that's precisely what we are discussing here. It was right there, in the very first paragraph of the very first post thus I'm not sure why people started discussing other things…

Oh, good I understand correctly then.

Perhaps because the title of the thread is about "position independent code".

Some further thoughts:

To answer the headline question: I think it depends what you mean by "truly position independent". If you mean "no relocations", the answer is "not practically". Even the stub kernel I use in my bootloader tests has one relocation, and it's in the panic machinery. That would seem to imply that it's basically impossible to completely avoid them in normal rust code.

If you were to build with no_core, and were very careful in how you implement any global pointers so that they're always in the .bss section and are initialized at runtime, and never used dyn... it could be done? It would be a lot more effort than getting your bootloader to handle a few relocations.

If your kernel is built properly as a static PIE, you won't have any relocations that depend on symbol resolution or other complex dynamic loading bits. You will not have a GOT - that is only used for dynamic linking. You will have a .data.rel.ro section (and associated segment), with a bunch of simple relocations pointing to it. R_X86_64_RELATIVE is about the most straightforward relocation type there is, and it's the only one my bootloader knows how to handle. I won't say for sure that it's the only type that will ever be emitted for a static PIE, but so far it's the only one I've had to deal with.

I also just want to point out that while vtables are the most obvious culprit, they are not the only place where this can happen. A relocation needs to be created any time you have:

  1. a pointer
  2. in a (ro)data segment
  3. statically initialized with the address of other program code/data

This is because in order to get the true value of that pointer, the compiler would need to know the final loaded address of the code. That is impossible, so you end up with a relocation. For pointers which are determined at runtime, a rip-relative lea will be emitted, so there's no need for a relocation in those cases.

You should be able to see both of these cases in your code - the vtable itself needs relocations, but the pointer to the vtable that's use for the dyn Trait metadata will be created with a rip-relative lea

1 Like

Yes. And the question is: how can dyn Trair be implemented in the “position independent code”.

In what is usually called “position independent code” they are supported by storing absolute addresses — and then adjusting them at loading time. Because modern CPUs don't offer any other simple alternative.

Whether term “position independent code” is still suitable for that construct is less relevant then the question of how to either support it or change it.

Supporting it is relatively easy: you just need [a very simple] linker that would handle reloations as @branan explained. Real, full-blown, dynamic linker also have that code: it first relocates itself, then, later, does another pass that makes functions in linker available to libc (and other libraries). Quite a PITA, but, ultimately, still easier than trying to change the whole toolchain to have special mode just for one binary. (such things were done before, but it's not clear if they are still justified today).

Avoiding that… it's another venue, but I'm not really even sure where to start with that: any jump table would have to have absolute addresses stored somewhere, I'm afraid. Or, alternatively, instead of using CPU-provided CALL (or “branch with link”) instructions we may handle everything manually.

Not impossible, actually, I worked with such system, many years ago. But requires significant changes in the compiler and linker and it's not clear if doing simple OS is worth it.

I guess the idea is that PIC is just to improve loader performance by reducing the number of relocations?

If you truly want no relocations you'd theoretically need to have only one section, since the loader might need to deal with arbitrarily large pages when mapping your sections...

Oh I get it. What the modern world calls "Position Independent Code" actually means code that is NOT position independent but needs fixing up at run time.

Obviously I'm out of touch.

Code doesn't need fixups. Data needs fixups. This may sound like a trivial distinction, but it's important: that means that code may be shared (but data have to be unique… it needs to be unique anyway, because globals can be changed, you know).

It's less about loader performance (although I'm sure on some architectures it helps), and more about memory usage. Without PIC, every process would have to relocate shared libraries by fixing up addresses in the code segments. This means that every process would need to have its own copy of the code pages, since they're all different.

With PIC, all the fixups are contained to a few global tables, so all of the code pages can be shared read-only across every process that uses a shared library. Only the explicitly-relocated pages (and any writable data pages) need to be duplicated per-process.

PIE goes a step further and lets the program assume that it won't be loaded as a library, so it can avoid those global tables for any of its internal function calls and directly use pc-relative addressing.

EDIT: But important in the context of this thread is that even in PIE mode, vtables are still global tables of function pointers and thus need to be fixed up, even though the PIC global offset table (GOT) isn't used for a static position-independent executable.

Sections don't really exist at runtime in ELF, so I'm going to assume you meant segments.

One segment isn't going to necessarily avoid runtime relocations, since that segment could still have static data pointers that would need fixing up. Conversely, multiple segments can easily exist side-by-side with no relocations so long as they contain no static pointers. The relative offset from one segment to another is part of the ELF structure, so position-independent code can use PC-relative addressing to call code or read data across segments.

I'm not sure what you mean by "arbitrarily large pages"?

2 Likes

Gah, sorry, yes, that's PE terminology poisoning my brain. PE uses "section" for both, confusingly.

Yeah, it's not the only cause, but I do think it is required in theory.

My understanding is that is the assumed offset, if the segments are all loaded at their preferred virtual address (relative to the image base virtual address), and relocations for references across segments is required/allowed (depending on your PoV) to support the loader mapping them differently, for example if the page size is larger than the alignment of the preferred virtual address.

Theoretically, the size of the page is an arbitrary choice by the operating system at runtime, including at the process level or even lower. In practice for a bunch of reasons (including CPU hardware!), it tends to be pretty much fixed to 4kB outside carefully administered closed systems like database servers.

Linux (and others) often calls them "huge pages"