Can rust create truly position independent code?

Nope, that's wrong. That's why Android mandates support for 16K pages now.

If you ideas had some relationship to reality then these changes wouldn't have been needed at all.

If page size is larger than alignment of the preferred virtual address then you need to switch to CPU mode with smaller pages.

Huge pages are entirely different thing, it's a way to reduce number of level in the multi-level mappings. And they are not per-process but per mapping: ther are a map in high-level page table that next level doesn't exist — and that's it.

4KiB pages vs 16KiB pages is entirely different thing: these are per process… e.g. on macOS these are 16KiB for native ARM apps and 4KiB for x86-64 apps.

For ELF (on Linux) specifically, that isn't the case. There's only a single base address used for relocating an image, and segments are all loaded at their offset from that base. The granule of relocation is the object file, not the segment. From my minimal experience with PE on UEFI, I believe what you're saying is correct for PE environments (at least sometimes)? But that's not my area of expertise. And I have no comment on mach-o whatsoever.

This means that it's common to build aarch64 executables with 64K segment alignment even if your current target environment only has 4K or 16K pages, since it's always possible to load an over-aligned executable, but not an under-aligned one.

Fair enough!

An attempt to do that would make the whole PIC vs non-PIC difference pointless: PIC works precisely because distance between code segment and data segment is constant. This allows code to be unchanged when the whole construct is moved in memory.

These are different: there loader is pard of OS and have special superpowers. And the PIC code notion doesn't exist: in theory everything may always be relocated freely, because that was strict requirements of Win32s, where segments had to be allocated in memory that may be badly fragmented in practice modern compiler don't produce Win32s-compatible code, but I'm not sure what exact requirements are (are they even documented, these days).

ASLR is implemented radically differently: because loader is in OS it can [try to] place all libraries at some random-yet-shared place in memory. That allows sharing of pages between processes if not betwen processed and HDD/SSD[1].


  1. And that problem was solved, back in the day, in a way that reads as both ingenious and horrendous, these days: it still allows one to discard code segment, but explains why loader have to be part of OS in Windows world… ↩︎

I do know what you mean.

In the case of vtables and such though the only reason for their existence is to direct the flow of code and they are fixed once the code is started. Effectively they are code. As far as I can tell for position independent code they would move around with the code blob and contain relative offsets into that blob so that they work no matter where the blob is loaded.

As for that actual global data that would also have to be at a fixed position wrt the code or a base pointer to the globals passed in at run time.

Of course with modern day operating systems with their loaders and dynamic linkers fixing up addresses and such none of what I'm saying necessarily holds. But I often live in machines and environments where there is no such OS or loader magic. Which sounds like the situation our opening poster is in with their kernel development.

Perhaps in another universe… with a different development of literally everything this may work like that.

Maybe, but that are also a way to add some dynamism to the whole thing. The keyword dyn in dyn Trait was added for a reason: it's dynamic dispatch… and when you have two different DSOs it may not even direct you to the code in your module, it may be inherited from other module, too!

Saying “it's simply just a code” in that case is definitely an oversimplification…

No. Opening poster is in different situation: just needs to do relocations like it was done in PC DOS 1.0.

Yes, even these extra-primitive, trivial systems back then already did most of these tricks. Even CP/M and OSes from 1960th included them! They were just done in the system generation phase.

Fascinating. Would be interesting to know more about that. Something even more primitive than PC DOS 1.0 or CP/M sounds rare, indeed.

Do you even have something that is “position independent” there or is it banal case of everything having fixed address (and thus being resolved with static linker, not dynamic linker)?

Because I know of no OSes (or OS-less environments) that have support for what the opening poster had in mind: something that you may put at any random place in memory and simply execute… usually fixups and loader are there, just, sometimes, they are bundled together with the executable.

The original Apple 1 ROMs (eg booting into WozMon) maybe counts?

Some how I think not. The I don't think the 6502 processor was suited to position independent code, worse still to get performance the infamous "page zero" (First 256 byte of memory) was used because it was accessible with short one byte addressing.

The 6809 was mentioned. A novel feature of that chip at the time was its relative addressing for calls and jumps which made PIC easy. It was a great help we built a debug monitor for it in 1980 or so. Also some applications to load and run from that. Just had to load the code from our C60 cassette tape, serial or paper tape interfaces anywhere there was memory free and jump to it. Ahhh... happy days.

There were manually assembled, fixups were still there, they were just hand-written on paper.

Yes, that was an interesting design, but… have it actually went anywhere? OS-9 is still, supposedly, exist… but have it retained that ability to simply load program anywhere in memory and just simply jump there to execute it? Given that modern rewrite is C-based I strongly suspect that these days it's done like with every other OS: by using relocations and loader.

I recall my Amiga-owning friends saying that "absolute addressing is forbidden in AmigaOS programs, because they may be loaded at any address.". I never checked to see if this is actually true, but this thread made me do some digging, and apparently it isn't true. The Amiga Loader does relocation patching.

.. and while I was at it, I checked to see if anyone was building Rust for Amiga, which was silly of me to do, because obviously there are people working on Rust for Amiga.

Oh what, there is support for 68000 in Rust?! I have a couple of 68000 chips in a draw here that I've been meaning to build into an SBC for a couple of years now.

Yes, but if I understand correctly it is a bit buggy. See m68k-unknown-linux-gnu - The rustc book and https://github.com/rust-lang/rust/issues?q=is%3Aissue%20state%3Aopen%20label%3AO-motorola68k

People have done proof of concepts of Rust on classic MacOS, Windows 95, etc. So why not? At this point I would assume all of the popular retro computers from the 90s or late 80s should be covered, or someone is working on it. :laughing:

(Older than that, such as the C64, is more doubtful.)

Yeah, a native compiler is never going to happen, but someone has built Rust against llvm-mos and cross-compiled Rust-to-C64, which is pretty neat.

Thanks for all the details everyone!

I decided to just forgo the relocatable binary/sections and use a linker script to absolutely position the kernel (it's eventually going to be anyway) and add support for virtually mapping the elf binary to its proper position.

The problem is, after removing the following from my target:

    "position-independent-executables": true,
    "relocation-model": "pic"

And using the link script (added this to .cargo/config.toml):

rustflags = [
    "-C", "link-arg=-Tmemory_layout.ld",
]

And rebuilding... and it's positioned correctly... but now I have a GOT section.

$ objdump -h target/target.x86_64/debug/kernel| more

target/target.x86_64/debug/kernel:     file format elf64-x86-64

Sections:
Idx Name          Size      VMA               LMA               File off  Algn
  0 .text         0000e0e6  ffffff8000000000  ffffff8000000000  00001000  2**4
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
  1 .got          00000028  ffffff800000e0e8  ffffff800000e0e8  0000f0e8  2**3
                  CONTENTS, ALLOC, LOAD, DATA
  2 .data         00003664  ffffff800000e110  ffffff800000e110  0000f110  2**3
                  CONTENTS, ALLOC, LOAD, DATA
etc...

But based on what I'm reading above, it seems like it shouldn't (?).
If I'm explicitly specifying a starting virtual address for my binary, why is a section emitted for supporting PIC?
Or is that simply for if the loader needs to load it elsewhere? (Meaning if I load it to the correct address then there's nothing I need to do with the GOT (other than load it)?

Thanks,
Jeff

I don't have a direct answer to this (maybe someone else does), but it's probably worth checking the binary's symbol tables to see which specific symbols are allocated inside .got, and the relocation entries to see what specific relocations are to be applied if any - that may be a useful clue for why it's being created at all. Since it's only 40 bytes there's probably only one or two things in it.

Not relevant if you're not using PIC any more, but aside on PIC + vtables...

Clang actually has an option -fexperimental-relative-c++-abi-vtables which represents vtable entries as offsets relative to the beginning of the vtable itself. This makes the calling code slightly more complex to turn the offset into an absolute address, but removes the need to have relocations for vtable entries.

This isn't ABI-compatible with "normal" C++ libraries (though it is used as the platform ABI for Fuschia), but if your program statically links any C++ dependencies and only dynamically links to libraries with a C ABI then you can use it, and Chromium does this in some configurations to significantly reduce the number of relocations and the load-time cost of applying them. Chromium has a large number of large vtables from things like C++ classes which mirror the type hierarchy and general interface of the DOM.

It looks like someone was interested in supporting this in Rust as well!

This still doesn't guarantee no relocations: anything in .rodata or .data that is a compile-time-initialized pointer to something else in the binary still needs a relocation entry. The compiler can't easily avoid this in the same way, because it would need to modify any generated code that accessed the pointer in any way, and doing that correctly for every conceivable way that it could be used seems impractical (e.g. what if you memcpy a struct that contains such a pointer-represented-as-offset to some arbitrary location on the heap or stack: you'd probably need to resolve the pointer value as part of the memcpy operation, which is awkward and intrusive at best).

GOT is not used just to support PIC. It also makes it possible to use symbols from other libraries (in particular publicly visible symbols from main executable may be called from other libraries).

Try to see what symbols are in that .got, then you would be able to what is in there. There are only 5 entries, judging from size… probably from some precompiled code that's compiler -fPIC option.

If you haven't rebuilt everything from scratch (including std and compiler support libraries) it's non-trivial to avoid GOT.

I do not know the Rust details about Trait Objects and why their vtables would strictly require absolute addresses. But LLVM has a C++ ABI for vtables with relative addresses according to these sources:

But it's not widely supported.