Reliable cross-platform way to fetch context/individual registers?

Thought I'd give a shot at trying to implement my own garbage collector. One of the requirements for a conservative implementation seems to involve scanning everything that looks like a root. Including the stack, as well as the CPU's GPRs the compiler might have stored a pointer in.

After a bit of research, there appears to exist a whole bunch of ways to skin this cat:

  1. getcontext for Linux, or GetThreadContext for Win32
  2. Linux-only setjmp, which is not exactly reliable
  3. manual architecture-specific asm! write-up
  4. a cross-platform (?) version of 1.
  5. interrupting the thread with a signal, then 1. again

The 4. appears to be the right tool for the job. Yet the specifics of the how I'm meant to be calling into it elude me somewhat. The function itself expects a *mut ucontext_t. Which I can only provide it, safely, through a MaybeUninit, be it zeroed or otherwise. Which must first be called itself, thus scrambling the register values I was looking for, to begin with. By the time getcontext returns, what are the chances I'm reading the exact GPR values as they were before the call?

Assuming there is a way around this particular part, what do I do about the stack frame of the function I'm calling it to begin the scanning of roots, itself? For instance, at any part in my main program, I may want to run my GC. Thus, I call into GC::collect. Which is where I'm fetching GPRs via getcontext or similar. Yet by the time the function call is underway and I'm in the frame of the GC::collect, the registers themselves are no longer the same as they were before the call.

What I'm looking for are the GPRs at the exact moment I choose to run the GC, not what they are a few frames deeper where I might be making my call into the getcontext. Am I supposed to have the GC::collect to be not a function, but an inlined collect! macro instead? To record at which point in the stack I'm about to call into collect, then inside the function itself scan the last few dozens of bytes leading up to that point; just to ensure any registers spilled onto the stack in the given prologue/epilogue are still properly scanned and accounted for?

"Talking" to an LLM about this clearly was a mistake. Whatever "intuition" and clarity it might have imparted to me at the beginning quickly turned into a bunch of self-contradicting nonsense: from "you must getcontext use to scan for any pointers in the registers", to "you're absolutely right, machine-specific asm! is the way to go", to "I apologize, you're absolutely right, there's no way to capture the registers at any one moment in any given thread from within itself, as it's only possible by suspending it from outside by a signal, before fetching the context inside the signal's handler".

I simply can't imagine how someone could let leave entire codebase at the mercy of a GPT assistant, at this point. But that has quite little to do with the topic at hand. So, any pointers?

  1. How are you going to distinguish Box<T> (one heap value, one pointer to it) from Vec<T> (many heap values, one pointer to the first)?
  2. You are breaking data structures like XOR-linked-lists (they store prev.bits^next.bits instead of two pointers), or any that use tagged pointers.
  3. Multithreading support?

It's really easier if you apply GC to a subset of objects. (Have you seen Rc/Arc yet? Does your app produce reference cycles?) An example is GitHub - kyren/gc-arena: Incremental garbage collection from safe Rust · GitHub; it even links to design explainer in the README.

registries and stack are all just raw binary you can't really reliably tell if they contain pointers to something, i recommend you look more into how garbage collectors actually keep track of what pointers exist at any given time

A conservative garbage collector is one which does not try to tell, but assumes that each piece of data might be a pointer and makes sure it is not collected if it is.

So then. This did turn out to be a bit more interesting of a challenge than I had anticipated. For one, there seems to be no single reliable way to scrape all of the registers at any given point. Not without hand-rolling one's own custom assembly for each and every individual platform of interest. Different OS's, different calling conventions, different prologues/epilogues; all make it rather impractical.

For this particular use case, however, turns out I might not need any it at all. Might package this little experiment into a crate of its own. I'm kinda enjoing the way it's coming together.

Unless you're replacing your own global allocator, I don't believe there's any sensible way there. Boehm's GC does precisely that, under the hood. Unless I really misunderstood what it's about.

Raw pointer scan won't help much with those, true. Although just a few days ago I did learn that D lang's team had to deal with (are still dealing with?) the fact that their own garbage collector wasn't quite smart enough to figure out its way around around preserving XOR LL's either. The first few people stumbling this must have had quite a fun experience. Tagged pointers are not as tricky, in theory: as long as your scanner takes into account the bit-shifting, you won't have much trouble.

In all honesty, for my personal use case / vision, it would be way too much of a hassle. As well as cognitive overhead. The kind of control I'm going after implies being able to pause and run the collector (to completion or incrementally, in a later iteration) at any given instance. Forcing several threads a time to cooperate would require an access to the stack as well as thread local storage of each, in addition to the static memory of the executable as well as the heap itself.

I can imagine this being far easier to account for in a higher-level lang, backed by a VM that can put every thread on temporary hold after N operations, check if the garbage collector is about to run, and submit all of its own memory space into a joint registry which shall have to be relied on in the reclamation process about to execute. Yet for Rust, still feels completely out of place. Not to mention that my main concern with this little experiment was chiefly ergonomical.

Sure have. These might/would have been perfectly sufficient too, if their usage didn't imply clone-ing all over the codebase just to make sure one is not missing out on any tiniest bit of "control" and/or performance considerations one may (initially) not care in the slightest about.

I'd like to support them, yes; for convenience purposes, if nothing else.