How to unwind stack like C setjmp/longjmp or C++ throw

Perhaps, after you've been programming as long as I have-- (since 1979 when I coded z80 assembler on my TRS-80 I got after painting my parents' cottage, lol)-- you learn what you thought was abnormal at one time can surprise you in its normality once you run into the problems that make it the elegant solution. I tend to believe it is the problems you solve that dictate normality, not the strategies you use to solve them.

A point where you make a call into a deeply recursive function where it is known that only stack allocations (i.e. not heap) will occur represents a point in the code to which it may be desirable to return control when a certain condition is met and at an arbitrary depth in the recursive call tree. And you may need to do so as efficiently as possible, and avoid the overhead of a return and frame adjustment at each call point. Hypothetically speaking now, as a formal part of the language, it's possible that an artifact could be introduced (analogous to lifetimes) with which the compiler could make use for generating an optimal execution path. (I'm not proposing this, but I introduce it as just a thought experiment to illustrate an idea.)

To gain some historical perspective it's useful to consider Lisp, which has been around longer than C and many constructs that modern languages like to boast as their best features were first implemented in Lisp. To that end we find that common Lisp (CL) has a longjmp like construct called "tag-body and throw" which solves the problem I mention elegantly. Having to unwind the stack in order to restore (or rollback) the execution flow is a real problem, worth solving and optimizing for, and is far more than what you are characterizing as the "mother of all gotos."

That said, Prolog, as a fundamental part of the language has a built-in feature called back tracking which automatically generates "return points" to which execution must resume more efficiently than an unwind of each call would be desirable. This is, again, because it solves a problem produced by deep recursion.

Rust declares itself as a systems programming language and that says to me (for good or bad) that I would ideally like to use it to replace C in all cases. Therefore the problem landscape is very coarse and offers many hurdles to which the best solution, you may, at first glance, deem abnormal, but later a reasonable one.

I may be wrong of course, but if in doubt I would prefer to do what I can do in C rather than some other-- perhaps more "normally" viewed conventional approach that could be more expensive performance-wise. My current focus is Genetic Programming, after than I hope to create a WAM (Prolog abstract machine) in Rust and ejecting stack accumulated artifacts more efficiently than a call-by-call rewiind will likely be a "problem" to solve at some point.

1 Like

Quite a newbie then :slight_smile:

What then is the problem we are trying to solve? Sounds like the desire to make a call like:

    let result = black_hole(param1, param2, ...)

Where black_hole can run down all kinds of recursion, allocate all kinds of memory and spawn all kinds of threads, claim all kinds of resources, on its way to the singularity at the bottom.

At which point it should hit "bail out". And magically return to the statement after the call, as if nothing had happened. Perhaps with some kind of result.

In assembler that is easy, just a "JMP" instruction gets you back. Likely with a lot of chaos left behind. In C that is easy with setjmp/longhmp, also likely with a lot of chaos left behind. C++ gives us exceptions.

All of them make reasoning about what a program does tricky.

1 Like

Lol--- I often envy the newbies!

In my case currently, your "black_hole" (is that what you meant to call it and not back_hole?-- I like that!) would be great. I had thought a similar idea actually as a way to "tag" your call so that a "bail" is possible. (Not the mother of all gotos, but the mother of all returns!)

I don't know if that captures all the cases that the Lisp tag-body and throw / setjmp/ longjmp captures, but it's better than a goto that's for sure. (I agree gotos-- simple branches without meta information are bad and no doubt would quickly lead to spaghetti code!)

Yes, "black hole" ... "singularity", get it? Your alternative sounds horribly obscene.

Oddly enough, since I started programming, in 1974, I have never felt the need for that black hole and bail out. Not setjmp/longjmp, not exceptions, whatever.

What I have often wanted is for a way for threads to be able to instantly commit suicide or be killed off as if nothing had ever happened.

Perhaps that is logically the same problem...

Are the details of this documented anywhere? (Not looking to utilize it, just curiosity).

This may be a cultural difference, and I certainly can't speak for @CKait here. setjmp and friends are certainly a very ad-hoc way to approach it, but there are languages out there with control-flow primitives that accomplish similar goals and where the culture around the language engages with them as a way to construct and explore new structured control flows.

I'm thinking particularly of call/cc here, and its less-powerful delimited-continuation equivalents, which are widely used in Scheme for expressing computations that may complete abruptly. It's rare for code to use those primitives directly, though - the pattern is usually to build a higher-level flow abstraction ("exceptions," say) on top of them, and then to call that from application code.

Rust isn't easily amenable to that kind of control-flow abstraction, in large part because Drop is hard to express in those terms, but it's not unreasonable to want to explore alternatives to Rust's existing "return or panic" rules.

This is due to SEH. Windows has a unified way of handling unwinding, and Rust panics, C++ exceptions, and longjmp hook into the same thing.

4 Likes

1974? Very nice. In the 80's I worked with several people from your era during summer jobs at a place called Digilog in Montgomery county PA, doing and CP/M 8080 drivers and MP/M-86 assembler. Invaluable experience learning from these guys. Sadly have completely lost touch. Those were the days of bulletin boards so perhaps you may know some of them through that.

btw/ That obscene name was not a suggestion--- it was simply a "back-at-cha" for your "mother of all goto's". How's mine was any worse? Lol.

I have considered asking about how a thread could commit suicide here-- but that constrains the case a little bit more than desirable.

Here's a possible use case you may have run into. Imagine a very deep binary tree that you must search thousands of times a second and you do this using recursion and you cannot expect the compiler to implement tail call recursion very well ( a thing I asked previously here and learned I cannot with Rust. ) So at some arbitrary depth when your result is found you must do a return and with the stack an average depth of 1000-- That would be 1000 stack pops when all that really must happen is a jump and stack pointer adjustment.

Even if the compiler did tail call recursion well, there could be something about the recursive call that might prohibit that. Some real world side effect you cannot code around. That's pretty close to the case I've got here. In Genetic Programming you must generate thousands, or even millions of S-Expressions-- that are basically B-Tree structures and interpretation looks a lot like this.

I have not written a Prolog WAM compiler ( yet ), but when I do I expect the optimization problems to look similar.

Correction 1973. I was 17 in second year of technical college. We had no computer, only a single teletype linked by phone to a far away mainframe. We learned BASIC and assembler as part of a Computer Science course, the rest was computer architecture, statistics and numerical analysis. I thought it was wonderful. CP/M was another year or so in the future. The first I heard of micro-processors was when some friends were ordering 8080 chips or some such from the USA.

Anyway, if I remember correctly people managed pretty well without recursion back in the day. One just has to write a loop and maintain ones own stack or stacks to keep track of where one in in the problem. I got he impression that is how recursive descent parsers/compilers were built.

Thank you. Good comments.

I think it's best to focus here on the problem not the solution. I'm not looking for a longjmp if there is a better way to solve the problem in Rust. Since I don't think that is available it was natural for me to describe how I solved the problem with C. In no case am I trying to argue longjmps are pretty. Goto's are ugly too, and as ZiCog pointed out longjmps, are Gotos. And I am not advocating that Rust adds goto's!

On the other hand I would say C has goto's not because they are pretty, and not to satisfy people who don't know better, it's just sometimes they are necessary because there simply is no other elegant way that provides the performance needed. I think that's what makes it a systems programming language. It's there to replace assembler where jumps are more common than calls.

Rust has "unsafe" and I don't think anyone would argue that they ever want to use it. But if my choice is either to use an unsafe block, or something as ugly as a goto, I would choose the latter every time.

btw/ I must also invest the effort to document it very well, and add perhaps other coding constructs around it so that it is always handled with care by future programmers.

As you're saying ( I did also ) there are indeed other languages that solve this problem. I mentioned Prolog and Common Lisp. It's just built into Prolog, and in CL you need to be explicit.

As a complete side. Rust is a great thing. I think it's the first language that really can replace C. Problems calling for setjmp / longjmps are probably always going to be corner cases-- Things that just happen from time to time in this cruel world. I also think Rust will evolve to solve it without the need for unsafe code blocks. ( Perhaps the optimizer will just "do" it right, but I'm thinking maybe that's not possible, and it would require (tags or) something that look more like lifetimes. ) Maybe a way to tag a call as a "black_hole with bailout", as I think ZiCog suggested -- which might mean the compiler would reject anything in recursive tree that would not permit a "fast return" to the root level.

You can convert a recursive solution into an iterative one. If you like, you can even emulate an unstructured GOTO in Rust using loops.

I wonder how much that is true now a days. The optimisations compilers can do today are amazing.

In many decades I have only gotten one "GOTO" accepted into any serious code that was subject to review. It was in the name of performance in an interrupt handler that fired off quite a lot in the days when processors were very much slower. I was very proud of that GOTO :) Goto is used well in the Linux kernel. Only partially for reasons of performance I think. It is used very well there is quick and orderly error returns from functions cleaning up as necessary on the way out. Much easier to comprehend than a mess of loops and conditionals and so on.

On a whim I counted the goto's, steampunk and longjmp in the Linux source:

➜  linux git:(master) ✗ grep -r goto * | wc -l 
  176611
➜  linux git:(master) ✗ grep -r setjmp * | wc -l 
     143
➜  linux git:(master) ✗ grep -r longjmp * | wc -l 
      78

Intersting. Seems setjmp/longjmp don't find much use there. Actually, looking at that, does it not imply that half the setjmp are not used?!

Anyway, what really are the overheads when bailing out of a recursion in some massive tree search? Is it really significant enough to worry about?

I'd love to see an example.

https://crates.io/crates/colossal

Rust is necessarily a little more verbose, but main.rs looks almost like the original FORTRAN IV code it is based on. More importantly, it functions almost identically.

Yes of course, but-- I don't like-.- lol

So, I mentioned Prolog-- it's a fun language to become proficient in and once you do it leaves a permanent effect on you. Well, for me it did. It took me a whole summer (and then some) though. Anyhow, once you do a fair amount of Prolog, recursion becomes the natural way of looking at things.

If you've ever studied a grammar notation like, BNF-- or yacc syntax-- you'd get an idea what I'm talking about. Recursion is everywhere there too. I suppose the could use loops there too. But uggh, who would want that.

( swi-prolog is awesome. couldn't resist giving Jan the maintainer a plug!)

Prolog is not everyone's cup of tea. I know.

Indeed it is.

Those who want to have control over their memory usage. Rather than delegate it to the language/OS and hope it all works out.

For most of my career recursive solutions were not allowed. For exactly that reason in embedded systems.

I have even worked on a PCB CAD system where routining, an inherently recursive algorithm, was done without using recursion in the language. You can still buy it here: CADSTAR | eCADSTAR. Although I guess recursion in the language is now used, since it has all been rewritten in C++.

I don't think so, not running destructors is completely safe (e.g. mem::forget) and unsafe code cannot rely on the execution of a drop implementation for soundness.

From one of the OP's original links:

UB by circumventing the borrow checker.

2 Likes

Borrow checker is of a course problem. It is also one of the reasons why we don't have goto. I thought that you was talking about running drop implementations while jumping, which panic does while unwinding.

There's actually a way around this: put it on the stack. If a value is on the stack, you're guaranteed that one of three things is true:

  • The stack frame is never freed, because the program is aborted or a lower stack frame never returned
  • The value is dropped during panic unwinding
  • Control flow is returned to your stack frame, and you control what happens to the value

This is relied on for soundness, by things like pin_mut! and closure-based scoped threads.

Everyone here would likely benefit reading the C-unwind RFC, as it discusses when it is even potentially sound to unwind over Rust frames by anything other than a Rust-controlled panic.

5 Likes