Catching panic through C code, how bad is it?

The documentation for catch_panic says:

It is currently undefined behavior to unwind from Rust code into foreign code

but unfortunately that's exactly what I wanted to do. libjpeg expects its error handler callback to never return, so I make it panic on error via something like this:

catch_panic(|| ffi::c_function(panicking_rust_callback));

It appears to work fine on x86_64. How big risk is it, if:

  • I'm mixing with only plain C, not C++
  • setjmp is not used
  • catch_panic + panic are always called on the same thread, same stack (there are no async callbacks).

?

1 Like

Side note. May be use another thread and/or global variable to catch such errors?

I already do, but this is not a solution. Design of libjpeg forces error handlers to either unwind the stack or abort the program. I don't want to abort the program, so stack unwinding is the only other option I have. It's because libjpeg has lots of code like:

if (!foo) {
   error_callback(); 
}
foo[0]=1;

so if my callback ever returns, the program will crash or do unsafe things.

1 Like

Sounds like it's time for a new jpeg library.

2 Likes

That is an unhelpful answer.

I am in fact designing a new Rust JPEG library, but to get something working first I want to leverage a few of libjpeg-turbo functions, since Rust doesn't have SIMD support yet.

Still, none of that is relevant to the question of what are the consequences of unwinding through C stack frames in a controlled situation.

4 Likes

It's a really unsatisfying answer, but your best bet will probably be to print a really descriptive error message then abort. I'm pretty sure libbacktrace can also be used to generate backtraces for any ELF executable, so that may be useful during development.

The big issue is that unwinding across the FFI boundary is UB and even if it appears to work on your machine, you're betting that the behaviour will always be the same. It's kinda like how you can return a pointer to a local variable in C and still read the variable just fine even though that stack frame has been popped. Sometimes it works, but other times you'll get garbage or leave your application in an indeterminate state (or demons could fly out of your nose).

4 Likes

I don't doubt it's UB in general, e.g. if the FFI language has non-standard stack. And I get it would be really bad for a foreign language with destructors (maybe even if the stack goes in and out of FFI multiple times).

But what can go wrong in case of C, when the stack is Rust -> C -> Rust and C is compiled with about the same version of LLVM?

1 Like

I'm not actually sure to be honest. I can't say I've ever tried something like that, mainly because in all the FFI resources out there say it's really bad (even across the C/C++ boundary with exception safety), and I once segfaulted when a Rust function I was calling from Python panicked.

You might want to write a dummy Rust program which calls some C which calls into Rust again and then step through the unwinding with gdb to see what actually happens. Also, is the stack layout for Rust and C guaranteed to be the same on x86_64?

Sorry I can't help you much more than that! Someone on the compiler team (like @nikomatsakis or @nrc) might be more knowledgeable than me though. Keep us informed on how your experimenting goes, I'm really curious to find out what actually happens when you unwind across the FFI boundary!

EDIT: I got curious and found an issue on the Julia language repo about unwinding from C++ into Julia. It may help answer some of your questions.

1 Like

This is probably because libjpeg may now be in an inconsistent state and can't continue. So, if you do do this, you had better unload and reload libjpeg (as a dynamic library).

For the actual "unwinding", you can implement this yourself with a setjmp and longjmp in C.

1 Like

Yes, I was going to say the same -- write the error handler in C, so it is a stack of Rust->C->C->C unwinding only within the C part.

1 Like

On some targets it will work fine; on others your program will abort because Rust unwinder will not be able to locate stack unwind info for C code. As an example, I'd expect that x86_64-pc-windows-msvc is going to be among the former, while i686-pc-windows-gnu will be one of the latters.

I think that the safest option in your case (for some definition of "safe") might be C's own setjmp/longjmp. Preferably without crossing the language boundary (i.e. create C wrappers around libjpeg functions that do the setjmp and return an error code to Rust caller), though calling setjmp from Rust code might work too. Of course, this will leak all sorts of resources, but that's a given for this sort of stuff.

3 Likes

It depends on the distribution. Some compile all their C code with support for unwinding, others do not. It might be safer to use a small C wrapper with setjmp and longjmp, but the overhead might be prohibitive.

Another alternative would involve bunding libjpeg and recompile it with -fexceptions. If libjpeg indeed assumes that the error handler callback never returns, it should be able to cope with the stack unwinding without resource leaks.

2 Likes

Because we make no guarantee at all here. UB is UB, even if "it works" in some circumstances. Doing this is inherently playing with fire.

3 Likes

Indeed, great idea. Why not compile C code of libjpeg-turbo with C++ compiler, and throw exception inside callback, and then catch it in c++ wrapper of corresponding function and return error code to rust?

Oh, that's an interesting hack. Thank you for the suggstion.

Can you afford to call exit(1)?

Beware that if you catch an exception or panic, it may not be safe to call into that FFI library anymore. It usually takes some effort to maintain exception safety, which is why Rust has UnwindSafe.

1 Like

libjpeg is widely used with setjmp, and uses its own memory pool, so unwinding it is not a problem.