Elide side effect free external API calls

Is there a way to mark extern functions as side effect free, so the compiler can elide calls?
Similar to how I think most Linux system calls set the errno thread-local integer variable, many Windows APIs also set a thread-local errno-like variable, which can be retrieved via the side effect free function kernel32.GetLastError. I created a wrapper crate for certain Windows APIs, which includes automatically calling GetLastError in the error case, before the next API call overwrites the error code. The issue is that when the crate user is not interested in the exact error code the call to GetLastError is issued anyway. I would like to get rid of those calls, without requiring the user to retrieve the error code manually.

1 Like

there's an unstable #[feature(ffi_pure)] feature for this, but I don't know how well it behaves, as it simply add an llvm attribute and relies llvm to do the data flow analysis:


1 Like

Thanks, I tried it out in godbolt with all different -C opt-level values, but it does not seem to elide the call, even though the assembly code discards the return value in eax by zeroing it out with an xor eax,eax, directly after the call returns. I may not be using it correctly?

The tracking issue for the feature links to this document: Common Function Attributes (Using the GNU Compiler Collection (GCC)), which states:

[...] functions declared with the pure attribute can safely read any non-volatile objects [...]

Does this thread-local errno-like variable qualify as non-volatile?

[...] tells GCC that subsequent calls to the [pure] function [...] with the same [parameters] can be replaced by the result of the first call provided the state of the program observable by [the function] [...] does not change in between.

The GetLastError function does not expect any parameters. Does a call to a non-ffi_pure function cause the compiler to understand that the state might have changed, and that it would need to keep the result of a call to GetLastError after the non-ffi_pure function, not before?

I don't think either ffi_pure nor ffi_const is appropriate for it, because the return value can change depending on when you call this function. I don't see any attribute that would fit this use-case.

ffi_const definitely isn't, but ffi_pure should be appropriate:

ffi_pure only guarantees that the function has “no effects except for its return value, which shall not change across two consecutive function calls with the same parameters”. The functions are still allowed to read global variables.

In LLVM IR #[ffi_pure] currently sets the function attributes memory(read) and nofree.

  • memory(read): Doesn't write any memory → no observable side effects besides unwinding, synchronizing with another thread or diverging.
  • nofree: Doesn't free any memory → all dereferenceable memory is still dereferenceable after the call.

This is enough to elide duplicate calls without any memory writes inbetween, but doesn't allow the optimizer to completely remove calls with unused return values, because the function may still contain an endless loop, a memory fence or unwind.


I don't have much technical knowledge about LLVM, but from a quick glance at the documentation you linked I could not see any other obvious attribute which could help me out here.

This is the GetLastError code on different platforms:

  • x86:
    mov eax, fs:[0x34]
  • AMD64:
    mov eax, gs:[0x68]
  • ARM64:
    ldr w0, [x18, #0x68]

As far as I can see it only reads from the always readable thread environment block, without any loops, fences or unwinding (though I am not entirely sure about the meaning in this context).

The logic of GetLastError is technically an opaque OS implementation detail, which "can" change at any time, but is practically set in stone, so I tried inline assembly in this godbolt snippet, which uses a single-instruction block (instead of calling GetLastError) with options(nostack, preserves_flags, pure, readonly).

  1. This succeeds in eliding the block when the error code is not used in return_bool. Any way to transfer those options to the GetLastError function import?
  2. In return_error the error code is used from both create_file and close_handle which causes the block to be emitted twice. This did not happen in the original godbolt snippet which used the GetLastError import. Maybe that is because of the opaqueness of the block? I tried to use the __read(f/g)sdword intrinsic (?) but it does not seem to be part of the documented LLVM intrinsics. The core_arch crate also does not provide this kind of intrinsic (and neither for reading the register x18 on ARM64).

In theory a LLVM function attribute combination like memory(read) willreturn nounwind nosync nofree should be enough to mark the function as fully side effect free, but apparently this optimization is not applied yet. Also there currently is no way to get this attribute combination in Rust code.

Looks like you're right. I tried some variations in the LLVM IR your godbolt snippet produces, but LLVM refuses to merge identical blocks with inline assembler. This seems to be a missed optimization.

These are intrinsics in clang and thus only usable in C code. They generate LLVM IR like

load volatile i32, ptr addrspace(256) %offset, align 4

(load from a pointer in target-specific address space 256, which means "relative to GS" for AMD64)

Currently, there is no way to generate that with Rust code.

It looks like you're out of luck for now. Some longer-term options would be to ask the compiler team for read(f/g)s intrinsics in core::arch or to open a ticket in the LLVM issue tracker for the missed optimization with inline assembler.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.