What's the problem of using different CRTs on Windows over FFI?

Hi, sorry for asking a bit too vague question. I wrote several R packages using Rust via FFI.

Recently, I received some concern about using different CRTs between R and Rust on Windows; R (as of version 4.2) uses URCT while, if I understand correctly, the Rust toolchain uses MSVCRT. It seems their concern comes from their experience with the problems of statically-built C/C++ libraries when R switched from MSVCRT to UCRT.

https://blog.r-project.org/2022/11/07/issues-while-switching-r-to-utf-8-and-ucrt-on-windows/

In my understanding, memory allocation / deallocation can be a problem on Rust's case as well (the Rust libraries should not free the memory allocated by R, and vice versa. I think this is a basic rule of FFI anyway). However, except for this, I have no idea what problems there can be. On C/C++'s case, the encoding and locale support of CRT matters a lot, but I don't think Rust's string system depends on it.

Rust is not C / C++. I guess Rust depends much less on the CRT than C/C++, but at the same time, I know it does depend at least to some extent. What are the examples of possible problems caused by the difference of the CRTs?

Also, I'd like to know the current status of UCRT support on Windows. I found this comment, but couldn't find there's any progress on windows-gnu's support for UCRT.

In general, that's true for any pair of binaries.

It is.

Correct. Rust strings are absolutely not anything Microsoft strings.

Yeah. From your confusion, I'm confused about such things as well. Why do you believe your R package built with Rust is statically linking to any C runtime? While it has a been a few years, the last time I dealt with this issue it was clear that our Rust application used whatever CRTL was available on the target machine (which can make the situation much worse than using a statically linked CRTL).

1 Like

Thanks.

Sorry, I didn't describe the details around this. No, I don't use crt-static on Rust, and R is not statically linked to any C runtime. So, the concern is when a staticlib-type Rust library, which was compiled for MSVCRT is called from an R session, which should use UCRT.

(Honestly, I don't understand these things well. Sorry if my explanation doesn't make sense...)

That is certainly understandable given this...

I'm struggling to understand how an external library is statically linked to the dynamic library of R. I guess they're trying to say they use import libraries.

That question can be boiled down to the interface between the Rust library and R. If the folks who built R have done a good job then responsibilities are clearly defined and memory ownership is always retained by the thing that allocated that memory.

In other words, you nailed it in your original post...

I think I don't understand this part, but it's that the Rust crate is compiled to the static library, and then the static library gets liked on compiling some helper C code to make it possible to call the Rust functions from R's side. As R uses MSVCRT, when the session loads the result DLL, the DLL should use MSVCRT accordingly.

So, are you saying the only possible problem is about memory allocation? Doesn't Rust rely on the C runtime for other things than memory allocation...?

Memory management is not the only potential problem. Some of the C library is stateful. On rare occasions I've seen programs built incorrectly call the "wrong" function resulting in a crash. But...

As far as I know, Rust, on Windows, only uses the C library for heap management. On the several occasions that I've dug into the details of various bits of the Rust standard library the end result has always been a call directly to the operating system.

When I'm concerned about such things a MAP file is an invaluable tool. Armed with a MAP file it's possible to definitively answer both of those questions.

Rust heap management on Windows does not use the C library. As you can see at rust/alloc.rs at c8e6a9e8b6251bbc8276cb78cabe1998deecbed7 ¡ rust-lang/rust ¡ GitHub, it uses HeapAlloc directly, which is part of kernel32.dll.

Looking at a program of mine using DUMPBIN, the C library is used for the executable entry point before main(), maths functions, memcpy() etc., and stack unwinding for panics.

1 Like

Thanks. I didn't come up with this. So, it might be that there are no such things like "the problem of using different CRTs," but there actually are differences, and they might matter, right?

Indeed. Many years ago I worked in a project that pulled in three different CRT's for one process. We discovered that this was a problem because the CRT's at startup would pull in the environment variables from the system and store them in their own CRT-specific buffer. After the initialization the setenv(), unsetenv() and getenv() calls would operate on this CRT-specific buffer. We first noticed this when one DLL would set an environment variable, but it would not become available to another DLL.

2 Likes

Yes. The most obvious one out of what I noted is that I wouldn’t expect stack unwinding on panic to work across code compiled for different CRT’s. And as blonk noted, even if Rust calls HeapAlloc, GetEnvironmentVariable and so on, the different CRT’s might not from C and C++ code.

1 Like

Note that the article is talking about MINGW which (in Rust) uses the very very ancient msvcrt.dll distributed with the OS (which was intended as a private dll and should not have been used by third parties but its kept around because people did).

The msvc and the (tier 3) llvm-gnu toolchains use the UCRT.

2 Likes

Thank you all for a lot of useful information!

This is exactly the kind of problems I'd like to know. Interesting.

Yes, unwinding is one of the biggest problem related to FFI. A good news is "C-unwind" ABI is getting stabilized! (but I honestly don't understand to what extent it will solve the problem)

Oh, I didn't know the MSVC toolchain also uses UCRT. Unfortunately, if I remember correctly, R requires the library to be built by the GNU toolchain, so it's probably not the case. Good to know anyway.

The general rule of thumb here is "it's the interface's problem" - they have to define not just what functions exist, but also everything legal to do with them. If they, for example, allocate and return a string you have to CRT free(), they have to define exactly which CRT it is, which can get very complex very quickly.

The two main approaches for the interface to avoid having to deal with this mess are to demand you build both sides of the interface, then it's your problem to make sure the compiler and library settings and versions match up; or to provide a completely hermetic interface, where no assumption about the runtime is made: mostly this is things like having a free_foo() for every new_foo(), doing their own last_error() instead of using errno, and so on.

So uh, tldr read the R library docs, I guess!

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.