I'm currently investigating how to identify functions declared in standard and third-party libraries in Rust malware.
I noticed that functions or traits declared with the "pub" keyword were not exported when we created DLLs using code from third-party libraries with the "dylib" option.
Question
As a condition for a function or trait not to be exported when a DLL is created using the "dylib" option with code,
I think the following conditions apply to functions that are not exported in a DLL.
Is my understanding correct?
In addition, if there are any other conditions, could you please let me know?
the "pub" keyword is not declared
it is a generic function or trait and has never been called with a concrete type.
Thank you for taking the time to read my question.
Looking forward to your response.
cdylib: This only exports a C interface and only #[no_mangle] functions are exported.
dylib: This additionally exports a Rust interface with the crate metadata stored in the .rustc section with the rust_metadata_* symbol pointing to the start of this section. In addition to #[no_mangle] functions it exports an undefined set of functions that can only be called from Rust. For all functions that are not exported from the dll yet are still callable according to the privacy rules, the crate metadata instead stores the MIR for the function which then gets codegened locally in the crate that calls the function. This happens for all generic functions, all #[inline] functions as well as functions for which rustc has determined that cross-crate inlining may be beneficial. For example because they are very small. For generic functions it is still possible that they get exported if they were called locally and debug mode is enabled. In that case only the specific instantiations that were actually called get exported to make compilation of downstream crates a bit faster.
I meant undefined set of functions, not set of undefined functions. In other words if the function is in the set of exported functions it is always defined. The exact functions that are exported aside from #[no_mangle] functions depends on the exact rustc version and compilation flags however.
This is an example of functions that are not no_mangle, yet are often exported in current rustc versions.
You have to compile as cdylib rather than as dylib. A cdylib only exports #[no_mangle] functions as those are the only ones that can be called from C.
there is not a strict term of "defined functions" as far as I know, I think he used that term loosely, trying to clarify to your earlier question, it's the set (of exported functions), NOT the functions themselves, that are undefined; in other words, the functions are not "undefined", hence "defined", to emphasis.
here, the set of exported functions are "undefined" (or "unspecified"), means the compiler reserves the rights to decide whether a function is exported or not, and the user shall NOT depend on this implementation detail. if you need stable/deterministic result so you can depend on it, you should use cdylib, not dylib.
to be clear, the dylib crate type is NOT meant to be consumed by foreign languages like C/C++, and only rustc of the exact same version is guaranteed to have the knowledge of how to import the library correctly. this is part (not all) of what it means that "rust does NOT guarantee a stable ABI".
of course you can inspect the file with any tools (like objdump), but you will not be able interpret the content in a meaningful way, unless you are rustc (or use the same internal code of the rustc).
Thank you for your reply.
Am I correct in understanding that the set of exported functions vary between compiler versions and is unstable when using dylib?
To provide some context, I am currently investigating the possibility of identifying third-party library code in Rust malware.
In order to do so, we need to create a signature, and it is crucial that the public functions of the third-party libraries are included in the executable file.
Assuming that the third-party libraries are not modified, the easiest approach seemed to be creating the executable using dylib.
However, when building the executable with dylib, some functions were not defined in the executable, and there was no explicit way to include them.
In this case, is the only way to ensure that the compiled results of the third-party public functions are included in the executable file to compile a code that calls the third-party library’s public functions?
As mentioned before, it is assumed that the third-party libraries are not modified.
like, recognizing/guessing standard or well-known library function names in the disassembled code, am I understanding this right?
by "third party library", do you mean a rust crate, or C library?
to my understanding, even the functions are public, they not guarenteed to be exported in the final executable, unless the public functions are explicitly exported as FFI, e.g. annotated with the. #[unsafe(no_mangle)] attribute. but I'm not complete sure about this, I might be wrong.
"not modified", do you mean the third party library is distributed as pre-built binary blobs instead of compiled from source? or do you mean the source code is not modified?
that's the way how rust libraries work. they are design to be consumed by other rust code with the same version compiler, they are not meant to be used other ways, e.g. loaded at runtime.
I might be wrong about this too, but to my understanding, even if you called the functions, they are not necessarily exported. even functions defined in your own crate might not be exported. as @bjorn3 stated, functions can exist only as MIR in the metadata.
so my conclusion is, because of how rust compiles code, there's no reliable way to export an extern "Rust" function in an executable. you'd better find other heuristics than the symbol table.
That is correct. For example small functions implicitly get marked as #[inline], which will cause codegen to be delayed to the crate that uses the function.
"third party library" means a rust crate. (e.g. syn)
"not modified" refers to not making changes to the crate in order to create a signature (e.g. insert #[no_mangle]).
Even though LTO, #[inline], and optimization are unavoidable, my understanding is that the public functions of a crate will be included in the executable file(not .dll with dylib, but .exe) when the code containing those public functions is compiled.
This is true if you create the .dll using dylib, right?
If I compile it as an .exe file, is it correct that the .exe file will always contain the public functions of the crate that are called by a code?