I'm interested in doing a post build step where I embed a hash of the binary itself into the binary. If this was in C/C++ I would do this by having a single variable in a special named section in the binary, and have a post linking step where a helper program or script puts in the data at that offset.
I'm wondering what the best way to do this in Rust is, as it seems build.rs does NOT support post build actions.
In the interests of avoiding a xy-question here, I should also talk about the use case I'm trying to solve: I'm implementing a command-not-found handler for Arch Linux (this is the thing that suggests what package to install when you type a non-existing command in your shell on Linux, the current one is too slow for my taste). This involves a process that needs to start and do it's job increadibly quickly. The overall latency is determined by the total runtime of my program.
I'm looking at serialising binary cache files and mmaping them back in (using rkyv). I want to use the non-validating API (to avoid loading more data than needed in case of cold disk cache), so I need to ensure the data is from a compatible version of the binary. Re-generating the cache files is slow, but not a deal breaker (several seconds, normally done from a cron job or systemd timer on a daily basis). I need to version the data format thus, to avoid UB from loading incompatible data when the format changes.
I could do this manually but knowing myself this is error prone especially during development. I will forget to update the value.
I could embed the git hash, but for releases that becomes difficult to get hold of (not built from git repo) and when developing I might not have made a new commit yet.
What is always unique is the hash of the build artifact itself. For obvious reasons I don't want to compute this at runtime (slow, doesn't matter during cache update, but definitely matters in the hot path of a lookup!). Thus I'm looking at embedding said hash into the binary so it is just a static variable.
I know (approximately) how to embed a hash: Put a static variable into a separate section, then patch the data in this section after building. I'm fairly sure I could get that working on my own using objdump to get the section offset and a bit of shell scripting (or using a suitable crate in rust and doing it from rust code).
The problem is that I can't run any code post build from what I can tell. Not that works with cargo install at least. And since I want to publish to crates.io I don't see any other option.
Thus the answer to my question about how to do a post build action might be "I can't do that (currently)", in which case the followup question naturally is: what can I do instead? I listed some ideas I considered but discarded.
I found appending the data to work a little better. If I remember correctly, doing that does not affect a digital signature. Neither the Linux nor the Windows loader were bothered by the extra stuff. Setting and checking the extra data are almost trivial.
Thanks, that would indeed work. Now I feel stupid for not thinking about that. Cargo.lock + src is not quite all that is needed, since I use a workspace with several local crates. But this should be doable.
I don't know what is your usecase, but if it is purely for debugging/identifying purposes, you could embed hash of current git commit, instead of hashing source code.
As I indicated in the original post it is for data format purposes, and since I will mmap in data serialised with rkyv and not do validation on it, getting it wrong is a soundness issue. rkyv is a crate for zero copy deserialisation.
Thus git hash is not ideal (also the git hash doesn't exist on crates.io any more).
While that is generally true, for a cache file (that gets regenerated by a nightly cron job anyway) that is less important. If the cache isn't valid: just re-generate it.