As an aside: file sizes on trivial programs aren't terribly interesting, unless you're talking about multiple orders of magnitude difference, or you're trying to target a pathologically space-constrained system.
Thanks.
So the executable can be even smaller than what I reported!
Do you have reference links for what you mentioned, LTO, system allocator, executable stripping.
I already used '--released' forgot to mention it in my post.
I think the expectation about performance is here is probably unfounded. They both are compiled with LLVM, yes, but how runtime and object model works matters a lot, probably more than what particular codegen backend is used.
For purely number-crunching applications, when you don't do any allocation and just do a lot of math (not explicitly vectorized), the performance indeed should be similar.
However as soon as you start using allocation, "objects" and standard library, the differences in runtime, memory management, allocation patterns should make a big difference, and Kotlin and Rust are very different in these respects.
Also, I would naively expect that Kotlin JVM would be faster than Kotlin/Native for typical workloads: Kotlin's object model is basically the Java object model, and JVM is very optimized for dealing with it.
To drop the size even further, you can use Xargo to build the Rust stdlib from source when you build the application, which allows you apply lto and opt-level to the stdlib build, stripping out lots of the stdlib you don't need.
I've been meaning to try this for some of my performance-critical crates. I would think that it might speed up cases of many Vec calls.
What the easiest way to do this? Do I need to list every stdlib crate that I want recompiled in my Cargo.toml, or is there a single option to Xargo that will do it? (Target CPU is just my host CPU)
By changing the main.rs + Cargo.toml as you recommended, and using Cargo build --release, the size I got is: 238K which is less than the 253K you mentioned, any idea what could be the reason for this variance in numbers!
After using strip additional to the above, the final file size is: 169K
You'll probably need a more realistic example than only printing Hello World to see a difference. In this example, a large majority of the code is probably static linking the stdlib. You'd need Xargo to help make that smaller by building from source.
The s and z optimisation levels are now stable. These optimisations prioritise making smaller binary sizes. z is the same as s with the exception that it does not vectorise loops, which typically results in an even smaller binary.