I was trying to read a simple csv file with polars package. My code is modified from this page.
use polars::prelude::*;
fn example() -> PolarsResult<DataFrame> {
CsvReader::from_path("iris.csv")?
.has_header(true)
.finish()
}
fn main() {
let df = match example(){
Ok(d) => d,
Err(e) => {println!("Error while reading file: {}", e.to_string()); return; }
};
println!("{:?}", df);
}
I tested the code with command cargo run
. It took a long time to build but it ran properly. I checked the target/debug folder. For this simple program, the executable was 700mb in size!
I ran release version with command cargo run --release
. The executable created was 22mb in size.
Why such a large debug executable is being created for this simple program? Thanks for your insight.
Your program may be simple, but polars
certainly isn't.
When building in debug mode, the compiler will include data that a debugger can use to map every instruction back to the source code it came from, plus things like the names of local variables and what fields each struct may contain. These detailed debug symbols are also what give you nice backtraces that include line numbers or what make debugging with gdb
useful.
However, as you can imagine, all this information isn't free. You are pulling in polars
, which in turn pulls in a lot of other libraries. This is going to include a lot of debug symbols, especially when you are compiling in debug mode because we'll be generating data for a lot of trivial functions/types that would normally be inlined and optimised into oblivion.
To see how much debug symbols contribute towards your debug binary, you could try running strip
on it. There is also a strip
option in Cargo.toml
which lets you do this automatically.
5 Likes
Is it possible to reduce executable size by using some other use
statement instead of what I used?
I used following statement to import crate:
use polars::prelude::*;
No; use
just changes what names you can use and has no effect on compilation after name resolution has finished. You can always write a program with no use
s and get the same result.
The compiler won't include actually unused items ("dead code") in the final executable, regardless of whether your source code imports them.
2 Likes
This is very reassuring. I do not need to keep checking if some imports can be reduced by replacing * with specific packages.
Correct. This isn't C where a rogue #include
could accidentally pull in piles of unnecessary header files.
The base compilation unit in Rust is the crate, and after a crate has been compiled once, both the machine code and parsed metadata will be reused in future builds. All a use
statement does is add an extra entry in whatever datastructure the compiler uses to track which names are visible within a particular scope.
2 Likes
Polars has a lot of dependencies, while not all of them are used on every target, it is still a lot. I wouldn't be surprised if most of the size you see in debug mode comes from debug information for all those dependencies. Also it seems like the csv
feature accidentally pulls in polars-sql
by enabling the csv
feature of it using polars-sql/csv
rather than polars-sql?/csv
. The former enables polars-sql
, while the later only enables the csv
feature of polars-sql
if polars-sql
is enabled already. polars-sql
in turn pulls in an SQL parser, polars-plan
, polars-arrow
and polars-lazy
. Each of these have a lot of deps too. Some deliberate, and some accidental due to the same csv
feature mistake.
2 Likes