I created this little crate as a first project to teach myself Rust: https://crates.io/crates/csv_log_cleaner (My background's mainly in Python). It's a CLI tool which cleans CSV files to conform to a type schema by streaming them through small memory buffers using multiple threads and logs data loss.
Docs are here: csv_log_cleaner - Rust
Github repo is here: GitHub - ambidextrous/csv_log_cleaner: Clean CSV files to conform to a type schema by streaming them through small memory buffers using multiple threads and logging data loss.
I would very much appreciate any comments or code reviews (or even PRs) that people might be willing to offer.
In particular, I'd appreciate it if someone could take look at the Rayon threadpool handling around here: csv_log_cleaner/lib.rs at 07ca52da3b8a82037877e031258a057e02e726ea · ambidextrous/csv_log_cleaner · GitHub All of my tests are passing, but I've had enough experience with multi-threading in other languages to know that might just be blind luck and I may have actually messed up the multi-threading disastrously.
A couple of things stand out already in connection with
impl Trait shouldn't be used in public functions unless you specifically want to restrict the caller from explicitly instantiating the corresponding type parameters (reader and writer). I don't see any good reason for that, here, so you should probably be using explicit type parameters instead.
- Taking a
&String (or a
&Vec, etc.) parameter is actively harmful. It adds no value over
&str (because a
&String can't grow as it's behind an immutable reference), but it requires the caller to have a concrete
String allocated. This would incur a superfluous allocation when they have something that would coerce to a
&str but is not a
String. You should just take an
&str, or even better, a type implementing
AsRef<Path> if you are trying to describe a filesystem path. Better yet, you should take
Writers, so that people can use your function with arbitrary readers and writers, not only with files.
Box<dyn Error> in a library is not very useful. It conveys no information. Create a semantic error type instead.
Moving on to the implementation:
@H2CO3 Thank you very much! Will implement.
This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.