First crate code review

I'm super excited. I just published my first create. I don't know anyone that writes in Rust. I would appreciate a code review.

https://crates.io/crates/elp
https://github.com/ereichert/elp

Thanks in advance.

1 Like

Congratulations! It looks like you've documented the design decisions well. It's easy to follow :smiley: and I like that you've used chrono.

Can you explain what the file_list method is about though? It mutates a vec of filenames passed as an argument and returns the new length of the filenames. Specifically I'm not sure what the state of the filenames vec is expected to be if the method returns an Err. Does that mean it shouldn't be used or not? The fact that it escapes the Error result seems like a bad thing.

Do you plan to do any benches on more than a single line too? It'd be interesting to see what the scale is like.

ooph... yah... file_list isn't very good.

That is some of the first Rust I wrote when I started working on this project. Taking the Vec as a parameter was the result of writing C style code in Rust which I now understand isn't necessary in most cases. You can see I didn't write code in that style later because I learned how Vec are returned (cheaply). As for returning the size, I can't remember why I returned the size of the Vec.

I never really paid attention to that function after I got deeper into the project but it should probably be removed entirely. I was never thrilled with it being part of the library anyway. process_files should probably take a &[std::path::Path] and the user should decide how they want to build the file list or something like that.

And that's what v2.0.0 is for. :slightly_smiling:

As far as benchmarking, I'm not a fan of microbenchmarks in general. The one in the repo is somewhat useful however. I don't think I'll do much more than that as far as microbenchmarks. Were you considering something specific?

If you go through the git history, you can see I was benchmarking using an example program that was run against real data as I learned and made changes. I was recording the results in the main README. However, the example program is actually being used by my employer so I had to move it to a project in their github account before I published. Plus the test project did more than just benchmark the library. It benchmarked what it's used for by my employer.

That being said I just stripped the example program down to get it pretty close to just benchmarking the parsing. These are the result.

Processed 138 files having 11446671 records in 36120 milliseconds.

Not bad. That's with a pull request that someone sent me which got rid of a ton of allocations. Also, that's without any concurrency. It would be very easy to write a program that did the parsing in parallel.

I'll probably add an example project later that does the same as the current, stripped down example. The problem is that it's nearly impossible to duplicate the results unless I leave a bunch of log files hanging around on my local drive. I'll have to think about that.

Thanks for taking a look. I appreciate the feedback.

1 Like

Ah ok, I've never written C code, so I'm not familiar with that coding style.

Not necessarily, just a suggestion while I was at it :slightly_smiling:

Have you had a look at something like rayon for introducing potential concurrency?

It looks like you've got a solid plan for where you're going which is awesome, and also getting to write Rust for work is an enviable position!

rayon is interesting but, as a first run at adding concurrency, I'll probably stick with the primitives built in to the std lib. Just to get a little more experience with them.

Maybe later though.