Hi all,
I recently published my first crate (https://github.com/kodemartin/rustpostal) and I would greatly appreciate a review.
Thank you!
Hi all,
I recently published my first crate (https://github.com/kodemartin/rustpostal) and I would greatly appreciate a review.
Thank you!
Hi! Thanks for your work, it's an interesting library. This is a use case that I've been pondering myself for some time.
Let's jump into the review!
First of all, the customary way of wrapping a C library in Rust is to split the functionality in two separate crates:
libpostal-sys
. This should contain the immediate C FFI bindings. These include C types and functions, as well as equivalents of other C constructs that can't be directly translated to Rust, e.g. const
items that replace #define
macros.-sys
one.This is preferable to your current approach of mixing safe and unsafe APIs in the same create.
Next, it would be nice to follow the Rust API Guidelines as closely as possible. A couple of things that stick out immediately:
unsafe
blocks are generally very long and coarse-grained, often spanning almost entire functions.Default
, Debug
, Clone
, Copy
(if applicable, especially on simple enum
s and newtype wrappers around primitive types), PartialEq
, Eq
(if applicable) and Hash
are the most important ones you'll want to provide for interoperability. It is very hard to use types that lack these implementations.NormalizeOptions::get_default_options()
should just be the Default::default()
impl. NormalizeOptions::initialize()
could be the derived Default
impl. (Having a constructor named initialize()
is confusing anyway – why do you need it? Why isn't NormalizeOptions::new()
enough?)In general, taking Vec<_>
in function arguments is considered an anti-pattern (see e.g. expand_address_with_options()
). They can only be used if the caller actually has a Vec
. It's preferable to accept a slice (&[T]
) or better yet, an Iterator<Item=T>
instead.
Your NormalizeOptions::as_libpostal_options()
function takes &mut self
, which is inappropriate for a function named as_something()
(without it being called as_mut_something()
). Do you really need this? If you are mutating the structure, then the method shouldn't be called as_something()
, since the idiomatic naming convention dictates that as_something()
methods perform a read-only conversion between types, and do not mutate the receiver.
Exposing separate (and unsafe
!) setup and the teardown functions is very error-prone. Consider wrapping setup and teardown functionality into an RAII "context" type, which calls setup()
in its constructor and teradown()
in its Drop
implementation.
Your documentation for address::parse_address()
is unidiomatic: you explicitly call .into_iter()
on the result before iterating, which is unnecessary.
The use libc;
statement in ffi.rs
is unnecessary.
Globally suppressing compiler warnings against unused items (#![allow(unused)]
) is a code smell, too. Why do you need it? You should in general fix warnings rather than allowing them, especially if you are allowing them globally.
The AddressComponents
bitset shouldn't be implemented manually. Bitset boilerplate can be easily implemented using the bitflags
crate.
The code is unnecessarily verbose in some places. For example, the following code snippet:
let mut n: libc::size_t = 0;
let n = &mut n as *mut libc::size_t;
let raw = ffi::libpostal_expand_address(address.as_ptr() as *const libc::c_char, options, n);
could be rewritten as just
let mut n: usize = 0;
let raw = ffi::libpostal_expand_address(address.as_ptr() as *const libc::c_char, options, &mut n);
because usize
is guaranteed to be the same as libc::size_t
, and an implicit conversion ("coercion") from &mut T
to *mut T
exists (as well as from &T
to *const T
, of course).
This is an amazing review! Thank you very much for the time you spent, the depth
you went into, and the constructive quality of your comments.
Indeed, many aspects of my first release seemed a bit off, and getting feedback
on the customary and idiomatic ways to do things in Rust, is extremely helpful.
A lot of things you mention, simply did not occur to me due to lack of
experience with the language.
Most of the recommendations are straight-forward and outline a roadmap for the
project.
To recap:
Use dedicated libpostal-sys
crate for the ffi.
Q: Do you think this relates only to the ffi.rs
module of the crate?
Or do you see declarations in other modules that should not be there?
Use RAII-enabled wrappers on the higher-level crate.
Converge to the Rust API Guidelines.
Note: You raise very good points in your specific remarks (derive
traits, implement Default
etc.), so no argue there!
Use bitflags
for AddressComponents
. Note: I value also the generic
implications of the comment, to reuse accepted community solutions and void
boilerplate.
Revisit reasons for suppressing compiler warning, and prefer fixing any
underlying issues.
Apart from these general directions, I would like to discuss more two specific
comments:
NormalizeOptions::as_libpostal_options()
:
Your
NormalizeOptions::as_libpostal_options()
function takes&mut self
,
which is inappropriate for a function namedas_something()
(without it being
calledas_mut_something()
). Do you really need this? If you are mutating
the structure, then the method shouldn't be calledas_something()
, since
the idiomatic naming convention dictates thatas_something()
methods perform
a read-only conversion between types, and do not mutate the receiver.
This has troubled me a lot during the implementation. Wanted to create a
higher-level interface to construct ffi::libpostal_normalize_options
. This
requires the population of a *const *const libc::c_char
field, and the
only solutions I could think of, was to mutate
NormalizeOptions.language_c_strs
and NormalizeOptions.language_ptrs
.
The latter holds the data in the target ffi
structure for the whole
lifetime of NormalizeOptions
. So this was the reaseon for &mut self
.
After reviewing API guidelines C-Conv I reckon the to_
convention is more appropriate, no?
About verbose code snippets:
an implicit conversion ("coercion") from
&mut T
to*mut T
exists (as well as from&T
to*const T
Is this because of the CoercedUnsized
implementation?
Thanks again!
All of the other modules contain pub unsafe
functions, which should not be there. The prime example of this is the setup-teardown pair which is directly in lib.rs
. But as_libpostal_options()
is also unsafe
.
Not really. to_xyz()
and into_xyz()
methods still shouldn't mutate. In general, conversion methods should not mutate the receiver. The difference between them is only that:
as_()
methods convert from either references or owned values to references or cheaply-constructed values (typically, primitive types or small Copy
types), e.g.: as_ref()
, as_str()
, as_bytes()
.as_mut_()
methods convert from either mutable references or owned values to a mutable reference. E.g.: as_mut()
, as_mut_slice()
.to_()
methods convert from borrowed to owned / expensive-to-construct types, e.g.: to_owned()
, to_string()
.into_()
methods convert from owned to owned types, e.g.: into()
, into_inner()
.Why? Couldn't you create a new *const *const char
value? Currently, you are indeed returning the pointers inside self
– and while I'm not familiar with the internals of libpostal
, it seems really dubious that you are returning something with pointers inside self
but that return value has raw pointers and thus it doesn't have a lifetime that would tie it to self
. All in all, this smells like Undefined Behavior waiting to happen.
No, it isn't. Nothing's unsized here. &T
always coerces to *const T
and &mut T
always coerces to *mut T
, full stop. Unsized coercions are an entirely different kind of beast. They are related to the fact that you can e.g. convert a &[T; N]
reference-to-array into a &[T]
reference-to-slice, and that you can do the same using Box<[T]>
, and trait objects, etc. None of those compilcations is happening here. Note how that CoerceUnsized
impl you linked to has T
on one side but U
, a different type, on the other side. In your case, there aren't two different pointed types. It's &mut T
to *mut T
, not &mut T
to *mut U
.
It's a lot simpler: a raw pointer has strictly less capabilities and information content than a reference — basically, references are pointers with extra lifetime and borrowing information. Therefore, you can always convert a reference to a raw pointer that points to the same type.
You got a point(er) there! In fact, if currently self
is moved after the ffi
structure is created and dropped for some reason, we will be left with dangling reference. I have to revisit this.
This makes sense. Have to understand coercion better.
Are there any other references, sample codes you could recommend on the topics we discussed?
This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.