Elsewhere it was pointed out that a C function signature roughly like this
T* fn(T*, T*);
represents a multitude of different concepts. So I started to try to enumerate the possible meanings.
So far I've got
- The first parameter could be nullable, or not. In Rust: should the first parameter be an
Option? That doubles the number of possible Rust signatures. Let's represent this with
- The first parameter might point to a single datum, or a sequence. Rust:
- The first parameter might be borrowed or owned. Rust:
Vec<T>, or ...):
So far, that's
x2x2x2 = x8 variations on the first parameter. Something similar applies to the second parameter and the return value. That's a factor of 8 for each, so we have
8x8x8 = 512 variations.
When both inputs and the return contain references, we have a number of choices for the lifetime of the output:
- first parameter
- second parameter
That's a factor of 4 which must be applied to the 1/8 of the 512 which have refs in all three positions. There are 512/8 = 64 of those, which turn into 64x4 = 256, for an additional 192. This leaves us with 512 + 192 = 704 variations.
These are just the obvious ones. I guess there are more.
What else could the C signature mean, that should be documented and must kept in the C programmer's head, but which can be expressed in and verified by Rust's type system?
I can see one more set of meanings - because C does not check lifetimes, any lifetime parameters in either input parameter (if it's a reference, or if it's an owned pointer to a struct containing a borrow) can be either
'static (meaning that it can be stored in a global variable or a
static within the function) or
'_ (meaning that the borrow is no longer guaranteed to be valid after the function returns).
Another axis is whether the data behind the pointer is initialized or not (
&mut T vs
&,ut MaybeUninit<T>). In C it's common to use write-only pointers for output parameters — a replacement for multiple-valued return.
In practice I also often miss information whether the function mutates the arguments, or not. Some libraries simply don't bother to put
const, or can't due to weird edge case like
const char *const * actually letting mutate the target
And it's missing information whether the function is thread-safe, which Rust expresses by having
Do you have an example of this? I always interpreted
const char *const x as
x: &[u8] in Rust (as opposed to
const char* x, which is closer to
mut x: &[u8]). Obviously, in Rust neither of these forms let you modify the
u8 being referred to.
That's because you can cast
const char*, and you can alias pointers, so you can replace the target of a
const char* pointer while you can still use it through its
Yeah, yay for non-transitive immutability
Ahh, I forgot that other languages still allow aliased mutation.
The nullable return type axis is a bit longer than might appear at first sight: the function might throw an exception (in C++) or call
longjmp(). But it's the same fundamental Rust mechanism that caters for all of these:
I still don't understand what you said here.
Yes, if you have a
const char *const *, somebody else might be modyfing the
char. But the same is true if you have a
This doesn't apply in the case you said. A
const char *const * parameter can take a
char ** argument.
Edit: Oh gcc still produces a warning in C. Weird. It's allowed in C++ with no warning.
Yeah, I think that's what the page was talking about when it said
C++ has more complicated rules for assigning const-qualified pointers which let you make more kinds of assignments without incurring warnings, but still protect against inadvertent attempts to modify const values.
Another meaning could be when the first and second parameter are supposed to point into the same range (which is different than having the same lifetime!), i.e. when they are the begin and end iterators.
How would you express this sort of idea in Rust? Slice + indices?
It depends on what you mean by "iterator".
The C++ definition is a pointer-like object that can be used in a
for (auto i = start; i != end; ++i) loop, which is logically equivalent to accepting
However, if it's meant as a sub-section of some contiguous chunk of memory (i.e. an array or
std::vector), then Rust would use
Given that this tangent was started by the observation that two C++ pointers might be interpreted as two iterators defining a range, I guess we have to recognize that C++ has (pre-C++20) 5 different categories of iterators (input, output, forward, bidirectional, random-access)) arranged in a not-entirely-trivial hierarchy.
That only covers up to forward, input and output iterators in the C++ hierarchy.
I guess that the complexity of C++ iterators makes this a can of worms that is probably not too interesting to explore in this context.
Yeah, in general, Rust started with the approach that an "iterator" is something which can yield a next item (similar to Python and most other languages), whereas C++ started with iterators being a generalisation of a pointer to some range of object. Once you realise that, the various types of iterator in C++ start to make a lot more sense.