The multiple meanings of T* fn(T*, T*) in C(++)

Elsewhere it was pointed out that a C function signature roughly like this

T* fn(T*, T*);

represents a multitude of different concepts. So I started to try to enumerate the possible meanings.

So far I've got

  • The first parameter could be nullable, or not. In Rust: should the first parameter be an Option? That doubles the number of possible Rust signatures. Let's represent this with x2.
  • The first parameter might point to a single datum, or a sequence. Rust: T vs &[T]: x2.
  • The first parameter might be borrowed or owned. Rust: &T vs T (or &[T] vs Vec<T>, or ...): x2.

So far, that's x2x2x2 = x8 variations on the first parameter. Something similar applies to the second parameter and the return value. That's a factor of 8 for each, so we have 8x8x8 = 512 variations.

When both inputs and the return contain references, we have a number of choices for the lifetime of the output:

  • first parameter
  • second parameter
  • both
  • static

That's a factor of 4 which must be applied to the 1/8 of the 512 which have refs in all three positions. There are 512/8 = 64 of those, which turn into 64x4 = 256, for an additional 192. This leaves us with 512 + 192 = 704 variations.

These are just the obvious ones. I guess there are more.

What else could the C signature mean, that should be documented and must kept in the C programmer's head, but which can be expressed in and verified by Rust's type system?

4 Likes

I can see one more set of meanings - because C does not check lifetimes, any lifetime parameters in either input parameter (if it's a reference, or if it's an owned pointer to a struct containing a borrow) can be either 'static (meaning that it can be stored in a global variable or a static within the function) or '_ (meaning that the borrow is no longer guaranteed to be valid after the function returns).

2 Likes

Another axis is whether the data behind the pointer is initialized or not (&mut T vs &,ut MaybeUninit<T>). In C it's common to use write-only pointers for output parameters — a replacement for multiple-valued return.

7 Likes

In practice I also often miss information whether the function mutates the arguments, or not. Some libraries simply don't bother to put const, or can't due to weird edge case like const char *const * actually letting mutate the target const char.

And it's missing information whether the function is thread-safe, which Rust expresses by having T Send and Sync.

4 Likes

Wait... What? :exploding_head:

Do you have an example of this? I always interpreted const char *const x as x: &[u8] in Rust (as opposed to const char* x, which is closer to mut x: &[u8]). Obviously, in Rust neither of these forms let you modify the u8 being referred to.

That's because you can cast char* to const char*, and you can alias pointers, so you can replace the target of a const char* pointer while you can still use it through its char* alias.

https://c-faq.com/ansi/constmismatch.html

6 Likes

Yeah, yay for non-transitive immutability :sweat_smile:

Ahh, I forgot that other languages still allow aliased mutation.

3 Likes

The nullable return type axis is a bit longer than might appear at first sight: the function might throw an exception (in C++) or call longjmp(). But it's the same fundamental Rust mechanism that caters for all of these: Option/Result.

I still don't understand what you said here.

Yes, if you have a const char *const *, somebody else might be modyfing the char. But the same is true if you have a const char*.

This doesn't apply in the case you said. A const char *const * parameter can take a char ** argument.

Edit: Oh gcc still produces a warning in C. Weird. It's allowed in C++ with no warning.

1 Like

Yeah, I think that's what the page was talking about when it said

C++ has more complicated rules for assigning const-qualified pointers which let you make more kinds of assignments without incurring warnings, but still protect against inadvertent attempts to modify const values.

Another meaning could be when the first and second parameter are supposed to point into the same range (which is different than having the same lifetime!), i.e. when they are the begin and end iterators.

2 Likes

Ah yes.

How would you express this sort of idea in Rust? Slice + indices?

It depends on what you mean by "iterator".

The C++ definition is a pointer-like object that can be used in a for (auto i = start; i != end; ++i) loop, which is logically equivalent to accepting impl Iterator<Item=T>.

However, if it's meant as a sub-section of some contiguous chunk of memory (i.e. an array or std::vector), then Rust would use &[T].

You see other languages (C#, JavaScript, Python, etc.) accepting the whole list and start/end indices, but that's mainly because they have no way of expressing slices without creating a new list. Passing around list+indices is almost never necessary in Rust because we have more precise control over memory access/layout and slices are a first-class citizen.

4 Likes

Indeed.

Given that this tangent was started by the observation that two C++ pointers might be interpreted as two iterators defining a range, I guess we have to recognize that C++ has (pre-C++20) 5 different categories of iterators (input, output, forward, bidirectional, random-access)) arranged in a not-entirely-trivial hierarchy.

That only covers up to forward, input and output iterators in the C++ hierarchy.

I guess that the complexity of C++ iterators makes this a can of worms that is probably not too interesting to explore in this context.

Yeah, in general, Rust started with the approach that an "iterator" is something which can yield a next item (similar to Python and most other languages), whereas C++ started with iterators being a generalisation of a pointer to some range of object. Once you realise that, the various types of iterator in C++ start to make a lot more sense.

2 Likes