The multiple meanings of T* fn(T, T) in C(++)

jacg · October 7, 2022, 9:11am

Elsewhere it was pointed out that a C function signature roughly like this

T* fn(T*, T*);

represents a multitude of different concepts. So I started to try to enumerate the possible meanings.

So far I've got

The first parameter could be nullable, or not. In Rust: should the first parameter be an Option? That doubles the number of possible Rust signatures. Let's represent this with x2.
The first parameter might point to a single datum, or a sequence. Rust: T vs &[T]: x2.
The first parameter might be borrowed or owned. Rust: &T vs T (or &[T] vs Vec<T>, or ...): x2.

So far, that's x2x2x2 = x8 variations on the first parameter. Something similar applies to the second parameter and the return value. That's a factor of 8 for each, so we have 8x8x8 = 512 variations.

When both inputs and the return contain references, we have a number of choices for the lifetime of the output:

first parameter
second parameter
both
static

That's a factor of 4 which must be applied to the 1/8 of the 512 which have refs in all three positions. There are 512/8 = 64 of those, which turn into 64x4 = 256, for an additional 192. This leaves us with 512 + 192 = 704 variations.

These are just the obvious ones. I guess there are more.

What else could the C signature mean, that should be documented and must kept in the C programmer's head, but which can be expressed in and verified by Rust's type system?

farnz · October 7, 2022, 9:56am

I can see one more set of meanings - because C does not check lifetimes, any lifetime parameters in either input parameter (if it's a reference, or if it's an owned pointer to a struct containing a borrow) can be either 'static (meaning that it can be stored in a global variable or a static within the function) or '_ (meaning that the borrow is no longer guaranteed to be valid after the function returns).

kornel · October 7, 2022, 10:15am

Another axis is whether the data behind the pointer is initialized or not (&mut T vs &,ut MaybeUninit<T>). In C it's common to use write-only pointers for output parameters — a replacement for multiple-valued return.

kornel · October 7, 2022, 10:18am

In practice I also often miss information whether the function mutates the arguments, or not. Some libraries simply don't bother to put const, or can't due to weird edge case like const char *const * actually letting mutate the target const char.

And it's missing information whether the function is thread-safe, which Rust expresses by having T Send and Sync.

Michael-F-Bryan · October 7, 2022, 10:24am

Wait... What?

Do you have an example of this? I always interpreted const char *const x as x: &[u8] in Rust (as opposed to const char* x, which is closer to mut x: &[u8]). Obviously, in Rust neither of these forms let you modify the u8 being referred to.

kornel · October 7, 2022, 10:43am

That's because you can cast char* to const char*, and you can alias pointers, so you can replace the target of a const char* pointer while you can still use it through its char* alias.

https://c-faq.com/ansi/constmismatch.html

H2CO3 · October 7, 2022, 11:26am

Yeah, yay for non-transitive immutability

Michael-F-Bryan · October 7, 2022, 11:28am

Ahh, I forgot that other languages still allow aliased mutation.

jacg · October 7, 2022, 12:34pm

The nullable return type axis is a bit longer than might appear at first sight: the function might throw an exception (in C++) or call longjmp(). But it's the same fundamental Rust mechanism that caters for all of these: Option/Result.

tczajka · October 7, 2022, 2:24pm

I still don't understand what you said here.

Yes, if you have a const char *const *, somebody else might be modyfing the char. But the same is true if you have a const char*.

This doesn't apply in the case you said. A const char *const * parameter can take a char ** argument.

Edit: Oh gcc still produces a warning in C. Weird. It's allowed in C++ with no warning.

Michael-F-Bryan · October 7, 2022, 3:37pm

Yeah, I think that's what the page was talking about when it said

C++ has more complicated rules for assigning const-qualified pointers which let you make more kinds of assignments without incurring warnings, but still protect against inadvertent attempts to modify const values.

SkiFire13 · October 7, 2022, 3:48pm

Another meaning could be when the first and second parameter are supposed to point into the same range (which is different than having the same lifetime!), i.e. when they are the begin and end iterators.

jacg · October 8, 2022, 3:56pm

Ah yes.

How would you express this sort of idea in Rust? Slice + indices?

Michael-F-Bryan · October 8, 2022, 4:28pm

It depends on what you mean by "iterator".

The C++ definition is a pointer-like object that can be used in a for (auto i = start; i != end; ++i) loop, which is logically equivalent to accepting impl Iterator<Item=T>.

However, if it's meant as a sub-section of some contiguous chunk of memory (i.e. an array or std::vector), then Rust would use &[T].

You see other languages (C#, JavaScript, Python, etc.) accepting the whole list and start/end indices, but that's mainly because they have no way of expressing slices without creating a new list. Passing around list+indices is almost never necessary in Rust because we have more precise control over memory access/layout and slices are a first-class citizen.

jacg · October 9, 2022, 11:24am

Indeed.

Given that this tangent was started by the observation that two C++ pointers might be interpreted as two iterators defining a range, I guess we have to recognize that C++ has (pre-C++20) 5 different categories of iterators (input, output, forward, bidirectional, random-access)) arranged in a not-entirely-trivial hierarchy.

That only covers up to forward, input and output iterators in the C++ hierarchy.

I guess that the complexity of C++ iterators makes this a can of worms that is probably not too interesting to explore in this context.

Michael-F-Bryan · October 9, 2022, 11:34am

Yeah, in general, Rust started with the approach that an "iterator" is something which can yield a next item (similar to Python and most other languages), whereas C++ started with iterators being a generalisation of a pointer to some range of object. Once you realise that, the various types of iterator in C++ start to make a lot more sense.

Topic		Replies	Views
Inefficient way to declare lifetime?	62	953	May 16, 2025
Which side is handling lifetime specifier in rust compiler? help	24	978	July 17, 2024
A little lifetime exercise (for newbies) tutorials	31	3450	September 9, 2018
Pointer constness for FFI help	8	294	November 15, 2025
Why are function pointers special? (no null)	29	3630	January 23, 2023

The multiple meanings of T* fn(T*, T*) in C(++)

Related topics

The multiple meanings of T* fn(T, T) in C(++)