Why was ‘s: string’ used for type definition instead of C’s ‘string s’?


#1

Why was ‘s: string’ used for type definition instead of C’s ‘string s’, I notice it’s also used in Pascal so it’s not really new, probably goes back decades. I also notice many other languages use it, e.g. Swift, Kotlin, TypeScript, even Python, it seem to be a trend. What advantages has it from user perspective and from compiler developer perspective compared to the C way?


#2

Disclaimer: I am not a compiler writer nor a Rust designer, and this is just my opinion.

One first issue with the C declaration syntax is that it does not cleanly separate the name of a variable from its type the way Pascal-style colons do. This makes the life of compiler parsers harder, but that’s not the worst problem with it. The worst problem is how the C language designers purposely exploited that confusion when they introduced pointers, arrays, and const qualifiers into the language.

T a, b; // Two variables of type T
T* c, d; // One pointer to T, one variable of type T (confusing)
T e[50], f[50]; // Two arrays of 50 objects of type T (redundant)
const T * const g, * const h; // Seriously, this is getting ridiculous
T *(* const i)(U *, const V *); // Please, stop, you're hurting yourself

The syntax for declaring pointers, arrays and constness in C is an atrocious mess. And as soon as you try to fix it, you’re not compatible with C anymore, and won’t manage to make C developers happy. Worse yet, if your syntax is similar enough, it will trigger their muscle memory and they will suffer more from learning your language. With this in mind, going for something completely different is a good idea.

Now, you might argue that this “something completely different” might as well still have the type in front. But there is an intrinsic advantage to putting the type in the back: it allows for a terser type inference syntax. If your grammar for variable declarations looks like this…

type SEP identifier [= initial_value];

…then the first thing which the compiler will look for when parsing a variable declaration is a type, which means that you cannot just elide the type, you must put a placeholder in its place, like C++ does with “auto”. Whereas if you put the type after, you can easily make it entirely optional:

identifer [SEP type] [= initial_value];

…and then all of the following are valid according to the grammar:

a = 5;
a : MyInt = 5;

#3

C’s syntax has ambiguity:

a * b

could mean multiplication or a pointer variable b of type a. It depends on what typedefs the parser has seen so far, creating a bit of a chicken-egg situation that you need to have results of parsing in order to parse properly.

Also complex C types require parenthesis. In Rust types like “array of functions returning 2d array” is straightforward. In C I wouldn’t know where to even start.


#4

C has an even bigger problem in that the syntax of declarations and types are different. What’s the type of char *argv[]? It’s char (*)[], obviously. Good luck papering over that in a preprocessor macro.

Of course, this alone doesn’t prevent us from having reasonably separated types like Vec<&str>, it’s just that @HadrienG got all the good points. :slight_smile:


#5

I’ve always thought it meshes better with my mental model of type inference as well:

Rust

let x: i64 = 4;
let y = add_nums();

In both cases the type information is coming right-to-left just like the value.

C++11

In C++11, x gets its type information from the “left” while y gets its type information from the “right” (from add_nums).

int64_t x = 4;
auto y = add_nums();