Why strings and vectors are treated differently? (syntactically)

I am quite new to Rust and have been trying to learn it, so sorry in case the question is trivial.

Strings and vectors are similar concepts in Rust. The "Rust Book" states: "Vectors are to slices what String is to &str". So why is the syntax so different?

While I agree that literals should be syntactically different (e.g. "1234" vs [1, 2, 3, 4]), I wonder why the other syntax should be so different when dealing with them.

To explain what I'm referring to, let's consider the following examples.

Example 1

let a = "1234";
let b = [1, 2, 3, 4];

This is ok: the syntax of the two statements is similar apart from the different literals.

Example 2

let a: &str = "1234";
let b: [i32; 4] = [1, 2, 3, 4];

This starts already to puzzle me. Why for the string I need to use the reference & symbol, and why for the vector I need instead to specify the length? If the two concepts are similar (one is an array of chars, the other one an array of integers), why is the notation so different? I would have thought more clear to use for instance str and vec, so to write for instance:

let a: str(4) = "1234";
let b: vec[i32; 4] = [1, 2, 3, 4];

or:

let a: str = "1234";
let b: vec<i32> = [1, 2, 3, 4];

(in this case the length would have been inferred through initialization).

Example 3

let a = "1234".to_string();
let b = vec![1, 2, 3, 4];

Same question here. Couldn't we use str!("1234") and vec![1, 2, 3, 4], or otherwise "1234".to_string() and [1, 2, 3, 4].to_vector(), or otherwise String("1234") and Vector[1, 2, 3, 4]? (I would have liked the latter one the most).

Example 4

let a: String = "1234".to_string();
let b: Vec<i32> = vec![1, 2, 3, 4];

The two statements are even more different now... Couldn't we use either String and Vector<i32>, or Str and Vec<i32> etc.

Thanks

I think the main problem with merging these two concepts is the fact that Strings and &strs do not represent characters in an array like you would expect them to. The length of a string in memory isn't going to always be equal to the number of characters in the string, because unicode characters (things like סֶ) occupy more space than regular ASCII characters (like a). In order to force that characters are accessed safely in a way you would expect, the string types need to be wrapped in their own structures (String and &str). If they were treated just like arrays, it would be possible to create corrupt UTF8 strings which wouldn't be valid to display or use.

Internally, a String is really just a Vec<u8>, but in order to preserve UTF8 validity, it needs its own data type which validates it. &str, similarly, is really just a wrapper over &[u8], but is provided in order to assure to anyone using it that it is a valid UTF8 string.

All in all, strings really are the same as vectors on the inside, but they need to have their own distinctive types in order to keep things from corrupting or invalidating the UTF8 structure.


Side note:

One other thing you mentioned is that you need to declare a length in order to have an array, like [i32; 4]. This isn't always necessary, and you only really need to use it if you want an owned slice which is fixed to a specific size.

If you want to have the slice equivalent to what an &str is, use something like this:

let a: &str = "1234";
let b: &[i32] = &[1, 2, 3, 4];

This way the array of [1,2,3,4] is stored in the same way that &strs are. The [i32; 4] syntax is really just there for specific cases where you need to confirm that the array is that specific length, which isn't really something you can do with strings. The above allows the [i32] to be treated exactly like a string slice would be.

3 Likes

Hi, thank you very much for the good explanation, I guess I need some more time to digest Rust...

For now I have a question about the second half of your reply. The following two statements seem to produce the same result:

let a: &str = "1234";
let a: &str = &"1234";

Instead, I cannot write:

let b: &[i32] = [1, 2, 3, 4];

Can you please explain why this is allowed for strings and not for arrays?

Thanks

The reason &"something" works is the fact that "something" (without &) is already an &str. So, you end up with an &&str, which rust will turn into an &str automatically. On the other hand, [...] is just an [i32; 4] by default, so you need to reference it.

Someone else could probably speak more in-depth to why string literals are always stored as &str, but I think it mostly has to do with the way they are embedded in the program binary. Because the majority of strings just want to exist, and not be modified, I guess it would be impractical to create a new mutable string each time the code runs.

Meanwhile, arrays, and all other data types, are created and owned each time the code is run. This means you can mutate it if you store it as &mut [i32], or as [i32; 4], but also that you need to explicitly create a reference with & if you want an &[i32]. Owning things by default is more flexible, but less efficient (I guess) if you never mutate it.

To recap, mostly everything you can create is owned by default (no & to start with), except for strings, which are already references when you create them using string literals ("1234").

This is basically it, as far as I know. They are stored in a read only part of the memory and what you get is a reference, with the type &'static str to this spot. It makes it possible reuse the same reference for multiple strings with the same content and you don't have to make space on the stack or the heap for a possibly huge text chunk if you just want to show some text. It may be more to this, but that's all I know.

I once imagined changing the string literal syntax to &"hello", rather than just "hello", to be consistent with other parts of the language. But it will also make the language slightly more verbose, which may not be worth the change. Your mileage may vary, though.

Well, that's also a bit confusing. Because "hello" isn't just a &"hello", it's actually, & 'static "Hello".

Thank you for the explanations, now I understand the real meaning of "hello" in Rust. However I still believe notation could be made more consistent between strings and arrays. At the end stings and arrays are still similar concepts, with two notable differences being that strings are always of type u8 and of undefined length.

There can be several possibilities, the one I propose here is to use notation vec<type>[size] (or vec<type; size>) for arrays, to modify Vec into Vector, and to add an str! macro. I am going to explain this with reference to the same examples above.

Example 1

let a = "1234";
let b = [1, 2, 3, 4];

No change.

Example 2

let a: &str = "1234";
let b: &vec = &[1, 2, 3, 4]; // or: &vec<_> = &[1, 2, 3, 4];
let c: &vec<i32> = &[1, 2, 3, 4];
let d: vec = [1, 2, 3, 4]; // or: vec<_> = [1, 2, 3, 4];
let e: vec[4] = [1, 2, 3, 4]; // or: vec<_>[4] = [1, 2, 3, 4];
let f: vec<i32> = [1, 2, 3, 4];
let g: vec<i32>[4] = [1, 2, 3, 4];
let h: vec<i32>[2][2] = [[1, 2], [3, 4]];

This looks more consistent to me. Notice that arrays b, d, e have element type inferred through initialization, if this is not needed they could be removed. Array f has size inferred through initialization, this would be convenient when just the element type need to be specified.

Example 3

let a = str!["12", "34"];
let b = vec![1, 2, 3, 4];

Example 4

let a: String = str!["12", "34"];
let b: Vector = vec![1, 2, 3, 4]; // or: Vector<_> = vec![1, 2, 3, 4];
let d: Vector<i32> = vec![1, 2, 3, 4];

The main differences here are the modification of Vec into Vector and the addition of the str! macro.

This is the "correct" (as in "should compile in current rust") versions of your 2nd example:

let a: &'static str = "1234";
let b: &[_] = &[1, 2, 3, 4];
let c: &[i32] = &[1, 2, 3, 4];
let d: Vec<_> = vec![1, 2, 3, 4];
let e: [_; 4] = [1, 2, 3, 4];
let f: Vec<i32> = vec![1, 2, 3, 4];
let g: [i32; 4] = [1, 2, 3, 4];
let h: [[i32; 2]; 2]= [[1, 2], [3, 4]];

The relationship:

String :: &str :: N/A :: N/A (but similar to &'static str)
Vec<i32> :: &[i32] :: &[i32; 4] :: [i32; 4]

Note that "abcd" is &'static str but [1, 2, 3, 4] is [i32; 4].

Also note that String and Vec are slightly slower than the alternatives.

We do not have fixed size strings because we rarely need to "know" the string length. We do not have easy syntax for String literals because we don't need them a lot in practice.

I think some of these would be sound changes to make, but it is kind of a bit late to do so now. I think the original decisions to do this the way they are now were really just to give strings a first-class position as a primitive, rather than just another type. Vecs, on the other hand, aren't given any such special treatment. They are just another struct provided in the standard library.

If this was a discussion around a year ago, it probably would have been excellent changes to make - however, as Rust is now in beta status and fast approaching 1.0, there isn't really any room any more for huge breaking changes like this. This has already been decided, for better or for worst.

Thank you for your answer, reading your and other messages I feel I am not the only one who thinks syntax for strings and arrays could be improved. I am sorry if this discussion comes late, but I started studying Rust only recently (actually the last time I checked Rust out there was not a good documentation available yet).

I have then opened a few threads on this forum regarding aspects of Rust syntax that I think could be improved before final release of v1.0. In particular, these are the three main points that I noticed while studying Rust:

Syntax for strings and arrays:
(this thread)

Dereference operator notation:

Iterators notation:

When I opened these threads Rust was still in alpha and I thought syntactic changes were still possible (and I believe for instance the notation [T; N] was introduced not long ago). A few days later Rust turned into beta and all replies afterwards highlighted that now that Rust is in beta no more changes are allowed anymore...

Ok, I can accept that, I still like Rust and I think it is a very promising language although with a few rough edges. However I think it is a pity to disregard possible improvements of the syntax just because we are in beta now. Actually in my opinion, now that Rust is in beta and is going to be studied and used by many more programmers outside the core developers team (also because documentation is available), many useful suggestions from fresh eyes looking at the language for the first time may come to smooth out the last few rough edges. It will be a pity if those suggestions (I am not necessarily talking about mine) will be disregarded because it is too late. The reason for being late is probably because not many developers outside the core developers team studied and used Rust while it was still experimental or in alpha and without good documentation. In my opinion it would actually be now the right time to listen for suggestions and evaluate possible changes before the final v1.0 release. If it was me, I would purposely plan to leave Rust in beta, leaving some time for new developers to grasp the language, and I would ask for feedback suggestions. After release of v1.0, it will really be too late, so we will all be forever stuck with whatever syntax Rust v1.0 comes with. I am ok if suggestion are disregarded because the present solution is better, but it would be a pity to disregard them just because of time. It is obviously extremely important to get Rust right at v1.0.

In my opinion syntax changes could be managed through a smooth transition time, where both the old and the new syntax are supported, and the old one is marked as deprecated. I think everyone should agree that for instance the syntax proposed above is compatible with the present syntax, and they could coexist during the transition time. No code would break, and a few weeks time would allow programmers to update their code before v1.0 is released.

Please notice the aim of my posts is not to criticize but to give my contribution. As I wrote, I think Rust is a very promising language, but making it even better would benefit everyone. I think clear and consistent syntax is a very important aspect for a new language in order to attract as many programmers as possible. We all deal with awkward languages when needed, but prefer to use beautiful syntax languages whenever possible. This in particular is important in the open source world and for "spare time" community projects (this in my opinion is one of the reasons behind Python's success).

To conclude, I think the point is not to evaluate whether it is too late or not for this or other suggestions (at least until the official v1.0) but to decide which solution is the best possible, as once finalized we will all be stuck with it for the next 20 years and more (long life Rust!).

Thanks everybody for the great work!

Indeed. And I'm sure that feedback will be incorporated into Rust as it evolves. 1.0 is the beginning, not the end. But at some point, a programming language needs to commit to stability if you have any hope of achieving widespread adoption.

Rust has been in development for a long time. It's time for 1.0.

This is false. Tons and tons of people outside the core team have been using Rust. Certainly, fewer people have been using it than the number that will be using it once it gets to 1.0, but suggesting that the user base is as small as a handful of people developing the language in isolation is a gross misrepresentation.

Rust's core team and its surrounding community has been doing this for years.

The bottom line is that the time for breaking changes is over. This is a commitment that the Rust core team has made. Rust will ship with mistakes---all software does.

I would strongly recommend that you read exactly what it is that the beta period is for: Announcing Rust 1.0 Beta | Rust Blog --- Beta isn't for experimentation, it's a staging area for stable release.

1 Like