Just first-timer's feedback

(Typo fix fixed. Arggh.)

"Help" isn't really the right category for this, but it seems closest...

A few observations from a someone who's worked in a fair range of languages but is learning Rust for the first time. Note that if this is the worst I've found to gripe about it means I'm pretty impressed with the language so far; this is nitpicking and flyspecking, and since it's language syntax I don't expect it to change -- it's just spots where a few more words in the intro book might be helpful.

The "match field and create binding" syntax, with the @ modifier after the variable name being bound, doesn't read very clearly to novice human eyes -- especially when there's a space between them so they aren't immediately visually associated, I think I understand the language developers' rationale, but as a newcomer to the language it was a point where my eyes crossed for a moment and it may deserve a few more words; I would have expected @name by analogy with &and * syntax, or at least a convention of formatting the code as name@ rather than name @.

The other eye-crosser for me is the ..= closed-range syntax. Again, it makes perfect sense to a language developer but it looks a bit weird to a newbie. It took a moment to realize that this was a three-character token rather than .. followed by =. For folks who have only seen = as assignment and comparison, having it be a modifier on .. may initially be hard to visually parse. I'm not sure how to improve documentation for that one; it's just something we have to get used to seeing.

As I say, much too late to change the language now, but I thought it was worth documenting that these small details were surprising stumbling points, so future editions of the book can think about whether there's a way to ease the user into them a bit more smoothly.

I guess Rust had to leave something for the next language to improve...!

(And I had to leave something to correct? Can I claim it's proof that I'm not an AI?)

1 Like

Yeah, that's a wonky bit of syntax that you don't really appreciate until you see the alternatives.

For a long time, you could use a..b for the normal half-open range, [a, b), and a...b for the inclusive range, [a, b]. However it's pretty easy to mistake .. for ... (or vice versa) when you are skimming code, so after a considerable amount of bikeshedding we ended up with the less-bad ..= syntax and deprecated ... (which was available on nightly at the time). Also, note that ranges use dots not colons (i.e. ..= instead of ::=).

Tbh, I don't mind that the @ binding syntax is so unfamiliar because you almost never see it used in real code. The vast majority of the time, you won't be matching on something's internal structure while binding to the larger object, which is where @ is most useful.

The if-let chain syntax also feels a lot more intuitive to me. That means I would prefer to write something like this:

if let Some(person) = maybe_a_person &&
   let Person { address: Some(addr), .. } = person 
{
  println!("Address is {addr}");
  do_something_with(person);
}

Instead of something like this:

if let Some(person @ Person { address: Some(addr), .. }) = maybe_a_person {
  println!("Address is {addr}");
  do_something_with(person);
}

Even if it costs me an extra line.

After a lot of head scratching, I concluded that you meant ..= when you wrote ::=. That is to say this. As an anecdotal counterpoint let me say that I found this syntax extremely obvious from the very first time I saw it. (It took me at least an order of magnitude less time to grok the syntax completely, than it did to decipher your ::= typo!). What I found far more confusing and unexpected was that all these bits of related syntax

  1. start..end
  2. start..
  3. ..end
  4. ..
  5. start..=end
  6. ..=end

evaluate to different types. I had a very strong implicit expectation that they would be different variants of a single enum type. (And was slightly disappointed that only 6 of the 9 possible combinations of the variants of Bound are covered by the syntaxes and their corresponding types.)

I don't mind syntax being unfamiliar in general. Admittedly, if all the syntax in a new language is completely unfamiliar, far fewer people will ever get past the initial deluge of WTFs and continue exploring the language, thus significantly lowering the chances that the language will ever accumulate enough users to surpass a critical mass needed for the language to thrive. Much of Rust's syntax is very familiar to the majority of programmers coming from other widely-used languages, and it has by now gained enough followers for little syntactic surprises to be irrelevant to its uptake and success.

On a philosophical tangent, experience of teaching programming for many years has taught me that all too many people struggle with decoupling syntax (or notation, in general, if we move away from programming) from the concepts that it represents. One example:

  1. fn(a, b)
  2. a.fn(b)
  3. (fn a b)
  4. fn a b
  5. a b fn

There are many people who thoroughly understand that case 1 above applies function fn to arguments a and b, but somehow struggle when the same concept is expressed in the forms 2-5 (and that's even before we bring dispatch into the picture). It turns out that (for some) the intellectual understanding that two notations are isomorphic is overwhelmed by some strong visceral distinction in perception between the notations that obscures the link to the concept that they represent.

So perhaps, in the real world, unfamiliar syntax is more important and problematic than I would like it to be.

5 Likes

I don't want to derail the topic too much, but your comment reminds me of this koan:

The venerable master Qc Na was walking with his student, Anton. Hoping to prompt the master into a discussion, Anton said "Master, I have heard that objects are a very good thing - is this true?" Qc Na looked pityingly at his student and replied, "Foolish pupil - objects are merely a poor man's closures."

Chastised, Anton took his leave from his master and returned to his cell, intent on studying closures. He carefully read the entire "Lambda: The Ultimate..." series of papers and its cousins, and implemented a small Scheme interpreter with a closure-based object system. He learned much, and looked forward to informing his master of his progress.

On his next walk with Qc Na, Anton attempted to impress his master by saying "Master, I have diligently studied the matter, and now understand that objects are truly a poor man's closures." Qc Na responded by hitting Anton with his stick, saying "When will you learn? Closures are a poor man's object." At that moment, Anton became enlightened.

It makes sense once you realise that the different forms have different semantic meanings, and by giving them their own types you can implement methods so they returns different things based on the range type.

For example, imagine if a BinaryHeap type had a .get() method which accepts a range. When you pass in the full range (..) it just returns an iterator over the backing array in whatever order it uses (really cache friendly!). On the other hand, when you use .get(..=10) it might return an LessThan iterator which is aware of how the heap is laid out and can use the fact that we only need to make a single < comparison because there is effectively no lower bound.

Why could they not return different things based on the enum variant?

Essentially, we need some dispatch and (hopefully not derailing the topic) when it comes to dispatch, generics are a poor man's enums and enums are a poor man's generics.

With enums the dispatch must be dynamic; with generics you have the option of static or dynamic. But the separate Range* types and the RangeBounds trait add complexity. Do the various tradeoffs clearly favour generics over enums?

I guess sometimes it's desirable to return not just different values, but different types (e.g., using a BinaryHeap example above - full range will produce some continuous storage, which is not available for non-full range).

Functions can only have one return type.

From a compiler's perspective, it doesn't make sense to say "sometimes I'll return a 64-byte object with an alignment of 1, but other times I'll return a 16 byte object with an alignment of 16", because then you don't know how much memory the caller needs to set aside for the return value or how that value should be aligned.

Things like impl Trait help hide the return type's name, but they don't magically make that "functions can only have one return type" requirement go away.

You could return an enum with variants for each of the iterator types, but that means the caller now needs to do a match to get the right iterator and consume it (remember, our full range iterator has a different type to our bounded one, so they can't be assigned to a common variable and iterated with one loop).

Alternatively, you could implement Iterator for that enum, but now every single call to next() will impose a match statement, and when your implementation is just incrementing a pointer and doing a bounds check (like the iterator for a full slice) that extra branching can have a non-trivial effect or could act as a barrier to optimisation.

3 Likes

No, not really.

Generics for allow an open, extensible set of types. Enums don't make that possible, as they consolidate a fixed set of variants and that's it.

In general, if you need to rely on a piece of common behavior across (abstract) types, it's best expressed via generics. It's much more usable and convenient down the line fromo the API consumer's perspective, since it makes it possible to write arbitrary 3rd-party code that is compatible with a trait that the inventor of the trait didn't even think of.

On the other hand, enums are a good fit when you have a truly fixed set of choices. In that case, it's not about common behavior, it's often about choosing between different behaviors instead. Of course, the two might need to be mixed and matched, since enums can implement traits as well, e.g. by simply forwarding to the wrapped data.

3 Likes

Yes. I'd swept this all under the dynamic-vs-static carpet, not only on argument types but also on return type.

I guess that this is key: in a language like Rust this really matters; in a language where everything is boxed (or at least heap-allocated) anyway, it's much less important.

In the context of

closures are a poor man's objects and objects are a poor man's closures

(which upthread koan inspired this phrase): Yes, really. :slight_smile:

Exactly. And my naive, first-glance gut feeling is that ranges have a truly fixed set of exactly 9 choices: {Unbounded, Included, Excluded} Γ— {Unbounded, Included, Excluded}, but I'd be very happy to be enlightened as to why this is not the case.

I'd even venture to suggest that the 3 cases not covered by the Range* types and their corresponding syntaxes (presumably because they are considered to be rarely useful) are already enough of a counterexample.

My point is exactly that generics and enums are not isomorphic the way closures and objects are, and it's not merely syntax or convenience. Generics are more powerful (as in "you can do this with generics but not with enums") when it comes to designing interfaces around types that need to uphold constraints while also remaining extensible. For example, good luck creating a strongly-typed ORM like Diesel or SQLx using enums instead of generics.

Meanwhile, enums are more powerful (as in "you can do this with enums but not with generics") when one needs dynamic dispatch over concrete types. For example, a serde_json::Value is serializable, deserializable, and storable by-value. You couldn't do the same with a Box<dyn Trait> – for some pretty trivial and fundamental reasons, such as an abstract, opaque (existential) type not being constructible (and thus deserializable) to begin with.

But this doesn't mean there are exactly 9 types. For example, RangeBounds is implemented not only for Range, RangeFrom, RangeInclusive, etc., but also for (Bound<T>, Bound<T>) and (Bound<&T>, Bound<&T>) and a heap of other types. This would be impossible to achieve with just a single, gigantic enum: every time you wanted to add a new case, you would have to create a new enum variant, breaking the API. (No, #[non_exhaustive] doesn't solve this, because what would you do when you encounter a range of unknown bounds?)

I like how this topic still derailed hard.

Back to topic.

I prefer ident @ pat since the @ separates ident and pat, so I can easy spot which part it a binding and which part is a pattern.

I agree ..= is kind of hard to parse visually. But I don't have a better alternative proposal either. At least it's better than ....

1 Like

Could you edit your text to replace all ::= with ..=?
It's super confusing to read because you sometimes use the correct dot notation and sometimes the non-existent colon notation.

Will edit as soon as I'm back at my desk Apologies, of course! BNF reflexes, apparently.

(Fixed.)

Re the digression: If the koan is rewritten as enums versus generics rather than objects and closures, I think @H2CO3's point that each is more powerful in some areas does make it an enlightening comparison.
......
Back on topic:

I agree that ..= is better than ...

I'm less sure that it's better than the [,) notations, which make the closed/open choice visible rather than implicit. And alternatives like suggesting that .. should be closed and __ should be open, with appropriate combinations, are probably a bit of a pain for the tokenizer.

As I said, this is down in the nitpicking and flyspecking range. Users will adjust after seeing/using it a few times. It's just a corner I tripped over, and next time I write a language I may consider changing it... but Rust is Rust, and that's fine. Thanks to all for the context/history!

So, getting back to koans: The answer to my initial question is somewhere between "mu" and "meh".

1 Like

I guess there would be some parsing issue with [,). It makes parentheses pairing quite hard. Do note that rust range expressions can contain any expression, instead of integer literal only.

@zirconium-n: Ranges bound with expressions: Good point, thanks. Though of course as things get more complex, additional parens to help the human parse the code are a Good Thing.

@Michael-F-Bryan: I'm still getting used to if let syntax, so that mapping hadn't occurred to me. Thanks!

(It does make me wonder whether this could be extended to make let usable wherever an expression is expected just as = is in C, so if let was just one idiomatic use thereof... but I presume that was considered and rejected as one step too far and I'm not sure I would disagree.)

There are now other errors in thereπŸ˜‚
E.g. you typed .= instead of ..=.

Arggh.

Thanks, sorry for bothering you.

It was a good catch, thanks! Just frustrated with myself, not with anyone else.