How idiomatic are `Index` and `IndexMut`?

But it excludes cases where element may exist or not exist. Which means indexing is not suitable for hashmaps is other similar objects.

Which are used, in many applications, more often than actual arrays.

That's precisely my point: there are confusion and digression because originally feature is just misdesigned.

It's possible to extend it and make it more flexible, but confusion and digression is with us to stay.

Thus I'm not 100% sure if fix is not worse than disease at this point.

I literally mentioned insertion in that quote. It's quite common to have an Entry object handle the not-present state, and allow insertion: VacantEntry in std::collections::hash_map - Rust

Indeed it's common kludge which helps to mitigate initial problem to some extent.

The fact that you need that kludge where better design wouldn't need it is precisely what makes design not optimal.

That's part of the rust standard library entry API design, which doesn't use Index at all. What's the kludge?

The whole hoopla with Vacant and Occupied entries is needed because of Index/IndexMut design.

They are not needed otherwise.

And what's more funny is the fact that IndexMut support was removed leaving only kludges behind.

It's a kludge which lets you only pay for lookup once when doing mutations, separate your traversal and update logic, do both of those even when the element is missing, can handle any sequence of read, write, insert, delete and perhaps other container specific actions, are almost certainly how the other operations would be implemented internally anyway... wait why is this a kludge, and not just a good and useful API?

Index{Get,Set} are probably the right things to add, or something pretty close, but that doesn't mean allowing Index{,Mut} to return something other than a reference is a mistake.

1 Like

Because good and useful API MUST (not SHOULD!) support the following:

  // Declare empty  dictionary somehow: string ⇨ int.
  dictionary["foo"] = 42;
  dictionary["bar"] = dictionary["foo"];

It works in such different languages as C#, C++, Go, Python and many, many others. Heck, you can even type the following in shell:

declare -A dictionary
dictionary["foo"]=42
dictionary["bar"]=${dictionary["foo"]}
echo $((dictionary["foo"]+dictionary["bar"]))

And it would print 84. Bash is extremely ugly, limited and awful programming language yet even it does indexing right.

Yet Rust fails to do that. How can you call it good and useful API?

Adding Index{,Mut} as optional optimisation traits would have been fine. Instead currently these are the only allowed traits. And as a result one of the most important data structures is needlessly hard to use. If that is not a design mistake then I don't know what is.

I was talking about the existing entry API, and how it supports insertion, when you called "it" an ugly kludge. It's not, and has never, been used by any Index API, because the index API can't return a non-reference.

The whole point of allowing Index to return arbitrary types would be for it to return custom references to item slots, so that code exactly like what you've written would work.

Eg:

struct CoolHashMap ...

enum CoolHashMapEntry<'map, K, V> { Present ..., Vacant ... };

impl<K, V> Index for CoolHashMap<K, V> {
  // generalizes, and defaults to, existing &'self Item;
  type Reference<'self> = CoolHashMapEntry<'self, K, V>;
  type Item = V;
  fn index(&self, key: K) -> Self::Reference { ... }
}

impl<'map, K, V> Deref for CoolHashMapEntry<'map, K, V> {
  type Target = V;
  fn deref(&self) => &Self::Target ...
}

impl<'map, K, V> DerefMut for CoolHashMapEntry<'map, K, V> {
  fn deref_mut(&mut self) => &mut Self::Target ...
  // maybe something like?
  fn deref_set(&mut self, value: Self::Target) ...
}

would let you do hash_map[key] = value and it would automatically insert or update, because indexing automatically derefs.

1 Like

TL;DR: not allowing custom reference types is due to the lack of LGATs to tie the returned proxy lifetime to the indexing.


Interestingly enough, this shares a lot with "placement". In previous versions of the compiler, there was a placement type that allowed you to write e.g. vec.place_back() <- value; I don't know if it was ever actually supported, but this extends simply to allowing map[key] <- value.

Despite the syntactical simplicity of and theoretical purity of allowing $receiver[$index] = $expr as sugar for IndexSet::index_set(k#autoref $receiver, $index, $expr), it's more complicated than that in practice.

container[index] evaluates to a place. This place is then used as any other place is, and is subject to the same autoref and usage rules as any other place. The compiler then (after type inference!) substitutes container[index] with roughly *Index::index(&container, index) or *IndexMut::index_mut(&mut container, index) depending on if the place was used as a shared place or as a mutable place.

There isn't even a concept in the language for a place which is written to but not read from. Without such a concept, the compiler cannot tell the difference between use as a mutable place (thus indexing translating to IndexMut) or as a write target place (thus indexing translating to IndexSet). Perhaps just specifically the index expression could be special cased, but this doesn't compose at all. Most of the people working on the language are trying to remove special cases, not add new ones. Usually around once a year someone brings up the concept of some "&lateinit mut T" type[1] which would add the necessary semantics into the language for separating mutable places from write target places. However, the utility of such a type is somewhat limited (and relies on typestate, a huge feature on its own right, for much of it), so what gets significantly more attention is "&move T"/IndexMove, because Box already can do that, and extending this ability to user types (and thus making Box incrementally less special) has a much higher impact.

&lateinit mut T also runs into significant issues around necessarily being a linear type as well. Plus, there's evaluation order issues as well: container[ix] = expr is IIUC ordered as

  • evaluate container
  • evaluate ix
  • evaluate container[ix]
  • evaluate expr
  • drop_in_place container[ix]
  • move expr into container[ix]

This evaluation order is different from a setter function. If you write container.index_set(ix, expr), this is ordered as

  • evaluate container
  • evaluate ix
  • evaluate expr (swapped!)
  • call container.index_set with ix, expr
    • evaluate the place (swapped!)
    • drop_in_place the place
    • move expr into the place

For a language that manages memory for you and doesn't try to expose exact control over the execution model, it's relatively simple to have indexing assignment as a special case of syntax sugar. For a language that is going for exposing such control, though, there are a lot of devils in the details of how exactly this would work.


Without an argument supported by some measurement, this is just an assertion without basis.

And one I disagree with. An API which offers just get and get_mut but not set can absolutely be good and useful.

It's worth noting that C++ is really cheating here. Just map[key] for most containers inserts a default constructed value at key and returns a normal C++ reference. There's actually nothing preventing you from providing the same semantics in Rust, at least for IndexMut inserting a mut reference to a newly inserted Default constructed value.

You shouldn't, but you can implement the C++ semantics here.


  1. ^You'll also see this as &out mut T or rarely &in T. I really don't like labeling it with in or out, though, because either label is wrong from one side of the reference. ↩︎

5 Likes

Yeah, the RFC has a lot of talk about "if this trait is implemented" to handle distinguishing the different Index* cases, which is it's own can of worms.

Very much a "nice idea in theory" kind of thing.

What measurements? It's about meeting the expectations. Most popular languages are supporting dictionaries and this leads to certain expectations. Take top 5 or top 10 or top 20 from the language popularity index and you would find variation of that syntax supported in the majority of them.

Simple application of POLA says that it must be supported in Rust, too. Yet it doesn't work. That's a problem.

Note that Rust was willing to replace sigils with things like Box<T> to make syntax less confusing. Yet associative arrays still don't work as they should.

It's worth noting that C++ is really cheating here.

Absolutely. But they manage to achieve the end goal with these kludges: syntax that most developers expect “just works”. Yet Rust with it's nonkludge fails to achieve that.

There's actually nothing preventing you from providing the same semantics in Rust, at least for IndexMut inserting a mut reference to a newly inserted Default constructed value.

Then why that natural syntax is not provided by Rust? Note: it wasn't even working as people expected when HashMap included IndexMut implementation (that's why it was removed before Rust 1.0, BTW).

Also, please, read what you cited:

container[index] evaluates to a place.

Yes, that's the wrong design decision which C++ inherited from C and which Rust picked up without thinking.

C++ have found out that thus wrong decision can be papared over with a pile of kludges. Rust made different decisions and doesn't provide associative arrays at all (it's not an associative array if you can't use array syntax).

The rest of discussion, basically, says that it's hard to do the right thing and just easier not to support what people expect.

True, but so what?

I don't understand your actual position here? From what I can tell it's something like:

Rust should support a[b] syntax

It does.

Standard maps should allow insertion using indexing

That's desired, but not currently implemented for largely technical reasons, see the RFC and the rest of this thread.

Rust should behave like other languages in order to not surprise the user

Rust, in general, will copy other languages, where they feel it's appropriate. Like any language, they will try to not copy choices they feel are mistakes.

Rust should not behave like C/C++ specifically in what a[b] does, because that was a mistake.

Perhaps, but:

  • They can hardly break pretty much every Rust crate in existence
  • You have yet to actually explain why it's worse or a mistake, despite me explaining why returning a reference (of some sort) has significant benefits.
  • They're still trying to add get/set like you want anyway, see the RFC.

So, please, clarify what you actually want to happen here?

@moderators : This topic seems to have strayed from the original question of whether the current version of Index and IndexMut should be part of an idiomatic crate's API. It is now a discussion of Rust's design choices, which is more properly a topic for IRLO.

Please consider splitting off, and potentially locking, this spin-off discussion.

3 Likes

Two most popular data structures in most popular languages are arrays and associative arrays. And they use a[b] syntax. In most popular languages but not in Rust.

Yet it copied C approach (which was largely copied to C++): a[b] is a reference to something. Which, as C++ showed, is a mistake (but C++ largely managed to mitigate it via pile of kludges).

At this point? IndexSet would, probably, be nice. Not sure how feasible it is given the fact that original mistake happened so long ago and so many things depend on it.

It's a mistake because it doesn't give users what they want and/or need. You have to keep in mind that the whole indexing thing is just one big syntax sugar which is required to make people feel familiar. You can write literally any program without ever using indexing. Which means only expectations of people matters.

And people expect that associative arrays would be available and would work in any modern language.

They are still discussing whether they may fix the mistake. That's good, but doesn't mean the whole thing wasn't a mistake.

We can easily return it back to URLO space. If we would stop arguing about semantic and IndexSet (which actually belong to IRLO, you are right)… should crates use Index and IndexMut for anything but yet-another-type-of-vector?

My gut feeling is that HashMap makers were right when they removed IndexMut: without IndexSet it would definitely cause more harm than good.

1 Like

Sure, but "majority" is weakly stated: it's not a majority for the top 6 (which currently includes C, java, VB) or the top 10 (also includes asm, SQL).

Rust's ability to deal with (i.e. generate references to) place expressions is a feature not shared by the majority of the languages that do support insertion with indexing. So it's not an inherent error to support place expressions there - just a trade off, whose utility you could argue against with concrete cases if you wanted to. But...

...this feels really over the top to me. I've used maps plenty, and the lack of an indexing insertion has never come in the way of "goodness" and "utility" of my code or resulted in a different number of lines or worsening of readability than it would have otherwise in my use cases. Don't get me wrong: this syntax would be nice to have, but not much more than that for cases I've run into, personally.

4 Likes

That is a biased take. Yes, once you learn and internalize that maps do not support IndexMut and that you need to use insert and entry, there is no issue working with their API. However, it is hard to argue that it isn't a stumbling block for people new to the language. The insertion of values through indexing is something that people come to expect from other languages, and it is even more confusing since the indexing on arrays works exactly as expected.

Why does it rely on typestate?

Also, while the utility may be limited, there is an important number of use cases which is very difficult to support without such feature. This includes unsized returned values, non-movable data, partial initialization and IndexSet itself.

With regards to expectations from other languages, I suspect that the majority of new users are actually coming from JavaScript-land, where I think it's interesting to note that TC39 decided to make the hot new Set and Map collections not support indexing (at least, any more than every object does, which is probably why!), and you need to use map.set(key, value) and the like.

People are used to languages having quirks for obscure technical reasons. It doesn't mean they shouldn't be fixed if possible, but having a different spelling than we'd like isn't that bad considering same of the crap I've had to deal with. (Actionscript 3 deciding that a[x++] += 1; should read a[x++] then write a[x++], incrementing x twice is up there, Visual Basic for Applications needing you to ping pong recursion because recursion to the same function was spelled the same as indexing into the return value is another...)

4 Likes

&lateinit mut T relies on typestate because

let mut x:
dbg!(x); // error: x may not be initialized
x = 0;
dbg!(x); // ok: x is guaranteed initialized

it is fundamentally a typestate problem; before you've written to the place you have a reference uninitialized T, and once you've written to it you have a reference to initialized T. Rust already has the small amount of typestate required for this behavior for local places, but does not support piecewise initialization of them.

The bigger issue is the linearity of such a typestate changing reference interacting with unwinding and in particular catch_unwind and drop flags.

The moveit crate provides support for C++ style address aware constructors which use a kind of library &lateinit mut T; it gets around the typestate issue with unsafe assertions that you don't do it wrong.

&lateinit mut T is a solvable problem, as is &move T. (Roughly, say the unwind landing pads are in change of dropping the pointee if it's there.) It's not full typestate, but it is adjacent and that complicates the support.


With the current Index[Mut], I would agree that IndexMut should not be implemented for growable maps, as it is unclear whether map[key] = val (or more problematic, map[key].by_mut()) should be update (panicking if the entry does not exist) or upsert (create the entry if it does not exist).

For growable arrays (vectors), it is uncontroversial that vec[vec.len()] = val is a panic and not a push.

Where I disagree slightly is that IndexMut does make sense for maps where the set of keys is immutable (e.g. a perfect hash map). In this case, the only possible behavior for map[key] = is update and not upsert, so providing IndexMut won't cause any surprises.

C++ defines self[ix] as quite literally being just self.operator[](ix), with an expectation (but not requirement) that the return value act like an lvalue reference. It is imho the lack of guaranteed consistency which is the problem with C++ — not the fact that indexing acts as an lvalueC++ (placeRust) — along with the general chaos that results from C++ references not being a proper type.

Rust does have this "complexity" in that indexing evaluates to a place and not a value, but I argue that this is inherent complexity to supporting indexing. The alternatives to allow container[key] = val are

  • Make []= a completely separate syntactical form and not just the composition of [] and =, or
  • Allow overloading operator=, which is its own disastrous can of worms which Rust wants to avoid.

container[key] does semantically evaluate to a place, just the same as *reference does, in that you can do the same operations to either, and they operate on the value in the place, not a fresh copy of it.

What you can argue for is that Index[Place] not supporting returning a custom Deref[Place] is a semantic lacking, and that not having overloads for two of the four ways to use a place is a semantic lacking. But the indexing syntax should evaluate to a place.

1 Like

I feel like this is pretty obviously refutable for Rust.

Let's look at the top 5 languages from your link in terms of how they "let" you modify a collection while iterating:

But trying to do that with the same syntax as you'll see in those language in Rust won't compile, and you have to structure your code rather differently as a result.

That could certainly be considered "astonishing". But I'd also consider it wonderful, which sounds like a direct refutation of the PoLA.

The "normal"/"least astonishing" ways to use hashmaps are often really bad -- see, for example, every time someone does if m.contains_key(k) { use(m[k]); }. Encouraging use of APIs that are more performant and better express intent anyway seems totally fine to me.


And as a meta-note, the idea of "astonishment" is *really* difficult to use to make decisions, because it's fundamentally experiential for individual users. What will "astonish" a javascript programmer learning Rust is *very* different from what will "astonish" a haskell programmer learning Rust.
5 Likes