In Ruby, for instance, calling the 'values' method on a hash returns an array. This is more intuitive to me, though perhaps that is because I learned Ruby first. What is the reasoning for the equivalent method in Rust returning a special case iterator rather than some general use collection?
Because an iterator is more general. What if you don't want all the values? What if you don't need them all at once? What if you don't want to allocate? You can turn an iterator into a vector without doing unnecessary work, but the reverse is not true.
It seems to me that calling 'values()' implies one wants all the values. Do you mean work by the compiler or work by the coder? Because 'iter.collect()' is the same as 'vec.iter()' in terms of coding work.
More work at runtime. Returning a Vec
would require allocating space for and copying the whole thing. Returning the iterator is essentially free. (Basically it's the equivalent of returning a &[T]
instead of a Vec<T>
from a method to save copying the local Vec
.)
Why? All it says is that you want access to the values, not that you want all of them, or all of them at the same time. Besides which, the point was more that you don't want to have a method for every possible way you might want to access the value: you want the most fundamental kind of access possible such that you can build other access schemes on top.
I understand not wanting them all at the same time perhaps, but any specific value is accessible by key without requiring a specific values()
method. Calling that method is only for times when some action or evaluation would involve all or at least many of the values.
I suppose I see this point. It still feels counter-intuitive. Thanks for the explanations.
As a general policy, Rust likes to make runtime costs explicit, even if that makes the resulting code more complex. For example, if you want a String
containing "foo", you have to say "foo".to_string()
rather than just "foo"
. There's a couple of reasons for this:
- If your HashMap is small, the extra runtime cost might not matter, but if it's huge the extra runtime cost might be crippling. A language like Ruby (or Python or JavaScript) might say "we don't care about 20% use-cases, we want our language to be pleasant for the 80% use-cases", but Rust wants to be useful for 98% use-cases[1] so it provides a zero-runtime-cost approach wherever possible.
- Because things with a runtime cost must generally be written out explicitly, when your Rust program has a performance bottleneck it's likely to be some specific function call or loop that you wrote, and therefore in code that you can fix or work around or redesign. In languages that hide runtime costs, the performance bottleneck is often in some secret glue code constructed by the compiler, without your knowledge or understanding.
[1]: Even Rust still discards some use-cases; don't try to use it with an 8-bit micro-controller or obscure CPU architectures like SuperH or PA-RISC.
The builtin dict
type in the Python programming language went through an evolution from Python 2 to Python 3, and it ended up looking more like HashMap
in Rust.
In Python 2, the dict.values()
method returned a copy of all the values (a Python list
). Such a list can be costly to produce if the dictionary contain many values. You had to use dict.itervalues()
to get an iterator. This was changed in Python 3 so that dict.values()
is a cheap and light-weight operation. Compare:
Having an iterator is handy when you want to loop over all values one by one -- in that case you don't actually want to pay the cost of allocating a list for all the values: the dictionary already contain all the values and you just need a way to iterate over them.
-
Because that would allocate memory. OTOH it can be iterated for free regardless of size of the hash map.
-
Because it's more flexible and efficient. I may want to get the values as a HashSet, not Vec (or vice versa). Iterator can collect into any collection.