I have a function that reads a file and parses it's line into a HashMap:
pub async fn from_file(file: &str) -> Result<DotLocalDNSConfig> {
let contents = fs::read_to_string(&file).await?;
let mut result = empty();
for line in contents.lines() {
match line {
"" => continue,
s if s.starts_with("#") => continue,
s => {
let (name, ip) = parse_line(s)?;
if result.contains_key(&name) {
return Err(anyhow!("Duplicate hostname: {name}"));
}
result.insert(name, ip);
}
}
}
Ok(result)
}
While this is pretty straight forward, working mostly with functional languages, this makes me feel uneasy .
In most functional languages somemap.Add(...) creates a new map with the supplied values (hopefully efficiently). I tried to search for a way to do it in Rust and found this:
pub async fn from_file(file: &str) -> Result<DotLocalDNSConfig> {
let contents = fs::read_to_string(&file).await?;
contents.lines().try_fold(empty(), |acc, line| {
match line {
"" => Ok(acc),
s if s.starts_with("#") => Ok(acc),
s => {
let (name, ip) = parse_line(s)?;
if acc.contains_key(&name) {
return Err(anyhow!("Duplicate key: {name}"));
}
let new_acc = acc
.iter()
.map(|(k, v)| (k.to_owned(), v.to_owned()))
.chain(std::iter::once((name, ip)))
.collect();
Ok(new_acc)
}
}
})
}
Which is arguably worse
Isn't there a straight forward way to work with immutable HashMap in Rust?
Part of the reason why functional languages like to use immutable data structures is solved by the "shared xor mutable" rule in Rust. In other words, you have provably unique access to your hash map, so I doubt you would gain much, if anything, with an immutable version in this example.
That said, there are a couple of options on crates.io. You may also be interested in the entry API.
No. There are some crates, but I wouldn't recommend to try them.
And who told you that Rust would do the same? It doesn't even try.
You can play with the fact that when you own hashmap and pass it around while pretentding that you are creating a new one optimizer can produce somewhat decent code… but this fragile and tend to blow up and lead to huge inefficiencies very often.
Trying to write Haskell in Rust is as bad as trying to write JavaScript in Rust… maybe even worse: tricks that JavaScript uses under the hood are pretty easily imitable in Rust while tricks that functional languages employ to produce code that's not too awfull… Rust doesn't even try to replicate them and manual implementation is very non-trivial.
You're both right. Certainly in this case it's not worth going against the language or searching for special carates.
It just seemed weird to me that a language that uses ML like type system, immutability by default, and have a good selection of functions to iterrate over collections, offer them only on mutable instances (it's been a while since I last used rust so I forgot about it).
You may have an exaggerated impression about how immutable things are. let mut doesn't change the type of the binding compared to let /* no mut */, for example, and you can move something you own from a non-mut binding to a mut binding. There's more discussion in this recent IRLO thread.
(&T and &mut T are different types with different guarantees and capabilities, however.)
mut usually means exclusivity (emphasis on no-aliases) in rust. the keyword mut has different meanings for variable bindings and for reference types, as already pointed out by @quinedot. ownership is always exclusive, so mut in local bindings is redundent information, the type checker doesn't actually need it. some expert is even "suggesting" to update the language and remove the mut keyword -- remove it entirely for local bindings, and replace it with different terms (that put emphasis on the no-aliases nature) for reference types.
in some sense, it's a bit like "transient" collections in Clojure, which is a language famous for its use of immutable persistent data structures.
in Clojure, you can convert a persistent collection to a transient one, and mutate it in place (i.e. iteratively adding a batch of elements to the collection), all done locally, and then turn it back to persistent when done, and the persistent collection can then be shared for concurrent access as usual. transient is not necessary for the language, but it can greatly improves efficiency when needed.
the major difference is, rust's exclusivity is guaranteed by the type system, but since Clojure is dynamically typed, the transient is implemented at runtime, IIRC, by checking the current thread's exlusive access (akin to ownership), and would lazily do copy-on-write for the portion which might be potentially shared with other threads.
Both of you raise good points (some of them I'm familiar with) but that's not what I meant.
I'm not sure I can think of a good reason to use a function in the family of fold* or reduce* on a mutable element. These functions are:
very expensive in terms of resources and the implementation should be fairly complicated to be performant.
Conceptually, fold is not easy to understand if you're not familiar with functional languages. for loops are much more natural if you're coming from non-functional languages. However, if you do come from functional languages you might expect that adding value to collection will produce new collection .
What I'm trying to say (and not sure I'm succeeding) is that as far as I know one of the main reasons to use these family of functions is because you can not update a value in place. Why would I use constant allocation if I can just update a mutable value (this refers to other languages, not rust where I think you technically can't in this context)
Hope I managed to convey why I think it's weird to have a fold function that has to run on a mutable instance.
Actually, they can be optimized better (or at least not worse) then the ordinary for loop, see this for a little context. That's sometimes another reason to go for fold and friends instead of imperative loop.
what fold() function are you refering to? for Iterator::fold(), it does NOT require the collection of data to be mutable. you don't need a mutable collection of data to fold over them.
let xs = [1, 2, 3, 4];
let sum = xs.iter().fold(0, |acc, x| acc + *x);
Immutable data structures are data structures which can be copied and modified efficiently without altering the original. The most uncomplicated example of this is the venerable cons list.
─API docs
As other people have already pointed out, lazy clone operation can bring significant performance gain for certain algorithms, but can backfire for others. This is why these data structures are not shipped with std.
However, your concerns might be less about performance and more about correctness. Programs with complicated ownership story[1] often want to share data all over the codebase. Outright using shared mutability in such case quickly lead to logic bugs that cannot be caught by the compiler.[2] If this is your reason for desiring immutable data types, I can more easily understand that.
Still, I consider it a good idea to try simplifying your program's data flow before resorting to such drastic measures. A clean code rarely ever needs cloning, lazy or otherwise.
I think this isn't an official term of art, but it's one I've read from other forum regulars and is most likely to come up in searches. ↩︎
The functional paradigm has become popular for this very reason; I and many others have learned this the hard way. ↩︎
I was not resorting to drastic measures when I saw what it takes to use immutable data types I immediately resorted to for loop.
You were right about my motivation though; it wasn't about performance (in my case performance is not relevant), it was about clean code and about the way I like my code to look. For me, for loops with continue's and breaks are urgly and I prefer to avoid them but not at a cost of going even uglier with forcing immutability where it doe't exist .
Bottom line, I was just surprised. It gave the vibes of functional constructs but as everybody pointed out, it's not .
By the way: HashMap can be built with a "fluent" method chain via its FromIterator impl. Unfortunately, it doesn't apply well to your case.
You need to error out if the file contains duplicate hostnames, but when using Iterator::collect, it's not possible to inspect the collection while it's being built.
In general, method chains tend to scale poorly when you need to perform interleaved operations on the same data. Don't insist on one tool for everything you need to do. Rust is advertised as a pragmatic language. Use it as such.