I came across an old blog post about generating name anagrams from a file: https://brad.livejournal.com/2354680.html
Since I’m familiar with Python, here is a simple Python version of this that shows what I’m trying to do:
from collections import defaultdict
anagrams = defaultdict(list)
for line in open('dist.male.first'):
name = line.split()[0]
sorted_name = "".join(sorted(name))
anagrams[sorted_name].append(name)
for sorted_name, names in anagrams.items():
if len(names) > 1:
print(anagrams[sorted_name])
This takes about 100 ms to run on my machine. You can download the file being opened at: https://www2.census.gov/topics/genealogy/1990surnames/dist.male.first .
I thought it would be a good way to practice working with strings and hashmaps in Rust, so here’s what I came up with:
use std::fs::File;
use std::collections::HashMap;
use std::iter::{Iterator, FromIterator};
use std::io::{BufRead, BufReader, Result};
fn main() -> Result<()> {
let file = File::open("dist.male.first")?;
let mut anagrams: HashMap<String, Vec<String>> = HashMap::new();
for line in BufReader::new(file).lines() {
let line = line.unwrap();
let words: Vec<&str> = line.split(" ").collect();
let name = String::from(words[0]);
let mut chars: Vec<char> = name.chars().collect();
chars.sort();
let sorted_name = String::from_iter(chars);
anagrams.entry(sorted_name).or_insert_with(Vec::new).push(name);
}
for names in anagrams.values() {
if names.len() > 1 {
println!("{:?}", names);
}
}
Ok(())
}
As you can see, the Rust version is longer, but on my machine, it runs noticeably faster: about 10 ms or so (I didn’t do rigorous timing on either, I just ran it with time
several times) when run in release mode. Python’s ~50 ms startup time is actually noticeable after the more or less instant run of the Rust version, so I think the additional effort is worth it.
My question is: is there a way to make simple data munging tasks like this more concise in Rust?
Thank you for taking the time to read this. I hope you’re doing OK.