Is there an easy way to find the first duplicated value in an iterator consisting only of said values (maybe with itertools)? The way I am currently solving this is by building a HashSet until insertion fails due to duplication, which is fine tbh, but just wondering whether someone has another solution.
Since itertools
also uses a hashset underneath the hood of all_unique
, I don't think there's a better solution.
This is what I have for now:
// Find duplicated ports
let mut unique_ports = HashSet::new();
for camera in cameras.iter() {
if !unique_ports.insert(&camera.port) {
// throw error
}
}
Any improvement suggestions?
Yes, I'd use Itertoos::all_unique
:
if !cameras.iter().all_unique() {
// throw error
}
But I don't get the duplicated value with .all_unique
, and I want to display that in the error (sorry, should have made that clear in my example).
Ah yes of course. Then I think your solution is fine. If you abstract it in a function or method check_duplicated_ports
I think everybody that will read your code will understand what is going on.
In this case this is the entire code I am working on:
impl TryFrom<CamerasConfigJSON> for CamerasConfig {
type Error = anyhow::Error;
fn try_from(cameras: CamerasConfigJSON) -> std::result::Result<Self, Self::Error> {
// Find duplicated ports
let mut unique_ports = HashSet::new();
for camera in cameras.0.iter() {
if !unique_ports.insert(&camera.port) {
error!("Parsing failed because JSON contains duplicated port: {}", camera.port);
bail!("Parsing failed because JSON contains duplicated port {}, even though ports are expected to be unique", camera.port);
}
}
Ok(Self(
cameras.0
.into_iter()
.map(|camera| (camera.port, camera.command))
.collect(),
))
}
}
Do you think it still makes sense to extract it to another method? If yes, where would I put it? Inside the TryFrom
impl, or outside as a private function?
In that case I like @tguichaoua's solution. I just feel like hiding away implementation details (the hashset) is always a good idea to increase code quality.
Do you need to preserve the config order when storing it inside CamerasConfig
?
If not, it might make sense to store a HashMap
¹ inside CamerasConfig
:
(untested, just a sketch)
impl TryFrom<CamerasConfigJSON> for CamerasConfig {
type Error = anyhow::Error;
fn try_from(cameras: CamerasConfigJSON) -> std::result::Result<Self, Self::Error> {
let mut port_map = HashMap::new();
for camera in cameras.0.into_iter() {
match port_map.entry(camera.port) {
Vacant(entry) => { entry.insert(camera.command); }
Occupied(_) => { bail!(...); }
}
}
Ok(Self(port_map))
}
}
¹ or BTreeMap
, if you want a predictable iteration order
This is probably the best solution for my exact use case. Storing a HashMap
in CamerasConfig
works perfectly. Didn't know about the entry
method, but looks super cool. Just to test that I understand the .entry()
method correctly (I think the documentation is a little lacking): If the key doesn't yet exist in the hashmap, it gets inserted and I receive an Entry back, into which I can insert the value. In case the key does already exist, I get the Occupied variant back.
Thanks for the great answer!
That's pretty much right, but you've got the order of operations a little bit wrong— If the key doesn't exist, you get a Vacant
variant back. Only if you call insert
(or similar) on that variant is the hash map written to. So, if you don't call insert
, there isn't a dangling slot for the key you passed to entry
.
This avoids a double-lookup because the VacantEntry
keeps a reference to the empty slot where the item should be stored, which it had to find to discover that the key is missing in the first place.
Quick question: Why not use port_map.insert(key, val)
directly? Is there some difference/disadvantage to this compared to the Entry
version? The only difference I see is that with your Entry
version the value doesn't actually get inserted if the key already exists. But I'm throwing an error in that case anyways, so I don't really care if that element gets added to the hashmap I no longer care about, right?
impl TryFrom<CamerasConfigJSON> for CamerasConfig {
type Error = anyhow::Error;
fn try_from(cameras: CamerasConfigJSON) -> std::result::Result<Self, Self::Error> {
let mut port_map = HashMap::new();
for camera in cameras.0.into_iter() {
if port_map.insert(camera.port, camera.command).is_some() {
bail!("Parsing failed because JSON contains duplicated port {}, even though ports are expected to be unique", camera.port);
}
}
Ok(Self(port_map))
}
}
Where entry
really shines is when you want to do an upsert operation, where you insert a default value (if necessary) and then modify the value (whether you inserted something or not). For example:
fn count_items<T:Hash+Eq>(items: impl Iterator<Item=T>)->HashMap<T,usize> {
let mut map = HashMap::new();
for x in items {
*map.entry(x).or_insert(0) += 1;
}
map
}
As this is usually the sort of operation I’m writing, I tend to reach for entry
first. In this case, there’s no real advantage over a plain insert
; it’s just a question of style.
This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.