HashMap keys of different composite types


#1

I’ve been working on this problem for a few hours every day for about 10 days. I think I’ve exhausted all options and it can’t be solved. I’m hoping someone can put me out of my misery by confirming it can’t be solved or pointing me in the direction of a solution.

The root problem is that the program has to parse billions of records, read from files, and aggregate them based on a composite key, defined as a struct, in a HashMap. And it needs to be fast. Right now it will parse and aggregate 19M records in about 47 seconds. But testing with some dummy values indicates that if I can get rid of most of the .to_string() calls I can get that down to 30 seconds.

To get the best performance I don’t want to convert &str to String unless I absolutely have to. It’s billions of records during each run and .to_string() gets really, really expensive especially considering I only need one copy of the final key for permanent storage in the HashMap.

I thought I could solve the problem by using a generic form of the key that can take &str until it needs to be inserted into the HashMap, then build the owned version of the key with a String. And I’m really close.

You can see my work at https://github.com/ereichert/counter/blob/feature/temp_record_approach/src/record_handling.rs.

But it doesn’t compile because of this line https://github.com/ereichert/counter/blob/feature/temp_record_approach/src/record_handling.rs#L35.

    compiling counter v1.0.0 (file:///Users/ereichert/dev/counter)
error[E0495]: cannot infer an appropriate lifetime for autoref due to conflicting requirements
  --> src/record_handling.rs:35:43
   |
35 |             system_name: self.system_name.as_str(),
   |                                           ^^^^^^

error: aborting due to previous error

error: Could not compile `counter`.

To learn more, run the command again with --verbose.

There may be other problems. It seems every time I solve one puzzle another is presented. But that line is the problem I have not been able to solve.

If that can’t be solved is there any way to check

HashMap<AggregateELBRecord<String>, i64>

for an existing key using a

AggregateELBRecord<&str>

as the key?


#2

Can you store a Cow<str> in your AggregateELBRecord?


#3

We’ve been down that road also.


#4

Well, there’s always

impl<'a> Borrow<AggregateELBRecord<&'a str>> for AggregateELBRecord<String> {
    #[inline(always)]
    fn borrow(&self) -> &'static AggregateELBRecord<&'static str> {
        unsafe { std::mem::transmute::<&AggregateELBRecord<&'static str>, &'static AggregateELBRecord<&'static str>>(&AggregateELBRecord {
            system_name: std::mem::transmute::<&str, &'static str>(self.system_name.as_ref()),
        })}
    }
}

:innocent:

The Rust community did a great job of justifiably discouraging unsafe, but this could be an acceptable place to use it: you need the last drop of performance, and can preserve the invariants (supposedly, I haven’t looked at the rest of the code.)

[Aside: I had to put in that #[inline(always)] for my minimal test code to work in debug mode, with the latest stable and a couiple of nightlies. Could that be a compiler bug worth reporting?]


#5

I would see if you can make Cow work as I believe it’s pretty much meant for the case you’re describing. In your thread about get vs get_mut, I think the issue was you’re trying to use a single function, do_stuff, to retrieve immutable and mutable borrows and running into conflicting lifetime requirements. Perhaps you can split your immutable and mutable cases into separate functions, and then use different lifetime constraints on them to appease variance.


#6

I definitely want to stay away from unsafe. But I can be bought with a healthy speed increase.

Thanks.


#7

I will take another look at this.

Thanks.


#8

On second reading of the code, yes, you should. I failed to notice why the proposed impl of Borrow can’t ever work: it’s trying to return a reference to a temporary struct, which vanishes when fn borrow() returns. Therefore, fudging lifetimes with unsafe isn’t going to buy you anything. Using Cow<str> remains the best bet.


#9

I did implement it but it was slower because of the underlying copies