Borrowing for HashMap

Hi, I have some bioinformation algorithm that I am trying to implement. I take a long string and find to count occurances of lenght k. If they appear more than t times I put them in a Set.

I can't figure out the right borrowing thing to do:

fn clump_finding(text: String, k: usize, t: i32)

{    

    //let mut res = Vec::new();

    let tlen = text.len();

    let mut my_collect = HashMap::new();

    let mut my_set = HashSet::new();

    // First round go over the first L character

    for j in 0..(tlen-k+1)

    {

        let pattern = &text[j as usize..(j + k) as usize];

        if !my_collect.contains_key(&pattern)

        {

            my_collect.insert(&pattern, 1);

        }

        else

        {

            let e1 = my_collect.get(&pattern).unwrap();

            *e1 += 1;

            if *e1 >= t { my_set.insert(&pattern); }

        }

    }

}

The errors I get:

error[E0594]: cannot assign to `*e1` which is behind a `&` reference
   --> src\ba_problems.rs:315:13
    |
314 |             let e1 = my_collect.get(&pattern).unwrap();       
    |                 -- help: consider changing this to be a mutable reference: `&mut i32`
315 |             *e1 += 1;
    |             ^^^^^^^^ `e1` is a `&` reference, so the data it refers to cannot be written

error[E0597]: `pattern` does not live long enough
   --> src\ba_problems.rs:316:41
    |
316 |             if *e1 >= t { my_set.insert(&pattern); }
    |                           ------        ^^^^^^^^ borrowed value does not live long enough
    |                           |
    |                           borrow later used here
317 |         }
318 |     }
    |     - `pattern` dropped here while still borrowed

For the first part, references are immutable by default. The HashMap type's get function returns an immutable reference. To get a mutable reference, use its get_mut function instead: std::collections::HashMap - Rust

For the second part, I would consider inserting the pattern as a Vec instead of a slice using the to_vec function. This will allow it to outlive the borrow due to now being owned by the Vec: slice - Rust

Also, give the section on references and borrowing from the book a read: References and Borrowing - The Rust Programming Language

The get_mut helped. I couldn't find an example of converting a slice to to_vec.

As @camden-smallwood already mentioned, get_mut solves the first problem. Slightly more efficient would be to use the .entry(…) API to avoid re-indexing twice (and also for slightly cleaner code):

let e1 = my_collect.entry(pattern).or_insert(0);
*e1 += 1;

How to approach the question of owned vs borrowed data in the map depends on what the return value of your function is ultimately going to be. If you want to return my_set you need it to contain owned Strings or you need to take a borrowed &str parameter in which case your function could return a HashSet<&str> containing slices of that original string slice, i.e. without extra copying. For the intermediate HashMap no owned data is needed in any case. The problem currently in your code is that pattern is already a &str, in which case writing &pattern creates a reference-to-reference whose lifetime is restricted to the scope of the variable pattern itself (so it can’t live multiple loop iterations).

There are multiple ways to solve this second problem, e.g. by specifying the HashMap type manually (which is a good idea anyways), in which case the &pattern expressions are automatically dereferenced back from a &&str to a &str, or by removing the superflous &s for pattern.

Minimal fix by adding type annotations and get_mut:

use std::collections::HashSet;
use std::collections::HashMap;
fn clump_finding(text: String, k: usize, t: i32) {
    //let mut res = Vec::new();
    let tlen = text.len();
    let mut my_collect = HashMap::<&str, i32>::new();
    let mut my_set = HashSet::<&str>::new();

    // First round go over the first L character
    for j in 0..(tlen - k + 1) {
        let pattern = &text[j as usize..(j + k) as usize];
        if !my_collect.contains_key(&pattern) {
            my_collect.insert(&pattern, 1);
        } else {
            let e1 = my_collect.get_mut(&pattern).unwrap();
            *e1 += 1;
            if *e1 >= t {
                my_set.insert(&pattern);
            }
        }
    }
}

More refactored code with some return value and using entry (also avoiding re-inserting the same pattern all the time whenever *e1 > t)

use std::collections::HashSet;
use std::collections::HashMap;
fn clump_finding(text: &str, k: usize, t: i32) -> HashSet<&str> {
    //let mut res = Vec::new();
    let tlen = text.len();
    let mut my_collect = HashMap::new();
    let mut my_set = HashSet::new();

    // First round go over the first L character
    for j in 0..(tlen - k + 1) {
        let pattern = &text[j as usize..(j + k) as usize];
        let e1 = my_collect.entry(pattern).or_insert(0);
        *e1 += 1;
        if *e1 == t {
            my_set.insert(pattern);
        }
    }
    my_set
}

alternative returning HashSet<String>, also taking String argument, although arguably even with a HashSet<String> the text: &str argument could/should be kept:

use std::collections::HashSet;
use std::collections::HashMap;
// FIXME: `text: &str` would be more ideomatic and more flexible
// in this case, too.
fn clump_finding(text: String, k: usize, t: i32) -> HashSet<String> {
    //let mut res = Vec::new();
    let tlen = text.len();
    let mut my_collect = HashMap::new();
    let mut my_set = HashSet::new();

    // First round go over the first L character
    for j in 0..(tlen - k + 1) {
        let pattern = &text[j as usize..(j + k) as usize];
        let e1 = my_collect.entry(pattern).or_insert(0);
        *e1 += 1;
        if *e1 == t {
            my_set.insert(pattern.to_string());
        }
    }
    my_set
}
1 Like

Thank you very for the detailed answer. Really helpful.

nice catch with the unnecessary e1 >= t

Also once you have non-ascii characters, the string indexing like this will fail. In case you only need ASCII, you could use Vec<u8> in place of String and &[u8] in place of &str. In case you want to keep unicode support, you have to think about what kind of character boundaries you want to consider and what concept of length of a string is the one you want.

Also, for Vec<u8> / &[u8] you can rewrite the

    for j in 0..(tlen - k + 1) {
        let pattern = &text[j as usize..(j + k) as usize];
        // ...
    }

into

    for pattern in text.windows(k) {
        // ...
    }