Efficiency of atom distance matrix

Hi everyone!
I have a function that analyzes distances in a molecule (represented by points in cartesian space). It is supposed to tell me if for a given atom there are other atoms within a given radius and if yes print information about it to stdout. I have written a function to do that this:

 pub fn find_contacts(&self, other: &Molecule, level: u8) {
        let mut vdw_radii = HashMap::new();
        vdw_radii.insert("H".to_string(), 1.54);
        vdw_radii.insert("C".to_string(), 1.90);
        vdw_radii.insert("N".to_string(), 1.79);
        vdw_radii.insert("O".to_string(), 1.71);
        vdw_radii.insert("P".to_string(), 2.23);
        vdw_radii.insert("S".to_string(), 2.14);
        vdw_radii.insert("Cl".to_string(), 2.06);
        vdw_radii.insert("Na".to_string(), 2.25);
        vdw_radii.insert("Cu".to_string(), 2.17);

        // uses a 'prettytable' table
        let mut table = Table::new();
        table.set_format(*format::consts::FORMAT_NO_LINESEP_WITH_TITLE);
        table.set_titles(row![
            "Atom ID 1",
            "Atom Name 1",
            "Residue Name 1",
            "Atom ID 2",
            "Atom Name 2",
            "Residue Name 2",
            "Distance"
        ]);

        for residue in &self.residues {
            for atom in &residue.atoms {
                for other_residue in &other.residues {
                    for other_atom in &other_residue.atoms {
                        // Avoid double counting of atoms and terms within residues
                        if atom.atom_id < other_atom.atom_id && residue != other_residue {
                            let radius = match level {
                                0 => 1.0,
                                1 => *vdw_radii
                                    .get(&atom.element)
                                    .expect("No Radius found for given element."),
                                2 => {
                                    vdw_radii
                                        .get(&atom.element)
                                        .expect("No Radius found for given element.")
                                        + vdw_radii
                                            .get(&other_atom.element)
                                            .expect("No Radius found for given element.")
                                }
                                _ => panic!("Too high level given"),
                            };
                            let dist = atom.calc_distance(other_atom);
                            if dist < radius {
                                table.add_row(row![
                                    atom.atom_id,
                                    atom.atom_name,
                                    atom.res_name,
                                    other_atom.atom_id,
                                    other_atom.atom_name,
                                    other_atom.res_name,
                                    format!("{:.2}", dist)
                                ]);
                            }
                        }
                    }
                }
            }
        }
        if table.len() > 1 {
            println!("\nContact analysis");
            table.printstd();
        }
    }

As an explanation: This is a method of the Molecule struct. Each of these contains a vector of Residue (basically a local group of ~10-15 atoms close to each other) structs which in turn contain a vector of Atom structs.
So my solution was to take two references to an instance of Molecule, iterate over all the Atoms of each and calculate the distance between each unique pair. This is sort of the brute-force way but I wasn't yet able to come up with an alternative.
Also I tried to limit the number of evaluated distances via excluding calculation for atoms within the same residue.
However, since the number of atoms is in the 10,000s, this can take up to 12 seconds on my machine (depending on level).
Since I'm still a beginner there's probably a more idiomatic and/or efficient way to do this that I haven't thought of (or don't know yet) so I came here for advice.
Any help is appreciated!

There is enormous literature on this and related problems. Look e.g. here or here.

You might want to look into the Octree data structure to store all those positions and allow for much faster distance searching:

Thanks for your suggestions, I will go read :slight_smile:

The rstar crate will probably be useful to you here. You may have to adapt your structs or impl the relevant rstar traits for them, but it should be a lot faster to query the tree. The RTree struct has a locate_within_distance method which you can use once you've loaded your point data.

2 Likes

Cool, this is exactly what I'm looking for! I was reading up on octrees but I think a decent implementation of of this would have exceeded my existing project size by quite a bit.

No need to write an octree yourself:

Octree algorithm for nearest neighbor search in 3D space:
https://crates.io/crates/octree

Sorry I can't vouch for the quality of performance of that crate.

I had seen that previously but since the author said they created this as a learning project, I lean more towards something a little more well-maintained and established. But yes, it would probably work too.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.