Sorting lots of data from the network

Hi all (again). The code below takes in returned SNMP data in threads. The returned data is put in a tuple of (u32, String, String, String) and sent to the receiver. When running at full throttle the order of the Strings is not always the same. Often, but not always. I was wondering how to make these sortable? Instead of tuples should I use a hash map and sort on keys or a struct and sort on fields after the data is returned in from the receiver? Below is collector code, and this is an example of a chunked piece of data.

[(1, "10.0.9.95:161", "1.3.6.1.2.1.4.21.1.1.0.0.0.0", "IP ADDRESS: 0.0.0.0"), (1, "10.0.9.95:161", "1.3.6.1.2.1.4.21.1.11.0.0.0.0", "IP ADDRESS: 0.0.0.0"), (1, "10.0.9.95:161", "1.3.6.1.2.1.4.21.1.7.0.0.0.0", "IP ADDRESS: 10.255.9.28"), (1, "10.0.9.95:161", "1.3.6.1.2.1.4.21.1.9.0.0.0.0", "INTEGER: 14"), (1, "10.0.9.95:161", "1.3.6.1.2.1.4.21.1.10.0.0.0.0", "INTEGER: 16283738")]

crossbeam::scope(|scope| {
        for a in agents {
            let tx = sender.clone();
            scope.builder()
                .name(a.clone())
                .spawn(move |_| {
                    let agent: &str = &a;
                    let mut collected_oids: Vec<Vec<u32>> = oids.iter().map(|&s| s.to_owned()).collect();
                    let mut end_of_column = 0;
                    let mut sess = SyncSession::new(
                        agent, community, Some(timeout), 0).unwrap();
                    while end_of_column < cols {
                        let oids_for_getbulk: Vec<&[u32]> = collected_oids.iter().map(|s| &s[..]).collect();
                        let response = sess.getbulk(
                            &oids_for_getbulk,
                            non_repeat, max_repetitions);
                        if let Err(ref e) = response {
                            println!("Error in {}: {:?}", agent, e);
                            end_of_column = cols;
                            continue;
                        }
                        let mut col = 0_usize;

                        for (name, val) in response.unwrap().varbinds {
                            /*! If the column counter, col, has hit length of columns,
                          start the count back at 0 for the next table row
                        */
                            if col == cols {
                                col = 0;
                            }

                            /** Find the last table row and add those each column's
                            objectidentifiers to the new oids
                           */
                            let oid: String = oids[col].iter().map(|x| x.to_string() + ".").collect();
                            /** TODO needs to be logged, not printed
                            Keep track of fails. Remember, if snmp querying two
                            adjacent columns, at some point the next column
                            will be queried but should fail as you've already
                            queried it.
                            */
                            if !name.to_string().starts_with(&oid) {
                                println!("Failed: {} => {}", oid, name);
                            }
                            if name.to_string().starts_with(&oid) {
                                //println!("{}. {} {} => {}", row, agent, name, val.to_string());
                                tx.send((row, agent.to_string(), name.to_string(), val.to_string())).unwrap();
                                let next_getbulk = name.to_string()
                                    .split('.')
                                    .map(|c| c.parse::<u32>().unwrap())
                                    .collect::<Vec<u32>>();
                                collected_oids[col] = next_getbulk;
                                if col == cols - 1 {
                                    // Counting rows for sorting later
                                    row += 1;
                                }
                            } else {
                                end_of_column += 1;
                                println!("End of column: {} => cols: {}", end_of_column, cols);
                                if !end_of_column < cols {
                                    break;
                                }
                            }
                            col += 1;
                        }
                    }
                });
        }
    }).unwrap();
}

Here is the code for when data is returned from the receiver. The current sort makes sure I have groupings from devices in order but, again, not necessarily within row that device. To my chagrin, I discovered chunks are groupings of references and therefore do not implement sort.

let mut rows = bulk_receiver.iter().collect::<Vec<(u32, String, String,String)>>();
    rows.sort();

    for mut chunk in  rows.chunks(5) {

        if chunk[4].3.contains("14") {
            println!("{} => {:?}", chunk.len(), chunk);
            println!();
            let split1 = chunk[0].1.split(":").collect::<Vec<&str>>();
            let address = IpAddr::from_str(split1[0]).unwrap();
            let split2 = chunk[0].3.split(": ").collect::<Vec<&str>>();
            let prefix = IpAddr::from_str(split2[1]).unwrap();
            let split3 = chunk[2].3.split(": ").collect::<Vec<&str>>();
            let net_mask = IpAddr::from_str(split3[1]).unwrap();
            let split4 = chunk[3].3.split(": ").collect::<Vec<&str>>();
            let peer = IpAddr::from_str(split4[1]).unwrap();
            let last_seen = std::time::SystemTime::now();
            let split5 = chunk[1].3.split(": ").collect::<Vec<&str>>();
            let age = split5[1].parse::<i64>().unwrap();
            let missing = false;
            let route = Route {
                address,
                prefix,
                net_mask,
                peer,
                age,
                last_seen,
                missing,
            };

I think you meant to use [4] and not [1] here? Maybe this is why things appear unsorted. You never use [4] anyway.

The reply I was writing before I noticed that follows. You probably at least want to read the parenthetical comments I put in italics.


So if I understand correctly, you're always going to have five tuples that start with 1 for example? And these are indices starting from 1 or maybe 0. You want to sort the chunks you're iterating over.

However, you don't need to with your current code. When you use rows.sort(), it will sort not only by the first tuple member (the u32), but by all tuple members. If the sorting isn't what you're expecting, maybe you need sort_unstable_by. (Incidentally you want sort_unstable here, not sort, as nothing is gained by sorting stable; the keys are the data.)

If you did need to sort the chunks, you could do so by using chunks_exact_mut to iterate over mutable slices. (Incidentally you want chunks_exact since you assume you get 5 every time.)

There are also other approaches you could take to group the tuples of strings by indices and then sort just the strings in sets of five at a time, but I guess I won't go into that in depth.

If you ever find yourself needing that, you probably want a BTreeMap.

Thanks for all the tips. So far so good using sort_unstable_by and chunks_exact. (at some point that needs to be made for general based on the number of oids). Not being a computer science guy, I have to ask, by BTreeMap over HashMap?

BTrees (and most tree structures I think, there are lots of trees) keep their internal state sorted as you keep adding data to it. They will seek the correct branch to insert into and they will rebalance their branches if needed.

So you don't need to sort after the fact, the data is sorted by virtue of how a tree has to work.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.