Graphs, user input and Strings

#1

Hello,
I was banging my head in the wall for the last 2 days, but I still cannot figure out a good way to do this in Rust, as I think the solution is a lot different that in other languages.
https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=e67de2a4757b511c58ab746de74937ae

This is a non-working example of what I am trying to achieve. In general I have a struct holding data from user input from in my case - infinite loop(although the example is with a for I simplified it), and I want to use the Strings from the Struct and construct vertices (exchange: String/&str, from: String/&str) and (exchange: String/&str, to: String/&str) and connect them with edges. I know this is not possible, since String doens’t have Copy and I had many issues related to &str and lifetimes, but also I want to own the String in the struct. One way I started thinking is to have a Vec with the vertices and use their index as well as a HashMap holding the vertices as a key and the index as a value, so I can also be able to check if the vertices exist already so I can update them, as the example is not full and the struct is missing the Edges’ weights.

So my question is: What approaches can I take to ensure I can do this, but also what are my options in general dealing with such cases, as this is not the first time I bump into this issue.
I also checked Box, but it I am also not quite sure when should I use these smart pointers at the moment. So guides towards a bit of information about what cases they are used in, is also welcome, as I am new the strictly-typed and system programming languages.

Thank you in advance!

#2

This seems to work.

I’m storing references to exchanges in graph, I understood that’s what you wanted. Therefore, I needed to put lifetimes on to add_edges. The other thing that needed doing is changing the loop. You’re modifying exchanges in the loop, and therefore putting references to it into graph will not ever work. So I changed it to 2 loops, where the first does all the modifying of exchanges it needs to do, and the second then puts the references into graph.

#3

Yes, but the problem is that I want to check the Input of the user(which is in an infinite loop) and on every new input to add it to the Graph or updated it, as I will need to handle a second user input type, that will work with the Graph in the same loop. So basically I need everything to work in an infinite loop, until the command is terminated and every input should be handled and not after you have all the User input.

#4

Uh ok, that’s not gonna work with references then. Whatever data structure owns your Strings, you will need to modify it, but you can’t when your graph holds references to it.

How about working with indices? Keep storing the strings in a Vec, and have the graph use indices into it?

#5

One way I started thinking is to have a Vec with the vertices and use their index as well as a HashMap holding the vertices as a key and the index as a value, so I can also be able to check if the vertices exist already so I can update them, as the example is not full and the struct is missing the Edges’ weights.

That’s what I meant with this. The only question is: Is this idiomatic way of doing it, since I am going to keep 2 copies of the same data - One in Vec (for the indices of the nodes/vertices) and one for checking if they already exist(the HashMap with the vertices as keys and indices as values), so I can update them, if a new request from the user comes for updating the weights(that I haven’t put into this code here).

Or is there a better way to do this with Smart pointers and using for example Box, but not sure if this will work out for checking if the Nodes/Vertices exists like (Box<String>, Box<String>)

#6

Keeping the same data twice sounds bad, alright. But I don’t fully understand why you’d need to do so. Which Vec would you want to keep indices in? I’d make edge something like (usize, usize) (maybe I’d make a struct Edge {from: usize, to: usize}), where the usizes are indices into Vec<Exchange>. If you need to specify which part of an Exchange any of the components of the Edge refers to, save an additional enum value and make a get method on Exchange.

Or am I missing something (again)?

#7

Ok, maybe I should have added a more complete snippet, my bad.
You can check it here

The idea is that I want to also be able to find out if that Node/Vertex already exists and if it exists I want to update it. As you see the 3rd user input has the same Bobby Exchange so that input should update the already existing Edges which are:
("Bobby Exchange", "USD") -> ("Bobby Exchange", "BTC") and ("Bobby Exchange", "BTC") -> ("Bobby Exchange", "USD") with the new ratios/weights.

#8

Gotta run, sorry, but won’t that just mean you have to search your Vec<Exchanges> for an existing one and update appropriately?

#9

That’s also a thing I was considering, but I am not sure if this is performance issue if you search every time for a specific value in the Vec with an Iterator and find method.

PS: I also thing it’s a bit error prone, since I need to guard against duplication. I.e. I cannot use a HashSet or HashMap way of guarding against have the same values twice. Maybe I will look into a library or make one if necessary for handling indexed HashMap.

Edit: Actually that seems like a good alternative. Use a struct that wraps HashMap with the key I want to search on and value of the index and keep an internal index. But still we have the same issue of: how can I get the key(the (String, String) of the hashmap by the value (the internal index)

#10

I wouldn’t necessarily worry about this. Sure, it doubles the amount of space you use to store certain strings, but only some strings, and the multiplier is fixed at 2. Unless I am misunderstanding the problem, you don’t have to clone any particular string more than once. From what I’ve seen, it’s fairly common (and not only in Rust) to have an indexing data structure that duplicates some of the data from your main data structure.

But there are some other things you might be able to apply here:

  • Rc<str> is a reference counted string. Its layout is the same as &str or Box<str>, and you can’t mutate the string, but it’s cheap to clone. Use Arc instead if you need to share across threads.

  • You can use an arena to store strings and only deal with references. Even cheaper than Rc, but the arena has to outlive your main data structure. There are crates that provide arenas; typed-arena is one.

  • If you never need to delete or reallocate strings, maybe you can just leak them and get away with &'static str. Be careful with this: once you leak something, you can’t safely free it. But there are some situations where you just know the string will never need to be cleaned up until the program ends, and leaking it lets you “just do the thing” without fooling around with lifetimes.

2 Likes
#11

What do you think it a good idea:

  1. Having a HashMap<(String,String), usize> where the value is index and searching every time when you want to check for existence of index or getting the Key by index
  2. Having both HashMap<(String,String), usize> where the value is the index from the Vec, which will be Vec<(String, String)> and be used as indexing for searching by index.

PS: No removal of nodes will be preformed.

#12

Have you seen http://smallcultfollowing.com/babysteps/blog/2015/04/06/modeling-graphs-in-rust-using-vector-indices/? Your comment about removal made me think about it as it says “The primary disadvantage comes about if you try to remove things from the graph.”

#13

Do you really need to store the strings as data at all, or just use them as keys for the HashMap? Does the node need to know it’s own names, or just be found by them, remembering that you can get them out as key-value pairs via iter() etc. The HashMap will take care of that storage for you, if so.

In addition to the other suggestions: If you do need the strings, and won’t be deleting them, a more ergonomic way to do what you’re suggesting with Vec indices is to use Intern<String> from the internment crate (or one of several similar alternatives). They behave like strings, rather than usize, and hide all that indirection.

#14

I am using petgraph for the GraphMap @scottmcm, but I will check the blog post, thanks!

@dcarosone I am not sure if I need the String in Data, but I do plan to use them and make calculation on the graph on each iteration, so first pushing the Graph Nodes and creating Edges with the weights, with the ability to update the weights, but also use the Edges and perform calculations with all of them(i.e. a Floyd-Warshall algorithm will be used)

And I guess one of the questions is - is it better to use an iterator (hash_map.iter()), to always look up for the index, in this case it will be probably the Value of the HashMap in order to use the index(usize) for the graph.

Another question is - is it better to store the String, as @dcarosone mentioned, with internment, and use the (Intern<String>, Intern<String>) for creating the Nodes. I haven’t tried it, but I see the Itern implements Copy, so it should be good.

Tbh at this point I managed to store it in a struct containing both a HashMap and a Vec pushing the data in both places, to be able to quickly check for both an Index(which is used in the graph) or a key (String, String) which is used in the actual Node.

That was one of my main question that I wanted to raise, as I am pretty new to system-programming and only worked in GC high-level languages, and I cannot weight the difference between calling iter() and find() a record, every time I want to search by index or duplicating the data, or I am just missing knowledge of crates like the one shared for interning strings.

Thank you for all your help! I hope I get more clarification though :smiley:

#15

I took a look at petgraph, and read your code and initial comments more carefully.

Initial thoughts:

  • I don’t really think (for the purposes shown here at least) you need the nodes to know their own name, since you can get a list of nodes, and from a node the list of its neigbours, etc out of the graph.
  • Even if you do want to keep the node name in the data, you don’t want to try and borrow from it. I think this is the problem you ran into initially with lifetimes.
  • I recommend to start with you just .clone() the strings whenever you need to pass an owned value and don’t worry about the extra space and copies. You might need the odd .as_str() or similar as well. The important thing is to look at each case and understand why some of these are necessary to keep the type system happy, rather than try to force it to something else, when learning.
  • Then if you have a lot of data, look at string internment as an optimisation.
#16

Well - yes, that’s what I am apparently going to do, and to keep track of the already inserted Nodes I will use a different structure to hold them as well as their indices. I already started implementing this, but was still wondering if I am on the right track.

And as a note to my learning experience: I do know that’s not how I should solve it, even though in other languages that will work, I just couldn’t wrap my head around and think of solutions in the Rust way.

#17

Ah, your node weights need to be Copy, so to revise my recommendations:

  • I’d jump straight to Intern<String> rather than tracking a separate map of nodes to indices.
  • It looks to me like the graph already provides the “insert or update” semantics you want, so there’s no need to handle repeat data specially just for the graph. If you want to know this (to provide some different UI) just ask the graph.

So, in short, don’t create the extra complexity of tracking everything twice.