Hi all! I'm trying to port a small app I wrote in (ugly) C++ to Rust, and learn it better in the process. What it does is to parse a BNF grammar file, build the FSA of the regular expressions it describes, and then derive some information about it. To do so, I need to store a series of references to the FSA. Basically,
in the first phase, it needs to create the FSAs, and store the references to them into the data structure
in the second phase, it should access the data structure to read
Now, everything could be done by manually passing a reference to the map in every piece of code it uses it, but it's cumbersome and cluttering. Note that there will be only one thread accessing the structure when building (potentially multiple ones could read it in the future, but it's not an issue at the moment, and in any case they won't modify it), so it will act as a sort of "read only, but built in more than one statement" dictionary?
in my C++ code I stored a global variable/singleton HashMap. I thought to do the same here in Rust, but I can't wrap my head around the errors the compiler is giving me.
after having specified the trait my structs are using as
trait ASTElem: Send + Sync {...}
Now, what do you think would be the best way to achieve the same? And in this case, why does it suggests using Lazy to avoid calling non const code in const variable? Since I don't need multithreading now, I suppose I could use something that doesn't require Send and Sync? Does it cost anything specifying them? thanks!
Without more to go on I think your best bet would actually be to
store the HashMap in a field of a struct
define methods on the struct for the building stage
after building, convert that struct an Arc<HashMap<_, _>>
store a clone of the Arc<_> in any struct that needs to access it
Though you may be able to use a lazy static for (4) instead.
If you want it to be global from the get-go, you need some sort of synchronization for the mutation, like a Mutex<_>. Not being explicitly multithreaded is not enough for many of Rust's invariants; e.g. &mut _ requires exclusivity even on a single thread. So for example two unrelated &mut to the same HashMap at different places on your call stack is UB.
I guess you could have a global Mutex protected version, and then after building, move that into a non-Mutex protected lazy static. With an increased risk of logic errors such as accidentally initializing the non-Mutex version early or mutating the Mutex version late...
This may lead to a bad time, unless perhaps you leak things so that you can get ahold of &'static _ references. Rust references are typically for short-term borrows, not long-lived data structures.
Send and Sync have no runtime costs on their own. They limit what types can implement your trait or be type erased to dyn YourTrait at compile time.
I see, thank you both. So, IIUC, the information you give to the compiler via Sync/Send and the like must be defined on a per structure basis, and therefore there's no way to attribute them to a single variable, you have to encapsulate what you need in a once use only struct. The upside of this is that it will be elided by the compiler, I guess, and so there will be no runtime cost?
I get that having multiple &mut is prevented by the compiler, but in this case there would only be one mutating reference "active" at a time. But again IIUC the whole point of doing this is to avoid explicitly passing the reference in the function signatures, and therefore, there's no way for the compiler to know in advance the relative scopes of the various references, hence the need to use synchronization mechanisms even if the code is single-threaded. Did i get all right?
I must correct, in the end I am trying to store not references, but Box<>es to the FSA. this opens me to three more questions:
in case I need to hold the pointing in more places of the code, I have to switch to Rc<>, right?
Can you please elaborate on the first sentence? I'm not sure I'm understanding what you are talking about, sorry
I get that references' idiomatic use is short term, but why? And can they be made long lived, if desired? My understanding is that for long term puproses you'll quickly need to synchronize accesses to the data they point, if you also need to mute them, thus reconducting ourselves to the problems we have in the first place. In this case, I can technically use &'static references for reading, but since I need to mutate the data first, they can no longer be 'static and multiple! Did I get this right?
I guess the type erasure always happens, right?
@2e71828 thanks for your reference, I tried to use once_cell but without success, I gues it was because I was trying to avoid the synchronization structs. I think I could revisit that approach now
I guess I'd say the benefit of the properties being on types is that the compiler can reasonably use the type system to prove soundness without global analysis.
One of the jobs of rustc is to prove that your program is sound / that UB is impossible.[1] There's no reasonable way to do that in the face of something like a static mut, where aliasing &muts can be created everywhere in the program.
In part "only short term use" is shorthand for "new to Rust and thinking you'll build some pointer-heavy data structure out of references? Think again...". It is possible to have long-lived lifetime-parameterized data structures; "no cost" parsers are one example.
But references are a poor fit arbitrary chains of borrowing, especially when mutation is involved. In part because they're a poor fit for the ownership/borrowing model, and in part because lifetimes and borrow checking are a compile time analysis, while arbitrarily nested borrowing is usually a dynamic situation.
I don't think what I just wrote is very illuminating, but I'm not sure how to word it better either. You could always give it a shot and experience some struggles of trying to get the compiler to accept lifetime heavy code yourself. Or here's a classic exploration relevant to the topic.
You said something like "read a file ... and store references to the result". I interpreted this as using references (&_) to some dynamically read or generated data (from a file). I guess now this isn't what you meant, so probably it's irrelevant. But anyway:
References to dynamic data generally can't be &'statics because it either lives on the stack, and the stack frame will pop eventually; or it lives on the heap, and a destructor will reclaim it eventually (unless you leak it). If you have some dynamically generated data, let's say a String, and you want a &'static _ or &'static mut _ to the data, you can leak the data in order to get the 'static reference.
Leaks are generally undesirable, but IMO this is a perfectly reasonable approach for things like non-daemon CLI tools that immediately read a file, use the data in the file for the entire program, and then exit.
I phrased that poorly. I meant, Send and Sync are enforced at compile time and have no runtime costs on their own. But not all types implement Send or Sync, so it is a restriction on what types can implement your trait.
You can't coerce something to dyn YourTrait ("type erase" it) if its type doesn't implement YourTrait. You don't have to use type erasure, so it doesn't "always happen", but probably I'm misunderstanding the question here.
Unless you use unsafe, in which case that's your job. ↩︎
My bad, I phrased that poorly, what I meant was "traits like Send and Sync, that only enforces some guarantees on your data, have a runtime cost, maybe some allocated overhead on the data, or on the functions accepting them? My guess would be no, since as you said they only provide information on the type system, but anyway their usage is mandatory in these cases, so it doesn't really matter what do they cost, IIUC.
I need to go deeper in the lifetime and reference thing, thank you for the reference you shared. Eventually I need to build graphs from the information I get here, and I resource I found observed that building graphs in Rust is tricky given Rust's rules, because you'll probably need to build your graph in more than one pass (especially if you have cycles), which is similar to my problem, so I'll try to investigate also that. Thank you again!