I'm trying to tell the compiler that the command string slice in my tokenize function has the same lifetime as the input owned string within the same struct, I've started learning rust couple weeks ago so and my understanding of the language is very primitive but any help is appreciated.
struct Lexer<'a> {
input: String,
tokens: Vec<&'a str>,
}
impl<'a> Lexer<'a> {
fn new() -> Self {
Lexer {
input: String::new(),
tokens: Vec::new(),
}
}
fn read_line(&mut self) -> &mut Self {
std::io::stdin().read_line(&mut self.input).unwrap();
self
}
fn tokenize(&mut self) -> &mut Self {
for command in self.input.split_whitespace() {
self.tokens.push(command); // the compiler doesn't know that command has the same lifetime as input
}
self
}
}
I've notice we use the so called 'ergonomic' code to fight less with the compiler by abstracting the ownership and borrowing rules into function blocks so it's more manageable, then we can chain call functions and they just work, I think I'm doing something not so ergonomic here that has to do with the structure of the whole thing that put me in this position where I can't seem to be able to specify the lifetime of the command variable within the loop.
Note: I know I can make the tokens vector type a string instead of slices but I that's just wasteful and I wanna learn things deeply.
Rust lifetimes ('_ things) are generally about the duration of some borrow, and not the liveness value of some value.
You're trying to create a self-referencial struct. For an example of why it won't work out: If you manage to construct a self-referencial lexer: Lexer, you can never get a &mut lexer again (because if you could, you could code lexer.input = String::new() and make all your tokens dangle).
If you want to keep going the borrowing route, put the input and tokens in separate structs. But you'll still have some limitations (input must remain borrowed so long as the tokens exist, and e.g. can't be moved while borrowed), so also consider using something like offsets (spans) instead of using references at all.
Ooh, self-referencial struct, now I know what is the name of what I'm trying to do, so based on that it means that there are many patterns that rust doesn't allow since it complicates or even prevents the compiler from doing it's safety checks, which means I need to know the patterns that it allows since they're probably much less than the ones that it prevents.
I used borrowed self in the functions since I'm not sure when to take ownership and when to just take a reference when it comes to chaining functions, taking ownership seems to simplify the ownership and borrowing rules within the functions while referencing seems to be more efficient but from your commend it seems to have limitations I'm not aware of.
Finally, using indicies to mark the start and end of each token/command seems like a good approach but since the structure of the code is a bad pattern to begin with (self-referencial struct), I'll have to find another way.
I really appreciate your response, it gave me so much insight.
Note that storing indices into the string in the same struct that owns said string is fine. You just can't store a reference to the string inside the owning struct due to Rust's move semantics and the guarantees that a reference has (which incidentally was discussed today in this topic). If you move something (like an instance of your Lexer) to a new location, i.e. by passing it to a function that takes the Lexer by value and not by reference, the value is copied bitwise to the new location and the old location in memory becomes invalidated.[1]
Clearly, this poses a problem for references that refer to the old location. After all, there is no valid data there any more and Rust tries its utmost to prevent you from having dangling references like that. So while references to the old location exist, you can't move a value to a new location. If you try the compiler will politely inform you about your mishap by slapping the an E0505 in your face. To solve this you must first get rid of any existing references to the old location. You can see how the snake bites its own tail here when we build a self-referential struct. We want to move the struct, but alas, we can't because we have an existing reference to it within itself.
The actual copying of the memory might be optimized away by the compiler during optimization but that has no effect on the semantics of a move. ↩︎
Very fascinating! Coming from higher-level languages like C# and Java, I can really see how the creators of Rust made design choices grounded in real-world experience and past pitfalls, rather than just throwing in features that sound good on paper—cough C++ cough.
Moving something to a new location and making sure all references to the old location are removed makes perfect sense. Thanks a lot for the extra info!
Half the problem with C++ features aren't even that they aren't grounded in experience and past pitfalls [1], it's that the correct version of the feature is usually proposed first while the committee doesn't understand the problems that led to the design being the way it is and guts it until it's a mess that works 80% of the time and breaks everything the other 20%.
Move semantics are actually a good example of this, since they were originally proposed to be destructive like Rust's, but were negotiated down into the awful must leave the moved from object in a "valid state" form they are today because adding destructive moves was seen as too complicated and there wasn't enough evidence that it would be useful.
Except for std::initializer_list. There's no excuse for that one. ↩︎