How to find the length of a substring that is at the end of the string?


The previous version of the lexer for my assembler project failed so hard when I was able to test it that I switched to using a library called logos to handle actually making the lexer. With it, I'm only really making an enum that logos will use to make the lexer and some callback functions. There's only three of these callbacks left to implement, and all of them deal with a variable-length substring, unlike every other callback which deals with a fixed-length substring.

As an example, here's the code I have for a fixed-length token: an absolute hexadecimal address.

fn addr_abs_hex(lex: &mut Lexer<Token>) -> Result<AddressInfo, ()> {
    let slice: &str = lex.slice();

    let poss_addr: Result<i32, ParseIntError> = i32::from_str_radix(&slice.substring(slice.len() - 6, slice.len()), 16);

    match poss_addr {
        Ok(t) => {
            if t > 16777215 || t < 0 {
            } else {
                Ok(AddressInfo { addr_type: AddressType::Absolute, base: NumBase::Hexadecimal, val: t })
        Err(e) => Err(())

If you couldn't tell, the pertinent information in the string slice is at the end of the string slice. With all the fixed-length substrings, that's fine because I know how long it is. With something that can be any length, that poses a problem. I can't just scan forward from the start of the string, as I can potentially run into an identifier or string that the lexer has already tokenized. So, I need to scan the string back to front. In addition to that, when it comes to trying to find a string within the string, I need to check for an escaped ", i.e. \". Which, as you may guess, complicate things.

I would like some help with this, as it has been stumping me. Thank you for your time, and have an awesome day!

Why though? From the documentation of logos, it looks like slice only returns the relevant string slice, i.e. whatever portion of the entire string was last matched. Isn't that the case?

1 Like

Is that so? From the examples given in the logos documentation, I had assumed that the slice contains the entire source up to the moment the callback was called. Perhaps that assumption was wrong.

I was looking at this example. It shows that slice() always returns the next word, nothing else.

1 Like

I was mainly looking at the example after that one, but now that I'm looking at that first example, I have no idea how I didn't notice that that was how slice() worked. D'oh!

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.