Parse a String using delimiters


#1

Greetings,

I’m migrating some Python code to Rust, but I stuck at a dead end… Sorry to provide some .py lines over here, but I got some doubts about the best (fastest) way to do that in Rust.

Executing the function parsertoken("_My input.string", " _,.", 2) will result “input”.
Parsercount("Rust=-rocks!", " =-") will result 2,

def parsertoken(istring, idelimiters, iposition):
    """
    Return a specific token of a given input string,
    considering its position and the provided delimiters

    :param istring: raw input string
    :param idelimiteres: delimiters to split the tokens
    :param iposition: position of the token
    :return: token
    """
    	vlist=''.join([s if s not in idelimiters else ' ' for s in istring]).split()
    	return vlist[vposition]

def parsercount(istring, idelimiters):
    """
    Return the number of tokens at the input string
    considering the delimiters provided

    :param istring: raw input string
    :param idelimiteres: delimiters to split the tokens
    :return: a list with all the tokens found
    """
    	vlist=''.join([s if s not in idelimiters else ' ' for s in istring]).split()
    	return len(vlist)-1

Given I really care about speed, in my Rust implementation I am thinking to change the former API, mainly because to get multiple tokens from a string, I have to split the string every single time.

Thanks in advance


#2

https://doc.rust-lang.org/stable/std/string/struct.String.html#method.split may be a useful place to start? Or maybe the regex crate?


#3

You can just split on a list of delimiters (the Pattern trait is quite flexible) and keep the Vec of slices:

let tokens: Vec<_> = input.split(&[' ', '=', '-']).filter(|k| !k.is_empty()).collect();

Now you can access tokens[i] and tokens.len() - all entries are pointers into the original input string. To make them into owned strings, use e.g. tokens[0].to_string().

If you need to take the delimiters as a string:

let delims: Vec<_> = delim_string.chars().collect();
let tokens: Vec<_> = input.split(&delims[..]).filter(|k| !k.is_empty()).collect();

#4

Thanks a lot !