Possible to split String returning [(&str, index: usize)]


#1

Problem: Strings of variable text that I need to parse semantically. For instance the text may give a person’s affiliation within a university. The format of this affiliation may be inconsistent however: “Department of Foo” vs “Bar Department”. This affiliation can also occur at any point in the text.

My approach: Split the text into collection of individual words using text.split(" ") then iterate over the collection for the keyword (e.g., “Department”) using enumerate() to look at the word immediately preceding or two words following the keyword.

let splittext = text.split(" ").clone().collect();
for (i, word) in splittext.enumerate() {
  if word.contains("Department") {
    match splittext[i+1] {
      "of" => return splittext[i+2],
      _ => return splittext[i-1]
    }

However this means that if the “Department” has a name > 1 word long the name will be truncated. “Department of Baz Qux” would only return “Baz”.

Possible solution: Upon finding the keyword, scan forward or backward in the original text to find another delimiter (lowercase word or punctuation). If the keyword is not preceded by lowercase word or punctuation, then return the preceding words that meet those criteria. Else return the words following the keyword up to lowercase (excluding initial “of”) or punctuation.

It occurs to me that I could get the index in the original string using find(keyword) rather than splitting the string in the first place, but I am curious now if it is possible to split a string and return a tuple including the index of the split in the original string.


#2

You can probably write this behavior yourself using str::match_indices, which gives you both the index of the start of the split as well as the length of the matched pattern.