As you can imagine this is all text coming from form inputs like text and textarea.
All this:
without using regex
with unicode chars
Some tests to satisfy are:
#[test]
fn test() {
assert_eq!(magic(" ".to_string()), "");
assert_eq!(
magic(" a l l lower ".to_string()),
"a l l lower"
);
assert_eq!(
magic(" i need\nnew lines \n\nmany times ".to_string()),
"i need\nnew lines\n\nmany times"
);
assert_eq!(magic(" à la ".to_string()), "à la");
}
This test appears to be inconsistent with the others, which remove all spaces from the beginning and end of the string. From the textual description, either this should be the empty string or the others should retain a single blank space at the beginning and end.
As this is possible to write as a regex, it's also possible to write it as a simple finite automaton. One possible solution would be something along these lines:
use std::iter::Peekable;
struct S<I:Iterator>(Peekable<I>);
impl<I> Iterator for S<I> where I: Iterator<Item=char> {
type Item = char;
fn next(&mut self)->Option<char> {
loop {
match self.0.next()? {
' ' => {
match self.0.peek()? {
' ' | '\r' | '\n' | '\t' => (),
_ => return Some(' ')
}
}
w @ ('\r' | '\n' | '\t') => {
while let Some(' ') = self.0.peek() {
self.0.next();
}
return Some(w)
}
c @ _ => return Some(c)
}
}
}
}
fn magic(chars: String)->String {
let ret = S(chars.trim().chars().peekable()).collect();
return if ret == "" { String::from(" ") } else { ret }
}
Reading RegEx is always a challenge and you are not the only one who finds it hard. However you can use regex parsers/explainers to see what it's doing (something like this, for example https://regex101.com/ ). You also should probably write comment what regex should do.
As for performance, regular expressions are often faster than doing String search and replace methods. Regular expressions are a great tool and you shouldn't avoid them, especially when making a RegEx would save you writing a ton of code.
two or more spaces in a string reduced to one (ex: " text other " -> "text other")
one or more spaces removed after and before characters such as:
\n
\r\n
\t
replace \r\n with \n
I tried with +|\\n +|\t +\\r\n .+ but obviously this doesn't work totally.
We can use the below patterns to check it's working:
assert_eq!(not_useful_space(" "), "");
assert_eq!(not_useful_space(" a l l lower "), "a l l lower");
assert_eq!(not_useful_space(" i need\n new lines\n\n many times "), "i need\nnew lines\n\nmany times");
assert_eq!(not_useful_space(" i need \n new lines \n\n many times "), "i need\nnew lines\n\nmany times");
assert_eq!(not_useful_space(" i need \r\n new lines\r\nmany times "), "i need\nnew lines\nmany times");
assert_eq!(not_useful_space(" i need \t new lines\t \t many times "), "i need new lines many times");
assert_eq!(not_useful_space(" à la "), "à la");