Hi, I'm building something like a lexer and I need to build a char from two utf8 values to escape some characters like \n
, \r
etc. I don't know if constructing a character is possible or what is the ideal option, but basically I need it not to take the backslash as an escaping character itself (\\
)
Parsing usually uses some sort of hard coded mapping of escape sequences to the result character the escape sequences should map to.
Sometimes one of the escape sequence is an arbitrary Unicode sequence, which you could implement via char::from_u32
if that's what you're getting at.
Otherwise I'm not entirely sure what your asking, it might help to show some expected inputs and outputs.
Here's an extremely non-optimal re-implementation of some of Rust's built in escape sequences, maybe that will indirectly answer the question?
const INPUT: &str = r"Here's some escapes\n\\and\\ such \u{2705}";
fn main() {
println!("BEFORE:\n{INPUT}");
let mut source = INPUT.chars();
let mut data = String::new();
let mut in_escape = false;
while let Some(c) = source.next() {
if in_escape {
let replace = match c {
'n' => '\n',
'r' => '\r',
'u' => {
let mut find = source;
let open = find.next();
if open != Some('{') {
panic!("Expected opening bracket in unicode escape: {open:?}")
}
let char_data = u32::from_str_radix(
&(&mut find)
.take(6)
.take_while(|c| *c != '}')
.collect::<String>(),
16,
)
.unwrap();
source = find;
char::from_u32(char_data).unwrap()
}
'\\' => '\\',
_ => panic!("Invalid escape: {c}"),
};
data.push(replace);
in_escape = false;
} else if c == '\\' {
in_escape = true;
} else {
data.push(c);
}
}
println!("AFTER:\n{data}");
}
Output:
BEFORE:
Here's some escapes\n\\and\\ such \u{2705}!
AFTER:
Here's some escapes
\and\ such ✅!
Note that I had to use a raw string literal to avoid escaping my backslashes in the string literal there.
Thanks! That's exactly what I asked for, maybe I don't explain it well
This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.