Rust doesn't support octal literal specification; a 7-bit escape is still specified in hexadecimal, it's just that the first nibble has to be limited to 0-7, instead of being 0-F.
You'll need to specify the byte in hexadecimal - octal 33 is hexadecimal 1b, so \x1b for this literal.
I'm confused. "An octal digit followed by a hexadecimal digit" is very obviously not the same as "two octal digits or two hexadecimal digits", which is what your interpretation would imply. How could the documentation possibly mean that?
7-bit escapes are used in Unicode string and character literals, where they map to the subset of Unicode scalar values that are are identical to ASCII and can be represented as single bytes in UTF-8. For USVs outside that range, you must use Unicode escape sequences like \u{FF}.
8-bit escapes are used in byte and byte strings literals (which are not UTF-8, so can contain arbitrary u8 values).
I forgot to mention why this difference is enforced. One possible reason is that removing the b from a byte string literal (like b"\x41\x42") to turn it into a Unicode string literal (like "\x41\x42") will result in a string whose UTF-8 representation exactly matches the original byte string, or a compile-time error if that is not possible.
That is, it ensures that various types of literals that look the same are guaranteed to have the same in-memory representation.
The escape sequence consists of \x followed by an octal digit then a hexadecimal digit.
I think this documentation is needlessly confusing because base 8 isn't used here. It should just say two hexadecimal digits, and that the number must be in the range 0x00 - 0x7f.
If it's part of the lexer, it's just writing out the regex x[0-7][a-fA-F0-9] in English, and talking about the individual USVs in the source code is the correct approach in the lexer.
Note that for the value the same section describes it as
is the result of interpreting the final two characters in the escape sequence as a hexadecimal integer
which sounds more like what you're talking about.
This is the reference, so it's phrased for that, and that's very different from how I'd informally describe it to people.
Also, the page clarifies
In the definitions of escapes below:
An octal digit is any of the characters in the range [0-7].
A hexadecimal digit is any of the characters in the ranges [0-9], [a-f], or [A-F].
OK so my proposed definition matches this: two hexadecimal digits in the range 00..=7f. What's wrong with that? I don't see what is informal about that, it's just as formal.
Imagine if the range was from \x00 to \x6c for some reason. How would you describe that? Clearly just specifying the range is both more general and cleaner.
Imagine there was a token that was required to be a decimal number between 00 and 39, would it be better to say "one quaternary digit and one decimal digit, whose value is computed by reinterpreting the quaternary digit as a decimal digit", or would it be better to simply say "a two-digit number in the range 0..=39"? The latter is equally formal, and much shorter and clearer.
Unicode escapes also have a limited range, the lexer has to reject numbers bigger than \u{10ffff}, and yet the grammar and its English description don't specify the range using regexes, it just says "up to 6 digits".
There's a bunch of open conversations about exactly what the lexer should accept in which positions. For example, '\u{AA0000}' is clearly not semantically valid, but it's possible that it might be lexically valid, either to a proc macro or as an ignored tt.
So from the perspective of the reference, I still want both "this is the regex that you should put in your tokenizer" that does talk in terms of the individual characters.
From the perspective of
There's an important lexical choice to make there. If that was the range, is \x6d valid inside a cfg(FALSE) or not? Is it legal to call ignore_one_tt!('\x6d'); because the lexer doesn't care, even if let c = '\x6d'; has to fail?
As a programmer, I almost certainly don't care. But the reference has to describe such things.
The escape sequence consists of \x followed by an octal digit then a hexadecimal digit.
But my point is that a description of the form "exactly two hex digits with value up to 0x7F" is exactly equivalent to the regex [0-7][a-fA-F0-9]. So it's described properly either way. Regexes aren't the only way to define tokens.
Sure, in the formal grammar you can use a regex instead, but we're talking about the part in English.
If it's unclear what happens under cfg(FALSE) or in macros, then switching from one way of describing it to the other doesn't clarify it because the two definitions are equivalent. So that's a separate issue.