Escaping Windows paths for TOML

Hey folks! I am looking for some advice on how to handle a situation.

I am writing some tests that take some TOML input. One of the fields in the TOML is a path. The issue is that TOML considers \X to be a unicode escape, and so Windows paths don't work. I am currently doing something like this:

let config = format!(
            r#"
            path = {:?}
            "#,
            path
        );

which abuses the fact that debug print:

  1. includes ""s
  2. turns \ into \\

This works, but feels kind of gross. Does anyone have any advice for doing this in a better way? Thanks!

(There are more keys and values than just path, hence the raw string, I've elided those parts as they're not that important.)

1 Like

Sorry if I'm being dense but wouldn't using a proper TOML serializer/deserializer be better here? Or is that not an option?

The only other good option is to escape them yourself. Convert the path to a string and write a new string that doubles any \ characters.

Alternatively TOML does support raw string literals like this:

'''A raw string \that won't escape characters'''

But that's a bit hacky as you'd have to reject any path with ''' in it (which is unlikely but possible).

1 Like

This is testing some code that uses a proper deserializer; I think the test works a bit better by having the raw values, because if we used a serializer here, then we're just testing a round trip, rather than the behavior.

You can use str::escape_default. It has well-defined behavior.

If you don't want to write test data by hand but don't trust your serializer, you can use snapshot tests (e.g. insta). Serialize the data using your serializer, confirm that the result is valid, and tests will alert you if the serializer's behavior changes in the future.

2 Likes

You could use escape_default which uses escaping that is almost compatible with TOML's escape sequences. You'll need to do a little more work if you want to support paths containing non-ASCII characters.

1 Like

Thanks @Riateche and @mbrubeck, this seems like a good middle ground :slight_smile:

The issue here is that these paths are dynamically generated, so they can change, so I can't do this. Would be great if I could though!

Why not just do replace(r"\", r"\\").Is there a case that wouldn't work?

It also might! It certainly addresses the direct question, but maybe there's other aspects of this problem I wasn't thinking about.

Windows paths can't contain ASCII control codes or these characters:
" * < > ? |

TOML has the following escape sequences:

\b         - backspace       (U+0008)
\t         - tab             (U+0009)
\n         - linefeed        (U+000A)
\f         - form feed       (U+000C)
\r         - carriage return (U+000D)
\"         - quote           (U+0022)
\\         - backslash       (U+005C)
\uXXXX     - unicode         (U+XXXX)
\UXXXXXXXX - unicode         (U+XXXXXXXX)

Anything else starting with a \ is an error. So I don't think there's an issue with simply escaping all the slashes.

2 Likes

You should also just be able to use forward slashes. All of the WIN32 file handling functions accept forward slashes. IIRC it's cmd.exe that doesn't like forward slashes, so this won't work if you're using those paths with a system-like call that invokes cmd but otherwise they should work fine.

You also need to escape double-quotes ("), if this code also runs on platforms where double-quotes may appear in paths.

Oh for sure, but my understading Linux is another problem entirely. Paths can contain any control characters so you'd have to escape them?

I am not creating these paths, I am asking the system for them. As such, Windows gives them to me in their native format, which is \.

Come to think of it, if you want to escape arbitrary strings for TOML (e.g. both WIndows paths and Linux ones) it would be simple enough to write your own escape code based on the TOML table above. Something like:

fn toml_escape(path_str: &str) -> String {
    let mut escaped = String::with_capacity(path_str.len());
    for char in path_str.chars() {
        match char {
            '\u{08}' => escaped.push_str(r"\b"),
            '\t'     => escaped.push_str(r"\t"),
            '\n'     => escaped.push_str(r"\n"),
            '\u{0C}' => escaped.push_str(r"\f"),
            '\r'     => escaped.push_str(r"\r"),
            '"'      => escaped.push_str(r#"\""#),
            '\\'     => escaped.push_str(r"\\"),
            c if c <= '\u{1f}' => {
                let hex = to_hex(c as u8);
                escaped.push_str(r"\u00");
                escaped.push(hex[0] as char);
                escaped.push(hex[1] as char);
            },
            c => escaped.push(c)
        }
    }
    escaped
}
fn to_hex(n: u8) -> [u8;2] {
    let digits = b"0123456789ABCDEF";
    [
        digits[(n >> 4) as usize],
        digits[(n & 0xF) as usize]
    ]
}

That's pretty off-the-cuff so doubtless it could be written more efficiently. Especially if you were to write directly to the formatter.

1 Like

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.