Custom formatter in tracing_subscriber::fmt to filter ANSI escape codes?

I have made a method replace_control_chars to filter ANSI escape codes in the stdout/stderr of untrusted processes avoiding terminal injection.

replace_control_chars (View code)
/// Remove control characters from string. Protects a terminal emulator
/// from obscure control characters
///
/// Control characters are replaced by � U+FFFD Replacement Character
pub(crate) fn replace_control_chars(s: &str, keep_newlines: bool) -> String {
    /// Return whether Unicode character is safe to print in a terminal
    /// emulator, based on its General Category
    fn is_safe(c: char) -> bool {
        !matches!(
            unicode_general_category::get_general_category(c),
            unicode_general_category::GeneralCategory::Control // Cc
            | unicode_general_category::GeneralCategory::Format // Cf
            | unicode_general_category::GeneralCategory::PrivateUse // Co
            | unicode_general_category::GeneralCategory::Unassigned // Cn
            | unicode_general_category::GeneralCategory::LineSeparator // Zl
            | unicode_general_category::GeneralCategory::ParagraphSeparator // Zp
        )
    }

    s.chars()
        .map(|c| {
            if is_safe(c) || (keep_newlines && c == '\n') {
                c
            } else {
                '\u{FFFD}'
            }
        })
        .collect()
}

Currently we use this method in all our println! calls like this:

let stderr = String::from_utf8_lossy(&result.stderr);
eprintln!(
    "Warning: something failed: {stderr_sanitized}",
    stderr_sanitized = replace_control_chars(&stderr, true)
);

I was wondering if there are better ways to do this?
Couldn't we use the tracing crate for our logging/output and use tracing_subscriber::fmt to register our custom replace_control_chars as formatter for all output?

do you output these messages from the app code or lib code? if you have full control over the entire code base, then it's no problem.

but if the code to print these messages is in a library crate that can be used by different application, then the tracing subscriber may not be configured as you expect.

also, when your application code sets up the formatter, you should consider: do you want to sanitize all log messages including those from third party dependencies, or do you only need to sanitize messages in specific crate/module?

Could you elaborate on this? Do you mean that if someone uses my library they can 'overwrite' the formatter or something?

So in case for libs you would stick to println?

Edit: I guess I understand. The library itself cannot enforce if or how tracing is used/configured, right?

yes, that's what I meant.

the application may use whatever formatter for messages, or it may use a custom subscriber without the fmt layer, it may even use a subscriber that does not write the messages to the console at all.

in general, a library (as opposed to an application) should use tracing mainly for instrumentation, not as a replacement for terminal ui output.

1 Like

Thanks for your reply. What do you recommend in my case? Continue using replace_control_chars in every println, or make a custom macro? Or a custom struct for output?

You could, in principle, replace stdout's and stderr's file descriptors and redirect them to a thread, to sanitize and actually print the data. That isn't job of a library either; imagine what happens if two versions of your library are installed across the dependency tree.

my personal recommendation is to explicitly sanitize the string when you want to send them to the terminal, it's the same whether you use std::println!() or tracing::warn!().

there are plenty ways you can do it, and I don't think it really matters how you do it, it's more about personal habits and styles. you can use a function, a wrapper type, or a macro, etc., for example:

let stderr = ...;

// your original approach using `eprintln!()` and a function:
fn sanitize(s: &str) -> String {
  //...
}
eprintln!("Warning: blahblah: {}", sanitize(&stderr));
//or
tracing::warn!("blahblah: {}", sanitize(&stderr));

// a small variant: use a wrapper type to wrap the impl for `Display`
struct Sanitized<T>(T);
impl<'a> Display for Sanitized<&'a str> {
    fn fmt(&self, f: &mut Formatter) -> Result {
        // can write directly to the internal buffer of `Formatter`
        // potentially save an extra allocation for `String`
        //...
    }
}
eprintln!("Warning: blahblah: {}", Sanitized(&stderr));

// you can always abstract the whole process into reusable code unit,
fn report_failure(result: ???) {
    eprintln!("Warning: blahblah: {}", sanitize(&String::from_utf8_lossy(&result.stderr)));
}

note, for maximal efficiency, it is possible to skip all the intermidiate String allocations and fuse the utf8 validation and escape code replacement in a single pass on the raw bytes, but a truly single pass algorithm needs a lot of manual work, since we don't have a utf8 decoder in the standard library that yields single code points (as opposed to sequence of code points in chunks, as [u8]::utf8_chunks() does), incrementally.

1 Like