Eprintln!() generates tonnes of write() system calls in strace

I have a macro

#[macro_export]
macro_rules! event {
    ($lvl:expr, $($arg:tt)*) => ({
        eprintln!($($arg)*);
    })
}

Which gets called as follow, in a library constructor function a follows

#[ctor]
fn cfoo() {
    event!(Level::INFO, "Constructor: {}\n", std::process::id());
}

When straced, this is what I see. I was expecting a single write() system call.

write(2, "Constructor: ", 13)           = 13
write(2, "41302", 5)                    = 5
write(2, "\n\n", 2)                     = 2

It gets far worse for

    event!(Level::INFO, "Incoming Environment: {:?}\n", env::vars_os().map(|(x,y)| x.to_str().unwrap().to_owned()).collect::<Vec<String>>());

becomes

write(2, "Incoming Environment: ", 22)  = 22
write(2, "[", 1)                        = 1
write(2, "\"", 1)                       = 1
write(2, "LD_PRELOAD", 10)              = 10
write(2, "\"", 1)                       = 1
write(2, ", ", 2)                       = 2
write(2, "\"", 1)                       = 1
write(2, "RUST_BACKTRACE", 14)          = 14
write(2, "\"", 1)                       = 1
write(2, ", ", 2)                       = 2
write(2, "\"", 1)                       = 1
write(2, "TERM", 4)                     = 4
write(2, "\"", 1)                       = 1
write(2, ", ", 2)                       = 2
write(2, "\"", 1)                       = 1
write(2, "HOME", 4)                     = 4
write(2, "\"", 1)                       = 1
write(2, ", ", 2)                       = 2
write(2, "\"", 1)                       = 1
write(2, "PATH", 4)                     = 4
write(2, "\"", 1)                       = 1
write(2, ", ", 2)                       = 2
write(2, "\"", 1)                       = 1
write(2, "USER", 4)                     = 4
write(2, "\"", 1)                       = 1
write(2, ", ", 2)                       = 2
write(2, "\"", 1)                       = 1
write(2, "WISK_TRACE", 10)              = 10
write(2, "\"", 1)                       = 1
write(2, ", ", 2)                       = 2
write(2, "\"", 1)                       = 1
write(2, "WISK_TRACK", 10)              = 10
write(2, "\"", 1)                       = 1
write(2, ", ", 2)                       = 2
write(2, "\"", 1)                       = 1
write(2, "WISK_CONFIG", 11)             = 11
write(2, "\"", 1)                       = 1
write(2, ", ", 2)                       = 2
write(2, "\"", 1)                       = 1
write(2, "WISK_WSROOT", 11)             = 11
write(2, "\"", 1)                       = 1
write(2, "]", 1)                        = 1
write(2, "\n\n", 2)                     = 2

Is this expected and normal?
How can I make this more efficient.

You can write to a global once_cell::sync::Lazy<Mutex<LineWriter<Stderr>>> to buffer stderr line by line.

1 Like

Does this mean, that is normal and expected that 1 eprintln!() command will translate to so many write system calls in strace ?

Yes, it's intended behaviour that stderr is not buffered. The reasoning is that the standard error needs to be seen by the user as soon as possible, should the program crash or get stuck for a while. It is also not generally a performance concern as masses amounts of data is not typically transferred across it.

9 Likes

To be precise, there are two issues that combine here - I/O buffering, vs. the implementation of formatted output.

stderr should indeed not be buffered, but the reasons for that don't call for separate output events for each punctuation element as it's computed, and indeed you'd get only one write with C fprintf.

1 Like

I certainly consider this a bug in the way rust eprintln is implemented.

I had to implement something like this, instead of using eprintln!()

#[macro_export]
macro_rules! event {
    ($lvl:expr, $form:tt, $($arg:tt)*) => ({
        WISKTRACE.write_all(format!(concat!("{}: ", $form, "\n"), UUID.as_str(), $($arg)* ).as_bytes());
        // eprintln!($($arg)*);
    });
    ($lvl:expr, $form:tt) => ({
        WISKTRACE.write_all(format!(concat!("{}: ", $form, "\n"), UUID.as_str(), ).as_bytes());
        // eprintln!($($arg)*);
    });
}

It’s a matter of tradeoffs: your version is more performant, but will produce less information than it could have if the Debug or Display implementation of an argument is faulty. Neither choice is universally superior, and the libs team chose to favor robustness over performance here.

3 Likes

Also this version allocate, which is not always possible in error cases.

Given the name "events" these are probably usually relatively short, right?

You could try changing the macro to write to a local fixed-size buffer than write that buffer to the output all at once.

1 Like

I just cut and paste a wrapper macro i had coded as a whole which is meant for more later.
My main point being, I had to format!() to collect and build the args writeall to file/stderr.
To avoid allocate, I could create a thread local buffer and build/format the message in there before I write it out.

The key messsage, anyway you look at it, it makes no sense that a single eprintl() translates to like more that 30 system calls and writing as little as 1 or 2 characters each.

By comparison, C printf to STDERR would be one system call.

Indeed, the system call sequence in sarvi's example goes far beyond a separate write for each object that could crash in formatting. But to account for those as well - is there a reasonably economical way to catch errors in Debug or Display, and then output the whole thing with all the items that formatted successfully, in one write()? That seems like a more desirable outcome as well as more efficient.

:100:

For instance:

//! Assuming this is at the module `some_path` of your crate

pub(in crate)
fn buffered_eprint (fmt: ::core::fmt::Arguments<'_>)
{
    use ::once_cell::sync::OnceCell;
    use ::std::{
        io::{stderr, Stderr, LineWriter, Write},
        sync::Mutex,
    };
    // println!("called buffered eprint");
    static STDERR: OnceCell<Mutex<LineWriter<Stderr>>> = OnceCell::new();
    let _ =
        STDERR
            .get_or_init(|| Mutex::new(LineWriter::new(stderr())))
            .lock()
            .unwrap_or_else(|err| err.into_inner())
            .write_fmt(fmt)
    ;
}

macro_rules! my_eprint {
    ( $($input:tt)+ ) => (
        crate::some_path::buffered_eprint(format_args!($($input)*))
    );
    () => (crate::some_path::eprint!(""));
}
pub(in crate) use my_eprint as eprint;

macro_rules! my_eprintln {
    ( $($input:tt)+ ) => (
        crate::some_path::eprint!("{}\n", format_args!($($input)*))
    );
    () => (crate::some_path::eprintln!(""));
}
pub(in crate) use my_eprintln as eprintln;

macro_rules! event {
    ($lvl:expr, $($arg:tt)*) => ({
        crate::some_path::eprintln!($($arg)*);
    });
}
pub(in crate) use event;

And then you can use

  • either crate::some_path::event!

  • or event!, if you previously have #[macro_use] mod some_path;

This seems to be using mutex. Would it still be efficient I a multithreaded environment.
Shouldnt the storage used for buffer before writing be thread local?

1 Like

eprintln! already uses a mutex (or equivalent) over stderr such that writes aren't interleaved. (That's what stderr().lock() is. Not locking stderr just does a short-lived lock on each write.)

If you want to keep stderr from interleaving, you will need to use a mutex.

makes sense.

That said, println!() and eprintln!(....) is the simplest way to get out to stdout and stderr and the fact that the default implementation is so inefficient in terms of how many sys calls it makes to do its job, makes it almost unusable in a lot of cases.

I am bit surprised I am not hearing agreement that implementation needs to be fixed in the first place.

I think what you're hearing in part is unsurprise that the simplest way isn't always the right one.

There are many, many cases where the easy ones are the wrong one -- another obvious one is that IIRC they .unwrap() all the errors, so there are people who don't use them if their program is likely to be used in a shell where the output can go to a closed pipe. But does that mean it's wrong of them to panic on errors? Probably not.

2 Likes

Are you saying eprintln!() isn't the right choice for diagnostic output? I think that's what sarvi referred to as the simplest way. I'm surprised so many people are unsurprised, if so. Maybe I'm just an old timer too used to C, where stdio buffered output works efficiently and you're rather encouraged to use it.

Depends what "diagnostic output" means to you.

Other things that might be appropriate include

1 Like

It does seem surprisingly inefficient. (I guess the reason for doing it that way is to avoid the possibility of having to allocate a buffer, especially if the line turns out to be unexpectedly long.) But we already knew that Rust stdout/stderr handling is inefficient because it has to lock a mutex each time.

If you don't care about speed, it is fine. If you do care about speed, you're going to want to output blocks of data (whole bunches of lines at a time) and not flush after every newline anyway, and you'll probably grab the lock for the entire duration of your output. So even without the multiple writes and the mutex (even if those overheads didn't exist somehow), you'd still know that println! can't possibly give you maximum performance because it flushes every line.

So perhaps this explains why no-one is concerned, because they're already doing it another way if they need maximum speed.

3 Likes

That pretty much describes me— I learned long ago (before Rust) that writing to the console is slow no matter how you try to do it, and so I avoid doing logging in the hot path if at all possible. I certainly wouldn’t mind a more efficient implementation, but neither do I consider poor performance here a bug— in every language I’ve used stderr has been unsuitable for performance-critical sections.

2 Likes