Is there a way to adding to the format string options?

I would like to be able to support printing directly from a null terminated "ptr: *const char" pointer

I find that doing format_args!("{:?}", CStr:from_ptr(ptr)) seems an expensive overhead.
I have measured it as an overhead to my raw LD_PRELOAD interception I am doing and it can double the time.

It was suggested, that I can write directly from ptr: *const char, directly into libc/syscall write.

Additionally, be able to implement a custom "{X}", strptr, that can print, print strptr as a json string "something", with things like escaping \ and ", etc.

Can this be done?

I am trying to do this to implement in a zero copy approach.

Is formt_arg!() zero copy?

Is this possible ?

CStr::from_ptr calls strlen internally. If you already know the length, you could avoid this overhead by using std::slice::from_raw_parts instead.

I'm curious why you are calling format_args!. Usually this is not called directly; instead you can call the other macros like write! or print! which call format_args! internally. (This won't change the performance, but might make your code simpler.)

You could create a wrapper type that does escaping inside its Debug or Display implementation. Something like this playground.

You might want to have a separate, zero-copy fast path for the case where no escaping is needed.

1 Like

format_args! doesn't do anything at run-time. Any potential copying is done by the consumer of the std::fmt::Arguments that it returns.

For example, the format! macro always allocates a new String and formats into that String, so it cannot be zero-copy. But the write! macro formats directly to a writer, so it depends on what the destination Write impl does. For example, using write! with a std::fs::File is zero-copy. Using it with a BufWriter instead will (sometimes) copy into a buffer on the heap in order to make fewer syscalls.

As mentioned before, you can almost always call these macros rather than calling format_args! directly.


Thanks for being patient with me.

Each line of data being written can have different piecs of data


where is * if the data is in that line is partial, because lib::PIPE_BUF was reached and will be continued in the next buffered write

So was looking for a simple convenient way/API to implement report_readlink(), report_execvp, report_symlink(), etc to be to written. I currently do this as

pub unsafe fn reportsymlink(self: &Self, target: *const libc::c_char, linkpath: *const libc::c_char) {
        let args = (utils::ptr2str(target), pathgetabs(linkpath,AT_FDCWD));
        write!(self.file, "{} SYMLINK {}", UUID&serde_json::to_string(&args).unwrap());

If I have to write each of the fields as an individual fields, the report*() code can become quite long

I also need buffered writing, where the buffer is thread local, so that when a multithreaded program invokes my LD_PRELOAD intercept, there is zero locking. I do buffering because, doing syscalls for individual seems expense. When I did a "linux perf" evaluation. So I want to buffer upto to libc::PIPE_BUF, before write in blocks of libc::PIPE_BUF

I started with BufferedWrite, but I read here that it has a single buffer. That means it cannot be shared by writes from multiple threads.

So I am doing my own implementation of BufferedWrite using thread_local!{} buffers. Trouble there is, I am yet to figure out how can I do thread local buffer fields within the BufferedWrite instance. So far I have seen thread_local examples of globals or BufferedWrite class level statics. Not how to make it threadlocal and specific to the instance Buffered Write. There are atleast 2 instances of BufferedWrite, for 2 destination files.

Thanks for this example. That shows me how to implement custom Display code for my structures/data.

By the way, the standard library's formatting system is not really designed for speed. (It's designed first for compile-time safety and flexibility, and second for avoiding code bloat.) I would definitely experiment with lower-level code in areas where I/O speed is super-critical.

The thread_local crate might be useful.

Thanks for the pointer. Will skip that line of thought. More code I guess. I am wondering if may be there is a way to define something
May be I can write a macro to make this code more simpler.

The problem with this is "thread_local" examples I see are still global or class level.
Not instance specific.

The closes I got to was

use std::cell::RefCell;

struct File;
impl File {
    thread_local! {
        // Could add pub to make it public to whatever Foo already is public to.
        static FOO: Vec<i8> = Vec::with_capacity(libc::PIPE_BUF);

Foo::FOO.with(|x| println!("{:?}", x));

But this would make it, one buffer per thread across "File" the structure.
Not per instance of File, which I think is the right level of buffer uniqueness I want.

On thread_local performace I saw this

and the reference to


static mut RUST_THREAD_LOCAL: ThreadLocal = ThreadLocal{i: 0};

So was thinking I shold use that instead?

The thread_local crate is different from the std::thread_local! macro or the unstable thread_local feature. You can store its ThreadLocal type in a struct or local variable, rather than a static. Example usage.