Formatting and writing to a file without malloc or locks

In my LD_PRELOAD library written in rust, I generate trace out to a trace file for debugging and some file system tracking data into a tacker.file.
The LD_PRELOAD will be used across a very large build to track all its dependency and I with some changes run into core dumps or hangs of the main program.

It has been a difficult to debug as the core dump has no stack trace to see.
I suspect that the rust write/format code that I am using is doing "malloc" during the formatting and locks to prevent interleaving.

TRACER.write_all(format!(concat!("{}: ", $form, "\n"), UUID.as_str(), ).as_bytes());

I am considering, creating threadlocal memory buffer and constructing the formated content in that thread local memory ad then writing using direct system callls.
To see if that helps cleanup this unstable behaviour.
I would also ideally like to avoid doing all the allocation/deallocation of memory every time I write a line of formatted trace output or metadata output.

What is the best way/API in rust to fully control/avoid memory allocation during formatting and writing of out to a file.
And also avoid locking as well.

It’s format! that allocates. It evaluates to a String, and String is heap-backed.

If you want to avoid allocation, use write! instead.

5 Likes

defmt may be an alternative (or may not, it has its own limitations).

I don't think you can avoid locking without risking corrupted/interleaved output.

You might be able to get away with it by ensuring only one thread ever writes to the file and using channels to send information from the caller to that thread. But even then, channels can lock or allocate.

Are the Rust I/O write APIs even designed to allow writing without any locking at all?

Does the following C code do any locking or mallocing in C langauge?

fprintf(fp, "My formatted %s message %d", "Hello World", 100);

This doesnt lock either.

I am looking for the equivalent code in RUST. And I am yet to find a good answer. So I started thinking in terms of calling linux syscalls. But then realized I would have to do Rust String to CString, which would be a malloc. And I still need to format!() a format string and values somewhere which would be another alloc and I havent found a way around it.

If the above is true. Between writing C code that uses fprintf to write large amounts formatted data to file and writing the ewuivalent program in RUST. RUST is going to have a lot of both locking and malloc/free overhead.
I am not hearing a nice alternative to fprintf() in C.

In my case, the LD_PRELOAD will be writing close to 30G of dependency tracking data. And possibly more tracing data to the 2 files. So the format/write over head definitely matters. Locking not sure if it does.

The equivalent Rust code with no locks or allocations is:

let fp = File::create("output.txt").unwrap();
write!(&fp, "My formatted {} message {}", "Hello World", 100);

So if I understand this correctly, directly writing to a File as in your example performs unbuffered I/O?

1 Like

That's correct. You can use a BufWriter<File> or a LineWriter<File> instead if you want buffered I/O.

2 Likes

So this has no locking or memory allocation ? Nice!!
This will do for my case then. Thanks.

A bit of help with a tweak.

I want to define this in a macro, would the following still not alloc or lock?

#[macro_export]
macro_rules! event {
    ($lvl:expr, $form:tt, $($arg:tt)*) => ({
         write!(&TRACEFP, concat!("{}: ", $form, "\n"), UUID.as_str(), $($arg)*);
    });
    ($lvl:expr, $form:tt) => ({
         write!(&TRACEFP, concat!("{}: ", $form, "\n"), UUID.as_str());
    });
}

also one behavior I noticed with eprintln!(), to avoid buffering, it ended up calling tonnes of syscalls to do 1 eprintln!().
I even have a thread in this forum complaining about the behaviour and was told thats the behavior of eprintln!() and no way around it.
Can I expect the above code to translate to a 1 syscall to write the whole string?
Where does the memory to expand/buffer format string with the values come from?

fprintf() is one linux syscall. Not sure where the memory to expand the format string comes from. stack probably.
Is that same with your example?

Correct.

You need to do the exact same amount of locking in Rust that you do in C. The only difference is Rust's Write trait forces you to do synchronisation (by taking &mut self), while the C APIs let you generate garbled data.

You should be using a lock in the C version, so the two will perform pretty much identically.

To avoid any buffering or hidden allocations, statements like eprintln!("Hello, {}!", name) will perform three writes to STDERR. In C, this would look like:

// const char *name = "Michael";

write(STDERR, "Hello, ");
write(STDERR, name);
write(STDERR, "!\n");

If you want the write to be done in a single syscall you need a temporary buffer to write the formatted text to, which will then be written to STDERR in one hit.. But you've just said you don't want the code to do any allocations, so this temporary buffer will need to be allocated by you and cached, or something on the stack which is big enough for all conceivable messages.

1 Like

No, write! with a format string will generally make multiple syscalls.

If you want to reduce the number of syscalls, you can use a LineWriter or BufWriter which allocates a buffer on the heap. (But unlike format!, this allocates a single buffer and reuses it, instead of allocating and freeing a buffer for every write.)

This depends on the type being formatted, but for built-in types it is generally on the stack. For example, the Display formatting for built-in numeric types uses a 39-byte buffer on the stack:

What is the most efficient way to write the following code in RUST.
In C, I would have created a large enough thread local buffer once. Assembled the whole message or messages that needs to be sent in that buffer. And done write/linux syscalls operations to write out those blocks.

I am not sure if I could be doing this any more effciently in RUST?

I am writing to a unix pipe, which guarantees that a linux write system call under a certain size 4096 is guaranteed to be atomic.
Most messages written are only a few 100 bytes.
For the very few messages that can be beyond that max 4096 limit, I have a notation of using prefixing the partial message with "*" so that the reciever knows the data is being split across multiple blocks. The example below is splitting one very large data block across 3 entries.
Context: There will be multiple processes and threads within processes writing to the same unix pipe

<uuid> <command> *<patial-message>
<uuid> <command> *<patial-message>
<uuid> <command>  <patial-message>

The summary of what I need to be doing is

  1. to construct json render of the data I want to write
  2. When writing the rendered json message, split them across multiple messages if it is over the 4096 limit
  3. print/write that data to file handle
  4. Preferably not mutex/lock the Tracker/File object to have to write/do 1-3 above. Since 2-3 can be done in its own thread local buffer. and 4 which is a linux syscall if under 4096 doesnt interleave wont require locking.
const SENDLIMIT: usize = 4094;

pub struct Tracker {
    pub file: fs::File,
    pub fd: i32,
}

impl Tracker {
    pub fn new() -> Tracker {
    .......
    }
   pub fn report(self: &Self, op : &str, value: &str) {
        let mut minlen: usize = &UUID.as_str().len() + op.len() + 2;
        let mut availen: usize = SENDLIMIT - minlen;
        let mut lenleft = value.len();
        let mut ind = 0;
        let mut contin = "";

        while lenleft != 0 {
            let max = if lenleft > availen {lenleft = lenleft - availen; ind + availen }
                    else { let x=lenleft; lenleft = 0; ind + x };
            contin = if lenleft > availen { "*" } else { " " };
            if let Err(e) = (&self.file).write_all(format!("{} {} {}{}\n", &UUID.as_str(), op, contin, &value[ind..max]).as_bytes()) {
            }
            contin = "*";
            ind = max ;
            minlen = &UUID.as_str().len() + op.len() + 2 + 1;
            availen = SENDLIMIT - minlen;
        };
    }

    pub unsafe fn reportexecvpe(self: &Self, variant: &str, path: *const libc::c_char,
                                argv: *const *const libc::c_char, envcstr: &Vec<CString>) {
        let mut vargv: Vec<&str> = vec![];
        let mut venv: Vec<Vec<&str>> = vec![];
        for i in 0 .. {
            let argptr: *const c_char = *(argv.offset(i));
            if argptr != ptr::null() {
                vargv.push(utils::ptr2str(argptr))
            } else {
                break;
            }
        }
        for i in envcstr.iter() {
            venv.push(i.to_str().unwrap().splitn(2,"=").collect());
        }
        let args = (variant, pathgetabs(path,AT_FDCWD), vargv, venv);
        self.report("EXECUTES", &serde_json::to_string(&args).unwrap());
    }
};

To start, here's a very simplified example showing how you can format and write a maximum of 4096 bytes in a single syscall without malloc or locks:

impl Tracker {
    fn report(&self, op: &str, val: &str) {
        // Create a stack buffer, and a cursor to track our position.
        let mut buf = [0u8; 4096];
        let mut cursor = Cursor::new(&mut buf[..]);
        
        // Format into the stack buffer.
        if let Err(e) = write!(cursor, "{} {}", op, val) {
            // TODO: handle errors
        }
        
        // Output what we've written to the file in a single syscall.
        let len = cursor.position() as usize;
        (&self.file).write_all(&buf[..len]).unwrap();
    }
}

This Playground has the complete program, including writing to a global Tracker without locking.

I hope this is a useful starting point, but feel free to ask more questions!

Thanks for this example. It helps with my understanding.
Cusor/Buf is the stack buffer where we are assembling the message and writing it out with one write system call.

  1. if op or val specifically is larger than 4096, I would have to loop through ranges of val writig to cursor and writing it to file
  2. Say instead of val: &str, it is some Vector of strings, or a hashmap fo strings and I needed to render the vector of strings into a json list of string object or json dict, Do you then have to ''' &serde_json::to_string(&args).unwrap() ''' which will malloc and then write to the cursor. Is there way for me to accomodate for json rendering when in this code. I see that serde_json::to_writer() not sure if that is a better way of doing this or how to use it in your example.

This part shouldn't be too hard; you can repeatedly write! into the stack buffer, and flush it to the file after each write! or after several of them. After writing the current buffer contents to the file, you can create a new cursor or use cursor.set_position(0) to rewind the existing one.

Yes, you can serialize directly into the stack buffer with serde_json::to_writer(&mut cursor, value).

However, this will fail if the value you pass is too large to serialize into the fixed-sized buffer you provided. If a single item of your input may be arbitrarily-sized, and you can't break it down into smaller pieces to be serialized separately, then it won't be easy to use serde_json to serialize it without allocations. (It might be possible by creating a custom serde::Serializer implementation, or something.)

You could also consider using a resizable Vec<u8> buffer that lives on the heap, but reusing it so that most writes do not need to do any allocation. This buffer could be thread-local, or it could be owned by the caller of report.

You might also consider using allocating methods like serde_json::to_string and then profiling the code before choosing how to optimize it further. With a good allocator, the cost of allocations may be trivial compared to the costs of things like JSON serialization and I/O.

Here is another hang situation I ran into, and malloc seems to be a root cause.
In this case one of the intercepted API seems to be called from a signal handler in this case and that ran into the malloc that seems to hang. because the signal handler code happenned when the main program was already in malloc.

The learning for me after many stumbles all relating to malloc within the intercept code is, malloc - BAD :slight_smile:

I am going to take some of the learning from this thread and see how i can eliminate mallocs in the intercept code.

Attaching to process 812
Reading symbols from /usr/bin/bash...Reading symbols from /usr/bin/bash...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Reading symbols from /nobackup/sarvi/cel7root/lib64/libwisktrack.so...done.
Reading symbols from /lib64/libtinfo.so.5...Reading symbols from /lib64/libtinfo.so.5...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Reading symbols from /lib64/libdl.so.2...(no debugging symbols found)...done.
Reading symbols from /lib64/libc.so.6...(no debugging symbols found)...done.
Reading symbols from /lib64/librt.so.1...(no debugging symbols found)...done.
Reading symbols from /lib64/libpthread.so.0...(no debugging symbols found)...done.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Reading symbols from /lib64/libgcc_s.so.1...(no debugging symbols found)...done.
Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols found)...done.
Reading symbols from /lib64/libm.so.6...(no debugging symbols found)...done.
Reading symbols from /lib64/libnss_files.so.2...(no debugging symbols found)...done.
Reading symbols from /lib64/libnss_vas4.so.2...(no debugging symbols found)...done.
Reading symbols from /lib64/libcrypt.so.1...(no debugging symbols found)...done.
Reading symbols from /lib64/libresolv.so.2...(no debugging symbols found)...done.
Reading symbols from /lib64/libfreebl3.so...Reading symbols from /lib64/libfreebl3.so...(no debugging symbols found)...done.
(no debugging symbols found)...done.
0x00002b259f7d35bc in __lll_lock_wait_private () from /lib64/libc.so.6
(gdb) bt
#0  0x00002b259f7d35bc in __lll_lock_wait_private () from /lib64/libc.so.6
#1  0x00002b259f74fb35 in _L_lock_17166 () from /lib64/libc.so.6
#2  0x00002b259f74cb73 in malloc () from /lib64/libc.so.6
#3  0x00002b259eefdfe8 in alloc::alloc::alloc () at /rustc/7e6d6e5f535321c2223f044caba16f97b825009c/library/alloc/src/alloc.rs:80
#4  <alloc::alloc::Global as core::alloc::AllocRef>::alloc ()
    at /rustc/7e6d6e5f535321c2223f044caba16f97b825009c/library/alloc/src/alloc.rs:173
#5  alloc::raw_vec::RawVec<T,A>::allocate_in () at /rustc/7e6d6e5f535321c2223f044caba16f97b825009c/library/alloc/src/raw_vec.rs:186
#6  alloc::raw_vec::RawVec<T,A>::with_capacity_in () at /rustc/7e6d6e5f535321c2223f044caba16f97b825009c/library/alloc/src/raw_vec.rs:161
#7  alloc::raw_vec::RawVec<T>::with_capacity () at /rustc/7e6d6e5f535321c2223f044caba16f97b825009c/library/alloc/src/raw_vec.rs:92
#8  alloc::vec::Vec<T>::with_capacity () at /rustc/7e6d6e5f535321c2223f044caba16f97b825009c/library/alloc/src/vec.rs:362
#9  <&[u8] as std::ffi::c_str::CString::new::SpecIntoVec>::into_vec () at library/std/src/ffi/c_str.rs:383
#10 std::ffi::c_str::CString::new () at library/std/src/ffi/c_str.rs:396
#11 std::sys::unix::os::getenv () at library/std/src/sys/unix/os.rs:516
#12 std::env::_var_os [_ZN3std3env7_var_os1...] () at library/std/src/env.rs:250
#13 0x00002b259eefde86 in std::env::var_os () at library/std/src/env.rs:246
#14 std::env::_var [_ZN3std3env4_var17h7...] () at library/std/src/env.rs:216
#15 0x00002b259ed1edac in std::env::var [_ZN3std3env3var17h50...] (key=...)
    at /rustc/7e6d6e5f535321c2223f044caba16f97b825009c/library/std/src/env.rs:212
#16 0x00002b259eb84f04 in wisktrack::my_waitpid [_ZN9wisktrack10my_wa...] (pid=-1, status=0x7fffcd626f50, options=1) at src/lib.rs:802
#17 0x00002b259eb4e78a in wisktrack::waitpid::{{closure}} [_ZN9wisktrack7waitpi...] ()
    at /users/sarvi/.cargo/git/checkouts/redhook-16e0a55232cdd48a/55ac3da/src/ld_preload.rs:67
#18 0x00002b259eb87e53 in std::panicking::try::do_call [_ZN3std9panicking3tr...] (data=0x7fffcd626df0 "\304nb\315\377\177")
    at /rustc/7e6d6e5f535321c2223f044caba16f97b825009c/library/std/src/panicking.rs:373
---Type <return> to continue, or q <return> to quit--- 
#19 0x00002b259eb8971d in __rust_try () from /nobackup/sarvi/cel7root/lib64/libwisktrack.so
#20 0x00002b259eb8682a in std::panicking::try [_ZN3std9panicking3tr...] (f=...)
    at /rustc/7e6d6e5f535321c2223f044caba16f97b825009c/library/std/src/panicking.rs:337
#21 0x00002b259eb5c750 in std::panic::catch_unwind [_ZN3std5panic12catch...] (f=...)
    at /rustc/7e6d6e5f535321c2223f044caba16f97b825009c/library/std/src/panic.rs:394
#22 0x00002b259eb8513c in wisktrack::waitpid [waitpid] (pid=-1, status=0x7fffcd626f50, options=1)
    at /users/sarvi/.cargo/git/checkouts/redhook-16e0a55232cdd48a/55ac3da/src/ld_preload.rs:67
#23 0x0000000000440b84 in waitchld.isra ()
#24 0x0000000000441418 in sigchld_handler ()
#25 <signal handler called>
#26 0x00002b259f749602 in _int_malloc () from /lib64/libc.so.6
#27 0x00002b259f74cb1c in malloc () from /lib64/libc.so.6
#28 0x000000000046a895 in xrealloc ()
#29 0x0000000000449fd7 in command_substitute ()
#30 0x00000000004501a4 in expand_word_internal ()
#31 0x00000000004512bc in call_expand_word_internal.constprop ()
#32 0x000000000045175a in expand_string_assignment ()
#33 0x000000000044b49e in expand_string_if_necessary ()
#34 0x000000000044b89f in do_assignment_internal ()
#35 0x0000000000452416 in expand_word_list_internal ()
#36 0x0000000000430c20 in execute_simple_command ()
#37 0x0000000000432343 in execute_command_internal ()
#38 0x0000000000433d2e in execute_command ()
---Type <return> to continue, or q <return> to quit---
#39 0x0000000000432ba3 in execute_command_internal ()
#40 0x0000000000433d2e in execute_command ()
#41 0x000000000041e365 in reader_loop ()
#42 0x000000000041c9ce in main ()
(gdb) 

Any function that returns a String is going to call malloc. You’re going to want to call it in advance and store it, probably with a Once Cell.