I have RUST library that writes some tracking data to a file or a linux pipe.
The library is meant to work in a multi-threaded and multiprocess environment writing to the same file or linux pipe
Each write, writes a line of the form
<uuid> <operation> <somedata>
I use
file : File::create(&fname).unwrap()
I can limit the size of the line to some max value if needed, to make sure it is not interleaved
I know that when writing to linux pipe, there is PIPE_BUF is 4,096 bytes, under which it is guaranteed not to be interleaved.
I am finding that when I write with RUST, and I run my test suite without "cargo test -- --test-threads=1"
I am seeing interleaving of the output as follows
Basically the writes are interleaved.
Without "--test-threads=1" the tests are passing.
I can do a Mutex lock for these writes. I am trying not to do that.
Questions:
1. Can I expect writes to not be interleaved for regular files?
2. If I cant how do log files manage to do this? Do they lock? Logs from multi-threaded programs seem to not be jumbled.
3. Can I expect writes to not be interleaved for linux pipes?
How do you write to the file? Do you use write!(f, "{} {} {}", uuid, operation, somedata)? If so every part is an independent write. You should use f.write_all(format!("{} {} {}", uuid, operation, somedata).as_bytes()) to format into a string that is written all at once. (I used write_all instead of write to ensure that if the write is too big to succeed all at once it will use multiple write calls instead of silently truncating the part after what was successfully written.)
Now things have gotten little better, as in I get a few successful runs.
And I am only using max size of 100 bytes const SENDLIMIT: usize = 100;
But it still fails on some runs, if I try running the test suites a few times, as can be the seen in the last but one line below
Note that write_all does multiple write system calls in a loop in the case of partially succeeding writes, which would get in the way of atomicity. For files, unlike for sockets, writes tend to succeed in their entirety (the exception might be if the write is interrupted by a signal?), so it's usually not an issue. I'm not sure about pipes. But it's something to keep in mind.
It's definitely more robust to just use a Mutex to coordinate writing.
I thought the same, but of course these things are tricky and turn out to be more complicated. I found a pretty great answer about the atomicity of the write system call on stackoverflow -- check it out.
The TL;DR for Linux is:
for pipes, POSIX requires concurrent writes of PIPE_BUF bytes or fewer to be atomic.
for regular files, you can expect atomicity if the file is opened with O_APPEND, but that likely non-append writes will be atomic as well.
HOWEVER, there is technically no guarantee you won't get a short write.
BUT on the other hand, you almost certainly won't get a short write, unless there are extenuating circumstances, such as ENOSPC, hardware failure, or bug.
All that said, while the system call itself may technically be atomic, in that concurrent write requests won't clobber each other, this says absolutely nothing about the durability of of the written data in the face of sudden power loss. In other words, probably this is an ok thing to rely on for low overhead debug logging, but probably not the best plan if you're implementing a database. Perhaps that goes without saying??