Atomic file operation?

How many way a write operation could fail ?

I know few:

  1. Hardware failure
  2. Out of space
  3. filesystem bug
  4. Unmounted disk

Is there more ?

And is there any guarantee that multiple write operation either succeed or fail. hence atomic...

For example:

let ops1 = file.write_at(0,  [0; 10]);
let ops2 = file.write_at(10, [1; 10]);
let ops3 = file.write_at(20, [2; 10]);

Here we are writing some data, There are 3 write operations. So either all write operations (ops1 , ops2, ops3 ) should succeed or all operation should fail,

From Wikipedia:

An atomic transaction is an indivisible and irreducible series of database operations such that either all occurs, or nothing occurs.[1] A guarantee of atomicity prevents updates to the database occurring only partially, which can cause greater problems than rejecting the whole series outright

Look at the errors section of the man page of write for an idea.

Well any write operation either succeeds or fails. I don't quite get what's the question here?

Could you please clarify what do you meant by atomicity here?

1 Like

I updated, my question. for more clarity...

Only some of them may be applied. The kernel of hardware may also reorder the writes for better performance, so for example only the writes at offsets 0 and 20 may have been committed, while the write at offset 10 got interrupted by hardware failure or a crash.

2 Likes

while the write at offset 10 got interrupted by hardware failure or a crash.

What kind of crash ?

A crash of the whole system. For example someone pressed the power button or a bug in the kernel (or a random bitflip due to bad ram or by change) caused a kernel panic.

1 Like

The short answer is no. For starters, the first two writes may succeed while the third may fail. Then as @bjorn3 mentioned, writes may be re-ordered by the kernel. Thirdly, if we have a network file system (NFS), any operation may randomly fail due to network errors. There are probably other reasons as well, but these were the ones I could get off the top of my head.

Read this article and weep. This part is especially telling:

The CrashMonkey developers did a bunch of testing on filesystem behavior after crashes, then brought some of the problems they found to the Btrfs developers, who said that they were not bugs. So the CrashMonkey folks wanted to document the expected behavior, then test and file bugs for filesystems that did not conform; that didn't work either, he said. He said it resulted in a long discussion between Dave Chinner, Ted Ts'o, and the Btrfs developers about the expected behavior, but there was a concern about committing to the existing behavior.

1 Like

https://danluu.com/deconstruct-files/

It would be better if you started with a list of things you actually need to protect against, based off of the things in that article for example. Then we can start suggesting appropriate solutions.

1 Like

If the bytes written are within the same page and assuming you use a file system which handles crashes properly, maybe it's possible to

  • read the entire page,
  • make your modifications,
  • and then write the entire page in a single write operation.

I would try to get rid of any buffering mechanisms, such that what you do is pretty much what write in C does. I don't think that guarantees atomicity but might be atomic in practice?

P.S.: See also https://wiki.postgresql.org/wiki/FreeBSD/AtomicIO for some thoughts on that.

P.P.S.: I guess I should have said "sector" instead of "page". But I also see that file systems operate with "blocks"/"clusters" (terminology seems to differ depending on the file system) which are not equal to sector size.

Generally, there is no filesystem-level atomic operation AFAIK.

The only thing that could act like an atomic operation would be to write a single sector at once, as even a single cluster (which is the smallest possible chunk of data that can be read or written from/to a device) requires multiple writing operations (one per cluster).

So if you use a low-level API to write a single sector, then it should be possible. But this is a very specific scenario and generally speaking you shouldn't deal with sectors but with the OS-provided clusters, which as I said cannot be atomic by design.

Also, even if the operation succeeds it does not mean the data were actually written, and they could be lost before that in case the kernel crashes or the computer stops due to a loss of power, as to-write data can be put in a cache before being actually written, be it in the OS' memory or physically in the drive's volatile memory.

That would require querying the sector size of the underlying disk, which I don't think you can easily do.

Not even remotely close. NTFS stores small files (less than about 900 bytes) in the directory entry of the MFT, but files larger than that would be stored in separate place on SSD/HDD, btrfs uses CoW approach which means that if you read something from SSD/HDD and then write that same value back — it still goes into entirely different place on SSD/HDD. And so on.

And all that sits, in modern system, on top of yet another filesystem, called FTL.

This wouldn't help because many modern filesystems don't store sectors on disk. Sure, there are O_DIRECT and other such things, but unless you are writing a database it's really bad idea to deal with all that.

And not even that is guaranteed. 512e sectors, shindled HDDs and other “modern inventions” make even that simple thing not 100% guaranteed.

I'm really no expert on this, but atomicity and flushing (i.e. guaranteeing that data has been written out before you report that data as being written) are two separate things. The OP (@greanleaf) talked about "atomicity" in the following context:

We're not talking about concurrent access here, but about the atomicity in regard to:

Actually, now that I'm thinking about it: 1 and 3 could basically result in anything. You can't give any guarantees here. But I could imagine that (edit) on power failure, for example, there are some file systems (e.g. journaling filesystems or copy-on-write approaches such as ZFS) which would either result in all 512 bytes written or zero bytes being written when you write less than 512 bytes within a single 512-bytes-aligned chunk.

In which case can, for example, writing 4 bytes at the first 512 bytes of a file (using the write C call) result in only 1, 2, or 3 bytes written when you use a file system like ZFS? Is there really such a case? I would doubt that, but I might be totally wrong.

P.S.: So maybe the term "atomicity" is/was misleading here, depending on what actually is meant.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.