What do you think about sharing open file descriptions?

Hello everyone!

The Rust standard library implements a concept of I/O safety, which means that every file descriptor has an owner object, and all operations on the file descriptor should happen through that object. However, on some circumstances, several file descriptor (even in different processes) may refer to the same open file description. On Linux, it may happen in following cases:

  • Duplicating the file descriptor using the dup3() or fcntl() system calls
  • Creation of a child process
  • Transferring the file descriptor through the UNIX-domain socket
  • Processes with "ptrace-attach" privileges may obtain a copy with the pidfd_getfd() system call

Such sharing may lead to various undesirable effects, such as:

  • Uncontrollable offset change
  • Changing the file status flags (for example, setting or unsetting the O_APPEND flag can lead to writes going to the wrong place)
  • Other process changing the status of file locks that are owned by the open file description (on Linux, this includes open file description locks, BSD-style locks and file leases)

The simplest way of dealing with this is avoiding duplication of file descriptors, setting all opened file descriptors as "close on exec", and closing the sent file descriptors (through UNIX socket) on the sending side (I don't touch the topic of pidfd_getfd() here, because it cannot be controlled by the program and requires external means such as mandatory access control, and that a process that can use it, can also alter the target process memory with ptrace()). More sophisticated approach, that may be needed when you have a file descriptor that you know to share an open file description with other processes (for example, if you receive it through a UNIX-domain socket and cannot be sure the sending side closes it after sending), is to interact with the file descriptor in a way that doesn't affect other holders, such as using absolute offset system calls (using preadv() instead of read() and pwritev() instead of write()), and not performing any changes to file status flags and file locking status -- that may require creating a new struct instead of std::fs::File (that would also implement I/O traits, but would store offset internally in the struct and won't rely on the offset in the open file description). Use of mmap() can also avoid relying on the open file description offset, but it has risks of file truncation (may result in delivery of SIGBUS, may be mitigated with the hw-exception crate) and unsafety related to shared memory.

P.S. I am using Linux, I don't know how things work on other operating systems, but I'll be glad to hear your opinions on the topic, including similar problems and solutions on other OS.

Your question appears quite open-ended, but IMO it's unavoidable for some use cases. And this is something that IO safety acknowledges.

Note that exclusive ownership of a file descriptor does not imply exclusive ownership of the underlying kernel object that the file descriptor references (also called “file description” on some operating systems). File descriptors basically work like Arc: when you receive an owned file descriptor, you cannot know whether there are any other file descriptors that reference the same kernel object. However, when you create a new kernel object, you know that you are holding the only reference to it. Just be careful not to lend it to anyone, since they can obtain a clone and then you can no longer know what the reference count is! In that sense, OwnedFd is like Arc and BorrowedFd<'a> is like &'a Arc (and similar for the Windows types). In particular, given a BorrowedFd<'a>, you are not allowed to close the file descriptor – just like how, given a &'a Arc, you are not allowed to decrement the reference count and potentially free the underlying object. There is no equivalent to Box for file descriptors in the standard library (that would be a type that guarantees that the reference count is 1), however, it would be possible for a crate to define a type with those semantics.

In particular, stdin/stdout/stderr are effectively globals.

Related discussion.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.