I found that there is a problem with blocking the opening of fifo files in concurrency. If reading, writing, or deleting fifo files are performed concurrently while blocking read and write operations, there is a chance that the owner will get stuck when opening the Fifo file, especially on the write side. Because the file is deleted, the read side will never be able to access it, resulting in a situation of freezing. I tried golang to do this test and found that it still exists. Under the stack, openfile will always be stuck at
" wait_for_partner+0x19/0x50"
. I don't know if this is the mechanism of the kernel or what?(but also linux 6.X kernel) But can Rust avoid this situation?
use std::{path::PathBuf, fs::OpenOptions};
use nix::{unistd::mkfifo, sys::stat::Mode};
#[tokio::main]
async fn main() {
let fifo_path = "/tmp/";
let mut cnt = 1;
loop{
let std_in = PathBuf::from(fifo_path).join("stdin-fifo");
mkfifo(&std_in, Mode::S_IRWXU).unwrap_or_default();
let std_in2 = std_in.clone();
let std_in3 = std_in.clone();
let r2 = tokio::task::spawn(async move{
println!("open read fifo start");
std::fs::OpenOptions::new().read(true).open(&std_in2).map_err(|e| println!("err {:?}",e));
// tokio::fs::OpenOptions::new().read(true).open(&std_in2).await;
println!("open read fifo end");
});
let r3 = tokio::task::spawn(async move {
println!("remove fifo start");
std::fs::remove_file(&std_in3);
// tokio::fs::remove_file(&std_in3).await;
println!("remove fifo end");
});
let r1 = tokio::task::spawn(async move{
println!("open write fifo start");
std::fs::OpenOptions::new().create(true).write(true).open(&std_in).map_err(|e| print!("err {:?}",e));
// tokio::fs::OpenOptions::new().create(true).write(true).open(&std_in).await;
println!("open write fifo end");
});
r1.await;
r2.await;
r3.await;
println!("success {}",cnt);
cnt += 1;
}
}
Because one do not delete files on *nix system. One delete file path instead and it's very different from deleting files.
On *nix files are reference counted object lives inside the kernel. Its refcount handles can be owned by filesystem paths and processes. A same file can even coexist under multiple filesystem paths at the same time if you hard link them. Files are deleted when their refcount reaches zero.
So why the kernel not inform you the file is deleted? Because it's not deleted - your process still hold its refcount. It can even be read/written again after its path is deleted, by your own process potential from other threads. You can even mount that pathless file again into some arbitrary file path.
Thanks , So How can we avoid this situation? The above code is just a demo. In reality, it involves two processes interacting. One process deletes the FIFO file after a request timeout, while the other process gets stuck in FIFO open.This is encountered in actual environments.
Well, what is the problem? That sounds like fundamental concurrency about file system as a global state. If you don't want to stuck on operation that will never ends, maybe add timeout?
Thank you, but I don't really want to add this timeout. This is my initial idea of seeing this issue, so I hope to seek better solutions in the community. Can modifying the kernel solve this problem? Or can Rust solve this problem in the encapsulated OpenOptions?
Sorry , it can't resolve my problem , I didn't try again just now , open_receiver is non_blocking ,so it can resolve this problem , this problem because this fifo is blocking
I've check the fifo's document ,fifo(7) - Linux manual page . But I think it's difficult to avoid this situation in high concurrency asynchronous scenarios.
Rust cannot overrule the fundamental semantics of the system interfaces your program calls, no. It can (and does) provide safeguards around those system interfaces to protect your program from certain kinds of mistake, but it can't change the rules entirely.
I don't know if this is the mechanism of the kernel or what?
Any kind of file in Unix, FIFOs included, exists for as long as it either has a name or is open. Unlinking all the names to a file ("deleting the file" as most people think of it) will not actually delete the file from disk or from the kernel's internal bookkeeping, until all open handles to that file are closed. However, unlinking a name frees the name up to be used for some other file.
Your program creates and opens a named FIFO at the path /tmp/stdin-fifo. It then tries to do three things concurrently, in no particular order:
Open the path /tmp/stdin-fifo in read mode (creating an empty file there if no file exists).
Open the path /tmp/stdin-fifo in write mode (creating an empty file there if no file exists).
Unlink the path /tmp/stdin-fifo.
This is not a sensible collection of operations, but the outcomes are at least well-defined. Unfortunately, depending on the sequence in which those tasks are completed, many of the possible outcomes include deadlocks: it is possible for the task trying to open the path for reading will open the FIFO and block until the FIFO is opened for writing, but the writer will find no file there and create a new, empty file, and open that, instead. It is also possible for the inverse to happen, with the writer opening the FIFO and blocking waiting for a reader, and the reader task finding no file there and creating one.
As I do not understand what you're trying to use FIFOs for in this situation, I can't give you advice on what to do, but I can see some immediate suggestions.
First and foremost, do not try to delete the FIFO concurrently with work to open the FIFO. There's no reason to do that, it is almost certainly incorrect no matter what problem you are trying to solve, and the best-case outcome for that is errors and deadlocks. Delete the FIFO only when your program is entirely finished using it, or at least only after opening both ends of it.
Second, do not use .create(true) when opening a file you expect to already exist. This will mask errors and produce file handles connected to newly-opened normal files, and not connected to whatever you expected to find at that path. In tandem with the first point, this is a serious contributor to the deadlocks your program is encountering.
Third, do not use synchronous IO, such as std::fs, inside of an asynchronous task. When a synchronous IO operation blocks, the whole thread is paused, which prevents the async runtime (tokio, in your case) from using it for other tasks. You don't have this option with mkfifo, so you may need to wrap that call in spawn_blocking to avoid blocking the tokio worker while it runs, but in a toy program you can just call the function and let the worker block for the tiny interval that mkfifo takes in practice.
Thank you for your help, but this issue is difficult to avoid from the perspective of software code design.
Imagine two processes A and B. Process A will create a FIFO file, and then process B will enter a blocking state to read the target FIFO file. At this point, process A may not reach the code to open the target FIFO reading end due to timeout or cone cancel reasons. At this point, process A performs resource recycling and deletes the target FIFO file, causing process B to block. At the same time, when attempting to perform lsof queries on the target Fifo file during the recycling process of process A, it cannot be found that it is occupied by process B.
This is not something you can solve with just file system primitives. You probably want to have the first messages in you FIFO file be about asserting that all processes can read and write to this file. Then once both sides acknowledge that they are ready, unlink the file path.
If having a path on the filesystem is not a useful property in your system, you might want to consider the pipe systemcall, as a way to open a channel without a name. mkfifo is more useful when a pipe needs to connect two unrelated programs, or where it needs to substitute for a normal file in some way.
It's difficult to implement it in kernel mode, so I decided to encapsulate a user mode library to meet my needs.
Now I am prepared to meet my needs in the simplest and most crude way possible.
I had to package a warehouse for business needs. Thank you all, I can only use this relatively simple library for my use .Although not very good, it can solve my problem. If you're interested, you can take a look. https://github.com/jokemanfire/sfifo