Linux x86_64: run at most one copy of this sever

Is there a Rust crate for doing the following:

I want to express: at most one copy of this Rust program should be running at any given time? (If we try to start a 2nd copy, it needs to detect that first copy is running, print an error msg, then exit).

This only needs to work on x86_64 linux

XY problem: I have a directory the server reads/writes from. I can not prevent other processes from trashing the directory; however, I prefer to avoid data corruption by 2 copies of the server running at the same time.

Thanks!

In C, I use PID file functions from libutil (on BSD) and libbsd (on Linux). I suppose there should be some Rust bindings as well.


Note that when dealing with (advisory) file locks, you may run into trouble if your data directory is hosted on a filesystem that doesn't support locking.


See also README.md file of the pidfile-rs crate, which lists some alternatives for PID file handling.

Cargo uses file locks:

1 Like

cargo::util::flock is private, and it uses the unsafe libc::flock internally:

#[cfg(unix)]
mod sys {
    /* … */
    #[cfg(not(target_os = "solaris"))]
    fn flock(file: &File, flag: libc::c_int) -> Result<()> {
        let ret = unsafe { libc::flock(file.as_raw_fd(), flag) };
        if ret < 0 {
            Err(Error::last_os_error())
        } else {
            Ok(())
        }
    }

(source)

Sorry, yeah, I meant that as an example that could be followed, not something to directly use.

Quoting that man page:

The pidfile_open() function opens (or creates) a file specified by the path argument and locks it.

Dumb question on my part:

What is the relation of pidfile and lockfile ?

  1. they are the same technique, different name

  2. they are different, unrelated techniques

  3. one technique is a special case of another (inferred from the above, where creating a pid file also LOCKS it)

Thanks!

As far as I understand, the startup process of a daemon opens a pidfile and locks it (using pidfile_open). If that fails, the daemon is already running (and you can exit).

On success, you fork once, and the child process writes its process ID (PID) into the pidfile by calling pidfile_write. The child process will also keep the file open and locked, so no other instance of the daemon will be started.

If the daemon process forks further, the children can use pidfile_close to close the handle inside the further-forked child processes.

When the main daemon process stops running, it will call pidfile_remove to remove the pidfile (and to release the held lock).

AFAIK, it's the file lock that prohibits restarting the process more than once. Writing the PID inside the file only aids other programs to detect whether the daemon is running without having to obtain a file lock, but just reading the file and checking if the PID inside is still running. It can also be used to send signals to the process (using the given PID).

If the daemon aborts without cleaning up, there is the chance that the pidfile doesn't get cleared and this mechanism fails (or causes you to send signals to a wrong process when using the pidfile's contents). So it's somewhat dirty. But it's like it is on Unix-like systems, I guess.

(But I'm not totally sure on all of this.)

So I would say it's 3.

IMO using pid files and file locks is obsolete and not ideal, as those files can still get accidentally wiped out by some other misbehaving process. In Cargo this is ok because worst case if something is corrupted you just delete the cargo cache, in a long-running server you probably want something more robust.

Services targeting the mainstream Linux distributions will usually connect to dbus and request a unique name, then the dbus daemon will give an error if some other process owns the name. In Rust you can use the zbus crate and call zbus::Connection::request_name or ConnectionBuilder::name.

1 Like

Sorry, I am missing something very basic. What is the point of this fork? Why not have the daemon process just write its own PID and continue running ?

Privilege de-escalation, generally. Daemons often need to do a couple things with root privileges at startup, but then want to avoid running as root the whole time. Forking the process is an easy way to surrender the root privileges for your long running code.

1 Like

Forking is not used to drop privileges. All you need to to to drop privileges is call setuid. The reason to fork is to "daemonize" a process by detaching from the controlling terminal. The confusion here is because typically the dropping privileges happens after daemonizing. This method however, is also obsolete on modern Linux when using a process supervisor such as systemd. You should never need to fork a daemon on startup or use pidfiles anymore, see man 7 daemon for details.

1 Like

Huh, I was under the impression that there was potential for problems if you called setuid without the fork. That's good to know!

Not ideal: yes. Obsolete? Not sure. Maybe under Linux it's obsolete now, but it's pretty common on other Unix-like operating systems at the time of this writing. (Maybe these are of no concern for the OP, but I think it's good practice to write platform-independent where possible and where it's not too much effort.)

Under Linux: Maybe. But, for example, FreeBSD doesn't ship with dbus in its base system. You can install it as a package. Relying on dbus will add extra dependencies for your software. I don't think most services require dbus. (edit: And I think DBUS is more targeted for Desktop environments, hence the "D"?)

I'm not really sure what daemon() does exactly, but looks like it also doesn't fork under (Free)BSD either (see edit below) (man 3 daemon on FreeBSD). Not sure though. Regarding pidfiles: At least under BSD, they seem to be heavily used yet.

I guess the documentation for the pidfile_ C functions still covers the "old way" of forking, though.

So it matters on how your process daemonizes. Looks like forking isn't necessary anymore. But if, during the daemonization process, your PID changes (depending on how you do it), then you want to determine/write the PID after that process, so that the PID file contains the actual process ID of your daemon.

Edit: Actually invoking the daemon C function does change the PID on my system. So I guess it forks internally. Tested with this program:

#include <stdlib.h>
#include <unistd.h>
#include <stdio.h>

int main() {
    printf("PID before daemonizing: %i\n", (int)getpid());
    if (daemon(1, 1)) {
        perror("could not daemonize");
    }
    printf("PID after daemonizing: %i\n", (int)getpid());
    return 0;
}

Coming back to your problem:

I think the easiest way would be advisory file locks (as @cuviper suggested by pointing to what cargo does). AFAIK, these will fail to protect you if your data is on an NFS share. From the above linked cargo source:

    // File locking on Unix is currently implemented via `flock`, which is known
    // to be broken on NFS. We could in theory just ignore errors that happen on
    // NFS, but apparently the failure mode [1] for `flock` on NFS is **blocking
    // forever**, even if the "non-blocking" flag is passed!
    //
    // As a result, we just skip all file locks entirely on NFS mounts. That
    // should avoid calling any `flock` functions at all, and it wouldn't work
    // there anyway.
    //
    // [1]: https://github.com/rust-lang/cargo/issues/2615
    if is_on_nfs_mount(path) {
        return Ok(());
    }

(source)

I (personally) think that also the pidfile approach is reasonable (which is "basically" a lock file, with some extra contents: the PID). In addition to (or, on NFS, as an alternative?) acquiring a lock, you could also check its contents and test if a process with the given PID is still running. Thus, even if the locking fails (e.g. due to a weird network file system), you can get a warning that an instance of the process may be still running. You could then provide a command-line option to wipe out the pidfile. But not sure if that's worth the effort.

Of course, you could also rely on a third service, such as DBUS, or possibly any other form of interprocess communication, e.g. System V interprocess communication semaphores. I don't think the latter would be idiomatic though (it's rather exotic, nowadays, I believe).

2 Likes

The OP explicitly mentions Linux. If using a major Linux distribution, then dbus will already be there because of systemd. Of course there are always portable methods but they tend to be buggy and inferior to system-specific methods.

The other most relevant server operating system is actually Windows, which has its own server APIs that are not compatible with any Unix-based ones. So IMO you don't really gain much by sticking to those obsolete methods. To me, caring too much about portability falls under "premature optimization" if it hasn't been defined as a goal from the start.

BTW I don't know why pidfiles are still common on BSD, that makes no sense to me. You only really need pidfiles on a SysV-style system that can't do process supervision. Those are known to be a somewhat buggy concept, AFAIK even on BSD you probably want to be using a process supervisor to avoid the forking and pidfiles.

Yes, that's why I wrote:

Yes, I know systemd is standard on Linux. But it's not standard on other unix-like systems.

I'm not sure if I would call pidfile_open or flock being buggy. If I have access to the data directory, I could also mess up the database manually. Sure, systemd may be better than the traditional UNIX approach, but it's not even specific to Linux. It's specific to particular (major) Linux distributions which chose systemd.

So I would say: If you write a service that's targeted to only run with systemd, then you can use DBUS or any systemd specific facilities.

But in my opinion it's a bad idea to write a unix service that depends on systemd (edit: without providing some sort of alternative start-up, at least). A lot of major services do not depend on systemd, e.g. Apache (just to name one). Maybe they will drop support for non-systemd OS' one day. But I don't think it's time for that yet.

I don't really want to disagree here. However, as a fact, there exist operating systems which do not use systemd (and that includes some Linux ones, I believe). (And major server software doesn't usually require systemd to be existent on a system in order to be run.)

Sorry I think there is some confusion here. As with many things, there is a difference between a concept and an implementation of that concept. Process supervision is just a concept, you don't need systemd to do it, but systemd is one implementation of it that happens to be used a lot on Linux. There are also implementations of the concept on BSD, and it is also required for Windows services and macOS services to use that. So it really depends on how your system is set up and how far down the "portability" hole you want to go. In general, process management is not really portable between operating systems at all, even basic things are different.

You also don't need systemd to use dbus services, it just happens that dbus is (almost) always available when a distribution uses systemd. Of course dbus can also be installed on other Linux systems or on BSD. And Windows and Mac have their own methods to do unique instancing that are not compatible with Linux or BSD...

So what's your proposed solution for the problem in the OP? Use dbus because dbus is usually there? I guess that's a possible way to go. But it does add the dependency (at runtime).


I (personally) prefer approaches where I keep my dependency tree small.

However, I understand that from a (major distribution) Linux p.o.v., DBUS really isn't a new "dependency".

If targeting RHEL, Debian/Ubuntu, Suse, Arch, etc, or derivatives, then systemd and dbus will already be there. If targeting some other distro with a non-standard setup then OP would already be sacrificing some portability anyway... In my experience people who are using Slackware or whatever usually tend to say that up front :slight_smile:

And of course none of this matters if you are using docker/k8s or some other container manager because that handles creating unique containers and isolates the service directory for you. On BSD, you probably want to put the service into a chroot jail for extra security.

Last time I did any sysadmin-level work like this, my go-to was always daemontools. That was a long time ago, though; it's likely something better has been developed in the meantime.

Yes, AFAIK that is one of the options for BSD but there are others.

BTW I just remembered, file locking actually has a number of other problems and is actually not as portable as it may seem, i.e. flock(2) and pidfile_* are BSD interfaces that just happen to have Linux wrappers, but other Unixes do not support them. There is this blog post by the systemd author that goes into detail why you probably never want to use file locks for services: On the Brokenness of File Locking

1 Like