Can vfork() be used safely with inline asm?

Hello everyone!

Recently, I've researching the vfork() system call on Linux, that is similar to fork(), but suspends the calling thread and lets the child process use the parent process memory instead of copying it (or copying-on-write). When a child process performs an execve(), exit or is killed with a signal, the parent process resumes in vfork() call. This is faster than fork(), but may be dangerous to use with high-level languages, therefore, I wonder, does the following code snippet use vfork() correctly?

Code snippet with vfork()
use core::arch::asm;
use core::ffi::CStr;

static PROG_NAME: &CStr = c"/usr/bin/id";
static ARG1: &CStr = c"id";
static NULLARG: &CStr = c"";

unsafe extern "C"
{
   fn waitpid(pid: core::ffi::c_int, wstatus: *mut core::ffi::c_int, options: core::ffi::c_int) -> core::ffi::c_int;
}

fn main() {
   let mut p = 0;
   unsafe
   {
      let mut ptr1: *const core::ffi::c_char = PROG_NAME.as_ptr();
      let args: [*const core::ffi::c_char; 2] = [ARG1.as_ptr(), core::ptr::null()];
      let env: [*const core::ffi::c_char; 1] = [core::ptr::null()];
      let mut ptr2: *const *const core::ffi::c_char = args.as_ptr();
      let mut ptr3: *const *const core::ffi::c_char = env.as_ptr();
      asm!("mov rdi,{}", "mov rsi,{}", "mov rdx,{}", "mov rax,58", "syscall", "cmp rax,0", "je 2f", "jmp 3f",
       "2:", "mov rax,59", "syscall", "ud2", "3:",
       inout(reg) ptr1, inout(reg) ptr2, inout(reg) ptr3, out("rax") p);
   }
   let mut status: core::ffi::c_int = 0;
   let r = unsafe { waitpid(p, &mut status as *mut core::ffi::c_int, 0) };
   println!("Spawned process {p}");
}

This code first prepares the execve() arguments, then uses the syscall instruction to invoke the vfork() syscall. If the rax register contains the value 0, it performs execve() system calls (and exits using the ud2 instruction in case of failure). In this case, the "vforked" child does not leave the inline asm block, so the "vfork() returns twice" problem shouldn't affect the Rust code. The vfork() spawning may result in better performance, but the disadvantage for me is that the ability to prepare the environment for the new process is limit (as far as I understand, if I need to replace stdin/stdout/stderr with dup2(), I also have to do it with inline asm).

I'm on Linux x86_64, syscall 58 is vfork(), syscall 59 is execve().

Why do you want to use vfork instead of posix_spawn? Or Rust's std::process::Command?

I'm just experimenting with inline asm and trying to find places where it can be useful. I've looked up posix_spawn(), and it looks like it indeed uses vfork() (or clone() with CLONE_VFORK | CLONE_VM).