Question about std::env::args()

While the library documentation sounds a bit mysterious, the purpose of the zeroth argument is to allow the program to be known under many names and change behaviour based on how it was called. This is used e.g. by busybox to pack as much functionality as possible while offering traditional Unix command line usage.

2 Likes

There are two interlocking issues here.

The first is that the Unix kernel (including non-Unix unix-alikes, like Linux) treats the argument vector for a program as a userland concern. The process for launching a program (via the exec family of syscalls) takes both a path to a program, and an argument vector, and the launched program receives the argument vector verbatim in a predictable place. In theory, you can put anything you want there, so long as it's a null-terminated sequence of null-terminated strings.

The second is that the Unix userland (ditto unix-alikes) makes very strong conventional expectations around the environment programs run in, which is also maintained by the user side of the programming environment. While the kernel may not care, nearly every unix program expects that argv[0] will indicate the name of the program, and nearly every unix program that runs another program will pass the path as argument zero.

Your program will be "correct" in both senses if it starts processing options and arguments from argv[1] on unix-like platforms. Only programs deliberately designed to break those conventions will ever call you with something other than the path of the program at the start of the argument vector, and only programs deliberately designed to expect it will treat argv[0] any other way. Tools like clap build on this, and will not treat the first element as anything other than a program name.

The caution is simply this: while this is a convention, there's no guarantee. You cannot use argv[0] to make security decisions within your program. Treat it as informational.

3 Likes

Thanks, all. I think what it really boils down to is that I was hoping the documentation itself would explicitly guarantee something equivalent to "if argv[0] exists it will take the 'program name role', but not might be. argv[1] and beyond correspond to options supplied to the program".

1 Like

The documentation of the API can't say that, because the API does not care. It is documentation (specification) of the shell that says (in section 2.9.1, subsection “Command Search and Execution”, point 1.e.i.b.):

Otherwise, the shell executes the utility in a separate utility environment (see Shell Execution Environment) with actions equivalent to calling the execl () function as defined in the System Interfaces volume of POSIX.1-2017 with the path argument set to the pathname resulting from the search, arg 0 set to the command name, and the remaining execl () arguments set to the command arguments (if any) and the null terminator.

Since shell does it, programs expect it, so other programs call them the same way – with some notable exceptions like login, which sets arg[0] to "-".

And POSIX recommends it for posix_spawn():

The value in argv[0] should point to a filename string that is associated with the process image being started by the posix_spawn() or posix_spawnp() function.

and exec*():

The value in argv[0] should point to a filename string that is associated with the process being started by one of the exec functions.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.