Question about std::env::args()

In the documentation for std::env::args there's the warning:

The first element is traditionally the path of the executable, but it can be set to arbitrary text, and might not even exist. This means this property should not be relied upon for security purposes.

Is there anything that can be done to check that the first argument is, in fact, the path of the executable? Maybe comparing it to some value returned by an OS-specific API or something? I'm not worried about someone exploiting my hobby projects that only I will ever use, but I'm wondering for a theoretical correctness perspective.

Maybe there's some way to get your "real" path from the process id, or something?

But of course everything like that gets weird with symlinks, it running after the file was deleted so it only has an inode, etc.

5 Likes

Exactly, I was eh, "worried" about symlinks too. It just raises it's own next set of question, heh

If you want to check if it matches the current executable's path, you can use std::env::current_exe().

But you cannot ensure that this argument is not a user-provided path that happens to be the current executable's path, though.

7 Likes

Yeah, maybe that's the best way. This caveat does seem to echo @scottmcm 's and my concerns about symlinks, and such though:

If the executable was invoked through a symbolic link, some platforms will return the path of the symbolic link and other platforms will return the path of the symbolic link’s target.
If the executable is renamed while it is running, platforms may return the path at the time it was loaded instead of the new path.

I guess not, but can you think of a situation in which that would matter?

What sort of attack or problem are you envisioning? If someone can execute your program and then move the executable out from under the process, they can probably do much more things like alter the executable or your running memory directly.

10 Likes

The most rigorous approach (in my opinion) is to completely ignore this argument — make your program's behavior completely independent of what it is named or where it is located. (This is true by default of most Unix-style installed programs.)

There is not necessarily any such thing as the true name of the program.

6 Likes

Well really, I can only speculate. What you're saying is true though, and I like your examples.

I think my question really first came up when I was just trying to do some very basic command-line parsing and read the documentation.

Let's say I was writing a CLI app be be invoked like:

cool-util file1 file2

This means args() at 0 is the path (maybe), at 1 is file, and at 2 is file2. But then I read args() at 0 "might not even exist". That means, to me, one possible interpretation is that args() at 0 is actually file and args() at 1 is file2. It just seems to me that args() isn't promising me anything about where the first CLI args is going to come in.

Then again, I'm probably being pedantic (like I said, this is more me being curious about a "correct" way to handle this) and "might not even exist" could mean "there might not be a file at this path". Maybe you should just say "part of this executable's in-contract with the OS/shell/whatever is that the first CLI args is at index 1"

Thanks all for humoring me :slight_smile:

No, what that means is: "It is possible for your program to be started with zero argument strings, so that args() is an empty iterator." You can assume that if it's being passed any arguments at all, the 0th element has the "program name" role even if its value is nonsense, and it should be skipped to get to the "options and inputs" sort of arguments that programs' documentation describes.

If this weren't the case, everything would be ambiguous, since there wouldn't be a way to recognize whether the 0th argument is "program name" or not.

10 Likes

The upshot of @kpreid's summary is, you almost always want to use env::args().skip(1) and not env::args(). I.e. just ignore the value at 0.

(Or env::args_os().skip(1) to avoid panics on non-unicode input.)

2 Likes

Don't get me wrong, I fully expect it to behave the way you describe and agree about that ambiguity. In fact maybe I should have asked "Is there a way to resolve this ambiguity without assuming things that aren't in this documentation?".

There cannot be, because all args are arbitrary strings that can't be distinguished from each other inherently. For example, ["echo", "echo", "echo"] is an entirely normal and meaningful argument list. Only the standard interpretation tells you what to do with it.

2 Likes

On Linux you can check the /proc/self/ directory. It contains a bunch of interesting stuff including exe, which is a symlink to your executable.

$ ls -la /proc/self/
ls /proc/self
arch_status  clear_refs          cpuset   fdinfo   loginuid   mounts      oom_adj        projid_map  setgroups     statm           timers
attr         cmdline             cwd      gid_map  map_files  mountstats  oom_score      root        smaps         status          timerslack_ns
autogroup    comm                environ  io       maps       net         oom_score_adj  sched       smaps_rollup  syscall         uid_map
auxv         coredump_filter     exe      latency  mem        ns          pagemap        schedstat   stack         task            wchan
cgroup       cpu_resctrl_groups  fd       limits   mountinfo  numa_maps   personality    sessionid   stat          timens_offsets

$ ls -la /proc/self/exe
lrwxrwxrwx 0 michael 31 Jan 01:23 /proc/self/exe -> /home/michael/.cargo/bin/exa

$ xxd -u -a -e /proc/self/cmdline
00000000: 00647878 2D00752D 652D0061 72702F00  xxd.-u.-a.-e./pr
00000010: 732F636F 2F666C65 6C646D63 00656E69  oc/self/cmdline.
2 Likes

That symlink will end with (deleted) [1] if the executable is deleted. This is true if the file is replaced, too. However it can lead to an ambiguity if you have also have an executable that ends with (deleted). (Granted it's a slightly suspicious case :wink:.)


  1. there's a leading space there ↩︎

1 Like

Isn't that just the ls output? If you used something like std::fs::metadata() the OS would dereference the symlink and give you a ErrorKind::NotFound error.

2 Likes

So, it turns out this is more complicated than I thought. The direntry is that of a symlink, and ls just shows the content of the symlink. If you cp -a the symlink somewhere else, it will act like a normal symlink. If you use other non-following operations (like readlink) on it, it will also act like a symlink.

However, if you open the procfs file with the normal symlink-following semantics, you get a filehandle to the original underlying data, even if it has been deleted, or replaced -- and even if the (deleted) path also exists. (This includes execution.)

Thus, while it's a pain and not always possible to get a path to the file on disk corresponding to the currently executing program, you can still reliably get a filehandle, and thus the contents, execute it, etc.

3 Likes

As for this in particular, you'll get the metadata about the original executable, even if it has been replaced or removed, whether or not the symlink dangles. (Technically Rust stdlib reserves the right to change this behavior in the future.)

Incidentally what I was going to reply with was an example where program was deleted and a trojan was copied to program (deleted), and pointing out the TOCTOU problems with checking the symlink for (deleted), but the "magical" behavior addresses many of these concerns.

2 Likes

Here's another way of thinking about things which (I hope) might help.

If a user is using your program, from the command line, in a sensible way, the argv[0] will be the name of your executable (or a sym-link), argv[1] will be the first argument they write, argv[2] will be the second argument, etc.

There is no way (I can think of) that argv[0] could end up being the first argument the user meant to give your program, all (sensible) ways of calling your program will make argv[0] the executable name.

However, a malicious (or plain weird) user can directly set the argv array to be anything, or nothing at all. Therefore (particularly if you are writing software that runs as root) you shouldn't assume argv[0] exists, or is the executable name.

2 Likes

The first element is traditionally the path of the executable, but it can be set to arbitrary text, and might not even exist. This means this property should not be relied upon for security purposes.

Even in non-malicious cases it is not generally true that the first argument is the path of the executable. It is the name of the program as written by the user. For example if I run /bin/ls by writing ls, the first argument will be ls. I think this statement should be reworded to:

The first element is traditionally the name of the executable, but it can be set to arbitrary text. This means this property should not be relied upon for security purposes.


In the case of pkexec the bug would already have been mitigated by rust as writing std::env::args().nth(1).unwrap() to get the second argument would panic rather than point to the first environment variable if it didn't exist. In addition it would give access to a copy of the second argument rather than allow arbitrary mutation of the original.

3 Likes