Best way to implement this command line argument parser code?

Hi everyone, I'm copying a c++ project over to rust, and I'm stuck at command line argument parsing.
Here are some things I need to do:

  • Print arguments, their aliases, and their descriptions when -h is called
  • Parse command line arguments
  • Store their values
  • Print summary of changed values

The main problem I have is that some arguments can set values of other arguments, and that they should set them in order, so I can't just simply post process them.

In the original code, all arguments inherit from a class called setting_base, and are contained in a class called setting_container. When a setting_base is initialized, it registers itself to the container:

setting_base::setting_base(setting_container *dictionary, /* ... */)
{
    /* ... */
    if (dictionary) {
        dictionary->register_setting(this);
    }
}

/* ... */

class setting_container
{
    /* ... */
    std::set<setting_base *> _settings;
    /* ... */
}

This makes it easier to iterate over them for parsing and printing. Here's an example of a setting_container:

// settings.hh
class common_settings : public virtual setting_container
{
public:
    // global settings
    setting_int32 threads;
    setting_bool lowpriority;

    setting_invertible_bool log;
    /* ... */

    common_settings();

    /* ... */
};

// settings.cc
// global settings
common_settings::common_settings()
    : threads{this, "threads", 0, &performance_group,
          "number of threads to use, maximum; leave 0 for automatic"},
      lowpriority{this, "lowpriority", true, &performance_group,
          "run in a lower priority, to free up headroom for other processes"},
      log{this, "log", true, &logging_group, "whether log files are written or not"},
      /* ... */
{}

Most of the argument types are pretty straight forward, they parse arguments and store the parsed value. Some however aren't:

  • setting_invertible_bool

setting_invertible_bool inherits from setting_bool which acts as a flag (or parses 0 and -1 as false, and 1 as true, if available). Upon instantiation, it adds to its list of names its own name prefixed with "no". this no variant acts like a false-setting alias to itself. For example, -nolog has the same effect as -log 0.

  • setting_redirect

setting_redirect doesn't store a value and instead redirects them to other arguments, essentially acting as an alias to multiple arguments. For example, -quiet is set to be equivalent to -nopercent -nostat -noprogress. This class is initialized with a std::initializer_list<setting_base *> containing the list of settings to redirect to.

  • setting_func

setting_func I think is the worst offender out of everything. It acts like a flag but contains a lambda that is executed while parsing. here is an example:

// light.hh
setting_func bspxonly;
// light.cc
bspxonly{this, "bspxonly",
    [&](source source) {
        write_litfile = lightfile::bspx;
        write_luxfile = lightfile::bspx;
        novanilla.set_value(true, source);
    },

it sets values of bitflags (write_litfile, write_luxfile), and sets a different flag's value. This would be a nightmare with rust's mutability rules.

Oh yeah, also, setting containers can be inherited, and that's how global settings are done.
//

I've probably gone through like five iterations of my code trying to come up with a good way to implement this in rust. I've tried writing a proc macro, because a lot of these can be simplified with compile time evaluation. That kinda turned out to be overcomplicated and I don't particularly want to revisit it. I have maybe over a hundred settings I need to get through, so I really don't want to simply write the same thing in different places (at struct declaration, at initialization, when printing, etc). I'm out of ideas and I'm tired. Any ideas?

It sounds like you are not struggling with having an implementation, but more so with the duplication in that implementation. Is that right? If so, could you share more details about what your solution looks like and what you would like it to look like?

Here is my understanding of what you're trying to generate, approximately.

use std::iter::Peekable;

#[derive(Debug)]
struct Settings {
    threads: Option<i32>,
    low_priority: bool,
    stat: bool,
    progress: bool,
    common_settings: CommonSettings,
}

impl Default for Settings {
    fn default() -> Self {
        Self {
            threads: None,
            low_priority: false,
            stat: true,
            progress: true,
            common_settings: CommonSettings::default(),
        }
    }
}

impl Settings {
    pub fn parse<I: Iterator<Item = String>>(&mut self, args: &mut Peekable<I>) {
        while let Some(arg) = args.peek() {
            match arg.as_str() {
                // Value
                "-threads" => {
                    let _ = args.next();
                    self.threads = Some(
                        args.next()
                            .expect("expected a value")
                            .parse()
                            .expect("expected an i32"),
                    );
                }
                // Invertible bool
                "-low-priority" => {
                    let _ = args.next();
                    self.low_priority = args
                        .next()
                        .expect("expected a value")
                        .parse()
                        .expect("expected a bool");
                }
                "-nolow-priority" => {
                    let _ = args.next();
                    self.low_priority = false;
                }
                // Redirect
                "-quiet" => {
                    let _ = args.next();
                    self.stat = false;
                    self.progress = false;
                }
                _ => {
                    if !self.common_settings.parse_one(args) {
                        panic!("unknown argument");
                    }
                }
            }
        }
    }
}

#[derive(Debug, Default)]
struct CommonSettings {
    color: bool,
}

impl CommonSettings {
    pub fn parse_one<I: Iterator<Item = String>>(&mut self, args: &mut Peekable<I>) -> bool {
        let Some(arg) = args.peek() else {
            return false;
        };
        match arg.as_str() {
            "-color" => {
                let _ = args.next();
                self.color = args
                    .next()
                    .expect("expected a value")
                    .parse()
                    .expect("expected a bool");
                true
            }
            _ => false,
        }
    }
}

fn main() {
    let mut args = std::env::args().skip(1).peekable();

    let mut settings = Settings::default();
    settings.parse(&mut args);

    println!("{settings:#?}");
}

Yes, in the original code, all information needed were just written in one line. the kinds of information needed depends on the specific argument type, but usually it takes:

  1. a name or names,
  2. argument group (for printing)
  3. a default value or behavior (depending on the type)
  4. a description

What I really hated while writing my proc macro was that I was still writing boiler plate code for everywhere where I need to go through all the arguments, just because I had to deal with them differently. Seems like I have no option than to make the whole thing in a proc macro though, as I need to go through hundreds of them and I don't think I can do this within rust's bounds with types and instances etc.

Since the argument types also save extra information (source of the change, default value etc) i'd ideally want to use types that contain these values as well, but then I'd be checking the types within the proc macros by name since type checking system kicks in after macro expansions, it feels messy and bad for maintainability.

You can get pretty far with declarative macros

use std::iter::Peekable;

macro_rules! arg_struct {
    ($T:ident {
        $($a:tt => $f:ident: $t:ty where default = $d:expr, parse_fn = $p:expr),* $(,)?
    }) => {
        #[derive(Debug)]
        struct $T {
            $($f: $t),*
        }

        impl Default for $T {
            fn default() -> Self {
                Self {
                    $($f: $d),*
                }
            }
        }

        impl $T {
            pub fn parse_one<I: Iterator<Item = String>>(&mut self, args: &mut Peekable<I>) -> bool {
                let Some(arg) = args.peek() else {
                    return false;
                };
                match arg.as_str() {
                    $(
                        $a => {
                            let _ = args.next();
                            let parse_fn: &mut dyn FnMut(&mut $T, &mut Peekable<I>) -> bool = $p;
                            parse_fn(self, args)
                        },
                    )*
                    _ => false,
                }
            }
        }
    }
}

arg_struct! {
    Settings {
        "-threads" => threads: Option<i32> where default = None, parse_fn = &mut |settings, args| {
            settings.threads = Some(args.next().expect("expected a value").parse().expect("expected an i32"));
            true
        },
        "-low-priority"=> low_priority: bool where default = false, parse_fn = &mut |settings, args| {
            settings.low_priority = args.next().expect("expected a value").parse().expect("expected a bool");
            true
        },
        // No support for -nolow-priority, common settings without a more advanced macro, probably an ([incremental tt-muncher](https://danielkeep.github.io/tlborm/book/pat-incremental-tt-munchers.html).
    }
}

impl Settings {
    fn parse() -> Self {
        let mut args = std::env::args().skip(1).peekable();
        let mut settings = Self::default();
        while args.peek().is_some() {
            if !settings.parse_one(&mut args) {
                panic!("unknown argument {:?}", args.peek().unwrap());
            }
        }
        settings
    }
}

fn main() {
    let settings = Settings::parse();
    println!("{settings:#?}");
}

You can even have the macro support the invertable bool and nested settings. Doing so would probably involve writing an incremental tt-muncher. A procedural macro does indeed allow transforming the input tokens into whatever you like, which is more powerful.

In essence, you're rewriting clap for your particular argument syntax. Fortunately you only need to support what you need, unlike clap. Perhaps there are crates that can help you. Maybe lexopt - Rust, or GitHub - RazrFalcon/pico-args: An ultra simple CLI arguments parser.. Have not checked or used either.

P.S. Note that the macro I gave you uses &mut for the closures so I can specify the type, not sure how else to do it. May not be the best way. Also the thing I did with bools representing whether the argument was parsed isn't great. Probably there are more areas where my example code should be improved.

After I realized i couldnt do it with existing parser crates I went through like 4 different macro implementations, my latest mimics clap in how it's implemented, I still feel like it's just painful to debug and maintain. Your solution is basically how it works, but well a bit more complicated. (I initially tried using a declarative macro but realized I had more freedom doing it with a proc macro). Is it by any chance possible to specify a file path in a macro to a file containing all the information I need, that the macro can then use to generate what I need? I feel like staying within rust might be making this a bit complicated

You could use a build script to just generate your Rust code in whatever way you please, based on the contents of your input file. This is what prost does to generate Rust type definitions from protobuf definitions.

that seems ideal then, I'll work on that for the time being

I'm pretty sure you can describe it with bpaf, creating primitives with combinatoric API:

  • setting_invertible_bool
fn invertible_bool(yes: &'static str, no: &'static str, help: &'static str) -> impl Parser<bool> {
    let yes = long(yes).help(help).req_flag(true);
    let no = long(no).req_flag(false);
    construct!([yes, no]).fallback(false)
}
  • setting_redirect
    I think example for cat does what you describe: examples/coreutils.rs

  • setting_func
    This one is doable with Parser::map or Parser::parse. It requires to be static so doing something like this should help. Or a top level static Arc with Mutex

1 Like

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.