Anyinput -- proc macro to easily create functions that accept any type of string-, path-, iterator-, or array-like inputs

anyinput - Rust (docs.rs)

When I make a function with a path input, sometimes I'd like it to accept &Path, PathBuf, &PathBuf, &str, String, &String, etc.

You can do this with generics, but I got tired of remembering the syntax, especially for iterators and ndarrays. I wrote a proc macro that remembers the syntax for you.

I'd be interested in opinions about if this is a terrible/good idea. It seemed useful in my bed-reader genomics crate to simplify AnyIter<AnyString> inputs.]

Example: Create a function that adds 2 to the length of any string-like thing.

use anyinput::anyinput;
use anyhow::Result;

#[anyinput]
fn len_plus_2(s: AnyString) -> Result<usize, anyhow::Error> {
    Ok(s.len()+2)
}

// By using AnyString, len_plus_2 works with
// &str, String, or &String -- borrowed or moved.
assert_eq!(len_plus_2("Hello")?, 7); // move a &str
let input: &str = "Hello";
assert_eq!(len_plus_2(&input)?, 7); // borrow a &str
let input: String = "Hello".to_string();
assert_eq!(len_plus_2(&input)?, 7); // borrow a String
let input2: &String = &input;
assert_eq!(len_plus_2(&input2)?, 7); // borrow a &String
assert_eq!(len_plus_2(input2)?, 7); // move a &String
assert_eq!(len_plus_2(input)?, 7); // move a String

Nesting and multiple AnyInputs are allowed. Here we create a function with two inputs. One input accepts any iterator-like thing of usize. The second input accepts any iterator-like thing of string-like things. The function returns the sum of the numbers and string lengths.

We apply the function to the range 1..=10 and a slice of &str’s.

use anyinput::anyinput;
use anyhow::Result;

#[anyinput]
fn two_iterator_sum(
    iter1: AnyIter<usize>,
    iter2: AnyIter<AnyString>,
) -> Result<usize, anyhow::Error> {
    let mut sum = iter1.sum();
    for any_string in iter2 {
        // Needs .as_ref to turn the nested AnyString into a &str.
        sum += any_string.as_ref().len();
    }
    Ok(sum)
}

assert_eq!(two_iterator_sum(1..=10, ["a", "bb", "ccc"])?, 61);

See the link for examples with paths and arrays. There is also an example of applying NdArray functions to any array-like collection of numbers. For example, you can apply mean or std to a Vec of f32.

Suggestions, feature requests, and contributions are welcome.

How it works:

The #[anyinput] macro uses standard Rust generics to support multiple input types. To do this, it rewrites your function with the appropriate generics. It also adds lines to your function to efficiently convert from any top-level generic to a concrete type. For example, the macro transforms len_plus_2 from:

#[anyinput]
fn len_plus_2(s: AnyString) -> Result<usize, anyhow::Error> {
    Ok(s.len()+2)
}

into

fn len_plus_2<AnyString0: AsRef<str>>(s: AnyString0) -> Result<usize, anyhow::Error> {
    let s = s.as_ref();
    Ok(s.len() + 2)
}

Here AnyString0 is the generic type. The line let s = s.as_ref() converts from generic type AnyString0 to concrete type &str.

Any feedback, negative or positive, is welcome.

- Carl

1 Like

Although its use in std and the fact that it is more typing has made this pattern idiomatic:

fn foo<P: AsRef<Path>>(path: P) {
    let path = path.as_ref();
    // ...
}

A better pattern in my opinion, a more correct pattern is

fn foo<P: ?Sized + AsRef<Path>>(path: &P) {
    let path = path.as_ref();
}

As in the absence of more bounds, the only thing foo can do with path is call as_ref, which takes a reference -- i.e. it has no reason to take ownership of something like a PathBuf.

On the downside, the function can't take a non-reference any more, so in cases where giving away ownership was fine, it can require adding a & at the call site. But, something to consider.


Another thing to consider is splitting the functions into a small generic translation stub and then calling a common function with the rest of the body, to help minimize code bloat and compilation time.

4 Likes

Thanks for your suggestions!

Does adding "?Sized" make things more efficient?
In bed-reader, users are allowed to pass in a list of chromosome names as IterLike<StringLike>. This can end up being [&str] or Vec<String>, etc. I like that they pass a full non-reference String if they want. I think this helps make the Rust API as easy to use as the Python API. (I wrote about the port from Python to Rust.)

Does someone else have a macro for splitting generic functions? It seems like a great idea for one.

  • Carl

It doesn't really make things more efficient in my opinion. Let's see what happens in both cases though.

use std::path::Path;
fn foo<P: AsRef<Path>>(path: P) { let _ = path.as_ref(); }
fn bar<P: AsRef<Path> + ?Sized>(path: &P) { let _ = path.as_ref(); }

#[derive(Clone)]
struct S;
impl AsRef<Path> for S {
    fn as_ref(&self) -> &Path { "".as_ref() }
}

fn main() {
    let s = S;
    bar(&s);   // A
    foo(&s);   // B
    foo(s);    // C

    let path: &Path = "".as_ref();
    bar(path); // D
    foo(path); // E
}
  • Sized types (S)
    • (A) We pass a &S and call <S as AsRef<Path>>::as_ref(path)
    • (B) We pass a &S and call <&S as AsRef<Path>>::as_ref(&path)
    • (C) We pass a S and call <S as AsRef<Path>>::as_ref(&path)
  • Unsized types (Path)
    • (D) We pass a &Path and call <Path as AsRef<Path>>::as_ref(path)
    • (E) We pass a &Path and call <&Path as AsRef<Path>>::as_ref(&path)

(B) and (E) are arguably less efficient than (A) and (D) as they create a nested reference which then has to be automatically dereferenced (within the as_ref body) to recursively call as_ref again. But I'm of the opinion this doesn't really matter in practice (even if it's not optimized out, which is almost surely will be).

(C) gives away ownership and is the functionality you lose (you can't pass a non-reference to bar). It's probably the main reason to not go with my suggestion as it is an ergonomic hit (especially if you have owned-but-Copy types). When dealing with String and PathBuf though, you generally want to pass a reference (or call .as_ref() yourself) anyway, as opposed having the option of calling .clone().

It's a tradeoff.


I don't know of such a macro myself but agree it would be nice.

1 Like

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.