Parsing a comma-separated string with a generic parser

I'm trying to write a generic parsing function that can parse a comma-separated string of values using a parsing function passed-in by the caller and return a vector of Option. The caller also send a total_size (usize) thats used (explained below) when input string is None.

Some rules of the parser:

  • if input string is None, return a vector with "total_size" number of values set to None
  • if any token is empty, its corresponding value in vector should be None
  • if parsing of individual items fails, return an error (the parser function sent by the caller returns a Result.

Eg: "1,2,3,,5" should parse to Result<[Some(1), Some(2), Some(3), None, Some(5)]>
I have to parse many such string with different custom types (which I intend to parse using serde_json).

Here's how I've tried to implement it, but having trouble getting it to compile.

fn tokenize<F, T>(input: Option<String>, total: usize, parse_fn: F) -> Result<Vec<Option<Result<T, String>>>, String>
where
    F: Fn(&str) -> Result<T, String>,
{
    match input {
        Some(s) => {
            let tokens: Vec<&str> = s.split(',').collect();
            let mut result: Vec<Option<Result<T, String>>> = Vec::with_capacity(total);
            for token in tokens {
                let val = if token.is_empty() {
                    None
                } else {
                    match parse_fn(token) {
                        Ok(num) => Some(Ok(num)),
                        Err(e) => Some(Err(format!("Failed to parse token {}: {}", token, e))),
                    }
                };
                result.push(val);
            }
            if result.len() < total {
                result.resize_with(total, || None);
            }
            Ok(result)
        }
        None => Ok(vec![None; total]),
    }
}

The error I get is:

error[E0277]: the trait bound `Result<T, String>: Clone` is not satisfied
  --> src/main.rs:25:25
   |
25 |         None => Ok(vec![None; total]),
   |                    -----^^^^--------
   |                    |    |
   |                    |    the trait `Clone` is not implemented for `Result<T, String>`
   |                    required by a bound introduced by this call
   |
   = note: required for `Result<T, String>` to implement `Clone`
   = note: 1 redundant requirement hidden
   = note: required for `Option<Result<T, String>>` to implement `Clone`

I'm not sure this is the right way to go about it. I'm also struggling to figure the syntax for "invoking" tokenize function.

The easy thing to do is add another bound, T: Clone, to the function signature:

fn tokenize<F, T>(input: Option<String>, total: usize, parse_fn: F) -> Result<Vec<Option<Result<T, String>>>, String>
where
    F: Fn(&str) -> Result<T, String>,
    T: Clone,

It isn't entirely clear why it returns such a complex type, though. I would expect it to return Result<Vec<Option<T>>, E>. The inner Result appears to be redundant.

You can replace the vec![None; total] with

{
  let mut vec = Vec::with_capacity(total);
  vec.fill_with(|| None);
  vec
}

This cyclic assignment will be optimized by the compiler.

But if T is a known type, you can use a const bound to help the compiler understand None of any type is copyable:

const STRING_NONE: Option<String> = None;

then try again

Ok(vec![STRING_NONE; total])

Besides, I would suggest you to try syn crate. This may help you greatly, it is not limited to rust code but almost any syntax you want.

You're right! That was a copy-pasta mistake on my part. The inner Result wasn't needed, and removing it fixed the Clone error.

I'm embarrassed to admit though, I can't figure out the syntax to call this function! :confused:

Eg:

fn parse_int(input: &str) -> Result<u32, String> {
    input::parse::<u32>()
}
...
// What should I pass in the generics syntax?
let int_input = "1,2,3,,5";
println!("{:?}", tokenize::<??, u32>(Some(int_input.to_string()), 4, parse_int).unwrap())

Thanks I have a look, although it looks above my Rust paygrade :slight_smile: (I'm still learning the basics of Rust).

In many cases, you can pass just _, asking the compiler to infer the type for you. If this doesn't work in your case, could you share the example on playground, so that we're able to see the problem exactly as it is?

Hmm looks like I'm back to square one with the Clone error after trying to clean a few things up.

Here's the playground link: Rust Playground

Sorry, here's the correct link to the latest code I'm trying with: Rust Playground

For the clone error, here's one way:

        None => Ok((0..total).map(|_| None).collect())

For the Err arm, you can replace the match with

                    Some(parse_fn(token)?)

The code contains some other weird/sub-optimal choices; here is a more idiomatic version that allocates less:

fn tokenize<T>(input: Option<&str>, total: usize) -> Result<Vec<Option<T>>, T::Err>
where
    T: FromStr,
{
    let Some(input) = input else {
        return Ok(repeat_with(|| None).take(total).collect());
    };

    input
        .split(',')
        .map(|tok| if tok.is_empty() {
            None
        } else {
            Some(tok.parse())
        })
        .map(Option::transpose)
        .chain(repeat_with(|| Ok(None)))
        .take(total)
        .collect()
}
4 Likes

That iterator chain is too complicated for my tastes; I’d be more likely to write something like this:

use core::iter::{repeat_with, zip};
use core::str::FromStr;

fn tokenize<T>(input: Option<&str>, total: usize) -> Result<Vec<Option<T>>, T::Err>
where
    T: FromStr,
{
    let mut res: Vec<_> = repeat_with(|| None).take(total).collect();
    for (slot, tok) in zip(&mut res, input.unwrap_or("").split(',')) {
        if !tok.is_empty() {
            *slot = Some(tok.parse()?);
        }
    }
    Ok(res)
}
1 Like

Thanks, this looks neat!

I assume passing input as a string slice (&str) is whats saving allocation (like a const ref` in c++)?

The reason I was passing in a custom parsing function instead of using parse was because in my actual code I need to parse custom (deserializable) types (using serde_json::from_str).

I still haven't found the right syntax for passing in the generic function to tokenize.

I guess this works, but I'm curious to know what the syntax looks like if I have specify it fully.

This indeed is a bit more easy on the eyes with fewer iterator chains!

still trying to grok what this does :stuck_out_tongue:

Passing input as &str instead of String saves an allocation conditionally (i.e., when you have a string and you can't pass ownership of it – the original signature would have required a superfluous clone then). The unconditional removal of allocation comes from not collecting the tokens in a temporary vector, but using map instead.

Just write out its name, like this:

let values: Vec<Option<serde_json::Value>> =
    tokenize(Some(jsons), 3, serde_json::from_str)?;

The type of a closure is unnameable. There's no concrete type that you could specify at that place which would match the closure's type exactly.

If the closure doesn't capture, then you can specify a function pointer, fn(&str) -> Result<T, E>, but that doesn't work for capturing closures. In that case, you could do &mut dyn FnMut(&str) -> Result<T, E>. Both of these options force dynamic dispatch, though, and they are not identical to the type of the passed closure or fn item (they result in a coercion).

It's clearly documented.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.