Conditional computation based on input type with PhantomData

I have a set of Nom parsers for various Markup languages. Each takes &str as input (c.f. parse-hyperlinks):

My goal is:

  1. to tag the input string with the markup language used, e.g. "md" or "rst", by associating a type

  2. statically dispatch two code versions for each parser:

    a) the right type for this parser: the parser does its computation.

    b) the wrong type: the parser returns Err.

  3. a generic parser combinator, that takes for example md-input
    and skips all non-md parsers.

  4. to have a marker type, let say "all" for which the combinator
    does not skip any parser.

I did explore what can be done with the PhantomData type, but I think I am on the wrong track. Any better ideas?

Open in PlayGround

use core::convert::AsRef;
use std::marker::PhantomData;
use std::println;
use std::str;

/// Hidden type tagging Markdown texts.
struct Md;
/// Hidden type tagging reStructuredText texts.
struct Rst;

#[derive(Debug)]
pub enum ParseError {
    WrongMarkup,
}

// A phantom tuple struct which is generic over `A` with hidden parameter `B`.
#[derive(Debug, PartialEq)] // Allow equality test for this type.
struct Input<'a, A: 'a + AsRef<str> + ?Sized, B>(&'a A, PhantomData<B>);

fn some_md_computation<'a, I>(i: Input<'a, I, Md>) -> Result<&'a str, ParseError>
where
    I: 'a + AsRef<str> + ?Sized,
{
    let i = i.0.as_ref();
    // Some computation.
    Ok(&i[0..4])
}

// This does not work because:
// src/main.rs:17:6:   B: !Md,    negative bounds are not supported
// src/main.rs:27:1    fn rend    the name `render_md` is defined multiple times
/*
fn some_md_computation<'a, I, B>(_i: Input<'a, I, B>) -> Result<&'a str, ParseError>
where
    I: 'a + AsRef<str> + ?Sized,
    B: !Md,
{
    // Return Error.
    Err(ParseError::WrongMarkup)
}
*/

fn main() {
    let md_input: Input<str, Md> = Input("Markdown abc", PhantomData);
    let md_output = some_md_computation(md_input).unwrap();
    println!("{:?}", md_output); // Prints: ("Mark")
    let _rst_input: Input<str, Rst> = Input("Markdown abc", PhantomData);
    // I would like to be able to do this:
    //let _ = some_md_computation(_rst_input).unwrap_err();
    // But as the above does not work...
    // src/main.rs:48:32   some_md_computation(...   expected struct `Md`, found struct `Rst`
}

You don't need PhantomData at all. There's no overloading in Rust, either (thankfully). For static dispatch, use traits. (Working playground.)

#[derive(Debug, PartialEq)]
struct Input<'a, T: ?Sized + 'a, U>(&'a T, U);

trait Parser<'a, S> {
    fn parse(&self) -> Result<&'a str, ParseError> {
        Err(ParseError::WrongMarkup)
    }
}

impl<'a, T: AsRef<str> + ?Sized + 'a> Parser<'a, Md> for Input<'a, T, Md> {
    fn parse(&self) -> Result<&'a str, ParseError> {
        Ok(&self.0.as_ref()[0..4])
    }
}

impl<'a, T: AsRef<str> + ?Sized + 'a> Parser<'a, Rst> for Input<'a, T, Rst> {
    fn parse(&self) -> Result<&'a str, ParseError> {
        Ok(&self.0.as_ref()[0..4])
    }
}

impl<T: ?Sized> Parser<'_, Rst> for Input<'_, T, Md> {} // use default impl which errors
impl<T: ?Sized> Parser<'_, Md> for Input<'_, T, Rst> {}

fn some_md_computation<'a, T, U>(input: Input<'a, T, U>) -> Result<&'a str, ParseError>
where
    T: AsRef<str> + ?Sized + 'a,
    U: 'a,
    Input<'a, T, U>: Parser<'a, Md>
{
    input.parse()
}
1 Like

Thank you a lot for your answer. And also that it came so quickly.

Your solution is very close to my needs. I realized, that I actually forgot one important requirement:

  1. The output type should carry the same type tag than the input type.

The reason is the following: the output &str in Result<&'a str, ParseError> stands for the remaining untreated string. It should be of the same type than the input, because it will be fed to next parser. The full output type would be something like

Result<(Input<'a, T, Md>, Cow<'a, str>), ParseError>

where &'a str is the remaining untreated string and Cow<str> the actual output.
The parser then looks like:

impl<'a, T: AsRef<str> + ?Sized + 'a> Parser<'a, Md> for Input<'a, T, Md> {
    fn parse(&self) -> Result<(Input<'a, T, Md>, Cow<'a, str>), ParseError> {
        let rest = &self.0.as_ref()[3..];
        let res  = &self.0.as_ref()[..3];
        Ok((Input(rest, Md), Cow::Borrowed(res)))
    }
}

Unfortunately, I frequently run into cycle detected when computing function signature of Parser::parse. What is wrong in my approach?

I'm probably missing something as to why that is a problem. I was able to add the same return type without getting any errors and with the same dynamic output.

The devel is in the detail: The parser needs to return a slice of the input. Not the whole input.
The following does not work:

    fn parse(&mut self) -> Result<(Self, &'a str), ParseError> {
        self = &mut Input(&self.0[4..], Md);  // <--------------------------
        Ok((*self, &self.0[..4]))
    }

I suppose there has to be an additional constraint to T something like std::slice::SliceIndex<str>.

I'm probably missing something as to why that is a problem.

:slight_smile: I am just learning about Rust's type system. I do not understand yet many of the details
to get this right.

That really doesn't matter at all. The signature and the argument/return types of a function are not inferred from or influenced by its implementation. When the compiler typechecks a function signature, it does not look at the function body. You can't write two functions with the same signature but different bodies such that one of them typechecks whereas the other one fails to compile due to the signature. This is a crucial feature of abstraction – acting otherwise would silently introduce breaking changes into the public interface of the code, depending on its private implementation details.

It does not work because you are trying to assign to self, which is immutable. self is itself a (&mut) reference, so it does not make sense to try and mutate the state by assigning to the reference. You want to assign to the object pointed to by the reference, hence you need to dereference it.

However, as I understand, there's no need for mutability here, either, because the parser is stateless, and it returns the rest of the yet-unparsed text.

There's also the problem of your generic T type not being the same as str itself – generics means universal quantification ("any type"), so you can't just assign a &str to a T. If you insist on using generics, you'll have to add a means of converting from a &str back to a T. However, quite honestly, at this point I think that's over-engineering, and you should rather work with &str directly.

Here's a working playground with the code you likely wanted to write, and here's another playground with the code you likely should write.

If you insist on using generics, you'll have to add a means of converting from a &str back to a T .

This was the point where I struggled. At some point I even tried the constraint std::slice::SliceIndex<str> instead of T, but it became a big mess. In your first Playground a simple into() does the trick. Sometimes the solution is so easy!

However, quite honestly, at this point I think that's over-engineering, and you should rather work with &str directly.

Yes absolutely! Originally, I copied <'a, T: AsRef<str> + ?Sized + 'a> from Nom where it makes sense, because it can parse binary data also. In my use case &str is absolutely fine. Thank you for this beautiful API. I repeat it here for a wider exposure H2CO3's Playground:

#[derive(Debug, Clone, Copy)]
struct Md;

#[derive(Debug, Clone, Copy)]
struct Rst;

#[derive(Debug)]
pub enum ParseError {
    WrongMarkup,
}

#[derive(Clone, Copy, Debug, PartialEq)]
struct Input<'a, U>(&'a str, U);

trait Parser<'a, S>: Sized {
    fn parse(&self) -> Result<(Self, &'a str), ParseError> {
        Err(ParseError::WrongMarkup)
    }
}

impl<'a> Parser<'a, Md> for Input<'a, Md> {
    fn parse(&self) -> Result<(Self, &'a str), ParseError> {
        let (head, rest) = self.0.split_at(4);
        Ok((Input(rest, Md), head))
    }
}

impl<'a> Parser<'a, Rst> for Input<'a, Rst> {
    fn parse(&self) -> Result<(Self, &'a str), ParseError> {
        let (head, rest) = self.0.split_at(4);
        Ok((Input(rest, Rst), head))
    }
}

impl Parser<'_, Rst> for Input<'_, Md> {} // use default impl which errors
impl Parser<'_, Md> for Input<'_, Rst> {}

fn some_md_computation<'a, U>(input: Input<'a, U>) -> Result<(Input<'a, U>, &'a str), ParseError>
where
    U: 'a,
    Input<'a, U>: Parser<'a, Md>
{
    input.parse()
}

fn main() {
    let markdown = Input("foobar", Md);
    let restruct = Input("quxlol", Rst);
    
    println!("{:?}", some_md_computation(markdown));
    println!("{:?}", some_md_computation(restruct));
}

I want to throw in the "typed enum" pattern I recently stumbled upon, in case it is helpful.

See this edited playground of your original code:

  • If the parse operation truly cannot continue for an input/parser type mismatch, then let the compiler enforce type mismatch:
    Remove these two lines, since Parser is not really implemented (e.g. always fails with WrongMarkup)
    //impl Parser<'_, Rst> for Input<'_, Md> {}
    //impl Parser<'_, Md> for Input<'_, Rst> {}
    
  • The added TypedInput enum wraps the Input struct, so you can call different type-specific functions at runtime.
  • Now you can process a Vec<TypedInput>, where before you could not collect Inputs of different types.

The point where you want to return a ParseError::WrongMarkup error is at runtime, only if the exact type is not known.
The WrongMarkup error might still make sense for the common-path functions... it just depends on the overall logic.

1 Like

Thank you for sharing your thoughts. Your proposal, wrapping of the input in enum is indeed interesting, although does not meet all my requirements e.g. I need static dispatch for my use case: I want the compiler to generate different versions of take_links(), depending on the input text type.


BTW: I recently stumbled upon Visualizing Rust's type-system, presenting a mental model of Rust's type system. People reading this thread might be interested in this as well.

1 Like

In addition: What would be the most elegant way to have a type All that enables all parsers, Md, Rst etc to work?