Asking for Rust solution

Hello.

I'm looking for someone who has plenty of experience in the Rust programming language.

I have some task, which I hope will be added to Rosetta Code.
This task is based on pqmarkup, and is called pqmarkup-lite.

The specialty of this task is formatting characters of pqmarkup {paired quotation marks themselves, and Н, Р, С, Т, О, which allow both Latin and Cyrillic writing} go beyond ASCII. This significantly complicates working with input UTF-8 string directly (especially in languages [like C++] which have no built-in UTF-8 support).

I've already translated the original implementation in Python of this task into C++ (there are two implementations: using u16string and using UTF-8 string).
Also there is a Nim translation by Luc Secouard.

And now I'm looking for people who can translate this task into other languages (Rust, Swift, D, etc.), and then I will compare all implementations by code readability and by performance.

So, can someone provide the most idiomatic Rust solution to this task?
I can pay for that (not very much though).

What's your policy on using libraries in these implementations?

One way to write an idiomatic rust implementation would probably use a dedicated parser library, such as nom. It could be completed without additional libraries, but most idiomatic Rust programs do use libraries.

Edit: An additional question - do you want the implementations to use the same general strategy, or can they be anything as long as they accomplish the goal and produce the same output?

What's your policy on using libraries in these implementations?
One way to write an idiomatic rust implementation would probably use a dedicated parser library

I'm not sure if pqmarkup can be effectively parsed by some general purpose parser library. But I don't mind using dedicated libraries.

do you want the implementations to use the same general strategy, or can they be anything

They can be anything. (The only requirements that must be satisfied by pqmarkup-lite implementations are listed in this my comment).

Looking at the C++ implementation, starting at line 41:

template <int oldN, int newN> std::string &&replace_all(std::string &&str, const char (&old)[oldN], const char (&n)[newN])
{
    size_t start_pos = 0;
    while((start_pos = str.find(old, start_pos)) != str.npos) {
        str.replace(start_pos, oldN-1, n, newN-1);
        start_pos += newN-1;
    }
    return std::move(str);
}

std::string &&html_escape(std::string &&str)
{
    replace_all(std::move(str), "&", "&amp;");
    replace_all(std::move(str), "<", "&lt;");
    return std::move(str);
};

If I'm not mistaken, you have a use after move in the html_escape function. The value of replace_all isn't used, which is actually the moved to string. Also return std::move(whatever) prevents copy ellision by the compiler.

I know this is a [not very] dirty code, but it works.
Without std::move() (i.e. with just return str;) this code produces a compile time error (cannot convert from 'std::u16string' to 'std::u16string &&').

May be using std::forward<std::u16string>(str) is better here, but it looks like generated code is the same.

Can you elaborate on this?
As this functions deal with references there can not be any copying.

Also I'd like to note that std::move() is a very special [almost "magical"] function.
For example, if you have such code:

std::string s("some text");
std::move(s);
std::cout << s;

it prints some text.
But this code:

std::string s("some text");
std::string s2(std::move(s));
std::cout << s;

prints nothing.

Yeah sorry, the copy elision comment is just a general comment regarding any return std::move(whatever);. For references you would just take a const ref and return a reference, or just take a (mutable) reference since you don't need to return any thing from replace_all. The compiler error is because std::string::substr returns a string, but you pass it directly as a temporary, so you're having to deal with everything as temporaries. I know the code is long and quite complex, but if we take just that part:

#include <string>
#include <stdio.h>

template <int oldN, int newN>
void replace_all(std::string &str, const char (&old)[oldN], const char (&n)[newN]) {
    size_t start_pos = 0;
    while ((start_pos = str.find(old, start_pos)) != std::string::npos) {
        str.replace(start_pos, oldN - 1, n, newN - 1);
        start_pos += newN - 1;
    }
}

void html_escape(std::string& str) {
    replace_all(str, "&", "&amp;");
    replace_all(str, "<", "&lt;");
};

int main() {
    std::string test = "&whatever<string!";

    // using html_escape(test.substr(0, test.length()) will fail the build, instead of taking rvalue refs, just assign the substr to a variable

    std::string sub = test.substr(0, test.length());

    html_escape(sub);


    printf("%s\n", sub.c_str());
}

Return value of replace_all does not needed here, but it is needed in the other place in the source code (write(replace_all(...))).

The compiler error is because std::string::substr returns a string

It may be some misunderstanding, but there [in this 2 functions] is no any call to substr(). (And replacing return std::move(str); with return str; prevents compiling just this functions, not where they are called.)

but if we take just that part:

I don't get your point. There are some places in the source code where replace_all and html_escape are called and their return value is actually used. So, what are you suggesting? Make this functions returning void and add some excess code here and there?
The current code despite being dirty is near optimal for this task, and it satisfies the main purpose of the C++ implementation — maximum performance. And there should be a very strong reason to rewrite it.

Also I want to say in my defense that current C++ implementation is a dirty partly because there is already a not dirty C++ "implementation". I have a transpiler, which translates Python code to human-readable C++. And this generated code can be considered as a "not dirty C++ implementation" [besides, it has comparable performance].

I've written a mostly-complete implementation, but ran out of steam for finishing up the last bits of support. Currently missing lists, block-quote authors, and a few other things I haven't figured out yet from the spec.

Opened a PR at Add incomplete ast-based Rust implementation by daboross · Pull Request #4 · pqmarkup/pqmarkup-lite · GitHub.

I probably won't work on this again soon, so if anyone else is interested in a relatively-clean parser from scratch based on parsing to an AST and then doing a few transformations before printing, feel free to take this the last mile. (in other words, grab my code, add your modifications, and PR as a completed thing).