General review for my first crate (may to be published)

Hey Rust-community,

I wrote a very small crate which provides a writer-type for markup-like languages, e.g. HTML, XML and JSON, and also the writers itself. It is a first small version to be extended in the future. For my personal training and to find in the world of open source, I would like to publish this, but not without a review in several point of views.

  • Is there an important code structuring issue, which should be improved?
  • Are there any beginner-coding-issues I can improve?
  • Is the toml-file ok (keywords, categories etc.) or should it be extended?
  • Is there a certain type of licence which could be added and is it recommended?

Thank you for any kind of suggestion for improvement :slight_smile:

Kind regards,
Michael

Hello,

You need a license if you want people to be able to use your crate freely and legally. The convention here seems to be MIT/Apache but it's really your (political) choice. (you can also choose GPL, AGPL, LGPL, BSD, public domain, etc.)

  • Public methods take &str then convert them to String. If the caller was forced to pass a reference to a String, this may lead to more allocations than needed. It may be better to ask for a String directly in this case. (but maybe not in certain cases... the more flexible would be a trait accepting both)
  • Maybe you should use format! instead of &["foo".to_string() + &bar].concat(). I'm not sure it's more efficient, but it sure is clearer.
  • I'm seeing some .len() > 0, which should have been detected by clippy. Have you run cargo clippy?
  • On line 200, you remove the end of a String using multiple pop in a loop. There are String methods to do that more efficiently in only one call.

Globally, each time you create a String or Vec, wonder if it's really necessary, or if &str or slice is enough. For example when comparing strings, it's cheaper to do &s == "blah" than s == "blah".to_string().

1 Like

You don't escape anything, so you'll get invalid strings with "realistic" inputs. It's not clear what the w prefix means on the all your methods. Every key and every value are not necesserarily strings, you shouldn't expect the user to convert everything to strings.

In general I don't find this API particularly ergonomic. Just looking at a test (kudos for having tests):

#[test]
fn test_html_mixed_entries() {
    let mut wr = HTMLWriter::new();
    wr.w_open_element("div");
    wr.w_property("class", "container");
    wr.w_lf_inc();
    wr.w_single_element("img");
    wr.w_property("style", "width: auto");
    wr.w_lf_dec();
    wr.w_close_element();
    assert_eq!(wr.core.content, "<div class=\"container\">\n    <img style=\"width: auto\">\n</div>")
}

This is probably error-prone and leaves too much exposed.

1 Like

Thank you very much!

  • Yes, I forgot to use clippy and now I did it and implemented all of its suggestions :+1:
  • About the way of string concatenation I somewhere read an article, but it is not clear to me, which way would be the best
  • The line 200 thing - I reimplemented it much more beautiful and easier to read :+1:
  • And I added now the default license, like you recommended, MIT+Apache2.0 - but I don't have any feeling about the differences betwenn several open-source licences...
  • Is there a certain trait, that forces to implement using both &str and String?
1 Like
  • What does escape mean? Can you give an example?
  • I tried to separate methods, which are writing to the content-String (edit the content of desire) and those methods, which only modify the setup of the writer... Would you recommend, to avoid this and just remove all prefixes?
  • What are kudos? And do you have an idea, how to implement it much better?

Thank you anyway :slight_smile: :+1:

https://www.fsf.org/blogs/licensing/licensing-resource-series-how-to-choose-a-license-for-your-own-work

1 Like

For example:

let mut wr = HTMLWriter::new();
wr.w_open_element("div><script>alert('hello');</script");
// with or without calling close_element
println!("{}", wr.core.content);

This will print <div><script>alert('hello');</script> which browsers will happily digest and run. Think also about quotation marks and many other "special" characters.

In general in Rust we really like APIs that can't be misused, where you can't forget to call methods like close (look at the File API for example).

Make an independant configuration struct (generally called a "builder") to build your writer, that builder should guarantee a valid state.

Kudos are praise, appreciation for something :slight_smile:

In general, your "writers" should build up a tree of objects and only when requested serialize that tree into a string (or into a Writer). You may be interested to look at the internals of serde and serde_json.

4 Likes

Ah, now I got your point :slight_smile: Thanks a lot, I will imrove the code with your inputs :+1:

And, I got some good ideas for the next version/level of this whole thing :slight_smile: :+1:

You're probably coming from JavaScript and read one of the several articles that claim that concatenating many strings using ["foo", "bar", "qux"].concat() is faster than repeatedly adding them. But:

  1. if you have two strings, then nobody cares, and clarity trumps performance.
  2. it's simply not true in Rust, because String::add() takes the LHS by value, so it's not quadratic.
3 Likes

There are many ways to do this, but the two main ones is to take arguments that implements either Into<String> or ToString.

The Into<String> variant mainly accepts String and &str (but also some variants such as boxes and cows). It takes something that is already a string (as in a sequence of bytes that is a utf8-encoded text) and you can convert it (by the foo.into() method) to a heap-allocated String.

The ToString variant is more general, it takes anything that is "formattable" (anyting that implements Display) and you can format it into a String with the foo.to_string() method.

Note that in many cases (e.g. if you need to do some escaping before storing the result in a String, or if you only store it by copying it into a longer String (or buffer)), it can be both more efficient and a nicer api to just take the argument as an &str anyway.

2 Likes

Thank you to all of you - every of your answers was helpful for me and I am refactoring the code with respect to every input. This gave me lot :+1: :slight_smile:

In my current branch:

  • I added a couple of input validations, e.g. check the tag-names for invalid characters
  • I separated the WriterCore from the MLLWriter-trait and rebuildt it to a functionality-only struct with interfaces for an independent String-reference
  • I replaced the [String &str String].concat()-parts by String.push('<'); String.push_str(tag); ...
  • Now I will separate the configuration of the WriterCore in also a separate struct and enable the user to configure more individual aspects
  • And I learned what escaping and kudos means

So, Kudos to all of you :slight_smile: