Let's talk about ecosystem documentation

Hey everyone,

The problem

Last week, someone said that they ended up not using Rust because they were frustrated with the documentation. I asked some questions, and it turns out they didn't mean fro Rust itself: they meant that when they got out into the ecosystem, they got frustrated. So naturally, I tasked myself with a way to evaluate this.

The solution

I decided to do this: I parsed out crates.io's index, and put together a list of the top crates that are depended on by other crates. I then looked at the graph to see the curve, and at about 40 crates there's a big drop-off in the number. Plus, 40 crates is enough to take some work to evaluate, but not something that'd take me forever.

The methodology

Then I had to decide how to do the grading. I've really enjoyed this blog post by Steve Losh, which breaks up docs into these categories:

  • First Contact - what and why
  • The Black Triangle - getting started
  • The Hairball - beginner -> intermediate
  • The Reference - complex docs for advanced and expert users

More in the post. I also decided to rank crates based on their README as well, since that's shown prominently on GitHub. This also means that we now have five things to rate, and I did them on a score of 1-5, meaning a max "ecosystem doc score" of 5 * 5 * 40 = 1000.

Some weaknesses

Before I share the score, I want to mention some weaknesses of this methodology. First, there are some kinds of dependencies that are vital that don't depend on enough other crates.io crates to show up with this kind of ranking. Notably, only one or two crates from Piston made the cut, because I'm assuming that many of Piston's users don't end up on crates.io. Second, while I tried to come up with some objective criteria for the 1-5 rating, there's an element of subjectivity here.

As a result, please consider this a broad-strokes, big-picture thing, and not get too hung up on the exact number or details.

The score

Overall, we got a 407 / 1000. The sums for each column were:

  • First Contact - 103 / 200
  • The Black Triangle - 94 / 200
  • The Hairball - 79 / 200
  • The Reference - 37 / 200
  • README - 94 / 200

I am not going to post more than that, for a few reasons:

  1. This isn't about shaming any particular crates or their authors. I will say that the rust-lang crates are generally very high up there, and aren't always the highest scorers themselves, frankly.
  2. I don't want to quibble over the exact scores I gave per-crate. Not productive, and won't change the needle that much.
  3. While I'd like to track this over time, I'm not even sure this is the correct methodoology. See below for more.

Random observations

A few things that I noticed:

  1. Crates that are largely just FFI bindings generally have the worst docs. I think that this makes sense, as it's generally assumed that you're supposed to read their docs, but still.
  2. Smaller crates have better docs than larger crates, on average. This also makes sense: it's much easier to document a small, focused crate than a big, sprawling one.
  3. A lot of the current state of the docs boils down to "If you know Rust well, and there's a pile of method signatures, it's good enough at times." While this is true, we're likely to leave out beginners with this approach, and frankly, nice docs are nice, even if you can muddle your way through without them.

Where do we go from here?

So, what do we do with this information? I have some thoughts.

  1. I should start devoting some of my time to the ecosystem, and not just Rust itself.
  2. Rustdoc needs a lot of improvement, still. For example, I don't think it's a coincidence that The Reference was the worst scoring category when we don't really have a place to even put longform docs, currently. Even using rustdoc on Markdown files takes some Cargo hackery at the moment. And there are a number of other improvements that I am sure will help.
  3. I would like to do this on a semi-periodic basis, but I'd like your input on the methodology here. Can we find a way to overcome its shortcomings?
  4. What can I do to help foster a general culture of documentation in the Rust ecosystem? Writing docs is not fun, and maintaining crates is nobody (well, very few) people's jobs.

That's all I have for now. Thoughts?

18 Likes

Some thoughts I had in the past:

  • Would it be possible to have the README or something like it (ABSTRACT.mkd might work) shown on the crate's main page on crates.io? I'm usually more interested in those than the download graphs, since I first have to select some crates by viability before choosing based on activity and other factors.

  • Is there a way to see documentation coverage for projects? As in: How much of the public API has documentation attached? How many of those show examples? This might make it easier for people to at least have good API reference coverage.

4 Likes

I think maybe having a list on Crates.io of the crates with the best documentation (not sure how they would be rated, community votes???) might get some people more interested in making sure their docs are up to par.

Interesting stuff!

I think having an easy way for cargo to publish docs would be nice (. Perhaps simply putting docs: yes or something in a toml file which will autoupload docs via crates.io (to rustdoc.org?) would be very helpful. We have Travis recipes for pushing things to gh-pages, but they're not perfect and need you to set up github access tokens (which don't have per-repo permissions).

What can I do to help foster a general culture of documentation in the Rust ecosystem? Writing docs is not fun

Doc sprints can be quite fun if done as an online event, IMO. We've had tons of folks come together and work on Rust diagnostics; I don't think writing non-code is as boring as it sounds.

5 Likes

A thought on the structural issues with docs: why not combine rustdoc and rustbook?

By default, have rustdoc (or maybe just cargo doc) generate documentation containing sections comprised of:

  • README.md (with a default page to encourage people to put something there).
  • Chapters from the doc directory.
  • The API reference (i.e. code documentation).
  • CHANGELOG.md if it exists.
  • LICENSE.md if it exists.

The doc directory would more or less have the same structure as current rustbook.

So, with no additional effort, this would let people write up more introductory material without ending up with a massive lib.rs doc comment. Another advantage is that currently, there are good reasons to duplicate prose in both README.md and the crate docs (since only one of those two is visible, depending on where the user is coming from); this would alleviate that.

Also, once cargo install comes out, we can make a program to do gh-pages publishing, so hopefully, putting up docs will be reduced to something like: cargo gh-pages.

5 Likes

NPM does this. We do have a description specifically for this, but I'm not sure it's better or worse.

I don't believe so, no. Rustdoc needs so much love...

yup, this is certainly something I've thought about, since rustbook is largely a wrapper over rustdoc anyway.

is the issue related here as well.

One thing that makes documenting your own work hard is that it is your work. You know how it works. This makes it hard to put yourself into the shoes of someone who doesn't know how it works, and that makes it hard to judge where the biggest pain points and the greatest ROI lies in your documentation.

People, when unsure of what to do, tend to take queues from their environment. This is where I would advocate leading by example. I think a few popular crates with exemplary documentation would be very helpful in guiding crate authors towards what they should do documentation wise. That would also be helpful because I feel it is still a bit unclear in the community exactly how good crate documentation should be structured and look like.

The description seems to be for short explanations and doesn't allow Markdown, I'd be more interested in a freeform document that can show code for example. I'd think that for many small utility/convenience crates, this might even be the only documentation I'd need. For more complicated crates, this would be a perfect place for a quick lineup of a rationale, design goals, limitations and the like.

This was sorta discussed on internals but no consensus was reached. I thought the idea was okay but I wasn't going to implement it so I was trying to give solid ideas of what it could look like.

What stumps me is how and if the built in doc systems could handle documenting something like iron which would be really complicated. Compare with say express whose docs really aren't that good but still leaps and bounds better than iron's.

Maybe if the docs generated a landing page like that express site with headers at the top dedicated to separating complicated topics into categories (such as lib docs, introductions, and further topics). Even if each category could each host sections like the rust book, would it be cohesive while still discussing things separately?


Most of your post basically seemed to suggest a lot of the documentation problems could (slowly) be fixed if someone documentation minded just started submitting documentation PR's to every popular crate in existence. Aside from that, you stated rustdoc needs improvement but is do you have anything particularly in mind? They're two very different problems.

Well, I wanted to see what others thought about things, but yes. There's a number of smaller stuff, like the weird sizes of headings, but also the lack of long-form doc tooling for external stuff, which is sort of what you're bringing up at the start of your post.

Okay cool. I've gotten the impression that some of the rustdoc issues are partially because a good way forward isn't visible so progress isn't readily being made.

So that's why those NPM pages are so full. I go to one of those pages and all I want to find is the link to the docs or to the repository and I can hardly find either. I greatly prefer crates because it's easy to find things and it's really simple.

One idea would be to have a section for "Friends" or "Similar" which would list how they are different from a different library. Using Serde as an example:

  • rustc-serialize is a similar but older framework which we have now superseded. We have these 3 features which make serde so much better: a, b, c
  • capn is different and targets a, b, and c which are outside of our scope

Also, having a line-graph (is that what these are called?) which detailed where each crate stood in the ecosystem would be useful. You'd look at a child crate and realize it's actually just a piece of a large piece. Labelling things according to categories would be really important here I should think.

                          - serde_macros
                         /
serialization -|- serde -
               |         \
               |          - serde_json
               |
               |- rustc-serialize
               |
               |          - capnp-rpc
               |         /
               |- capnp -
               |         \
               |          - capnpc
               |
               | - dbus-serialize
               | - rmp-serialize

[EDIT]

These two ideas are for crates in case it wasn't clear.

1 Like

That quick little visualization would be immensely helpful, I think. I'm just starting to dive into rust myself, and I find that one of the things that is painful is, like any new(ish) language, you almost don't know what you don't know. There's been many times starting out where I just didn't know that I had different, possibly better options available to me.

Some kind of tree like this would be an excellent way to explore what's out there.

Good initiative. This is one of the most important things we can do to improve the Rust experience.

I totally agree with this, but what about the opposite-- would you be willing to praise some particular crates or authors, maybe one from each type of documentation that earned a 5?

Basically, in your opinion, how are the crates that provide good documentation doing it? Could we look at those to see how we might encourage those behaviors in other crates? (ex: make them feel like the default, make them easy)

5 Likes

Some thoughts, in no particular order:

  • Centralized documentation hosting, ala CPAN or ReadTheDocs. It's not horrible to setup your own documentation builds with GH pages or RustCI, but it requires some effort. And you have to repeat the process for every crate you make

  • Ideally, this would be tied into your Crates.io account. So you could just do cargo doc publish and it is autopublished

  • I hadn't heard about Rustbook until now. Perhaps publicize this more, or integrate it with cargo?

  • Rustdocs are not the right place for substantial examples and long-form tutorials (imo). Regular reference docblocks already eat up enough space in the code. When you provide an example-per-method, you often have a huge block of comments and a tiny public-facing method (e.g an example from my own code)

  • Writing markdown inside of a comment is a pain, and rustdoc finds interesting and surprising ways to explode. I've been bitten by stray spaces before a ``` nuking the entire comment, as an example

  • Asciidoc would be a lovely alternative/addition to markdown, giving more expressiveness and styling while still being (relatively) easily tooled

  • Evil option: Don't let someone publish to crates.io unless they have documentation :smiling_imp: More realistically, provide some kind of social incentive to publish docs (rotating "spotlight" on crates.io front page for documented project, etc).

As to actually getting people to write documentation...I'm not sure. It's the eternal boogyman of software: no one likes to write documentation. Reference docs are generally good for Rust because of the rustdoc integration (which is great!), but "how-to" and "beginner" documentation is sorely lacking. Perhaps by making it simpler to add documentation (via crates.io hosting) would lower the barrier to entry and more people will document?

Just as a note, this sounded like a big list of complaints...but I think Rust has a relatively healthy documentation ecosystem all things considered. And it has folks like you that are invested in making sure 3rd party libraries are documented, which is outstanding.

The fact that rustdoc is integrated means most libraries at least have reference docs...which is honestly more than many libraries in the wild. It's partially offset by the fact Rust is more complicated to get started with, but I wouldn't despair too heavily =)

Completely agree with @Manishearth's suggestion for built-in publishing support. Having a hosted version of your crate's docs automatically hosted and linked to from its crates.io page would reduce a lot of friction. It would also provide a single authoritative source for all crates. It may also provide encouragement for authors who, after seeing their hosted documentation, realize that they've missed important pieces.

Two things I really love about rustdoc are the built-in Markdown support and documentation tests. That the latter doubles as an example block for the associated method or function makes it even more enticing. I think we could extend this tooling to provide better support for the "first contact" and "black triangle" tasks by setting conventions about where this documentation should live. Building a landing page using README.md and GETTING_STARTED.md files at the root of the project (and auto-generating these as part of cargo new) would provide more complete coverage for these differing types of documentation.

The "hairball" section seems a little bit trickier, but we could leverage module-level docs to provide higher-level topic coverage and define a manifest that would enable an ordered TOC linking to said module docs. This certainly feels targeted to a much smaller demographic, but it would be really handy for larger, more heavily-used libraries and frameworks!

I personally enjoy writing good documentation. Libraries aren't complete without it IMO, and there are few things more satisfying than producing a complete library. A good API with good docs can read like a good story. :smile:

To a first approximation, I agree with the evaluation criteria. Specifically, the linked blog post provided some good context and I very much agree with the rationale there. To a second approximation, I'm not totally convinced that all sections need to be properly fleshed out. Specifically, smaller libraries probably don't need long form docs. A solid module level introduction with good API docs and examples is usually enough in my experience.

I'd like to echo some of others' thoughts on motivating documentation. In some ways, it's a social problem. What can we do to provide positive reinforcement for writing good documentation? Others have mentioned some good ideas. I really like the idea of showcasing well documented crates, although this does imply some form of manual curation. I would personally consider it important enough to be mentioned in any introductory guide to crates.io. Something like: "If you want others to use your code, then increase the likelihood of them doing so by teaching them how to use your code."

Also, I have personally come up with weird hacks to upload my docs to a web server somewhere. This saves me from having to do any kind of per-crate setup: all I have to do is add it to my "meta" Cargo.toml. But if this were done for me by Cargo/crates.io, then that'd be great.

I'd just like to say this I vehemently disagree that this is true. I've written a lot of Rust and I come across crates all the time that are just a pile of method signatures and traits and sub-module sprawl. I end up completely lost and uncertain how the author intends for me to use their code. In almost all cases, I end up having to read the source or the tests or find something that has that library as a dependency and use that as an example. Sometimes I can't find these things or they aren't enough, and in the end, I can't be sure that I understand how to use the library correctly. This is not a Rust specific thing either.

I think my point here is: experienced Rust programmers need documentation too. We shouldn't enable the "this crate is only for experts any way, so I don't need to write docs" line of thinking. :slight_smile:

OK, let me end on some positive notes:

  • I think the single biggest roadblock to writing good documentation is having easy to use tools for doing so. I think that rustdoc has solved this problem to a large extent right from the get-go. This is so not true in other ecosystems (to the extent that I've written my own documentation generators). There is tons of room for improvement of course, but I just want to echo that one of the major problems other ecosystems face is not present here. That's a really wonderful achievement.
  • crates.io and Cargo.toml puts documentation at the front of the picture by showing links to docs. This helps let everyone know that we care about docs and you should too.
  • Providing a way to run tests inside documentation is wonderful because it gives us a way to show examples that we can be confident are correct.

I wonder, do we have a guide anywhere on "how to write docs for Rust crates"? I can think of a few things to say on that subject... (Document your type variables gosh darn it!)

6 Likes

Well spoken as always.


Something else that could be done for crates is having topic landing pages of some form. Take algebra for example (I've never used any of them). There are 34 crates over 4 pages. It's pretty unlikely that I'll even look past page 2 for something. I'll just click the few most popular maybe, go on IRC to ask about what I'm looking for, then give up if I don't find what I'm looking for. Those could actually be a more general documentation resource which overviews the topic and available options. This would save individual crates from the hassle of having to document everything in the general sense even if they were willing.

Note: I don't know how well this would scale though. Consider this npm search for a sanity check. Maybe it would have one regular result with everything else specialized to some particular application I guess? I'm not really sure. It would at least guarantee the main result is the general one though.

Regarding documenting types and giving examples and such, there could be a doc lint for that. If you do generate docs and a function is missing an example or a type is exposed without being documented, there should/could be a warning.