Please make a mdbook on Text Processing in Rust

Hello,

I have some experience in Rust, I think 2 years now, but steel I don't feel really comfortable doing heavy lifting with Text Processing in Rust when I compare it with other programming languages. Text processing is hard!

Text processing is at the core of what most work with computers is all about. We read strings in stream from files, sockets or web, we transform them in many complicated ways, then, we write them back or send them to other places.

From the most junior developer to the most advanced developer this is much of the work that we make, every day!

When the world was only ASCII it was really simple but with UTF-8 and unicode the world is a more complex beast! Much more complex beast. But the core of computer science is steel the same, it's text processing, binary data processing or numerical processing.

I see the last video of Jon Gjengset and Niko Matsakis about making things simpler for the newcomer, and I'm think with all the letters ... make a authoritative mdbook on Text Processing in Rust and everyone newcomers and more experience developers will forever thank you!

2022-04-18 Cross Team Collaboration Fun Times (CTCFT)
https://www.youtube.com/watch?v=xOkI7xZ35fE

Go full in with the topic, don't assume any prior knowledge, start from the basic, explain how all the API of strings and regex works and how it works in conjunction with related ones, fill the book with examples and give even common, useful and even advanced usage patterns for text processing in Rust. And exit with style with a reveberating long example, a simple editor in text mode with NCURSES or something like that. Show the thing working and give all readers and Rust enthusiasts the knowledge that they need to make the things that they already know how to make in other programming languages, in Rust. And some other new things :slight_smile:

See for example:

Kibi: A text editor in ≤1024 lines of code, written in Rust
https://github.com/ilai-deutel/kibi

The best free resource that I could find on the WEB about Text processing in Rust was:

Text Processing in Rust
https://www.linuxjournal.com/content/text-processing-rust

And it's well made, but it i not enough for the importance and real problem that text processing is for developers and that is not fulfilled with current resources.

Like I said this would be very important for the newcomer and for the more experience developer.

You could say, please write a normal payed book on the subject, right?, But this, Text Processing in Rust is a really core subject to a programming language that the community as a all would not collect the real knowledge benefits that such a resource cold give the community.

I have many payed books about Rust, but I think that there is currently nothing written, to my knowledge, that fulfills this real need.

Thank you very much!

The very best regards,
João

5 Likes

Text has both a long tail and broad application. Do you have a good idea of the limits of what would be covered?

For example, surely this wouldn't try to cover the details of Unicode, like dealing with user input with bidi content, or font fallback, or the shockingly complex details of word wrapping.

You also don't really mention formatting or parsing much, despite logging and serde being the most commonly encountered parts of text processing: do you feel they are already covered well enough?

I do think this is a good idea: text is far more complex than most people realize, and getting more information out there about that is great, I'm just not sure anyone's going to jump on a work-decade sized project if only a few dozen people actually need that much detail.

Hello @simonbuchan,

What I suggested was not a work-decade sized project, what I was suggesting is something that a couple of very knowledge/experience Rust developers could do in some dedicated weekends.

But that, would steel make a huge impact on the general developers usage of Rust for Text Processing, for newcomers and for intermediate Rust developers.

I not talking in any way, about Donald Knuth books kind of dedication project :slight_smile: This would be a much more down to earth and much smaller kind of project.

And I'm not talking of explaining all the details of Unicode that would be a huge work and not one that would suit the most cases of junior developers or intermediate Rust developers.

I'm talking about explaining with examples in a mdbook how to make Text Processing with Rust in a way that would cover in 10 % of text 80 % of the cases. I think the examples, explaining the patterns of usage that are specific to Rust and it's API's would be specially useful. Showing by example how to attack several problems. When I'm talking about making a small editor at the end, in the last chapter of the mdbook I'm talking about it, as a consolidation example and something that would fill you with confidence that you could handle not so simple tasks in Text processing in Rust.

To build parsers for languages even in Rust I would suggests other books on interpreters and compillers more specialized ones, other kind of book.

Thank you,

The very best regards,
João

Hello @simonbuchan,

In the current situation, there is so little information about Text processing in Rust in a single place, that even a small effort in documenting and explaining it would make a huge impact. Especially if we think how the subject is a core subject to programming in general.

Thank you,

The very best regards,
João

Sure, which is why I asked if you had a good idea if the limits. Even a "simple text editor" is potentially a huge amount of work to do right, if you're not careful about scoping, and it's also too easy to go in the other direction and make a uselessly trivial implementation that doesn't cover anything useful. As I mentioned, input and display for bidi text is quite complex, but that's needed for several major world languages. Same for word wrapping. You can to some extent delegate to Unicode libraries for the implementation for these details, but that's still a lot to explain and integrate. Certainly much more than a handful of weekends sized project.

Do you have more specific topics you think would work? As yet I'm still not sure what you're thinking of when you say "text processing". The example of a simple text editor implies that it's natural text editing, for example word segmentation, spell checking, case transforms, language detection. But that's a pretty narrow slice of programming with text in general: most people aren't building text editors!

There's also the issue that in my experience past logging and serde; most tricky text handling is in the GUI context, since fonts, formatting and layout are now major concerns. There's some great work going on to get rust native GUIs, but nothing that I would feel is ready to have a book talking about the Rust specific issues in dealing with text in GUIs.

Just to be clear, I still don't think it's a bad idea in general!

1 Like

This may sound odd, but you seem perfectly qualified to write such a book— You’re experienced with Rust and have tried to tackle the subject matter already. As you go through the process of researching and writing the book, you’ll naturally become an expert on text processing in Rust.

You also have a strong vision of what the book should be, and another author likely won’t share your enthusiasm for the subject.

7 Likes

Hello @simonbuchan,

When I say text processing, I'm thinking something on the general terms of:

  1. Explaining in general terms a little bit of what is Unicode, specially UTF-8 and how it's represented in memory.

  2. Explaining how strings, &str references and slices work in memory. To pass there models to the developers, so they can have a mental framework to think with them.

  3. Explain the String API, println!, format!, write! in conjunction to the iterators and collections.

  4. Explain how the most common API's and traits that work with text in Rust work and how they work together.

  5. Explain common patterns of text processing usage in Rust, something like making a list 30 to 50 of simple to advanced patterns and giving an example for each one of it.

  6. Final consolidating example. Implement a simple and small Text Editor in Rust, use what ever you can from the open Kibi project (link in the first post) because it's philosophy is to be small less then 1024 lines of Rust code.
    Spell check could be used with a integration library from Hunspell. Language detection would not be necessary, word segmentation yes, Kibi already does that and Search for strings and maybe regex would be good examples.

Thank you,

The very best regards,
João

Hello @2e71828,

me simple João writing a book... hehehe ... I'm not the right person to do it ... I never wrote a book, the most that wrote about Rust is a very simple thing. It's my guide How to learn modern Rust and that is more of a curated list of resources that I have found useful, with some tips and opinions. Not by no means a mdbook.

How to learn modern Rust
https://github.com/joaocarvalhoopen/How_to_learn_modern_Rust

Thank you,

The very best regards,
João

It doesn't have to be a book. You could start out with a series of blog posts. I agree that you are qualified enough to do this and would learn plenty in the process about a domain that interests you specifically.

4 Likes

Yes simple you. Your "book" does not have to be finished or even entirely accurate. Start with...

I would suggest just making the 6 chapters you laid out. Add a chapter for Introduction, (Why I wrote this book!), and add an appendix or two, ( terms, references, etc)

Make the layout of the book first.
Write the intro and the simplest form of each chapter.
Publish is as you did with your "How to Learn" book

If you have read, or at least skimmed through the material in "How to Learn" enough to know what to include in the "How to Learn", then you for sure have soaked up a bit into your brain. The parts of the "Text Processing" book that you can not quite explain well enough yourself, well, those parts of the book, point us to the information somewhere else. For example if there is some nuance of unicode that you do not understand or have time to explain, you don't need to explain, just point to someone else's explaination of that nuance.

If you make errors that matter, the internet will fix it. Someone will say, "this part about regex is wrong, you should have said ...." and we all learn from the mistake.

ps. Is it because you do not know how to write mdbook and that is holding you back? mdbook is designed to be easy to write. Jump in.

4 Likes

These are interesting suggestions. There is less text processing than I expected you to point out, much more on how to use Rust. I'll try to answer some of them over the next few weeks.

4 Likes