Textwrap 0.14.0 released with support for wrapping text without word separators

Hi all,

It's my pleasure to announce a new major release of textwrap: version 0.14.0. I would like to point out two highlights:

  • Textwrap will now use the Unicode Linebreaking algorithm by default, courtesy of the unicode-linebreak crate. Before, words were found by splitting on space, now they are found using the Unicode break properties. This allows breaking East-Asian text like "你好" into "你" and "好". It also allows breaking a string of emojis like "😂😍" into "😂" and "😍". The Unicode line breaking algorithm also prevents breaks in certain cases, such as in "Bonjour !" where French punctuation rules require a non-breaking space before "!".

    You can customize this by implementing the WordSeparator trait, and you can avoid the new dependency if separating words by ' ' is good enough. Support for wrapping using rust_icu is planned.

  • Thanks to the excellent support for WebAssembly in Rust, we now have a little demo which lets you try out Textwrap in your browser: Textwrap WebAssembly demo. It was a ton of fun to write this! :slight_smile:

    The demo shows how Textwrap can be used to wrap both proportional and fixed-width text. In the demo, the text is rendered on a HTML canvas element, but it could just as well go to a PDF file, a GUI, or similar. The ability to wrap text outside of the terminal was added in version 0.13.0.

In addition to these new features, Textwrap still supports linear-time wrapping of full paragraphs as well as machine hyphenation via the hyphenation crate. You also get indentation support.

Please give it a spin and let me know of any problems!

8 Likes

Congrats on the release.

I am considering using this library to wrap markdown text.

The catch is in markdown if a line ends with two spaces, the newline must be preserved. Is there someway to hint text wrap where a new line must be inserted?

1 Like

Hi @amitu,

For that I would first look at what Comrak does. It's a full Markdown parser which can be used to normalize Markdown. I've used it in the past to reformat my Markdown files using

comrak --width 80 --gfm --to commonmark your-file.md > reformatted.md

I think that would be the most solid solution — and Comrak could then use Textwrap internally if they want :slight_smile:

Textwrap will preserve newlines when you give it a string to wrap. So if you can turn your " \n" into "\n" and turn your other "\n" into " ", then you should be half-way there. If you know that your input is simple and well-behaved, then something hacky like this might work: Rust Playground.

I would still recommend a full Markdown parser, though.

2 Likes

If you enable the hyphenation Cargo feature, you get support for automatic hyphenation for about 70 languages via high-quality TeX hyphenation patterns.

Very nice. Word Hy-phen-a-tion by Com-put-er lives on.

1 Like

Hehe, it does indeed! Old but still very useful technology :slight_smile: Credit of course goes to tapeinosyne on GitHub for packaging out the hyphenation patterns in an easy-to-use way.

Thanks for the help back in One generic parameter vs multiple parameters when I was wrestling with the new type parameters!

I hope the new unwieldy Options type won't cause problems for people. I ended up writing out the full type in my small test programs and that seems clear enough.

Breaking news! :slight_smile: I just learned about a brand new library for wrapping Markdown text: runwrap. It uses the pulldown_cmark crate for parsing and Textwrap for, well, wrapping. It might be just what you need!

1 Like