The state of fonts parsers, glyph shaping and text layout in Rust

Current font crates:

  • rusttype: a TrueType parser and rasterizer
  • font-kit: bindings to platform dependent parsers and rasterizers (and freetype)
  • font: unmainained OpenType parser
  • font-rs: fast TrueType parser and rasterizer
  • fonterator: TrueType parser and line layout
  • victor: Vector graphics library with simple TrueType/OpenType font support
  • font: OpenType, TrueType, CFF, Type1, WOFF(2) parser (and primitive line layout)

The last one is my own take at this.

I would like to organize some kind of meeting and maybe workgroup to somewhat reduce the duplications among the font-crates and to unify the effort. If you have written a font parser (or want to write one), or are interested in glyph shaping (see harfbuzz) and text layout, please comment!

9 Likes

I'd love to see a rust replacement for harfbuzz, because I am using a computer where I am not an admin and am hitting an error building harfbuzz that requires admin privileges to fix.

This sounds fun but it's unlikely I'll have much time myself. Also to the list you can add the font code from victor. A HarfBuzz replacement is a huge effort, but the YesLogic people are at least looking at it.

Hi, maintainer / developer of Fonterator here. I'm very interested in glyph shaping in pure Rust.

Wouldn't it make more sense to create a github issue in those repositorys? It'd be more likely seen :wink:

@Luro02 contacted all authors via email, so I hope that everyone is aware of this.

1 Like

I'm also eager to see tectonic(Rust XeTeX engine/distribution) oxidized and use one of the crates mentioned above.

@SimonSapin mentioned on IRC, that splitting the API a (raw) Font part and a shaper part is probably a good idea. I agree with this and would add a text layout API as well.

That would result in:

  • a Font loader allowing to acess and raw font information.
  • a Shaper that returns either a outline or list of outlines with corresponding transformations when given some text.
  • a text layout crate implementing different layout algorithms
1 Like

@crlf0710 Actually… there is ReX. Right now it uses built in data that was extracted from the MATH table of the STIX Fonts, but that could be changed.
And I have previously build a typesetting engine implementing Knuths paragraph algorithm.

I feel that parsing (extracting the information in a given font file, nothing more) is relatively easy and therefore uninteresting. (As Raph mentioned I have one in Victor but it’s very primitive, only what I needed for that project and not intended for other uses.) But it seems that despite its title this thread is about more than that. To put words on them: some other components / steps of text rendering are:

  • Finding system fonts. font-kit is an abstraction over various platforms’ native libraries for this. You can get away with not doing this for example in a game where all strings are known in advance and you can make sure that one font (or a few) has all the characters needed. But to correctly render arbitrary text in any language, you’ll most likely need system fonts.
  • Segmenting text and picking fonts to use, if you’re using more than one. For example a single English sentence that contains emoji and someone’s name in Kanji might use three different fonts. Skribo aims to do this (and wrap Harfbuzz).
  • Shaping is glyph positioning. It’s easy to do decently for Latin text, very hard to do correctly for complex scripts. As far as I know Harfbuzz (in C++) is the only open-source implementation that aims for completeness / correctness.
  • Rasterization turns vector-based glyph outlines into pixels. font-kit exposes the platform’s native rasterizer. Pathfinder is a GPU rasterizer in Rust + GL. rusttype and font-rs have CPU rasterizers in Rust. raqote is a CPU rasterizer in Rust for general-purpose 2D vectors rather than glyphs specifically, but it might work?
  • (Edit: forgot to add) Paragraph-level layout includes line breaking, bidirectional text, etc.
4 Likes

Oh and of course there are higher-level libraries that do and abstract all of the above, but someone asking about font parsing likely wants more low-level control than they provide. DirectWrite is on Windows, CoreText on is macOS, Pango + cairo are cross-platform but not necessarily easy to ship everywhere.

There is also ttf-parser. Which is the most complete TTF/OTF parser, afaik.

I'm also working on a HarfBuzz port to Rust, which will simplify deployment.

@SimonSapin

  • System fonts: I am not too keen on using system fonts as you never know what you will find (or rather not find).
  • System loaders: Similar problem. I don't know what bugs they have and what is supported. Bringing your own fonts, parser and shaper should give a much more predicable result.
  • Rasterization: I am using raqote (for Wasm) and pathfinder (for Linux) at the moment. It works, but the performance isn't the greatest. I am considering to port Ralphs rasterizer from font-rs and extend it with cubic bezier curves.
  • Paragraph layout: I have experience with left-to-right layout. Pure right-to-left should not be too difficult to add either. However mixed modes is what gives me a headache. It totally breaks the paragraph layout algorithm.
  • Shaping: Definitely a big task, but if we manage to only write one Shaper instead of four, it will be less work for everyone.

There’s also the unicode-bidi crate that implements UAX #9: Unicode Bidirectional Algorithm.

Re system fonts, yeah it totally makes sense to ignore them in some use cases. But to support arbitrary text / characters, a set of fonts with good Unicode coverage (especially for CJK) quickly gets in the hundreds of megabytes which could be prohibitive.

2 Likes

I made some measurement with the Noto font family:

  • Regular: 12,226,560 bytes uncompressed, 4,059,866 bytes using brotli. (Noto-Regular.tar.br)
  • Regular + Italic: 13,670,400 bytes uncompressed, 4,457,933 bytes using brotli.
  • Italic: 1,454,080 bytes uncompressed, 471,264 bytes using brotli.

I’m not sure how you chose which files to include in that archive, it’s missing at least CJK and color emoji (though you do have both serif and sans-serif). Maybe that’s fine for your app, and that’s my point: different use cases have different Unicode coverage requirements and different space constraints. Sometimes it makes sense to ship your own fonts and only rely on those, sometimes not.

On my system, uncompressed:

24M	/usr/share/fonts/noto-cjk/NotoSerifCJK-Regular.ttc
20M	/usr/share/fonts/noto-cjk/NotoSansCJK-Regular.ttc
7.5M	/usr/share/fonts/noto/NotoColorEmoji.ttf

The "Download all fonts" link on Noto’s home page goes to a 1.1 GB Noto-unhinted.zip (though there’s some duplication in there).

2 Likes

I completely missed the .otf fonts. I am going to blame bash for it. The CJK Fonts are totally killing the overall size…

@RazrFalcon and @Aldarobot are you on IRC?
If so, I would suggest #rust-font on OFTC.net

If you want to statically include CJK fonts, DroidSansFallback is relatively small (3MB). That's what fonterator uses if you enable the "normal-font" feature.

I've had issues with IRC in the past, because my primary development machine is a laptop, and it disconnected when the lid is closed, so I couldn't see the conversation.

Nope. Only github/email.

Probably not helping the fragmentation situation, but I put together fontdue mainly for games. I want to gradually support more opentype tables, and eventually shaping.

1 Like