Parsing PDF Documents in Rust

I recently needed to hack something together that would let me extract information from a table inside a PDF document.

It turns out PDFs aren't as simple as they seem! I thought I'd write down my experience so other people can learn from it and see an example of using Rust in the real world.

16 Likes

PDF is more like an interpreted programming language

I believe it literally is one. PostScript (on which PDF is based) definitely is; so is TeX. (Not the point of your article, I know.)

I do enjoy your "programming Rust in the real world" articles (and agree they're a useful resource for everyone). Thank you for taking the time to write them!

5 Likes

After having recently watched computerphile's series on PostScript and PDF, this was super interesting to read.

1 Like

Yeah it wasn't until I got midway through reading the PDF spec and interpreting the operations that I remembered watching David Brailsford's many computerphile videos on PDFs and PostScript.

PDF is somewhere in between, more like SVG but uglier. It does not include any branches in the core specification anyway…
There is a loophole however – fonts: Those can contain PostScript.

2 Likes

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.