Parsing PDF Documents in Rust

Michael-F-Bryan · January 31, 2021, 1:21pm

I recently needed to hack something together that would let me extract information from a table inside a PDF document.

It turns out PDFs aren't as simple as they seem! I thought I'd write down my experience so other people can learn from it and see an example of using Rust in the real world.

quinedot · January 31, 2021, 10:03pm

PDF is more like an interpreted programming language

I believe it literally is one. PostScript (on which PDF is based) definitely is; so is TeX. (Not the point of your article, I know.)

I do enjoy your "programming Rust in the real world" articles (and agree they're a useful resource for everyone). Thank you for taking the time to write them!

blonk · February 1, 2021, 2:21am

After having recently watched computerphile's series on PostScript and PDF, this was super interesting to read.

Michael-F-Bryan · February 1, 2021, 8:46am

Yeah it wasn't until I got midway through reading the PDF spec and interpreting the operations that I remembered watching David Brailsford's many computerphile videos on PDFs and PostScript.

s3bk · February 1, 2021, 8:46am

PDF is somewhere in between, more like SVG but uglier. It does not include any branches in the core specification anyway…
There is a loophole however – fonts: Those can contain PostScript.

system · May 2, 2021, 8:46am

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.

Topic		Replies	Views
Create/modify an interactive PDF within/in Rust?	12	2545	August 4, 2021
PDV Viewer – looking for Sponsors	1	362	September 29, 2019
Little Rust program to catalogue documents on my machine code review	1	311	October 13, 2021
Lopdf - Rust library for PDF files manipulation	1	2536	February 13, 2020
Rust analog to the python compiler's docutils? Editors and IDEs	6	775	January 15, 2023

Parsing PDF Documents in Rust

Related Topics