Gimli: a blazing fast parser for DWARF debugging information


#1

Hi folks!

I’d like to announce gimli! gimli is a lazy, zero-copy parser for DWARF debugging information. We aim to be the fastest and most ergonomic library for working with DWARF!

As of now, we have implemented coverage for almost all of the standard. We have fast DIE tree traversal, line number table evaluation, a DWARF expressions evaluator, CFI unwinding evaluation (only .debug_frame at the moment, .eh_frame coming soon), and more.

Notable TODOs are support for .debug_macinfo, GNU non-standard extensions, and features from the upcoming 5th edition of the DWARF standard.

Once we have completed our low-level, raw APIs for the whole standard, we plan on building a higher level “semantic” API on top of DIEs.

There have been a few of us contributing to gimli, so please take a look at the contributors list and give everyone their dues :slight_smile:

We hope that people find this library useful! Please don’t hesitate to ask for help, suggest API improvements, complain (constructively) about documentation, file an issue, or submit a pull request!

If you’re interested in contributing to gimli, please reach out to me and I can help mentor, find some stuff that would be a good match for you, etc.

Thanks!


#2

Oh, one thing I forgot to mention: we have clones of both dwarfdump and addr2line in the examples/ directory! Take a look at them if you’d like to see a more complex example of using the library.


#3

It’s also worth bragging that ruby-stacktrace uses gimli, as featured in Julia Evans’ RustConf keynote. Apparently this saved her from the complexity of libdwarf. :slight_smile:

I intend to write bindings for elfutils (when I find some round tuits), because I think its libdw is a lot nicer than libdwarf. But I think it’s great to see a pure-Rust DWARF consumer too, and I’ll be interested to eventually compare the performance.

You should at least get DW_LANG_Rust in there ASAP! (0x1c) :wink:


#4

It’s also worth bragging that ruby-stacktrace1 uses gimli, as featured in Julia Evans’ RustConf keynote. Apparently this saved her from the complexity of libdwarf. :slight_smile:

I couldn’t stop smiling during the talk :blush:

I intend to write bindings for elfutils (when I find some round tuits), because I think its libdw is a lot nicer than libdwarf. But I think it’s great to see a pure-Rust DWARF consumer too,

cough you could always contribute to gimli cough

I’ll be interested to eventually compare the performance.

philipc did some initial comparisons that were very promising. I’d like to get some benchmarks vs libdawrf and libdw into the library behind a feature flag or something (not sure if licenses would allow us to actually do this though). Either way, I’d like to do some more benchmarking between the libraries in the future.

You should at least get DW_LANG_Rust in there ASAP! (0x1c) :wink:

https://github.com/gimli-rs/gimli/pull/124 !!


#5

Touché! But I do think there’s value in coming at this from both directions.


#6

There’s some very rough benchmarks at https://github.com/philipc/dwarf-bench/blob/master/benches/bench.rs.

Results for traversal of all the DIEs and their attributes:

test info_libdwarf   ... bench:  13,584,306 ns/iter (+/- 173,801)
test info_elfutils   ... bench:  11,835,148 ns/iter (+/- 698,127)
test info_gimli      ... bench:   2,246,020 ns/iter (+/- 327,789)

Unfortunately these benchmarks include file IO. For comparison, without IO gimli gets:

test info_gimli      ... bench:   1,511,751 ns/iter (+/- 34,696)

I haven’t done any investigation into why the difference is so large… maybe gimli isn’t doing stuff that it should be doing in order to make this a fair comparison. I’ve never used libdwarf or libdw before though, so I’d welcome any feedback or suggestions for better benchmarks. We’ll certainly investigate this more once gimli is closer to feature completion.


#7

I haven’t looked too closely at your benchmark code yet – I just wanted to see if I could replicate the huge difference. The target binary matters, of course, so I can’t reproduce your numbers exactly. I also couldn’t run both libdwarf and elfutils in a single build, as I got segfaults, probably because they collide on a few symbol names.

Running them separately on a binary with 1.4MB .debug_info, I get:

running 3 tests
test info_elfutils   ... bench:  14,244,924 ns/iter (+/- 335,220)
test info_gimli      ... bench:  13,937,991 ns/iter (+/- 915,131)
test info_rust_dwarf ... bench:  18,569,518 ns/iter (+/- 699,243)
running 3 tests
test info_gimli      ... bench:  14,444,390 ns/iter (+/- 625,092)
test info_libdwarf   ... bench:  79,633,148 ns/iter (+/- 1,235,416)
test info_rust_dwarf ... bench:  19,092,238 ns/iter (+/- 451,287)

On a larger binary with 17.5MB .debug_info, I get:

running 3 tests
test info_elfutils   ... bench: 174,539,411 ns/iter (+/- 12,811,498)
test info_gimli      ... bench: 167,672,494 ns/iter (+/- 21,524,331)
test info_rust_dwarf ... bench: 212,620,183 ns/iter (+/- 7,798,126)
running 3 tests
test info_gimli      ... bench: 168,343,463 ns/iter (+/- 17,859,240)
test info_libdwarf   ... bench: 1,005,658,211 ns/iter (+/- 73,538,307)
test info_rust_dwarf ... bench: 207,347,681 ns/iter (+/- 7,304,612)

So for me, gimli edges out elfutils, rust-dwarf is not too far behind, and libdwarf is way out there.

Well done! Hope you can hold on to that lead as you add more features… :slight_smile:


#8

Some Updates

There has been a lot of progress being made (relatively silently) on completeness, correctness, and performance. Here are two notable changes (there are many more little things as well):

  • .eh_frame parsing and evaluation is now supported (in addition to the already existing support for .debug_frame). This paves the way for a proper unwinding and stack walking library, which I am working on when I get a chance here and there. Eventually we hope to replace libunwind in std and remove some of the last C code in the distribution.

  • The addr2line example program has been split out into its own crate and is both a executable and library now. HUGE thanks to https://github.com/jonhoo for coming in and doing this and a bunch of performance improvements and caching for when we are symbolicating many addresses at once. We are now an order of magnitude faster than the canonical binutils addr2line implementation! See https://github.com/gimli-rs/addr2line/pull/14#issuecomment-266141967 for more details and graphs of running time and memory usage between different configurations of our addr2line and the binutils version.

We are currently using the “binsearch-100” configuration by default which provides a pretty good trade off between memory usage and running time.

There’s more coming down the pipe, and if you want to get involved we have a few issues marked “help wanted” and “easy” plus others that aren’t marked as such but probably should be! Reach out to me if you’d like to help out! :grin:

Cheers!


#9

It could be useful to describe what DWARF is?
Or just adding to a link to another website about it?


#10

From http://dwarfstd.org/

DWARF is a debugging file format used by many compilers and debuggers to support source level debugging. It addresses the requirements of a number of procedural languages, such as C, C++, and Fortran, and is designed to be extensible to other languages. DWARF is architecture independent and applicable to any processor or operating system. It is widely used on Unix, Linux and other operating systems, as well as in stand-alone environments.

It is also used by anything that wants to turn an address into a (function name, file, line) such as a profiler.

More good info: http://eli.thegreenplace.net/2011/02/07/how-debuggers-work-part-3-debugging-information/

I’ll make the text “DWARF debugging format” in the first sentence in the README into a link :slight_smile:

EDIT: and of course rustc itself generates DWARF :slight_smile:


#11

Can you give a pointer to where this code lives?


#12

It is still very much a WIP and I need to get more groundwork laid out before it is useful at all or even ready for other contributors, but: https://github.com/fitzgen/pancakes

Also (again, still WIP) https://github.com/fitzgen/findshlibs for finding the .eh_frame unwinding section that is already mapped into memory along with the executable bits.


#13

Announcing gimli v0.14.0

gimli is a blazing fast library for consuming the DWARF debugging format.

Upgrade to this release by updating your Cargo.toml:

gimli = "0.14.0"

Changelog

Added

  • All pub types now derive(Hash). #192

  • All the constants from DWARF 5 are now defined. #193

  • Added support for the DW_OP_GNU_parameter_ref GNU extension to parsing and
    evaluation DWARF opcodes. #208

  • Improved LEB128 parsing performance. #216

  • Improved .debug_{aranges,pubnames,pubtypes} parsing performance. #218

  • Added the ability to choose endianity dynamically at run time, rather than
    only statically at compile time. #219

Changed

  • The biggest change of this release is that gimli no longer requires the
    object file’s section be fully loaded into memory. This enables using gimli
    on 32 bit platforms where there often isn’t enough contiguous virtual memory
    address space to load debugging information into. The default behavior is
    still geared for 64 bit platforms, where address space overfloweth, and you
    can still load the whole sections of the object file (or the entire object
    file) into memory. This is abstracted over with the gimli::Reader
    trait. This manifests as small (but many) breaking changes to much of the
    public API. #182

Fixed

  • The DW_END_* constants for defining endianity of a compilation unit were
    previously incorrect. #193

  • The DW_OP_addr opcode is relative to the base address of the .text section
    of the binary, but we were incorrectly treating it as an absolute value. #210

Friends

Thanks to everyone who contributed to this release!

  • Kyle Huey
  • Nick Fitzgerald
  • Philip Craig
  • Tom Tromey

Contributing

Want to join us? Check out our CONTRIBUTING.md and take a look
at some of these issues:


#14

Announcing gimli v0.15.0

gimli is a blazing fast library for consuming the DWARF debugging format.

Upgrade to this release by updating your Cargo.toml:

gimli = "0.15.0"

Changelog

Added

  • Added the EndianBuf::to_string() method. #233

  • Added more robust error handling in our example dwarfdump clone. #234

  • Added FrameDescriptionEntry::initial_address method. #237

  • Added FrameDescriptionEntry::len method. #237

  • Added the FrameDescriptionEntry::entry_len method. #241

  • Added the CommonInformationEntry::offset method. #241

  • Added the CommonInformationEntry::entry_len method. #241

  • Added the CommonInformationEntry::version method. #241

  • Added the CommonInformationEntry::augmentation method. #241

  • Added the CommonInformationEntry::code_alignment_factor method. #241

  • Added the CommonInformationEntry::data_alignment_factor method. #241

  • Added the CommonInformationEntry::return_address_register method. #241

  • Added support for printing .eh_frame sections to our example dwarfdump
    clone. #241

  • Added support for parsing the .eh_frame_hdr section. On Linux, the
    .eh_frame_hdr section provides a pointer to the already-mapped-in-memory
    .eh_frame data, so that it doesn’t need to be duplicated, and a binary
    search table of its entries for faster unwinding information lookups. #250

  • Added support for parsing DWARF 5 compilation unit headers. #257

  • Added support for DWARF 5’s DW_FORM_implicit_const. #257

Changed

  • Unwinding methods now give ownership of the unwinding context back to the
    caller if errors are encountered, not just on the success path. This allows
    recovering from errors in signal-safe code, where constructing a new unwinding
    context is not an option because it requires allocation. This is a breaking
    change
    affecting UnwindSection::unwind_info_for_address and
    UninitializedUnwindContext::initialize. #241

  • CfaRule and RegisterRule now expose their DW_OP expressions as
    Expression. This is a minor breaking change. #241

  • The Error::UnknownVersion variant now contains the unknown version
    number. This is a minor breaking change. #245

  • EvaluationResult::RequiresEntryValue requires an Expression instead of a
    Reader now. This is a minor breaking change. #256

Friends

Thanks to everyone who contributed to this release!

  • bluss
  • Brook Heisler
  • Dan Burkert
  • Kyle Huey
  • main()
  • Nick Fitzgerald
  • Nikolai Vazquez
  • Philip Craig
  • Steven Fackler

Contributing

Want to join us? Check out our CONTRIBUTING.md and take a look
at some of these issues:


#15

Announcing gimli 0.16.0

gimli is a blazing fast library for consuming the DWARF debugging format.

Upgrade to this release by updating your Cargo.toml:

gimli = "0.16.0"

Changelog

Added

  • Added support for building in #![no_std] environments, when the alloc crate is available. Disable the “std” feature and enable the “alloc” feature. #138 #271

  • Added support for DWARF 5 .debug_rnglists and .debug_loclists sections. #272

  • A dwarf-validate example program that checks the integrity of the given DWARF and its references between sections. #290

  • Added the EndianReader<T> type, an easy way to define a custom Reader implementation with a reference to a generic buffer of bytes and an associated endianity. #298 #302

Changed

Deprecated

  • EndianBuf has been renamed to EndianSlice, use that name instead. #295

Fixed

  • Evaluating the DW_CFA_restore_state opcode properly maintains the current location. Previously it would incorrectly restore the old location when popping from evaluation stack. #274

Friends

Thanks to everyone who contributed to this release!

  • Igor Gnatenko
  • Nick Fitzgerald
  • Philip Craig
  • Robert O’Callahan
  • Ted Mielczarek
  • Thadeus Fleming
  • whitequark
  • Xavier Martinez-Hidalgo

Contributing

Want to join us? Check out our CONTRIBUTING.md and take a look at some of these issues: