Gimli: a blazing fast parser for DWARF debugging information

Hi folks!

I'd like to announce gimli! gimli is a lazy, zero-copy parser for DWARF debugging information. We aim to be the fastest and most ergonomic library for working with DWARF!

As of now, we have implemented coverage for almost all of the standard. We have fast DIE tree traversal, line number table evaluation, a DWARF expressions evaluator, CFI unwinding evaluation (only .debug_frame at the moment, .eh_frame coming soon), and more.

Notable TODOs are support for .debug_macinfo, GNU non-standard extensions, and features from the upcoming 5th edition of the DWARF standard.

Once we have completed our low-level, raw APIs for the whole standard, we plan on building a higher level "semantic" API on top of DIEs.

There have been a few of us contributing to gimli, so please take a look at the contributors list and give everyone their dues :slight_smile:

We hope that people find this library useful! Please don't hesitate to ask for help, suggest API improvements, complain (constructively) about documentation, file an issue, or submit a pull request!

If you're interested in contributing to gimli, please reach out to me and I can help mentor, find some stuff that would be a good match for you, etc.

Thanks!

19 Likes

Oh, one thing I forgot to mention: we have clones of both dwarfdump and addr2line in the examples/ directory! Take a look at them if you'd like to see a more complex example of using the library.

It's also worth bragging that ruby-stacktrace uses gimli, as featured in Julia Evans' RustConf keynote. Apparently this saved her from the complexity of libdwarf. :slight_smile:

I intend to write bindings for elfutils (when I find some round tuits), because I think its libdw is a lot nicer than libdwarf. But I think it's great to see a pure-Rust DWARF consumer too, and I'll be interested to eventually compare the performance.

You should at least get DW_LANG_Rust in there ASAP! (0x1c) :wink:

1 Like

It's also worth bragging that ruby-stacktrace1 uses gimli, as featured in Julia Evans' RustConf keynote. Apparently this saved her from the complexity of libdwarf. :slight_smile:

I couldn't stop smiling during the talk :blush:

I intend to write bindings for elfutils (when I find some round tuits), because I think its libdw is a lot nicer than libdwarf. But I think it's great to see a pure-Rust DWARF consumer too,

cough you could always contribute to gimli cough

I'll be interested to eventually compare the performance.

philipc did some initial comparisons that were very promising. I'd like to get some benchmarks vs libdawrf and libdw into the library behind a feature flag or something (not sure if licenses would allow us to actually do this though). Either way, I'd like to do some more benchmarking between the libraries in the future.

You should at least get DW_LANG_Rust in there ASAP! (0x1c) :wink:

Add the `DW_LANG_Rust` constant ;) by fitzgen · Pull Request #124 · gimli-rs/gimli · GitHub !!

Touché! But I do think there's value in coming at this from both directions.

1 Like

There's some very rough benchmarks at https://github.com/philipc/dwarf-bench/blob/master/benches/bench.rs.

Results for traversal of all the DIEs and their attributes:

test info_libdwarf   ... bench:  13,584,306 ns/iter (+/- 173,801)
test info_elfutils   ... bench:  11,835,148 ns/iter (+/- 698,127)
test info_gimli      ... bench:   2,246,020 ns/iter (+/- 327,789)

Unfortunately these benchmarks include file IO. For comparison, without IO gimli gets:

test info_gimli      ... bench:   1,511,751 ns/iter (+/- 34,696)

I haven't done any investigation into why the difference is so large... maybe gimli isn't doing stuff that it should be doing in order to make this a fair comparison. I've never used libdwarf or libdw before though, so I'd welcome any feedback or suggestions for better benchmarks. We'll certainly investigate this more once gimli is closer to feature completion.

1 Like

I haven't looked too closely at your benchmark code yet -- I just wanted to see if I could replicate the huge difference. The target binary matters, of course, so I can't reproduce your numbers exactly. I also couldn't run both libdwarf and elfutils in a single build, as I got segfaults, probably because they collide on a few symbol names.

Running them separately on a binary with 1.4MB .debug_info, I get:

running 3 tests
test info_elfutils   ... bench:  14,244,924 ns/iter (+/- 335,220)
test info_gimli      ... bench:  13,937,991 ns/iter (+/- 915,131)
test info_rust_dwarf ... bench:  18,569,518 ns/iter (+/- 699,243)
running 3 tests
test info_gimli      ... bench:  14,444,390 ns/iter (+/- 625,092)
test info_libdwarf   ... bench:  79,633,148 ns/iter (+/- 1,235,416)
test info_rust_dwarf ... bench:  19,092,238 ns/iter (+/- 451,287)

On a larger binary with 17.5MB .debug_info, I get:

running 3 tests
test info_elfutils   ... bench: 174,539,411 ns/iter (+/- 12,811,498)
test info_gimli      ... bench: 167,672,494 ns/iter (+/- 21,524,331)
test info_rust_dwarf ... bench: 212,620,183 ns/iter (+/- 7,798,126)
running 3 tests
test info_gimli      ... bench: 168,343,463 ns/iter (+/- 17,859,240)
test info_libdwarf   ... bench: 1,005,658,211 ns/iter (+/- 73,538,307)
test info_rust_dwarf ... bench: 207,347,681 ns/iter (+/- 7,304,612)

So for me, gimli edges out elfutils, rust-dwarf is not too far behind, and libdwarf is way out there.

Well done! Hope you can hold on to that lead as you add more features... :slight_smile:

2 Likes

Some Updates

There has been a lot of progress being made (relatively silently) on completeness, correctness, and performance. Here are two notable changes (there are many more little things as well):

  • .eh_frame parsing and evaluation is now supported (in addition to the already existing support for .debug_frame). This paves the way for a proper unwinding and stack walking library, which I am working on when I get a chance here and there. Eventually we hope to replace libunwind in std and remove some of the last C code in the distribution.

  • The addr2line example program has been split out into its own crate and is both a executable and library now. HUGE thanks to https://github.com/jonhoo for coming in and doing this and a bunch of performance improvements and caching for when we are symbolicating many addresses at once. We are now an order of magnitude faster than the canonical binutils addr2line implementation! See https://github.com/gimli-rs/addr2line/pull/14#issuecomment-266141967 for more details and graphs of running time and memory usage between different configurations of our addr2line and the binutils version.

We are currently using the "binsearch-100" configuration by default which provides a pretty good trade off between memory usage and running time.

There's more coming down the pipe, and if you want to get involved we have a few issues marked "help wanted" and "easy" plus others that aren't marked as such but probably should be! Reach out to me if you'd like to help out! :grin:

Cheers!

2 Likes

It could be useful to describe what DWARF is?
Or just adding to a link to another website about it?

From http://dwarfstd.org/

DWARF is a debugging file format used by many compilers and debuggers to support source level debugging. It addresses the requirements of a number of procedural languages, such as C, C++, and Fortran, and is designed to be extensible to other languages. DWARF is architecture independent and applicable to any processor or operating system. It is widely used on Unix, Linux and other operating systems, as well as in stand-alone environments.

It is also used by anything that wants to turn an address into a (function name, file, line) such as a profiler.

More good info: How debuggers work: Part 3 - Debugging information - Eli Bendersky's website

I'll make the text "DWARF debugging format" in the first sentence in the README into a link :slight_smile:

EDIT: and of course rustc itself generates DWARF :slight_smile:

1 Like

Can you give a pointer to where this code lives?

It is still very much a WIP and I need to get more groundwork laid out before it is useful at all or even ready for other contributors, but: GitHub - fitzgen/pancakes: Still a WIP

Also (again, still WIP) GitHub - fitzgen/findshlibs: Find the shared libraries loaded in the current process with a cross platform API for finding the .eh_frame unwinding section that is already mapped into memory along with the executable bits.

Announcing gimli v0.14.0

gimli is a blazing fast library for consuming the DWARF debugging format.

Upgrade to this release by updating your Cargo.toml:

gimli = "0.14.0"

Changelog

Added

  • All pub types now derive(Hash). #192

  • All the constants from DWARF 5 are now defined. #193

  • Added support for the DW_OP_GNU_parameter_ref GNU extension to parsing and
    evaluation DWARF opcodes. #208

  • Improved LEB128 parsing performance. #216

  • Improved .debug_{aranges,pubnames,pubtypes} parsing performance. #218

  • Added the ability to choose endianity dynamically at run time, rather than
    only statically at compile time. #219

Changed

  • The biggest change of this release is that gimli no longer requires the
    object file's section be fully loaded into memory. This enables using gimli
    on 32 bit platforms where there often isn't enough contiguous virtual memory
    address space to load debugging information into. The default behavior is
    still geared for 64 bit platforms, where address space overfloweth, and you
    can still load the whole sections of the object file (or the entire object
    file) into memory. This is abstracted over with the gimli::Reader
    trait. This manifests as small (but many) breaking changes to much of the
    public API. #182

Fixed

  • The DW_END_* constants for defining endianity of a compilation unit were
    previously incorrect. #193

  • The DW_OP_addr opcode is relative to the base address of the .text section
    of the binary, but we were incorrectly treating it as an absolute value. #210

Friends

Thanks to everyone who contributed to this release!

  • Kyle Huey
  • Nick Fitzgerald
  • Philip Craig
  • Tom Tromey

Contributing

Want to join us? Check out our CONTRIBUTING.md and take a look
at some of these issues:

8 Likes

Announcing gimli v0.15.0

gimli is a blazing fast library for consuming the DWARF debugging format.

Upgrade to this release by updating your Cargo.toml:

gimli = "0.15.0"

Changelog

Added

  • Added the EndianBuf::to_string() method. #233

  • Added more robust error handling in our example dwarfdump clone. #234

  • Added FrameDescriptionEntry::initial_address method. #237

  • Added FrameDescriptionEntry::len method. #237

  • Added the FrameDescriptionEntry::entry_len method. #241

  • Added the CommonInformationEntry::offset method. #241

  • Added the CommonInformationEntry::entry_len method. #241

  • Added the CommonInformationEntry::version method. #241

  • Added the CommonInformationEntry::augmentation method. #241

  • Added the CommonInformationEntry::code_alignment_factor method. #241

  • Added the CommonInformationEntry::data_alignment_factor method. #241

  • Added the CommonInformationEntry::return_address_register method. #241

  • Added support for printing .eh_frame sections to our example dwarfdump
    clone. #241

  • Added support for parsing the .eh_frame_hdr section. On Linux, the
    .eh_frame_hdr section provides a pointer to the already-mapped-in-memory
    .eh_frame data, so that it doesn't need to be duplicated, and a binary
    search table of its entries for faster unwinding information lookups. #250

  • Added support for parsing DWARF 5 compilation unit headers. #257

  • Added support for DWARF 5's DW_FORM_implicit_const. #257

Changed

  • Unwinding methods now give ownership of the unwinding context back to the
    caller if errors are encountered, not just on the success path. This allows
    recovering from errors in signal-safe code, where constructing a new unwinding
    context is not an option because it requires allocation. This is a breaking
    change
    affecting UnwindSection::unwind_info_for_address and
    UninitializedUnwindContext::initialize. #241

  • CfaRule and RegisterRule now expose their DW_OP expressions as
    Expression. This is a minor breaking change. #241

  • The Error::UnknownVersion variant now contains the unknown version
    number. This is a minor breaking change. #245

  • EvaluationResult::RequiresEntryValue requires an Expression instead of a
    Reader now. This is a minor breaking change. #256

Friends

Thanks to everyone who contributed to this release!

  • bluss
  • Brook Heisler
  • Dan Burkert
  • Kyle Huey
  • main()
  • Nick Fitzgerald
  • Nikolai Vazquez
  • Philip Craig
  • Steven Fackler

Contributing

Want to join us? Check out our CONTRIBUTING.md and take a look
at some of these issues:

2 Likes

Announcing gimli 0.16.0

gimli is a blazing fast library for consuming the DWARF debugging format.

Upgrade to this release by updating your Cargo.toml:

gimli = "0.16.0"

Changelog

Added

  • Added support for building in #![no_std] environments, when the alloc crate is available. Disable the "std" feature and enable the "alloc" feature. #138 #271

  • Added support for DWARF 5 .debug_rnglists and .debug_loclists sections. #272

  • A dwarf-validate example program that checks the integrity of the given DWARF and its references between sections. #290

  • Added the EndianReader<T> type, an easy way to define a custom Reader implementation with a reference to a generic buffer of bytes and an associated endianity. #298 #302

Changed

Deprecated

  • EndianBuf has been renamed to EndianSlice, use that name instead. #295

Fixed

  • Evaluating the DW_CFA_restore_state opcode properly maintains the current location. Previously it would incorrectly restore the old location when popping from evaluation stack. #274

Friends

Thanks to everyone who contributed to this release!

  • Igor Gnatenko
  • Nick Fitzgerald
  • Philip Craig
  • Robert O'Callahan
  • Ted Mielczarek
  • Thadeus Fleming
  • whitequark
  • Xavier Martinez-Hidalgo

Contributing

Want to join us? Check out our CONTRIBUTING.md and take a look at some of these issues:

6 Likes

Announcing gimli 0.17.0

gimli is a blazing fast library for consuming the DWARF debugging format.

Upgrade to this release by updating your Cargo.toml:

gimli = "0.17.0"

The focus of this release has been on improving DWARF 5 support, and adding support for writing DWARF. The first user of the write support is rustc_codegen_cranelift.

Changelog

Breaking changes

  • Changed register values to a Register type instead of u8/u64.
    #328

  • Replaced BaseAddresses::set_cfi with set_eh_frame_hdr and set_eh_frame.
    Replaced BaseAddresses::set_data with set_got.
    You should now use the same BaseAddresses value for parsing both
    .eh_frame and .eh_frame_hdr.
    #351

  • Renamed many types and functions related to .debug_line.
    Renamed LineNumberProgram to LineProgram.
    Renamed IncompleteLineNumberProgram to IncompleteLineProgram.
    Renamed CompleteLineNumberProgram to CompleteLineProgram.
    Renamed LineNumberProgramHeader to LineProgramHeader.
    Renamed LineNumberRow to LineRow.
    Renamed StateMachine to LineRows.
    Renamed Opcode to LineInstruction.
    Renamed OpcodesIter to LineInstructions.
    Renamed LineNumberSequence to LineSequence.
    #359

  • Added Offset type parameter to AttributeValue, LineProgram, IncompleteLineProgram, CompleteLineProgram, LineRows, LineInstruction, and FileEntry.
    #324

  • Changed FileEntry::path_name, FileEntry::directory, and LineProgramHeader::directory to return an AttributeValue instead of a Reader.
    #366

  • Renamed FileEntry::last_modification to FileEntry::timestamp and renamed FileEntry::length to FileEntry::size.
    #366

  • Added an Encoding type. Changed many functions that previously accepted Format, version or address size parameters to accept an Encoding parameter instead. Notable changes are LocationLists::locations, RangeLists::ranges, and Expression::evaluation.
    #364

  • Changed return type of LocationLists::new and RangeLists::new.
    #370

  • Added parameters to LocationsLists::locations and RangeLists::ranges to support .debug_addr.
    #358

  • Added more AttributeValue variants: DebugAddrBase, DebugAddrIndex, DebugLocListsBase, DebugLocListsIndex, DebugRngListsBase, DebugRngListsIndex, DebugStrOffsetsBase, DebugStrOffsetsIndex, DebugLineStrRef.
    #358

  • Changed AttributeValue::Data* attributes to native endian integers instead of byte arrays.
    #365

  • Replaced EvaluationResult::TextBase with EvaluationResult::RequiresRelocatedAddress. The handling of TextBase was incorrect.
    #335

  • Added EvaluationResult::IndexedAddress for operations that require an address from .debug_addr.
    #358

  • Added Reader::read_slice. Added a default implementation of Reader::read_u8_array which uses this.
    #358

Added

  • Added initial support for writing DWARF. This is targeted at supporting line number information only.
    #340 #344 #346 #361 #362 #365 #368 #382

  • Added read and write Cargo features. Both are enabled by default.
    #343

  • Added support for reading DWARF 5 .debug_line and .debug_line_str sections.
    #366

  • Added support for reading DWARF 5 .debug_str_offsets sections, including parsing DW_FORM_strx* attributes.
    #358

  • Added support for reading DWARF 5 .debug_addr sections, including parsing DW_FORM_addrx* attributes and evaluating DW_OP_addrx and DW_OP_constx operations.
    #358

  • Added support for reading DWARF 5 indexed addresses and offsets in .debug_loclists and .debug_rnglists, including parsing DW_FORM_rnglistx and DW_FORM_loclistx attributes.
    #358

  • Added high level Dwarf and Unit types. Existing code does not need to switch to using these types, but doing so will make DWARF 5 support simpler.
    #352 #380 #381

  • Added EhFrame::set_address_size and DebugFrame::set_address_size methods to allow parsing non-native CFI sections. The default address size is still the native size.
    #325

  • Added architecture specific definitions for Register values and names. Changed dwarfdump to print them.
    #328

  • Added support for reading relocatable DWARF sections.
    #337

  • Added parsing of DW_FORM_data16.
    #366

Fixed

  • Fixed parsing DWARF 5 ranges with start == end == 0.
    #323

  • Changed LineRows to be covariant in its Reader type parameter.
    #324

  • Fixed handling of empty units in dwarfdump.
    #330

  • Fixed UnitHeader::length_including_self for Dwarf64.
    #342

  • Fixed parsing of DW_CFA_set_loc.
    #355

  • Fixed handling of multiple headers in .debug_loclists and .debug_rnglists.
    #370

Friends

Thanks to everyone who contributed to this release!

  • Andre Bogus
  • Armin Ronacher
  • bjorn3
  • data-pup
  • est31
  • Gerd Zellweger
  • Igor Gnatenko
  • Nick Fitzgerald
  • Philip Craig
  • Robert O'Callahan
  • roblabla
  • Squirrel

Contributing

Want to join us? Check out our CONTRIBUTING.md and take a look at some of these issues:

3 Likes

Announcing gimli 0.18.0

gimli is a blazing fast library for consuming the DWARF debugging format.

Upgrade to this release by updating your Cargo.toml:

gimli = "0.18.0"

The focus of this release has been on improving support for reading CFI,
and adding support for writing CFI.

Changelog

Breaking changes

  • For types which have an Offset type parameter, the default Offset has changed from usize to R::Offset. #392

  • Added an Offset type parameter to the read::Unit type to allow variance. #393

  • Changed the UninitializedUnwindContext::initialize method to borrow self, and return &mut UnwindContext. Deleted the InitializedUnwindContext type. #395

  • Deleted the UnwindSection type parameters from the CommonInformationEntry, FrameDescriptionEntry, UninitializedUnwindContext, UnwindContext, and UnwindTable types. #399

  • Changed the signature of the get_cie callback parameter for various functions. The signature now matches the UnwindSection::cie_from_offset method, so that method can be used as the parameter. #400

  • Reduced the number of lifetime parameters for the UnwindTable type. #400

  • Updated fallible-iterator to version 0.2.0. #407

  • Added a parameter to the Error::UnexpectedEof enum variant. #408

Added

  • Update to 2018 edition. #391

  • Added the FrameDescriptionEntry::unwind_info_for_address method. #396

  • Added the FrameDescriptionEntry::rows method. #396

  • Added the EhHdrTable::unwind_info_for_address method. #400

  • Added the EhHdrTable::fde_for_address method and deprecated the EhHdrTable::lookup_and_parse method. #400

  • Added the EhHdrTable::pointer_to_offset method. #400

  • Added the UnwindSection::fde_for_address method. #396

  • Added the UnwindSection::fde_from_offset method. #400

  • Added the UnwindSection::partial_fde_from_offset method. #400

  • Added the Section::id method. #406

  • Added the Dwarf::load method, and corresponding methods for individual sections. #406

  • Added the Dwarf::borrow method, and corresponding methods for individual sections. #406

  • Added the Dwarf::format_error method. #408

  • Added the Dwarf::die_ranges method. #417

  • Added the Dwarf::unit_ranges method. #417

  • Added support for writing .debug_frame and .eh_frame sections. #412 #419

Fixed

  • The code_alignment_factor is now used when evaluting CFI instructions that advance the location. #401

  • Fixed parsing of pointers encoded with DW_EH_PE_funcrel. #402

  • Use the FDE address encoding from the augmentation when parsing DW_CFA_set_loc. #403

  • Fixed setting of .eh_frame base addresses in dwarfdump. #410

Friends

Thanks to everyone who contributed to this release!

  • bjorn3
  • data-pup
  • Nick Fitzgerald
  • Philip Craig
  • Yury Delendik

Contributing

Want to join us? Check out our CONTRIBUTING.md and take a look
at some of these issues:

4 Likes

A post was split to a new topic: How do I run the simple.rs gimli example?