Gimli: a blazing fast parser for DWARF debugging information

fitzgen · September 20, 2016, 4:41pm

Hi folks!

I'd like to announce gimli! gimli is a lazy, zero-copy parser for DWARF debugging information. We aim to be the fastest and most ergonomic library for working with DWARF!

Repository: https://github.com/gimli-rs/gimli
On crates.io: https://crates.io/crates/gimli
Documentation: https://gimli-rs.github.io/gimli/gimli/index.html

As of now, we have implemented coverage for almost all of the standard. We have fast DIE tree traversal, line number table evaluation, a DWARF expressions evaluator, CFI unwinding evaluation (only .debug_frame at the moment, .eh_frame coming soon), and more.

Notable TODOs are support for .debug_macinfo, GNU non-standard extensions, and features from the upcoming 5th edition of the DWARF standard.

Once we have completed our low-level, raw APIs for the whole standard, we plan on building a higher level "semantic" API on top of DIEs.

There have been a few of us contributing to gimli, so please take a look at the contributors list and give everyone their dues

We hope that people find this library useful! Please don't hesitate to ask for help, suggest API improvements, complain (constructively) about documentation, file an issue, or submit a pull request!

If you're interested in contributing to gimli, please reach out to me and I can help mentor, find some stuff that would be a good match for you, etc.

Thanks!

fitzgen · September 20, 2016, 5:29pm

Oh, one thing I forgot to mention: we have clones of both dwarfdump and addr2line in the examples/ directory! Take a look at them if you'd like to see a more complex example of using the library.

cuviper · September 20, 2016, 9:39pm

It's also worth bragging that ruby-stacktrace uses gimli, as featured in Julia Evans' RustConf keynote. Apparently this saved her from the complexity of libdwarf.

I intend to write bindings for elfutils (when I find some round tuits), because I think its libdw is a lot nicer than libdwarf. But I think it's great to see a pure-Rust DWARF consumer too, and I'll be interested to eventually compare the performance.

You should at least get DW_LANG_Rust in there ASAP! (0x1c)

fitzgen · September 20, 2016, 10:49pm

It's also worth bragging that ruby-stacktrace1 uses gimli, as featured in Julia Evans' RustConf keynote. Apparently this saved her from the complexity of libdwarf.

I couldn't stop smiling during the talk

I intend to write bindings for elfutils (when I find some round tuits), because I think its libdw is a lot nicer than libdwarf. But I think it's great to see a pure-Rust DWARF consumer too,

cough you could always contribute to gimli cough

I'll be interested to eventually compare the performance.

philipc did some initial comparisons that were very promising. I'd like to get some benchmarks vs libdawrf and libdw into the library behind a feature flag or something (not sure if licenses would allow us to actually do this though). Either way, I'd like to do some more benchmarking between the libraries in the future.

You should at least get DW_LANG_Rust in there ASAP! (0x1c)

Add the `DW_LANG_Rust` constant ;) by fitzgen · Pull Request #124 · gimli-rs/gimli · GitHub !!

cuviper · September 20, 2016, 11:10pm

Touché! But I do think there's value in coming at this from both directions.

philipc · September 21, 2016, 6:49am

There's some very rough benchmarks at https://github.com/philipc/dwarf-bench/blob/master/benches/bench.rs.

Results for traversal of all the DIEs and their attributes:

test info_libdwarf   ... bench:  13,584,306 ns/iter (+/- 173,801)
test info_elfutils   ... bench:  11,835,148 ns/iter (+/- 698,127)
test info_gimli      ... bench:   2,246,020 ns/iter (+/- 327,789)

Unfortunately these benchmarks include file IO. For comparison, without IO gimli gets:

test info_gimli      ... bench:   1,511,751 ns/iter (+/- 34,696)

I haven't done any investigation into why the difference is so large... maybe gimli isn't doing stuff that it should be doing in order to make this a fair comparison. I've never used libdwarf or libdw before though, so I'd welcome any feedback or suggestions for better benchmarks. We'll certainly investigate this more once gimli is closer to feature completion.

cuviper · September 21, 2016, 8:27am

I haven't looked too closely at your benchmark code yet -- I just wanted to see if I could replicate the huge difference. The target binary matters, of course, so I can't reproduce your numbers exactly. I also couldn't run both libdwarf and elfutils in a single build, as I got segfaults, probably because they collide on a few symbol names.

Running them separately on a binary with 1.4MB .debug_info, I get:

running 3 tests
test info_elfutils   ... bench:  14,244,924 ns/iter (+/- 335,220)
test info_gimli      ... bench:  13,937,991 ns/iter (+/- 915,131)
test info_rust_dwarf ... bench:  18,569,518 ns/iter (+/- 699,243)

running 3 tests
test info_gimli      ... bench:  14,444,390 ns/iter (+/- 625,092)
test info_libdwarf   ... bench:  79,633,148 ns/iter (+/- 1,235,416)
test info_rust_dwarf ... bench:  19,092,238 ns/iter (+/- 451,287)

On a larger binary with 17.5MB .debug_info, I get:

running 3 tests
test info_elfutils   ... bench: 174,539,411 ns/iter (+/- 12,811,498)
test info_gimli      ... bench: 167,672,494 ns/iter (+/- 21,524,331)
test info_rust_dwarf ... bench: 212,620,183 ns/iter (+/- 7,798,126)

running 3 tests
test info_gimli      ... bench: 168,343,463 ns/iter (+/- 17,859,240)
test info_libdwarf   ... bench: 1,005,658,211 ns/iter (+/- 73,538,307)
test info_rust_dwarf ... bench: 207,347,681 ns/iter (+/- 7,304,612)

So for me, gimli edges out elfutils, rust-dwarf is not too far behind, and libdwarf is way out there.

Well done! Hope you can hold on to that lead as you add more features...

fitzgen · December 13, 2016, 6:24pm

Some Updates

There has been a lot of progress being made (relatively silently) on completeness, correctness, and performance. Here are two notable changes (there are many more little things as well):

.eh_frame parsing and evaluation is now supported (in addition to the already existing support for .debug_frame). This paves the way for a proper unwinding and stack walking library, which I am working on when I get a chance here and there. Eventually we hope to replace libunwind in std and remove some of the last C code in the distribution.
The addr2line example program has been split out into its own crate and is both a executable and library now. HUGE thanks to https://github.com/jonhoo for coming in and doing this and a bunch of performance improvements and caching for when we are symbolicating many addresses at once. We are now an order of magnitude faster than the canonical binutils addr2line implementation! See https://github.com/gimli-rs/addr2line/pull/14#issuecomment-266141967 for more details and graphs of running time and memory usage between different configurations of our addr2line and the binutils version.

We are currently using the "binsearch-100" configuration by default which provides a pretty good trade off between memory usage and running time.

There's more coming down the pipe, and if you want to get involved we have a few issues marked "help wanted" and "easy" plus others that aren't marked as such but probably should be! Reach out to me if you'd like to help out!

Cheers!

yanns · December 13, 2016, 7:28pm

It could be useful to describe what DWARF is?
Or just adding to a link to another website about it?

fitzgen · December 13, 2016, 7:43pm

From http://dwarfstd.org/

DWARF is a debugging file format used by many compilers and debuggers to support source level debugging. It addresses the requirements of a number of procedural languages, such as C, C++, and Fortran, and is designed to be extensible to other languages. DWARF is architecture independent and applicable to any processor or operating system. It is widely used on Unix, Linux and other operating systems, as well as in stand-alone environments.

It is also used by anything that wants to turn an address into a (function name, file, line) such as a profiler.

More good info: How debuggers work: Part 3 - Debugging information - Eli Bendersky's website

I'll make the text "DWARF debugging format" in the first sentence in the README into a link

EDIT: and of course rustc itself generates DWARF

vadimcn · December 13, 2016, 7:45pm

Can you give a pointer to where this code lives?

fitzgen · December 13, 2016, 7:53pm

It is still very much a WIP and I need to get more groundwork laid out before it is useful at all or even ready for other contributors, but: GitHub - fitzgen/pancakes: Still a WIP

Also (again, still WIP) GitHub - fitzgen/findshlibs: Find the shared libraries loaded in the current process with a cross platform API for finding the .eh_frame unwinding section that is already mapped into memory along with the executable bits.

fitzgen · August 8, 2017, 6:41pm

Announcing `gimli` v0.14.0

gimli is a blazing fast library for consuming the DWARF debugging format.

Upgrade to this release by updating your Cargo.toml:

gimli = "0.14.0"

Changelog

Added

All pub types now derive(Hash). #192
All the constants from DWARF 5 are now defined. #193
Added support for the DW_OP_GNU_parameter_ref GNU extension to parsing and
evaluation DWARF opcodes. #208
Improved LEB128 parsing performance. #216
Improved .debug_{aranges,pubnames,pubtypes} parsing performance. #218
Added the ability to choose endianity dynamically at run time, rather than
only statically at compile time. #219

Changed

The biggest change of this release is that gimli no longer requires the
object file's section be fully loaded into memory. This enables using gimli
on 32 bit platforms where there often isn't enough contiguous virtual memory
address space to load debugging information into. The default behavior is
still geared for 64 bit platforms, where address space overfloweth, and you
can still load the whole sections of the object file (or the entire object
file) into memory. This is abstracted over with the gimli::Reader
trait. This manifests as small (but many) breaking changes to much of the
public API. #182

Fixed

The DW_END_* constants for defining endianity of a compilation unit were
previously incorrect. #193
The DW_OP_addr opcode is relative to the base address of the .text section
of the binary, but we were incorrectly treating it as an absolute value. #210

Friends

Thanks to everyone who contributed to this release!

Kyle Huey
Nick Fitzgerald
Philip Craig
Tom Tromey

Contributing

Want to join us? Check out our CONTRIBUTING.md and take a look
at some of these issues:

Issues labeled "easy"

fitzgen · December 1, 2017, 6:29pm

Announcing `gimli` v0.15.0

gimli is a blazing fast library for consuming the DWARF debugging format.

Upgrade to this release by updating your Cargo.toml:

gimli = "0.15.0"

Changelog

Added

Added the EndianBuf::to_string() method. #233
Added more robust error handling in our example dwarfdump clone. #234
Added FrameDescriptionEntry::initial_address method. #237
Added FrameDescriptionEntry::len method. #237
Added the FrameDescriptionEntry::entry_len method. #241
Added the CommonInformationEntry::offset method. #241
Added the CommonInformationEntry::entry_len method. #241
Added the CommonInformationEntry::version method. #241
Added the CommonInformationEntry::augmentation method. #241
Added the CommonInformationEntry::code_alignment_factor method. #241
Added the CommonInformationEntry::data_alignment_factor method. #241
Added the CommonInformationEntry::return_address_register method. #241
Added support for printing .eh_frame sections to our example dwarfdump
clone. #241
Added support for parsing the .eh_frame_hdr section. On Linux, the
.eh_frame_hdr section provides a pointer to the already-mapped-in-memory
.eh_frame data, so that it doesn't need to be duplicated, and a binary
search table of its entries for faster unwinding information lookups. #250
Added support for parsing DWARF 5 compilation unit headers. #257
Added support for DWARF 5's DW_FORM_implicit_const. #257

Changed

Unwinding methods now give ownership of the unwinding context back to the
caller if errors are encountered, not just on the success path. This allows
recovering from errors in signal-safe code, where constructing a new unwinding
context is not an option because it requires allocation. This is a breaking
change affecting UnwindSection::unwind_info_for_address and
UninitializedUnwindContext::initialize. #241
CfaRule and RegisterRule now expose their DW_OP expressions as
Expression. This is a minor breaking change. #241
The Error::UnknownVersion variant now contains the unknown version
number. This is a minor breaking change. #245
EvaluationResult::RequiresEntryValue requires an Expression instead of a
Reader now. This is a minor breaking change. #256

Friends

Thanks to everyone who contributed to this release!

bluss
Brook Heisler
Dan Burkert
Kyle Huey
main()
Nick Fitzgerald
Nikolai Vazquez
Philip Craig
Steven Fackler

Contributing

Want to join us? Check out our CONTRIBUTING.md and take a look
at some of these issues:

Issues labeled "easy"

fitzgen · June 1, 2018, 7:49pm

Announcing `gimli` 0.16.0

gimli is a blazing fast library for consuming the DWARF debugging format.

Upgrade to this release by updating your Cargo.toml:

gimli = "0.16.0"

Changelog

Added

Added support for building in #![no_std] environments, when the alloc crate is available. Disable the "std" feature and enable the "alloc" feature. #138 #271
Added support for DWARF 5 .debug_rnglists and .debug_loclists sections. #272
A dwarf-validate example program that checks the integrity of the given DWARF and its references between sections. #290
Added the EndianReader<T> type, an easy way to define a custom Reader implementation with a reference to a generic buffer of bytes and an associated endianity. #298 #302

Changed

Various speed improvements for evaluating .debug_line line number programs. #276
The example dwarfdump clone is a whole lot faster now. #282 #284 [#285]

Deprecated

EndianBuf has been renamed to EndianSlice, use that name instead. #295

Fixed

Evaluating the DW_CFA_restore_state opcode properly maintains the current location. Previously it would incorrectly restore the old location when popping from evaluation stack. #274

Friends

Thanks to everyone who contributed to this release!

Igor Gnatenko
Nick Fitzgerald
Philip Craig
Robert O'Callahan
Ted Mielczarek
Thadeus Fleming
whitequark
Xavier Martinez-Hidalgo

Contributing

Want to join us? Check out our CONTRIBUTING.md and take a look at some of these issues:

Issues labeled "easy"

philipc · February 22, 2019, 1:43am

Announcing `gimli` 0.17.0

gimli is a blazing fast library for consuming the DWARF debugging format.

Upgrade to this release by updating your Cargo.toml:

gimli = "0.17.0"

The focus of this release has been on improving DWARF 5 support, and adding support for writing DWARF. The first user of the write support is rustc_codegen_cranelift.

Changelog

Breaking changes

Changed register values to a Register type instead of u8/u64.
#328
Replaced BaseAddresses::set_cfi with set_eh_frame_hdr and set_eh_frame.
Replaced BaseAddresses::set_data with set_got.
You should now use the same BaseAddresses value for parsing both
.eh_frame and .eh_frame_hdr.
#351
Renamed many types and functions related to .debug_line.
Renamed LineNumberProgram to LineProgram.
Renamed IncompleteLineNumberProgram to IncompleteLineProgram.
Renamed CompleteLineNumberProgram to CompleteLineProgram.
Renamed LineNumberProgramHeader to LineProgramHeader.
Renamed LineNumberRow to LineRow.
Renamed StateMachine to LineRows.
Renamed Opcode to LineInstruction.
Renamed OpcodesIter to LineInstructions.
Renamed LineNumberSequence to LineSequence.
#359
Added Offset type parameter to AttributeValue, LineProgram, IncompleteLineProgram, CompleteLineProgram, LineRows, LineInstruction, and FileEntry.
#324
Changed FileEntry::path_name, FileEntry::directory, and LineProgramHeader::directory to return an AttributeValue instead of a Reader.
#366
Renamed FileEntry::last_modification to FileEntry::timestamp and renamed FileEntry::length to FileEntry::size.
#366
Added an Encoding type. Changed many functions that previously accepted Format, version or address size parameters to accept an Encoding parameter instead. Notable changes are LocationLists::locations, RangeLists::ranges, and Expression::evaluation.
#364
Changed return type of LocationLists::new and RangeLists::new.
#370
Added parameters to LocationsLists::locations and RangeLists::ranges to support .debug_addr.
#358
Added more AttributeValue variants: DebugAddrBase, DebugAddrIndex, DebugLocListsBase, DebugLocListsIndex, DebugRngListsBase, DebugRngListsIndex, DebugStrOffsetsBase, DebugStrOffsetsIndex, DebugLineStrRef.
#358
Changed AttributeValue::Data* attributes to native endian integers instead of byte arrays.
#365
Replaced EvaluationResult::TextBase with EvaluationResult::RequiresRelocatedAddress. The handling of TextBase was incorrect.
#335
Added EvaluationResult::IndexedAddress for operations that require an address from .debug_addr.
#358
Added Reader::read_slice. Added a default implementation of Reader::read_u8_array which uses this.
#358

Added

Added initial support for writing DWARF. This is targeted at supporting line number information only.
#340 #344 #346 #361 #362 #365 #368 #382
Added read and write Cargo features. Both are enabled by default.
#343
Added support for reading DWARF 5 .debug_line and .debug_line_str sections.
#366
Added support for reading DWARF 5 .debug_str_offsets sections, including parsing DW_FORM_strx* attributes.
#358
Added support for reading DWARF 5 .debug_addr sections, including parsing DW_FORM_addrx* attributes and evaluating DW_OP_addrx and DW_OP_constx operations.
#358
Added support for reading DWARF 5 indexed addresses and offsets in .debug_loclists and .debug_rnglists, including parsing DW_FORM_rnglistx and DW_FORM_loclistx attributes.
#358
Added high level Dwarf and Unit types. Existing code does not need to switch to using these types, but doing so will make DWARF 5 support simpler.
#352 #380 #381
Added EhFrame::set_address_size and DebugFrame::set_address_size methods to allow parsing non-native CFI sections. The default address size is still the native size.
#325
Added architecture specific definitions for Register values and names. Changed dwarfdump to print them.
#328
Added support for reading relocatable DWARF sections.
#337
Added parsing of DW_FORM_data16.
#366

Fixed

Fixed parsing DWARF 5 ranges with start == end == 0.
#323
Changed LineRows to be covariant in its Reader type parameter.
#324
Fixed handling of empty units in dwarfdump.
#330
Fixed UnitHeader::length_including_self for Dwarf64.
#342
Fixed parsing of DW_CFA_set_loc.
#355
Fixed handling of multiple headers in .debug_loclists and .debug_rnglists.
#370

Friends

Thanks to everyone who contributed to this release!

Andre Bogus
Armin Ronacher
bjorn3
data-pup
est31
Gerd Zellweger
Igor Gnatenko
Nick Fitzgerald
Philip Craig
Robert O'Callahan
roblabla
Squirrel

Contributing

Want to join us? Check out our CONTRIBUTING.md and take a look at some of these issues:

Issues labeled "easy"

philipc · April 25, 2019, 1:17am

Announcing `gimli` 0.18.0

gimli is a blazing fast library for consuming the DWARF debugging format.

Upgrade to this release by updating your Cargo.toml:

gimli = "0.18.0"

The focus of this release has been on improving support for reading CFI,
and adding support for writing CFI.

Changelog

Breaking changes

For types which have an Offset type parameter, the default Offset has changed from usize to R::Offset. #392
Added an Offset type parameter to the read::Unit type to allow variance. #393
Changed the UninitializedUnwindContext::initialize method to borrow self, and return &mut UnwindContext. Deleted the InitializedUnwindContext type. #395
Deleted the UnwindSection type parameters from the CommonInformationEntry, FrameDescriptionEntry, UninitializedUnwindContext, UnwindContext, and UnwindTable types. #399
Changed the signature of the get_cie callback parameter for various functions. The signature now matches the UnwindSection::cie_from_offset method, so that method can be used as the parameter. #400
Reduced the number of lifetime parameters for the UnwindTable type. #400
Updated fallible-iterator to version 0.2.0. #407
Added a parameter to the Error::UnexpectedEof enum variant. #408

Added

Update to 2018 edition. #391
Added the FrameDescriptionEntry::unwind_info_for_address method. #396
Added the FrameDescriptionEntry::rows method. #396
Added the EhHdrTable::unwind_info_for_address method. #400
Added the EhHdrTable::fde_for_address method and deprecated the EhHdrTable::lookup_and_parse method. #400
Added the EhHdrTable::pointer_to_offset method. #400
Added the UnwindSection::fde_for_address method. #396
Added the UnwindSection::fde_from_offset method. #400
Added the UnwindSection::partial_fde_from_offset method. #400
Added the Section::id method. #406
Added the Dwarf::load method, and corresponding methods for individual sections. #406
Added the Dwarf::borrow method, and corresponding methods for individual sections. #406
Added the Dwarf::format_error method. #408
Added the Dwarf::die_ranges method. #417
Added the Dwarf::unit_ranges method. #417
Added support for writing .debug_frame and .eh_frame sections. #412 #419

Fixed

The code_alignment_factor is now used when evaluting CFI instructions that advance the location. #401
Fixed parsing of pointers encoded with DW_EH_PE_funcrel. #402
Use the FDE address encoding from the augmentation when parsing DW_CFA_set_loc. #403
Fixed setting of .eh_frame base addresses in dwarfdump. #410

Friends

Thanks to everyone who contributed to this release!

bjorn3
data-pup
Nick Fitzgerald
Philip Craig
Yury Delendik

Contributing

Want to join us? Check out our CONTRIBUTING.md and take a look
at some of these issues:

Issues labeled "easy"

alice · January 25, 2021, 11:07am

A post was split to a new topic: How do I run the simple.rs gimli example?

Topic		Replies	Views
Dwarf, debug format	6	3811	January 12, 2023
Announcing twiggy 0.3.0 announcements	1	815	January 12, 2023
Use rust-gdb and rust-lldb for improved debugging. You already have them	1	6628	January 12, 2023
Bindgen: automatically generate Rust FFI bindings to C and C++ libraries announcements	6	10154	January 12, 2023
TWiR Friends of the forest community	25	9845	January 12, 2023

Gimli: a blazing fast parser for DWARF debugging information

Some Updates

Announcing gimli v0.14.0

Changelog

Added

Changed

Fixed

Friends

Contributing

Announcing gimli v0.15.0

Changelog

Added

Changed

Friends

Contributing

Announcing gimli 0.16.0

Changelog

Added

Changed

Deprecated

Fixed

Friends

Contributing

Announcing gimli 0.17.0

Changelog

Breaking changes

Added

Fixed

Friends

Contributing

Announcing gimli 0.18.0

Changelog

Breaking changes

Added

Fixed

Friends

Contributing

Related topics

Announcing `gimli` v0.14.0

Announcing `gimli` v0.15.0

Announcing `gimli` 0.16.0

Announcing `gimli` 0.17.0

Announcing `gimli` 0.18.0