Svd2rust generates an enormous crate

I'm trying to bootstrap embedded Rust toolchain for the Teensy 4.0 board. It uses NXP MiMXRT1062 CPU. The crate generated by svd2rust from linked SVD is huge.

932 KLOC.
After splitting with form, it's 3948 files.
On my Ryzen box, cargo build takes 7-8 minutes (with both "dev" and "release" profiles)
With "dev" profile, rustc that builds this crate grows to more than 16GB RSS / 20GB VSIZE, and target dir afterwards takes 4GB. The target dir is even larger when the library is split with form utility (so for now I've just rustfmted the generated single lib.rs file).
(Interesting: with "release" profile, it's 160MB RSS and 228MB target dir).

sscache somehow doesn't help (when I run cargo build ; cargo clean ; cargo build on a local crate that uses peripheral crate, both cargo build commands take minutes to complete). Incremental builds without cleaning the target dir don't help much either.

Any ideas how to work with that?

I thought about splitting the thing into multiple crates or using features to avoid compiling code for unused peripherals. Both approaches require new features in svd2rust or form, but I can try that. Any other ideas/pointers?

Thanks,
-- M

Link to generated crate: https://gitlab.com/teensy-rs/teensy-4/tree/master/mimxrt1062

Update: features seem to help (but they're still a workaround: it should be possible to build code able to use all possible peripherals).

As a proof of concept, I split the library with form and added a feature to enable (i.e. disabled by default) the biggest peripherals, judging by du -sh src/* | sort -h. Build with dev profile is now down to less that 6GB RAM, and a bit over 2 minutes fresh, 37s rebuild. Still not great, but not terrible anymore.

If this approach will hold at least for the "blinky" program, I'll try to figure out how to properly hide peripherals behind crate features, and generate the features in svd2rust. This is a workaround, not a solution, but it seems to work.

Isn't it very desirable to only include the parts of the library you actually need?

that svd file (10MB) is definitely on the top end of what the svd2rust methodology can handle

looking at the file there seems to be quite a lot of repetition; maybe some things could be converted to arrays, clusters, or at the least use derivedFrom

2 Likes

You're right, it is desirable to only include the parts of the library I actually need. This would be a job for linker, though, not for the library's user. And linker does that quite well, looking at objdump of embedded code generated for simple platforms.

If I understand correctly: this would mean editing the input SVD file, right? I will look into that.

It also seems quite straightforward to modify svd2rust to generate a feature for each peripheral. This would also let board support crates select only peripherals that are actually used on a board (at least Teensy boards don't break out all the pins and leave some features unused).

Still, conditional compilation is only a workaround. It should be possible and practical to enable all peripherals, or at least generate docs for them. Maybe svd2rust could generate smarter code, or maybe splitting the generated code into multiple crates would help?

This was discussed and worked on before:

Also:

1 Like

yes, it would be more involved, though it might be worth it: in my experience a SVD with little duplication not only makes for smaller generated code, it also makes the generated code more pleasant to use, e.g. to be able to index into arrays instead of using register reg23 reg24 reg25, using the same enum for the same kind of value, small things like that

the features split sounds good too, it's quite rare for embedded software to use all peripherals—and it seems the C non-precompiled header approach has a slight advantage here to rust's module system, in that the definitions for peripherals that aren't used are automatically not compiled

Are you guys saying that zero cost abstraction can sometimes cost you more than it''s worth?

:slight_smile:

Is there a tool for "cleaning" up SVD files before passing them to svd2rust?

Haven't found one, trying to write something in Python, but it will probably be something that's specific to Teensy 4 (or at least to this specific SVD file): https://gitlab.com/teensy-rs/teensy-4/blob/feature/make-mimxrt1062-thinner/mimxrt1062/process-svd.py

So far I managed to add derivedFrom to duplicate enumeratedValues, which seemed to be a low hanging fruit, but the generated code is the same. Trying to go deeper, but if there's any SVD-editing tool or library, I'm interested:)

Another approach I'm considering is not focusing on SVD XML itself, but parse SVD with svd-parser crate, process resulting structure in Rust, and feed it to svd2rust which I can use as a library. It's easier to prototype the logic in Python first, though, so I'll start with that and maybe rewrite later.

I also went through Teensy schematics & MCU pinout and while there's not many peripherals I can just disable with a feature, many of them are only partially broken out (e.g. just some GPIO or FlexIO pins are broken out – but all of GPIO/FlexIO registers have some pins available to end user). I'm considering writing a script that will remove unavailable pins from SVD to generate a smaller peripheral access crate specifically for Teensy – this way I can trim down the PAC without writing an SVD optimizer, but the cost will be board-specific PAC.

1 Like

If you don't find an equivalent issue, have you thought about reporting an issue to rust-lang about this? Not that it is a functionality bug, but someone might find it interesting to use as a test case for slow compilation. Performance improvements often start with having something to measure.

I'd be curious just about what the -Ztime-passes output of this is and how it interacts with multiple codegen units and incremental compilation.

I haven't read deeply into the code, but I'm spotting one thing, that's the #[inline(always)] attributes. Unless those have already been shown to improve compile time, I'd like to try without them.

If I'm reading the crate correctly, the #[inline(always)] functions are not called from within the crate itself. The compiler only needs to codegen these when they are used, so as long as they are never used, shouldn't they improve compile-times?

Here it is: https://gitlab.com/snippets/1895030

What stands out:

  time: 175.061 coherence checking
  time: 50.221  wf checking
  time: 29.619  MIR borrow checking
  time: 0.777   item-types checking
  time: 161.925 item-bodies checking
[…]
        time: 19.001    encode dep-graph
      time: 30.181      persist dep-graph
    time: 32.263        serialize dep graph
  time: 34.189  codegen

As far as I understand, the inlining is needed to keep the zero cost abstraction for embedded (avoid call overhead for register access that is essentially a write to an address in memory), and the whole value of this particular approach is that the compiler checks validity of the constructs, but the generated code is similar to a straightforward assembly / C written by a wizard who knows the MCU reference manual by heart and knows that this particular sequence of writes is legal.

For the record: it looks like a lot of memory complexity comes from incremental build. Setting CARGO_INCREMENTAL=0 shortens the build by about 1 minute, which is not much, but memory usage drops from >16GB to ~3.7GB with dev profile.

Good point, that makes sense.

i don't think so ! making well-structured SVDs is manual work right now, maybe some work could be saved by sharing common peripherals between SoCs (where possible) instead of repeating work for dozens of chip variations

I believe that's what stm32-rs does (for STM32 chips), although I'm not familiar with all the details.

I think it would be a worthy project to extract stm32-rs's infrastructure into a separate project and make it usable for other chip families. I'm involved in lpc-rs and nrf-rs, and over there we're working like cavemen, with raw svd2rust and shell scripts :slight_smile:

2 Likes