High CPU usage on Linux ARM (Debian)

#1

Hi! This is my first time on this forum and very new to Rust.

I built a TI’s AM3358 (1GHz + 512MB RAM) based ARMv7 Single Board Computer (SBC), compatible with BeagleBoard Linux images. I’ve been testing the board for some time now and it’s been working pretty well so far, so I decided to give it a shot with Rust.

I’m trying to create a simple application to play audio files (WAV, MP3, OGG, FLAC), and I’m exploring Rodio library (https://github.com/tomaka/rodio) which seems up to the task and works very good on my desktop PC.

My SBC uses the latest BBB Linux IoT image (Debian 9.5 2018-10-07 4GB SD IoT) and I installed Rust on it. The installer program downloaded a few things (rustup, cargo, etc.) and installed armv7-unknown-linux-gnueabihf toolchain. No problems with the installation of Rust and its tools.

Next step was to download and build Rodio library and it’s examples with:
cargo build --examples

The process took a very very long time (more than 45 minutes) on a process that usually takes a few seconds on my desktop PC. Compilation used a steady ~98% (!!! :open_mouth:) of CPU time and only about 25% total RAM memory usage.

After the build, all Rodio examples had their generated executables as expected, and when tried to run them, they worked veeeeeery slow and also with high (~98% again) CPU time usage. Sound output was very laggy, totally unusable.

I then decided to try with “release” versions of the same executables, so I run:
cargo build --examples --release

It also took a very long long time to build the release executables. But this time, they performed “normal” with just about 5% CPU time and the sound output normal too.

So, to summarize, almost every “debug” program that I run on my SBC works very very slow, hogging the entire CPU.

What could be causing it? Can I do something to improve that? And also, why compilation takes so much time on a seemingly performant CPU?

Thank you for any help!

Andy

EDIT: Kernel is “Linux beaglebone 4.14.71-ti-r80 #1 SMP PREEMPT Fri Oct 5 23:50:11 UTC 2018 armv7l GNU/Linux”

1 Like
#2

AFAIK, debug builds lead to poorly performing executables everywhere, because idiomatic Rust code features plenty of assertions that are either optimized out or disabled in release builds. This regularly leads people to say that lightly optimized builds with debug assertions (i.e. the equivalent of -O1 -g -DDEBUG in C land) might be a better default compromise than the current “not optimized at all” approach.

I’m not sure how to go about optimizing compile times though. Maybe someone else knows tools for doing this? Would be nice to narrow down which specific crates take a long time to build (if the slowness is localized), and then narrow down why they take so much time…

#3

Hey HadrienG, thank you for your quick reply.

As far as I can tell, all crates take a long time to compile. I’ve been unable to pinpoint any particular one that takes more time. It’s very annoying.

But with “debug” executables using almost the entire CPU its very hard to work with, and actually debug if needed.

1 Like
#4

You can tune the “debug” build configuration in the Cargo.toml configuration. I would suggest tuning the profile.dev profile (used by debug builds), for example by increasing its opt-level a bit, I think you may be able to get a better performance/debuggability compromise for your use case this way.

EDIT: In case nobody has good ideas for investigating / speeding-up the compile times, maybe another thing you can try is to cross-compile executables for your BeagleBoard on your desktop PC and then send them to the board? I’ve never done that myself, but rustup target sounds like a promising starting point and this repo’s README seems interesting too.

1 Like
#5

+1 , it is really strange to build software for IoT on IoT device by itself,
use normal PC and cross-compilation.

cargo --target=armv7-unknown-linux-gnueabihf does the trick,
and you need only specify linker in .cargo/config

2 Likes
#6

Thank you Dushistov!

Indeed, I usually cross-compile too. I was building directly on the board just for the sake of testing, but the slowness of the build and CPU utilization really surprised me.

Anyway, that doesn’t solve the other problem of “debug” executables being ULTRA SLOW and hogging my CPU time (almost unusable). Even if I cross-compile, what if a need to debug the application on the target?

#7

Set this in your Cargo.toml:

[profile.release]
debug=true

This will include symbols in your release build. It will be…somewhat confusing to debug because of rust’s aggressive optimizations. Basically, don’t read too much into the line number it gives you, and consider plopping things into statics if you need to see them in the debugger, because access to locals in a release build is flaky.

This is a trade-off in all languages; I got really used to debugging optimized C++ on ARM because our app wouldn’t fit in memory at -O0.

You might also try the suggestion above, increasing the optimization level for debug builds just a little. Release defaults to (IIRC) 3; try 1.

2 Likes
#8

Thank you cbiffle! Will try that and see how it goes :slight_smile:

#9

It seems that this chip really is much slower than even older-gen Intel hardware, maybe your builds are actually running as fast on it as they should…

#10

I assume all these ARM based SBCs are not comparable to desktop Intel architectures, but running so slow just surprised me, specially being an “embedded friendly” language.

#11

Ah, sorry, I didn’t mean to suggest that executables running slow is normal. Although, of course, they will run slower than on your desktop, release Rust executables should be comparable to optimized C executables running on the same chip.

What I meant to say is that the differential in performance between a modern Intel chip and this SoC is huge, and thus it makes more sense to me that compiling things on the device will be so much slower, as you observed. It may not be rustc running abnormally slow on ARMv7.

#12

Oh, I know what you mean now and I totally agree :+1:

release Rust executables should be comparable to optimized C executables running on the same chip

Maybe too naive, but for a moment I thought that debug executables were as fast as debug (-O0) executables in C too. Now with all assertions and stuff that you mentioned, the difference makes more sense to me.

Thank you!

#13

Trying to build on something like a BeagleBone will definitely give you headaches. You’re working with a single core RISC processor with limited memory running much slower than a PC. Your laptop can probably run 4-8x more parallel jobs with 4x more memory per job, at a 3x clock rate, capable of doing more per cycle with cached drives. So that’s easily 100x faster.

Clearly cross-compiling is the way to go here. But, sadly, you might come across some crates that don’t cross-compile too easily, if at all.

Consider getting a more capable ARMv7 board to build and then copy the executables onto your board. I have a couple of NVidia TK1 boards that I use build natively for BeagleBones. (I use Toradex boards, but the Jetson-TK1 is similar). They’re quad-core ARMv7 running at 2GHz w/ 2GB RAM. Generally they’re about 4-5x slower than a good laptop, but they build Rust apps in a more reasonable amount of time than a BBB.

2 Likes