Mutex and multicore

I'm writing an embedded program on my Pico (the original one) to turn on/off LEDs, read buttons, move an actuator to distinct positions, read fingerprints and write to the flash.

I'm using EVERY pin on the Pico (except two, which I'll get to eventually :slight_smile:), most of them as GPIO, but also UARTs (one for the debug probe and two - TX/RX - for the fingerprint scanner) and I2C.

The test program I wrote to test the control of the actuator works flawlessly. BUT, that program only uses the three ADC pins (ADC2, ADC_VREF and AGND), nothing else.

Moving over to the main program, it resets (as in, not a crash or hang which have to be dealt with by the Watchdog, and that I can see in the logs - just "boom" starts up again) as soon as I try to do anything (press a button for example). My guess is, that "something" is interfering with "something" else.

The worst thing, it's keep reseting a few times in a row, before settling down. Before it does it all again when I press (any) button.

These all run as individual tasks. One for the actuator controller, four button readers, four LED controllers, two for the hopefully-one-day CAN reader/writer - isn't actually doing anything yet other than out some info on the serial - then one for the Watchdog. All in, 12 task, that communicate through embassy_sync::channel::{Channel, Receiver}. Seems to work just fine.

However, I figured I should use the second core, it's getting a bit .. "sluggish" running all that on only one core.

That's when all hell broke loose, so to speak. And the error might indicate my theory that there's some interference somewhere:

error[E0382]: borrow of moved value: `flash`
   --> src/drive-by-wire.rs:235:9
    |
122 |     let flash = FLASH.init(Mutex::new(flash));
    |         ----- move occurs because `flash` has type `&mut Mutex<NoopRawMutex, Flash<'_, FLASH, Async, 2097152>>`, which does not implement the `Copy` trait
...
159 |         move || {
    |         ------- value moved into closure here
...
164 |                     flash,
    |                     ----- variable moved due to use in closure
...
235 |         flash,
    |         ^^^^^ value borrowed here after move
    |
note: consider changing this parameter type in function `actuator_control` to borrow instead if owning the value isn't necessary
   --> src/lib_actuator.rs:21:12
    |
 19 | pub async fn actuator_control(
    |              ---------------- in this function
 20 |     receiver: Receiver<'static, CriticalSectionRawMutex, Button, 64>,
 21 |     flash: &'static FlashMutex,
    |            ^^^^^^^^^^^^^^^^^^^ this parameter takes ownership of the value
    = note: the full name for the type has been written to '/Users/turbo/src/Mercedes SLK Drive Selector/drive-by-wire/code/target/thumbv6m-none-eabi/debug/deps/drive_by_wire-0a9890b4c527948e.long-type-11434618967207936082.txt'
    = note: consider using `--verbose` to print the full type name to the console

For more information about this error, try `rustc --explain E0382`.
error: could not compile `drive-by-wire` (bin "drive-by-wire") due to 1 previous error

I do wonder if the compiler didn't see some referencing/borrowing/owning/whatever (a concept I don't fully understand how it works in practice). I (think!) I understand it on a very theoretical and .. ideological POV, but not how it works in practice in Rust (with statics, mut's etc etc).

The main() code - src/drive-by-wire.rs:

use embassy_rp::{
    [...]
    flash::{Async as FlashAsync, Flash}
    [...]
};

#[embassy_executor::main]
async fn main(spawner: Spawner) {
    [...]
    let flash = Flash::<_, FlashAsync, FLASH_SIZE>::new(p.FLASH, p.DMA_CH3);
    static FLASH: StaticCell<FlashMutex> = StaticCell::new();
    let flash = FLASH.init(Mutex::new(flash));

Few lines later, before trying multicore:

    spawner.spawn(unwrap!(actuator_control(CHANNEL_ACTUATOR.receiver(), flash, actuator)));

Replaced with:

    spawn_core1(
        p.CORE1,
        unsafe { &mut *core::ptr::addr_of_mut!(CORE1_STACK) },
        move || {
            let executor0 = EXECUTOR0.init(Executor::new());
            executor0.run(|spawner| {
                spawner.spawn(unwrap!(actuator_control(
                    CHANNEL_ACTUATOR.receiver(),
                    flash,
                    actuator
                )))
            });
        },
    );

Later, I need to spawn the button readers:

    spawner.spawn(unwrap!(read_button(
        spawner,
        flash,
        fp_scanner,
        Button::P,
        p.PIN_2.into(),
        p.PIN_14.into(),
    ))); // button/P

Four of those, all called the same way, but with different pins etc. Same flash variable though!

The actuator_control() function - src/lib_actuator.rs:

pub async fn actuator_control(
    receiver: Receiver<'static, CriticalSectionRawMutex, Button, 64>,
    flash: &'static FlashMutex,
    mut actuator: Actuator<'static>,
) {
    [...]
}

The FlashMutex and FLASH_SIZE - src/lib_config.rs:

pub const FLASH_SIZE: usize = 2 * 1024 * 1024;
pub type FlashMutex = Mutex<NoopRawMutex, Flash<'static, FLASH, Async, FLASH_SIZE>>;

Sorry if I have included to much information (or, more likely, not enough! :slight_smile: ), or haven't explained the problem thorough, but that's probably because I have absolutely no idea where even begin to look for a solution :slight_smile:.

I have no idea what I'm doing, most of the code was gathered from bits and pieces on the 'Net, cobbled together and I somehow (very mysteriously!! :slight_smile: ) got it working.

Or rather, they all work independently, in separate programs! I have one for testing the actuator, for moving it forward or backward, writing to the flash, etc etc.

I can't post more than five links, so have a look at my code repo.

They all work just fine. So I'm not completely useless, and I do understand what I have done in those (mostly :slight_smile:). However, when I combine the code into "The Big Kahuna", that's when it all fails, and I'm not experienced enough with Rust, or embedded programming, to figure it out on my own.

Because the code works independently, I'm fairly certain that the code is (mostly) sound. But there must be an interference somewhere. Maybe I'm trying to do two (or more) things on the same "thing" (whatever that is).

It doesn't resets at the exact same place every time, but it's either when the actuator is moving, or when it's writing to the flash. It's not impossible that it's crashing when the actuator is moving and reading from the flash (it just resets before defmt have time to receive it - I do often get a (HOST) malformed frame skipped just before the "Starting" (which happens extremely early, one of the very first lines in main()). Or any combination of those.

Which is why I think it might have something to do with the flash, and how I'm using it.

I would very much appreciate some assist on this, if nothing else so recommendation and points to documentation.

Forgot to mention, when I disable all flash code, it ("The Big Kahuna") works just fine! Even in multicore.

No idea on your "resets" but the compile error looks like you just need to make it a shared reference.

let flash: &'static FlashMutex  = FLASH.init(Mutex::new(flash));

Thanx, but unfortunately that lead to something that looks even worse :slight_smile:.

error[E0277]: `*mut ()` cannot be shared between threads safely
   --> src/drive-by-wire.rs:170:9
    |
167 |       spawn_core1(
    |       ----------- required by a bound introduced by this call
...
170 | /         move || {
171 | |             let executor0 = EXECUTOR0.init(Executor::new());
172 | |             executor0.run(|spawner| {
173 | |                 spawner.spawn(unwrap!(actuator_control(
...   |
178 | |             });
179 | |         },
    | |_________^ `*mut ()` cannot be shared between threads safely
    |
    = help: within `NoopRawMutex`, the trait `Sync` is not implemented for `*mut ()`
note: required because it appears within the type `PhantomData<*mut ()>`
   --> /Users/turbo/.rustup/toolchains/stable-x86_64-apple-darwin/lib/rustlib/src/rust/library/core/src/marker.rs:822:12
    |
822 | pub struct PhantomData<T: PointeeSized>;
    |            ^^^^^^^^^^^
note: required because it appears within the type `NoopRawMutex`
   --> /Users/turbo/.cargo/git/checkouts/embassy-c08a80187403f815/ef01e2e/embassy-sync/src/blocking_mutex/raw.rs:70:12
    |
 70 | pub struct NoopRawMutex {
    |            ^^^^^^^^^^^^
    = note: required for `Mutex<NoopRawMutex, Flash<'static, FLASH, Async, 2097152>>` to implement `Sync`
    = note: required for `&Mutex<NoopRawMutex, Flash<'static, FLASH, Async, 2097152>>` to implement `Send`
note: required because it's used within this closure
   --> src/drive-by-wire.rs:170:9
    |
170 |         move || {
    |         ^^^^^^^
note: required by a bound in `spawn_core1`
   --> /Users/turbo/.cargo/git/checkouts/embassy-c08a80187403f815/ef01e2e/embassy-rp/src/multicore.rs:163:33
    |
161 | pub fn spawn_core1<F, const SIZE: usize>(_core1: Peri<'static, CORE1>, stack: &'static mut Stack<SIZE>, entry: F)
    |        ----------- required by a bound in this function
162 | where
163 |     F: FnOnce() -> bad::Never + Send + 'static,
    |                                 ^^^^ required by this bound in `spawn_core1`
    = note: the full name for the type has been written to '/Users/turbo/src/Mercedes SLK Drive Selector/drive-by-wire/code/target/thumbv6m-none-eabi/debug/deps/drive_by_wire-0a9890b4c527948e.long-type-3000125839523162046.txt'
    = note: consider using `--verbose` to print the full type name to the console

For more information about this error, try `rustc --explain E0277`.
error: could not compile `drive-by-wire` (bin "drive-by-wire") due to 1 previous error

I'm not saying this dismissively, just being honest: Its going to be very hard to help you here because you have a few different problems that just come from not really understanding the borrow checker or "sync" (which basically pertains to what data can be sent to different threads). I would just back up and read the rust book if I were you, its going to be really frustrating (and also impossible) to keep trying to move forward if you don't understand these things well in the first place. Multithreaded programming is hard even when you do understand these things!

error[E0382]: borrow of moved value: `flash`
   --> src/drive-by-wire.rs:235:9
    |
122 |     let flash = FLASH.init(Mutex::new(flash));
    |         ----- move occurs because `flash` has type `&mut Mutex<NoopRawMutex, Flash<'_, FLASH, Async, 2097152>>`, which does not implement the `Copy` trait
...
159 |         move || {
    |         ------- value moved into closure here
...
164 |                     flash,
    |                     ----- variable moved due to use in closure
...
235 |         flash,
    |         ^^^^^ value borrowed here after move

flash is a mutable reference to some piece of data, thats what &mut means. You are moving it into a closure which means that closure now owns flash instead, so you cannot safely refer to it from outside the closure - you have "given it away" (moved).

You then change some stuff and instead get the *mut () cannot be shared between threads safely error - this is because you are trying to let two different threads have a mutable reference to the same data, this is pretty much immediately a race condition if either of them ever mutate the data.

There isn't just some quick solution to this though, you just need to understand better what you're trying to do. If you don't understand why two threads getting a *mut or &mut to the same thing isn't going to work then you aren't going to get things working realistically (and again I'm not saying this dismissively).

4 Likes

Read.. The... Book... !?? READ!??

Since when have anyone actually read [the] documentation!?? Do you KNOW how many millons of pages that consists of!?? :smiley: :smiley: .

Joking aside, yeah I do get your point, but "somehow" it got fixed "by its own".

Not sure WHAT I did exactly, but the problem must have existed like this for a LOOOONG time! Because I've had issues with stability ever since I got all things merged into one.

But that multi-core error, I think that really forced the issue! The compiler had been perfectly happy until then, but then it "just gave up" :slight_smile: .

Trying many different things must have "somehow", "somewhen", "somewhere" fixed it! Because it's rock solid now!

And yes, one of several attempts WAS to get that borrowing etc working, and I've been looking over both the official Rust doc, as well as other HOWTOs etc..

I DID (!) have a .. "rough understanding" of ownership and borrowing, but because the compiler was happy, I THOUGHT I had understood it correctly.. Which turned out to be wrong with that multi-core issue..

I THINK I have an even better understanding of it now, but only trial-and-error and learn-on-the-job is going really push that understanding to the limit :slight_smile: .

Just for the record, I would say that tasks like those should easily run on a single core as they don't seem to do computations and just io. It is a really bad idea to share FLASH between two cores and it will lead you to horrible problems you don't want to tackle probably. If the "sluggishness" was due to FLASH writes taking the time from other Io tasks, you may try move the FLASH writer to another core, but that FLASH would be owned, not shared.

Those resets should be easy to catch with a debugger, then look at the registers to see what happened exactly.

1 Like

Unfortunately, once you dig REALLY (really, really!! :slight_smile: ) deep, you'll find that the flash can not be on CORE1. At all!

I never managed to dig that deep, and I was apparently only scratching the surface when I did this, but in one of my other questions around here, someone pointed that out to me, and it was undeniable :frowning:.

I didn't go multi-core because of performance, but because I have so many I/O tasks to do. They're not super-heavy I/O, and absolutely not all the time.

But it WAS getting sluggish, which I assume was because to much I/O "scheduling conflicts" (they simply wasn't given enough time to do "their thing", before another task wanted to do I/O). So use the second core, seemed the best option.

And, as it turned out, a blessing in disguise! It revealed this problem, that's been around for over a year, but was "hidden" in plain sight. As in, the compiler didn't see it (!). And if "it" didn't see it, then how would I, with very little Rust experience, see it!?

Ok, I'm making excuses, even I can hear it, but I stand by that anyway :slight_smile:.

Well, then move that pin management to the second core and use flash from the first one? Btw if abstractions of the libraries you use let some crashes happen, it is 100% their bug and you should open an issue - code should either panic / return an error / not compile instead of spurious reset.

1 Like

Yeah, that's what I ended up doing.