pjrc . com/teensy/ says it uses a Cortex-M4F
https://web.eecs.umich.edu/~prabal/teaching/eecs373-f10/readings/ARMv7-M_ARM.pdf is the architecture manual, the section on Memory Model Feature registers might be helpful.
Another thing you could try is to run a simple C program like this one https://godbolt.org/z/A3_GQk and see if truly the atomics implementation either in Rust or hardware is faulty or something else is happening. Tipp also disassemble your actual produced binary, and see if the instructions show up as expected.
Something else I noticed taking a quick glace at the repo, is the use of Relaxed ordering, which more or less says, yeah reads and writes are visible at some point, might be 2 seconds from now, it hardly gives you any guarantees other than all your reads and writes will be visible at some point, no ordering guarantees whatsoever. An experienced developer I know once put it that way, if you open a PR with non sequentially consistent atomics in it, he'll demand a formal proof your algorithm is correct. While that might seem a little excessive, using non default memory orders correctly is ludicrous tricky, and many have tried at gotten it wrong. So maybe the synchronization hoped for by using atomics here is voided by the Relaxed memory order. Try
SeqCst and see if that helps.