Rust on ARMv7l with no NEON support

I'm trying to run Rust on an ARMv7l server, which is provided by Scaleway (their C1 baremetal ARM server with 4 cores.)

However, when I try to run any of the Rust executables, the process gets killed with SIGILL (illegal instruction.)

I suspect that this is because all the binaries available for Rust on armv7 (rustup-init, rust nightly, stable, etc.) are compiled with NEON support, but the CPU on the server does not feature NEON support.

root@cloudsearcher:~# file rustup-init 
rustup-init: ELF 32-bit LSB shared object, ARM, EABI5 version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux-armhf.so.3, for GNU/Linux 3.2.72, not stripped
Program received signal SIGILL, Illegal instruction.
0x7f887360 in std::rt::lang_start::h73c48c6a9af036ed ()
(gdb) x/8i 0x7f887360
=> 0x7f887360 <_ZN3std2rt10lang_start17h73c48c6a9af036edE+68>:	vmov.i32	q8, #0	; 0x00000000
   0x7f887364 <_ZN3std2rt10lang_start17h73c48c6a9af036edE+72>:	movw	r11, #7453	; 0x1d1d
   0x7f887368 <_ZN3std2rt10lang_start17h73c48c6a9af036edE+76>:	mov	r1, r0
   0x7f88736c <_ZN3std2rt10lang_start17h73c48c6a9af036edE+80>:	movt	r11, #7453	; 0x1d1d
   0x7f887370 <_ZN3std2rt10lang_start17h73c48c6a9af036edE+84>:	mov	r6, #0
   0x7f887374 <_ZN3std2rt10lang_start17h73c48c6a9af036edE+88>:	
    ldr	r4, [pc, #2864]	; 0x7f887eac <_ZN3std2rt10lang_start17h73c48c6a9af036edE+2960>
   0x7f887378 <_ZN3std2rt10lang_start17h73c48c6a9af036edE+92>:	vst1.64	{d16-d17}, [r1]!
   0x7f88737c <_ZN3std2rt10lang_start17h73c48c6a9af036edE+96>:	vst1.64	{d16-d17}, [r1]
root@cloudsearcher:~# cat /proc/cpuinfo | grep Features
Features	: half thumb fastmult vfp edsp thumbee vfpv3 tls idiva idivt vfpd32 lpae 
Features	: half thumb fastmult vfp edsp thumbee vfpv3 tls idiva idivt vfpd32 lpae 
Features	: half thumb fastmult vfp edsp thumbee vfpv3 tls idiva idivt vfpd32 lpae 
Features	: half thumb fastmult vfp edsp thumbee vfpv3 tls idiva idivt vfpd32 lpae 

so I'd like to ask - what's the easiest way of cross-compiling Rust without NEON support for the armv7l arch? And how would I go about compiling my projects without NEON?

Thanks in advance!

he he as I explained on the Rust Web development hosting topic, I tried to do exactly the same :smiley:

The easiest way is to use the arm-unknown-linux-gnueabihf instead. I guess the best way would be to have the std library for armv7 compiled without NEON, and compile your code without NEON. In theory you can disable NEON extension by setting RUSTFLAGS="-C target-feature=-neon" to cargo, but I don't know if this works.

I wonder what is the advantage of NEON compared to previous floating-point implementations (such as VFPv3 that the ARM in Scaleway servers have) especially on Rust standard library? Would it make sense to provide an armv7 build without NEON support? (please :innocent:)

That won't quite work because the std crate has been compiled with +neon so it will contain NEON instructions that may end up in the final binary.

IMO, the armv7-unknown-linux-gnueabihf shouldn't have NEON hard enabled because, in my mind, targets should only provide lowest common denominator features to maximize the supported devices. Also, one can always enable device specific optimizations via RUSTFLAGS.

cc @alexcrichton @PeteVine Thoughts on removing the NEON optimization from the armv7-unknown-linux-gnueabihf target? TL;DR Context: ARMv7 processors without NEON extensions exist (e.g. Scaleway servers) and they SIGILL when running binaries compiled for armv7-unknown-linux-gnueabihf.

I believe the armv7 target was primarily added for Servo which wants NEON support, although if it's not a widely available extension then it makes sense to me to disable by default.

I think NEON is fairly common on boards these days - off the top of my head, I know Raspberry Pi 2, Odroid C2 and OrangePi all support it - same goes for (at least some?) Exynos SoCs

It may not be the best idea to strip NEON support, as I imagine it will impact codegen speed by quite a lot, but I don't really know.

Maybe add support for both NEON and non-NEON armv7?

From Neon

NEON will give 60-150% performance boost on complex video codecs
Individual simple DSP algorithms can show larger performance boost (4x-8x)
Processor can sleep sooner, resulting in overall dynamic power saving

There's no performance penalty for using binaries linked against ARMv6 rustlibs (arm-uknown-linux-gnueabihf) compared to pure neonless armv7. The compiled code should be built with something like:
-C target-cpu=cortex-aX -C target-feature=+v7,+vfp3, but I was never able to find any difference outside of the CPU flag.

The armv7- target distinguishes itself from arm- by its inclusion of NEON which can make a big difference. If you disable it, there's no reason for this target's existence as the arm- target is fully compatible with ANY ARMv7 machine.

Secondly, if you look at the bootstrap process, it defaults to arm- on currently available distros so it's more of a question of misdetection.
$ cat /proc/cpuinfo should be consulted to decide which target to install, e.g:

Features : swp half thumb fastmult vfp edsp neon vfpv3 tls vfpv4

You meant the whole line of Odroids before C2 which is already aarch64 (it sports an improved NEON which is part of the ISA).

If you need a nightly like this, though, (basic armv7) here's a nicely patched build.

So yeah, the answer, coupled with the above tweaks, is a simple --target=arm-unknown-linux-gnueabihf

Yes I realized that in the end :smiley:

Sounds good to me, I'll try that!

That said the standard library contains nothing of the sort :slight_smile: Probably the current situation is good, as long as we document that arm7 target means ARMv7 with NEON support. If you don't have NEON, just use the regular arm target and you'll be fine. The alternative of stripping neon from arm7 would mean that everybody with neon-enabled arm7 processors (and as pointed out, this includes many popular platforms!) would need a target-feature, I don't think it's worth it.

2 Likes

Well, some Samsung Galaxy S3 phones are famously based on a ARMv7 + NEON SoC, but where the NEON unit is buggy so that one must avoid NEON instructions when it is in use.

Ultimately, I think it's best to have both libstd available for ARMv6 (without NEON), ARMv7+neon, and for ARMv7 w/o NEON. And also AAarch32 (in addition to AArch64), with the ARMv8 instructions. In both -eabi and -androideabi configurations. And ARMv7 without float support at all. And ARMv7 optimized for Cortex-M.

In other words, it's worth thinking about a way to support LOTs of different ARM configurations of libstd. IMO, it would be best if we could simply compile libstd ourselves, similar to how we compile other libraries packaged as crates. Especially when you get into real embedded development, you really want the toolchain to target the exact capabilities and performance profile of the exact hardware you chose.

5 Likes

FYI, armv7-unknown-linux-gnueabihf is now built with -neon and armv7-unknown-linux-gnueabihf Rust toolchain runs great on Scaleway C1 instances.