Let's build a standard library (std) free of C code / dependencies for Linux!

TL; DR steed

What the title says. Let's build a new (*) std. Same functionality and same API as the current one but that doesn't use libc. Instead this new std will directly interface with the Linux kernel via system calls.

(*) Not really "new" because most of the implementation will be the same as the current one.

"Why?" For me, because of:

  • Dead simple cross compilation. Cross compiling Rust code is very simple. It's the C code what complicates cross compiling Rust programs today: You need a cross C toolchain and cross compiled C libraries, and have to care about glibc versions on both the host and on the targets.

  • Better optimization (inlining) due to everything being written in Rust.

  • It's a chance to audit, in the form of a rewrite, the C code that every Rust program depends on.

  • The usual Rust over C advantages: Panics instead of buffer overflows, less UB (which has bitten us before), etc.

I have laid out the foundation in this repository. It already contains:

  • IMO, the most important thing: Testing. The repo is being tested on these architectures: AArch64, ARM, PowerPC, PowerPC64, x86 and x86_64 on each pull request. The test infra also supports MIPS, MIPS64 and SPARC64 but the tests are not passing so those targets are currently disabled.
  • Standard I/O and File I/O support. Not on parity with std but enough to build simple programs.

Completing this is going to take a lot of work. It's going to be equivalent to re-implementing chunks of libc, libpthread, libm, etc. in Rust. LoC-wise, however, it's going to be mostly copy pasting code from std.

And that's where you come in! There's plenty to do. The issue tracker contains a list of things to do that have been classified according to (my estimate of) their difficulty. There's easy stuff that's only about copy pasting code from std, and there's less easy stuff that's mostly about copy pasting code from std plus adding one or two syscalls here and there.

In the short term, I'm going to add some magic to Xargo so we'll be able to build std programs against this new std to compare / benchmark them. Eventually, when PAL becomes a reality, I'd like to refactor the new std into a pal crate that can be plugged into std.

cc @briansmith

64 Likes

This project sounds awesome!

IMO, there is some functionality in libstd that shouldn't be in libstd, which should become clear when libstd is actually written in Rust. I hope this project results in some proposals for moving such functionality from libstd (all implementations) to some C library helper crate.

I also expect that this project will help identify any libstd APIs that are unnecessarily worse because they've been optimized for implementations based on libc. If any are found, I hope we can find a constructive way to improve such APIs.

Also, I think it would be useful to specify a current and target minimum Linux kernel version to support. I also recommend documenting how people should go about implementing workarounds for older kernels that don't support useful functionality of newer kernels.

Again, awesome project idea!

3 Likes

Does it need to rival libc in performance? Getting a fast memchr for example, is going to be hard without having stabilised simd. (BufRead::read_until uses memchr.)

Here's another example of a glibc ball of hair (racy getenv, has never been reproduced) https://github.com/rust-lang/rust/issues/27970

As general rule in engineering: first you benchmark, and later if the numbers are better, you say it's faster.

12 Likes

@briansmith

IMO, there is some functionality in libstd that shouldn't be in libstd

Right, but it's going to hard or perhaps impossible to remove stuff from std because of backward compatibility. We could add Cargo features to std that would let you remove some functionality but that customized std will not work with every crate on crates.io due to API / functionality differences and that's a big downside.

I also expect that this project will help identify any libstd APIs that are unnecessarily worse because they've been optimized for implementations based on libc.

Like io::Error::last_os_error? Given Rust's Result based error handling, I don't see much point in implementing errno in steed, except, well, to implement Error::last_os_error.

@ruuda

Does it need to rival libc in performance?

Not at the beginning. Let's focus on achieving feature parity / API compatibilty with std first and then focus on optimization.

Getting a fast memchr for example, is going to be hard without having stabilised simd.

We can use unstable stuff in steed though. Or rather, we have to use unstable features anyway. For example, we need asm! to do the syscalls.

@leonardo

I didn't say it's or would be faster though. Just that inlining would be better. More optimization doesn't always result in faster code, e.g. -O2 being faster than -O3 in some cases.

I see potential for smaller / faster binaries though, like the very small statically linked hello world (see examples in the repo):

$ size hello
   text    data     bss     dec     hex filename
    131       0       0     131      83 hello
6 Likes

I'm very excited to see where this goes! I'm not terribly skilled with low-level OS programming, but I'm game to get more testing infrastructure for you to throw builds against.

Want to make some issues for targets to test and I can see where I can fill them in?

Sorry if I am misinterpreting something, but I'm wondering about your emphasis on Linux as the target OS. Is that just your personal motivation, or something inherent in how the implementation would work out? Wouldn't a pure Rust std make cross-compilation between all targets easier?

1 Like

Linux is noteworthy in having a stable kernel ABI. Windows is noteworthy in not having a stable kernel ABI. I don't know about other platforms, but that might be why.

3 Likes

Wow, that's encouraging ! Compact Rust-only statically binaries, who wouldn't like that ?

1 Like

There's already rusl. Please avoid work duplication by sharing code around.

3 Likes

@RandomInsano

By testing infra, do you mean real non-x86 testing machines? Because, right now, steed is being continuously tested on QEMUl ARM, AArch64, PowePC, PowerPC64, x86 and x86_64 (we could be testing on MIPS, MIPS64 and SPARC64 as well but the test suite is not passing so those are disabled) using QEMU for the non-x86 architectures. But it would be great to test on real hardware, even if just sporadically.

Also, we don't have support for propers tests (i.e. the #[test] attribute), but that's first is my TODO list, so right now we are just testing by running some executables.

@timClicks

Stable ABI like @ssokolow said. Also because supporting other platforms multiplies the amount of work and "only" supporting Linux is already a lot of work. Another problem is testing, there's no easy way to test e.g. *BSD programs within Travis CI and not having tests in a no go.

@ConnyOnny

I don't think steed can depend on rusl because rusl only supports x86_64 and steed attempts to support all the architectures that Rust supports. And I would prefer not to constraint steed's implementation to a libc API (e.g. errno based error handling). I'm game for refactoring common code into crates that both rusl and steed can use. Stuff like constants (e.g. EBADF) and typed syscalls (the syscall interface steed is currently used is untyped). Specially if they can be generated once (e.g. from the Linux kernel source code) and don't need further maintenance.

5 Likes

Does this / could this project interact with brson's proposal for a more portable libstd?
The possibility of a libc-free Linux std was explicitly mentioned somewhere in there, IIRC. If this project wanted to try doing some of the refactoring described there internally, that'd be pretty cool :slight_smile:

1 Like

@japaric Is it a goal (if it's even possible) to have this compatible with having libc in the same process? (So you could use *-sys crates?)

@kazimuth

Does this / could this project interact with brson's proposal for a more portable libstd?

Yes, our lives would be much easier if PAL already existed :smile:

If this project wanted to try doing some of the refactoring described there internally, that'd be pretty cool

If it's not too much work, sure. We are basing our std code on the latest stable release, so any such refactoring it's going to take 12 weeks in reaching us; we have to prioritize what we can use right now.

@parched

I see that as a stretch goal. Pure Rust is the main goal, then the possibility of statically linking to "simple" C code (just functions), then statically linking to more complex C code, e.g. with global constructors or .init / .fini stuff. And retaining the ability of calling steed code from C programs would be somewhere in between those three.

Well, it's a question what one considers "C-free". I understood it in the sense that the given program doesn't involve using a C library other then the core system.

Interesting project!
If we add infrastructure to support FreeBSD from start I can pitch in with FreeBSD stuff in parallel with Linux.

I have limited knowledge on the subject, but https://github.com/lrs-lang/lib seems to have similar goals.

UPDATE: You can now compile your crate, or any crate, against steed using
Xargo without having to modify it (previously you had to modify the signature of
main and/or add an extern crate steed). Check these instructions.

We also now have an IRC channel: #rust-steed on Mozilla's IRC network, if you
want to talk about steed's development.

@johalun

Supporting other OSes is an explicit non-goal right now due to the sheer amount
of extra work and lack of CI/testing infrastructure. We'll revisit that once we
are done with the Linux port.

@erszcz

There are some similarities, like both directly interface the Linux kernel, but
lrs doesn't aim to maintain API compatibility with std whereas being API
compatible with std is one of steed's main goals. We consider that's very
important being able to support as much of the crates.io ecosystem as possible
and being API compatible with std is a requirement for that.

5 Likes

last_os_error() and io::Error doesn't place any requirements on the underlying OS in terms of what errors are returned, because they effectively can't. You could just return a static error each time last_os_error() is called and any user would be unable to tell the difference.

This function reads the value of errno for the target platform (e.g. GetLastError on Windows) and will return a corresponding instance of Error for the error code.

Given the guarantee in the docs, you would just have the platform have a fixed value for errno.