Stdx - The missing batteries of Rust

Woah, hey there. Here's a project that needs help. It's called stdx and it wants to collect all those libraries that everybody uses that aren't std into one place. You can think of it as both a guide to the crate ecosystem for newbies as well as an extendend, 'batteries-included' standard library.

I'm looking for feedback: I think this is a sweet idea, but do you? What belongs in it? How bad is it to swear twice in the readme? I'm looking for help: maintainers to decide what goes in, write informative documentation, and keep it up to date each release of Rust.

Finally, I have a problem. Part of this project is to create a stdx 'facade', like the standard library, and reexport all these crates. This works fine but for macros. Since #[macro_reexport] is unstable I have to find another solution. What's the best way to make stdx provide the macros everybody wants like bitflags! and info! without doing a reexport?

26 Likes

I think this is a really cool idea.

1 Like

+1. A curated collection of batteries is going to be very helpful. More things that could belong there:

  • a more feature complete concrete logging implementation (log4rs looks to be the most flexible)
  • some solution for signal handling (or at least an atexit() functionality that runs functions when the process is terminated)
1 Like

I think it should probably avoid multiple, competing crates that do the same thing. Otherwise, the question won't be "how do you create a fucking random number?", it'll be "which logging crate are you supposed to fucking use?"

Perhaps the way it should work is that we just point people to stdx and say "unless you're doing something special, if there's a thing in there that does what you want, use it." Every once in a while, (six months, year, whatever), those choices are re-evaluated and if the "most reasonable default choice" has changed, do a major release of stdx with a new set of crates.

That way, the easy answer to "which $TASK crate are you supposed to fucking use?" will more often than not just be "whatever is in the latest version of stdx."

5 Likes

Sure, I'm all in favor of that, but that means the stdx crates have to be reasonably feature complete. And while env_logger is nice and easy, it's far away from being a complete logging library. log4rs is not there yet either, but its core is powerful enough.

I don't like the fact that it includes flate2 because it's not pure Rust.

If you're a Windows user, you will have to install MinGW and the 32bits Rustc or you won't be able to compile stdx just because of flate2.

EDIT: Same for time. time used to need a C compiler and was recently changed to be pure Rust, but stdx still uses an old version. I think that there was even a problem with old versions where the C code wouldn't compile at all on Windows.

4 Likes

It's a nice idea for beginners who want to some spend learning Rust instead of looking for libraries. It may also be an intermediate step between "random crates.io crate" and "part of std".

I'm a bit surprised you didn't add more (all) crates from github.com/rust-lang like uuid and threadpool.

Also, you might want to reexport some more of the smaller crates that are already (transitive) dependencies. Most of them are pretty good building blocks for more complex stuff. E.g., Hyper already depends on url, so why not depend on and export url as well? (I'd argue URL parsing is more fundamental but at least as useful as HTTP.) You already do that, sorry. That was too early for me. The only additional thing I'd export would be strsim (under a better, longer name probably).

I think this sounds like an excellent idea. The one additional thing I would wish for is hassle-free cross platform support for everything in stdx.

I also think 1 crate per thing is the right rule, though as a learning resource the docs might sometimes indicate notable alternatives, as the docopt blurb does.

1 Like

Yeah I definitely feel the pure-Rust concern. I put flate2 in because its maintained by acrichto as part of cargo, but am open to reevaluating.

Yes! I'm very much a fan of having an officially curated set of fairly high quality, well designed, well maintained, and battle-tested libraries.

I like the fact that the standard library is staying small, so you don't get the "where libraries go to die" problems that, say, the Python standard library has, but it can also be good to have a separate but also maintained by the core team set of additional libraries that cover a lot of the common tasks that people are going to want to do. In fact, I suggested this in a thread about what should go in the standard library, and what should be available on Playpen.

This is a good question. I'll start by talking about what I think std should include, in order to discuss where I see the differences between std and stdx, and from there what stdx should focus on.

In general, std should have those things which are the fundamental types and operations that all other code needs to be able to interact with each other. These are the baseline types that make the language a single, consistent platform on which higher level libraries can easily interoperate. It should consist of those types that without which, every higher-level library may pick it's own solution, and then devolve into incompatibility that needs a ton of glue code to make work. An example is a Unicode string type; one of the major weaknesses of the C/C++ ecosystem is that it never standardized on a representation of Unicode strings, and so now you have a whole bunch of different incompatible string types that you need to deal with in any application that runs cross-platform and uses a few different libraries.

What std currently has is a good start; integers and floating point types, UTF-8 strings, smart pointers, basic containers and iterators, threads and basic concurrency primitives, error types, conversion traits, and file and network I/O; those are all pretty basic, lots of code has to deal with them and you want it to be able to do so compatibly between different libraries and frameworks.

The next few things that I think might be appropriate for std are asynchronous I/O (see, for instance, the divided ecosystems between Twisted and the new Python standard asyncio), standard multidimensional arrays (lots of people seem to wonder about how to handle multidimensional arrays, and good support is likely important as a basis for numerics and linear algebra libraries to interoperate), and internationalization and localization, as determining the current locale and extracting locale-specific resources should probably work consistently across libraries and frameworks, rather than having each invent their own.

So what, then, belongs in stdx? Well, I think that's all of the "batteries" that don't have such a strong need to be consistent across the ecosystem for interoperability. For example, libraries like flate for zlib compression; it's useful to a large number of applications and libraries, but there's not a particularly strong need for any interoperability between what one library chooses to use vs. another library, as there aren't many data structures that would need to be shared between them. XML and JSON are a little more on the edge, as sometimes you do want to have a general-purpose XML or JSON structure that can be passed around, but for the most part it's generally better to serialize and deserialize them between custom, application specific structs, and merely having de-facto standard libraries as suggested by stdx is probably sufficient to avoid too many problems with fragmentation.

The reason for wanting "batteries", in the form of a large standard library or something like stdx, is because it makes it a lot easier to evaluate dependencies you add to your project. If there's one overarching project that you can rely on being fairly well maintained and providing relatively complete support for whatever the task at hand is, then when you're going and looking for something like an HTTP library or JSON decoder, you don't have to spend a while evaluating several and wondering "is this going to be a well-maintained library that covers all of the fundamentals, or is this some flash in the pan weekend project that looks pretty cool but doesn't cover the tricky bits and won't have bug fixes after 6 months"?

Based on this, stdx should include those things that are so common that any reasonably sized project will likely need a good number of the libraries eventually. Besides the things already includes, some other things that I think might be appropriate for stdx in the future:

  • Email handling; MIME composition and parsing, sending email (receiving email may be too complicated a job to have one obvious library to choose for everyone)
  • Some standardized XML infoset representation, and XML and HTML5 parsers that can parse into it (html5ever plus something like Kuchiki, not sure if that will wind up being the one chosen, but that basic idea).
  • rust-csv, maybe? Interacting with CSV is a very common thing to do
  • Dates and times. This is very common to need. All you really need in the standard library for interoperability are the fundamental instant and duration definitions (based purely on second/millisecons/nanoseconds from some epoch according to TAI time), but in stdx there should be the common UTC/Gregorian calendar/local times and timezones support
  • Tools for manipulating common container file formats; zipfiles, tarballs, etc.
  • A common database interface for database-specific drivers to implement as a baseline (like ODBC/DB-API/DBI/etc), and database abstractions to build on top of

In addition to these types of things, which really don't belong in the standard library but are so ubiquitous that almost every application needs at least one of them and so reducing the burden on users to sort through and find the one they need is quite useful, it might also be appropriate for stdx to be the staging area for things that might eventually go into the standard library, like mio as a potential standardized async library.

One set of things that I'm not quite sure where it belongs, but it would be nice to have some kind of official blessing, is more platform-specific stuff, like WinAPI bindings, more POSIX bindings, Cocoa bindings, and the like. Those are also things I tend to look for an officially blessed and well supported package for, as I really don't want to start using some half-implemented project that looks good at first but then doesn't do everything I need. In your description, you say that stdx should consist of cross-platform functionality, and I can see the value of that, but there's also value in good, official platform bindings, so I'm not sure if they should have an exception, or maybe there should be other separate projects for those.

I don't think it's all that bad, but it makes it sound a bit unprofessional, and I didn't think the swearing was necessary. Overusing swearing also dilutes its strength; better to keep it for those situations in which it's really necessary. Not a strongly held belief, but also not what I would choose for something that I want to instill confidence that this an official, well supported, well curated set of crates.

Furthermore, the uses of swearing in this description will sound out of place if this is adopted and widely used. Right now, there's the frustration of figuring out "man, every other language has some kind of rand in its standard library, why doesn't Rust?", but if this exists and becomes widely used and documented, then having that be part of the description will feel odd and out of place. It's a fairly useful way to describe the motivation for the library now, but will quickly become obsolete if its successful.

The one big question I have is; what is the stability policy of stdx? What would be a good stability policy? Part of the benefit of not having things in std is that you can be more aggressive about iterating on APIs outside of it, but also for something to be part of a blessed collection like this, I would probably want it to have stronger stability guarantees than just picking arbitrary crates would.

If a better library is created for a particular solution, for instance, if serde replaces rustc-serialize, will rustc-serialize be removed from future versions? Stay in but be deprecated somehow? I don't have a good a good answer for this, so I'm curious what other people think.

5 Likes

I would like to see an ApproxEq trait somewhere standard instead of multiple libraries implementing their own.

1 Like

I think a good set of modules can be found at Python Module Index — Python 3.10.6 documentation :wink:

Honestly, i consider the python "batteries" the most timesaving, awesome feature about python.
My favorites would be:

  • easy macros/wrappers for error handling
  • something easy for string handling (not sure what this involves)
  • a really easy wrapper around HTTP (python requests: http://docs.python-requests.org/en/latest/)
  • commandline argument parsing: docopts or something similar and easy
  • a yaml module
  • a csv module
  • a sqlite module
  • an uuid module
  • a configparser module (.ini style format)
  • datetime module
  • an email/mime module
  • an imap module
  • a smtp module
  • a pop3 module
  • a xml / html parser
  • a json module
2 Likes

We should choose a library that is more nice to use over one that is more feature-complete, but within reason of course.

1 Like

As for time library, I'd like to propose chrono crate as a complete date/time library solution.

2 Likes

Sure, but what good is it to be nice to use if I can't log to a file, for example? Simplistic libraries are always nice to use, since they don't have a lot of complexity that needs careful API design.

The problem with log4rs in this context is that it's too complicated and hard to use. env_logger takes a single line of copy+pasted code and you're done. I spent over an hour trying to work out how to properly integrate log4rs.

It's much nicer, but given that stdx is probably going to be mostly used as a "don't care, need an $X" crutch, it should probably prefer simple and "good enough" over powerful but "where's the manual, how the hell do I even use this?" libraries.

(As an aside: I find it somewhat baffling that log4rs doesn't implement the same RUST_LOG config that env_logger does, with a similar "good enough defaults" function. I mean, then it'd be a complete no-brainer drop-in replacement. I'd hope it's just a case of the author having not gotten around to it yet.)

I agree, the usability can be improved. But I don't see how env_logger could ever support writing to a file, or a file that rolls over, or to syslog, etc.

... someone could modify it to add those things? You're speaking like env_logger's implementation is carved in stone or something. I mean, before you mentioned log4rs, I'd considered my own logger that re-used env_logger's configuration, but added @thing clauses to specify where to send the output. Having now seen log4rs... I'm still considering exactly the same thing because damnit, it's handy! :slight_smile:

Well, by definition (of its name, at least) it is supposed to be configured via environment variable. However, that is not practical for large logging configs, as they get long, and usually are better stored in their own config file.