I want to talk about Rust’s failure modes. This discussion isn’t technical enough for Github and not interesting enough for reddit so I hope I can raise it here.
There is drive within the Rust community to push for Rust on the Server. Which is awesome for me and my business. However I feel one of the most important things about building server systems is being ignored or at least de-prioritised - and that is the story around building reliable systems with Rust. (My belief that it is being de-prioritisation is based around the lack of a mention of the concept in the 2016 Rust Conf, github issues (current), reddis links, user groups, libraries and the setting-our-vision-for-the-2017-cycle thread etc. - I am not saying the community doesn’t think its important - I am coming from the position that we are all competent)
As an example it is not uncommon to stumble over a erlang system with a years worth of uptime. I say stumbled over because people forgot it existed even though its a really important component (and organisations suck). The error log directory of this node might be 2GB in size from all the error reports from some bugs in some component - bugs that would have brought down a rust, java, etc system. Regardless the node kept servicing requests within its designed SLA. As far as business is concerned that is the perfect server.
I want to be able to do the same thing in Rust. There are ways of course but its not simple and its not awesome. I am looking for other people who are in this space and thinking about these issues and I would like to help make it a reality. I am willing and able to devout a lot of my companies time to this - because ultimately I want to use Rust’s compiler because then I can have systems with years of uptime and a 0-byte error log directories. So if anyone has links, or wants to chat please feel free to get in touch
In the mean time I would like to give a shout out to hansihe and the Rustler crate which makes putting Rust into Erlang production systems a real joy.
Things that Rust gets right in the server space:
- Compiler that keeps bugs low in the first place.
- Types that encourages good error handling i.e.
- No runtime and the removal of green threads - this is simpler and simpler contributes to reliability because less can go wrong. Fast startup time also helps recovery.
- Saf(er) shared data
- No GC pauses
- Libraries with good design like Tokio (i.e. backpressure)
- plenty more…
Things Rust doesn’t have a good answer for:
- Fault Recovery - current accepted practice is to delegate to the OS
- Fault Detection - my options are channels or thread.join or catch_unwind - these are not great primitives to work with.
- Process/Fault Isolation - Channels create explicit links between processes. This makes isolating faults difficult and makes recovery harder (if the sender dies the system must restart, if the receiver dies it can recover).