Situation
For some time my team is developing an application consisting of multiple processes which communicate via JSON messages passed over pipes. There is one central process (core) which starts the other processes and binds their stdin and stdout to itself. The core and some of the bound processes are written in Rust.
Messages itself get delivered by address. At the start of the application the modules exchange their addresses and so the messages can be passed to the matching destinations.
Effects
Some time ago we started to notice, that the application would not quit properly sometime.The reason was, that the address, were a shutdown command should be sent to, was not passed to the core (maybe about one time in ten runs or something like that). Because the address was unknown to most of the modules, this shutdown command could not get delivered to the right one and the shutdown procedure got stuck.
Problem
Logging to files showed that messages sent from a bound process to the core can get lost. Messages sent from the core to bound processes seem to get delivered without any problem though.
Details
The already mentioned logging consists of log files which show the messages getting written to the pipe succesfully and which show the lines getting read on the other end of the pipe successfully.
The problem happens both on Windows and Linux (WSL). On Linux I had the impression of the problem being more present.
The following picture shows two of those log files (Linux run): On the left side are (a part of) the messages, which got sent from a bound process (147), and on the right side the lines, which got read on the end of the pipe in the core (6!). To make matters worse, some of those lines are even incomplete, missing some front part of the message.
When trying to match the lines in the incomming log file with the lines in the outgoing log file, then something like gaps in the communication get visible:
Unsuccessful fix
Because of the suspicion of the pipe not being able to hold everything of the messages, I replaced on the outer process side the simple println!()
with manually using stdout::write_all()
. But this does not solve the problem - I do not know though, if it got better a bit.
Implementation Detail
The successful communication direction works by using stdin.write_all(msg.as_bytes())
(where stdin
is &mut ChildStdin
) and io::stdin().read_line(&mut line)
.
The error prone opposite direction works by using io::stdout().write_all(msg.as_bytes())
and stdout.read_line(&mut line)
. The last stdout
is a BufReader
which takes an instance of the following struct to wrap the &mut ChildStdout
which I access via Arc<Mutex<T>>
(reading is done on a different thread than writing):
struct ChildStdoutProxy<'a> {
stdout: &'a mut ChildStdout,
}
impl<'a> ChildStdoutProxy<'a> {
fn new(stdout: &'a mut ChildStdout) -> ChildStdoutProxy<'a> {
ChildStdoutProxy { stdout }
}
}
impl<'a> Read for ChildStdoutProxy<'a> {
fn read(&mut self, buffer: &mut [u8]) -> io::Result<usize> {
self.stdout.read(buffer)
}
}
I do this to be able to make use of the read_line()
method BufReader
provides and since it does not take a reference for construction, I tricked a little bit.
Conclusion
I hope some of you are able to help us out of this situation. Maybe it is down to a problem in my specific implementation, but maybe it is a bug in Rust.
I hope I can share some more implementation code, if you need more insight.
Edit
Windows Version: Windows 10 Pro 1803
WSL System: Ubuntu 18.04.1 LTS
Rust Version: last tests on my surface with 1.30.0 stable
Hardware: Surface Pro 4 (Problem also present on Surface Pro and my Desktop PC with FX8320)