Messages getting lost in process pipe

spaeger · October 27, 2018, 8:18pm

Situation

For some time my team is developing an application consisting of multiple processes which communicate via JSON messages passed over pipes. There is one central process (core) which starts the other processes and binds their stdin and stdout to itself. The core and some of the bound processes are written in Rust.
Messages itself get delivered by address. At the start of the application the modules exchange their addresses and so the messages can be passed to the matching destinations.

Effects

Some time ago we started to notice, that the application would not quit properly sometime.The reason was, that the address, were a shutdown command should be sent to, was not passed to the core (maybe about one time in ten runs or something like that). Because the address was unknown to most of the modules, this shutdown command could not get delivered to the right one and the shutdown procedure got stuck.

Problem

Logging to files showed that messages sent from a bound process to the core can get lost. Messages sent from the core to bound processes seem to get delivered without any problem though.

Details

The already mentioned logging consists of log files which show the messages getting written to the pipe succesfully and which show the lines getting read on the other end of the pipe successfully.
The problem happens both on Windows and Linux (WSL). On Linux I had the impression of the problem being more present.

The following picture shows two of those log files (Linux run): On the left side are (a part of) the messages, which got sent from a bound process (147), and on the right side the lines, which got read on the end of the pipe in the core (6!). To make matters worse, some of those lines are even incomplete, missing some front part of the message.

When trying to match the lines in the incomming log file with the lines in the outgoing log file, then something like gaps in the communication get visible:

Unsuccessful fix

Because of the suspicion of the pipe not being able to hold everything of the messages, I replaced on the outer process side the simple println!() with manually using stdout::write_all(). But this does not solve the problem - I do not know though, if it got better a bit.

Implementation Detail

The successful communication direction works by using stdin.write_all(msg.as_bytes()) (where stdin is &mut ChildStdin) and io::stdin().read_line(&mut line).

The error prone opposite direction works by using io::stdout().write_all(msg.as_bytes()) and stdout.read_line(&mut line). The last stdout is a BufReader which takes an instance of the following struct to wrap the &mut ChildStdout which I access via Arc<Mutex<T>> (reading is done on a different thread than writing):

struct ChildStdoutProxy<'a> {
    stdout: &'a mut ChildStdout,
}

impl<'a> ChildStdoutProxy<'a> {
    fn new(stdout: &'a mut ChildStdout) -> ChildStdoutProxy<'a> {
        ChildStdoutProxy { stdout }
    }
}

impl<'a> Read for ChildStdoutProxy<'a> {
    fn read(&mut self, buffer: &mut [u8]) -> io::Result<usize> {
        self.stdout.read(buffer)
    }
}

I do this to be able to make use of the read_line() method BufReader provides and since it does not take a reference for construction, I tricked a little bit.

Conclusion

I hope some of you are able to help us out of this situation. Maybe it is down to a problem in my specific implementation, but maybe it is a bug in Rust.
I hope I can share some more implementation code, if you need more insight.

Edit

Windows Version: Windows 10 Pro 1803
WSL System: Ubuntu 18.04.1 LTS
Rust Version: last tests on my surface with 1.30.0 stable
Hardware: Surface Pro 4 (Problem also present on Surface Pro and my Desktop PC with FX8320)

birkenfeld · October 28, 2018, 7:57am

It's probably impossible to say more about this without seeing more of the code involved. Especially if you suspect a bug in the stdlib, a complete minimal example that shows the behavior is required.

As an aside, BufReader should be able to take a &mut ChildStdout, since Read is implemented for &mut T where T: Read.

spaeger · October 28, 2018, 10:35am

Thank you for your reply and your advices.

You were right about BufReader taking a &mut ChildStdout, so I could at least remove this trick code.
Unfortunately this does not solve my problem.

I will try to follow your advice to find a minimal example that shows the unexpected behaviour.

NobbZ · October 28, 2018, 10:52am

Might be trivial, but have you made sure, that your sender application properly flushes? As far as I read documentation of Write::write_all and its implementation, there is no implicit Write::flush involved.

Also make sure the data you write really ends in a newline. As you are using write_all rather than something that writes lines explicitely.

Without knowing anything about your code besides what you said, I'm wondering, why you aren't simply using writeln!.

spaeger · October 28, 2018, 11:10am

Thank you for your suggestions.

I actually did not call flush. But I included it for testing and it does not seem to solve the problem.
The newline is added at the beginning of my method to the message string, so this should not be a problem.

I also tested using the writeln!() macro, but this does not seem to solve the problem, either.
Are there any advantages of writeln!() over println!()? Propably the Result it returns?
It is possible that I did not know writeln!().

spaeger · October 29, 2018, 8:45pm

Okay, today we finally found the reason for this behaviour.
When thinking about it, it just could be some kind of unfortunate behaviour, or maybe it is a bug. This someone with more knowledge might want to decide:

The problem actually is the BufReader.
My method to read from ChildStdout is called every time to read one line. Every call I constructed a new BufReader with the stdout and read one line from it. Then the scope ends and the BufReader gets dropped.
My thoughts now are, that the BufReader probably reads more off the stream than actually neccesary for one line. But only the first line is used and the remaining data gets dropped at the end of the scope, not being available to the future BufReader. This way so many messages could get lost.

The problem (?) can be reproduced by the following simple application and one file input.log with some lines of text in the execution directory. The read lines are written for comparison into output.log.

use std::fs::{File, OpenOptions};
use std::io::{prelude::*, BufReader};

fn main() -> std::io::Result<()> {
    let mut read = File::open("./input.log")?;
    let mut write = OpenOptions::new()
        .write(true)
        .truncate(true)
        .create(true)
        .open("./output.log")?;

    loop {
        let mut reader = BufReader::new(&mut read);
        let mut line = String::new();

        match reader.read_line(&mut line) {
            Ok(0) | Err(_) => break,
            Ok(_) => write!(write, "{}", line)?,
        }
    }

    Ok(())
}

I will probably change my code to preserve one BufReader instance instead of reconstructing it every time.

If this is totaly expected behaviour, then maybe a sentence of warning could be added to the documentation to prevent others from having the same bad idea.

troplin · October 29, 2018, 8:50pm

This is totally expected. If you use buffered input and discard the buffer, you lose data unless the buffer is empty. This is inevitable, the BufReader cannot possibly know, how much data you will read before discarding and the content of the buffer cannot be written back to the original stream.

Yes, this is the correct solution.

spaeger · October 29, 2018, 9:12pm

Okay, this behaviour is totally understandable.
And since prohibiting the use of referenced readers for constructing the BufReader would not solve the issue (I could bypass this restriction with the proxy struct explained above), this probably can not be changed.

Would a hint in the documentation be beneficial and welcommed to warn people of doing the same mistake? Or is this too much for something one could think of when taking the time?

Riateche · October 30, 2018, 2:01pm

I think it would definitely be beneficial to add a warning about this to the BufReader docs.

spaeger · October 30, 2018, 2:17pm

How does suggesting such a documentation change work?

Riateche · October 30, 2018, 2:33pm

You can make a pull request for the rust-lang repo. See the contributing guide.

Topic		Replies	Views
Need help with external program comunication! help	9	668	May 31, 2023
Communication with child process through pipe help	14	2393	November 22, 2023
[design suggestion] Websocket in Rust code review	2	114	July 8, 2025
How to get process exit/error message? help	8	553	January 1, 2024
[RustBook] 20.3 Multithreaded WebServer sometimes goes into deadlock when shutting down help	7	779	January 12, 2023