Unsafe Tokio and Rusoto

lbolla · July 12, 2020, 5:45am

Hi all, recently I stumbled upon this bug in Rusoto. The problem seems to be that the Tokio runtime is shut before the full object/resource/file is downloaded from AWS S3 (i.e. some future is resolved) .There are workarounds to avoid the issue, so my question is broader.

Code that compiles successfully causes a bug about some file being truncated upon write. It looks like some resource (e.g. Tokio runtime) is unsafely closed too early. I looked at rusoto and I couldn't find unsafe code, so it must be Tokio.

This leaves me a bit uneasy about Tokio in general (and any other async library built on top of it, like rusoto): how can such a bug arise at all? How can one be sure that async libraries built on top of Tokio are implemented correctly if the compiler is not catching bugs like this?

alice · July 12, 2020, 6:09am

You talk about unsafe code, but I doubt that unsoundness in unsafe code is at fault here. To be clear, it is possible to write incorrect code that compiles without the reason being unsafe code.

I don't know anything about how rusoto and into_blocking_read is implemented, but afaik Tokio's TcpStream is tied to the runtime it is created on, so it makes sense that it would cease to work if the runtime is dropped.

lbolla · July 12, 2020, 6:28am

@alice Thanks for your answer. You can peek at how rusoto implements into_blocking_read here: rusoto/stream.rs at 36445473e046f16bc48509fdda7165abd68876dd · rusoto/rusoto · GitHub

It seems to me that rusoto may be at fault, but I am not expert in Tokio, so I can't be sure: maybe rusoto is wrongly implementing that trait.

My point remains though: how can one rely on third party libraries to correctly implement required interfaces, if the interfaces (traits) themselves are not checked by the compiler? In other words, shouldn't the required traits (in Tokio) be defined in a way so that third party libs cannot fail to implement them (i.e. must typecheck)?

alice · July 12, 2020, 6:33am

It's always going to be possible to write incorrect code. Can you be a bit more specific about which thing you think could be better staticly verified?

2e71828 · July 12, 2020, 8:46am

This has been a problem since the invention of modular programming, and is the primary reason some experienced programmers are wary of external dependencies. In roughly increasing order of rigor, you can:

Hope it all works out
Use popularity as a proxy for quality
Spot-check the library code to assess overall quality
Read the developer’s public writing about the project
Ask the developer questions about potential issues
Sign a support contract with the project developer
Perform a 3rd-party audit of the library code

2e71828 · July 12, 2020, 9:10am

After reading the linked bug, I suspect that @lbolla wants some kind of interlock that ensures the stream source outlives the consumer, at least until it produces a positive end state like an error or end-of-file. By extension, that would also put a requirement on the life of the runtime.

ZiCog · July 12, 2020, 9:55am

One of the first things I learned about Rust is that "unsafe code" has a very specific meaning in the context of a discussion about Rust. Namely code that uses the "unsafe" keyword to enable things to be done that cannot be checked for memory safety by the compiler.

As such it's probably better not say "unsafe" when talking about a bug in an application, library or other dependency. Unless it is known to be a problem with "unsafe" code of course.

I don't understand the details of this problem but following the links around it looks like "expected behavior". Some async thread is fire up and then the run time is dropped before it has finished it's job.

Is that not the same as starting real threads and then dropping off the end of main() without joining the thread some how and waiting for it to complete? Things just stop when you do that.

kornel · July 12, 2020, 2:58pm

https://github.com/rusoto/rusoto/blob/36445473e046f16bc48509fdda7165abd68876dd/rusoto/signature/src/stream.rs#L138

Creating and destroying new runtime for every read() call is a poor use of tokio. The reader should keep an instance of a runtime, or a handle, to ensure the runtime lives for as long as the reader lives.

lbolla · July 12, 2020, 5:58pm

Yes, some kind of typing/lifetime indication to ensure that the source outlives the consumer is what I had in mind.

lbolla · July 12, 2020, 6:03pm

@ZiCog You are right, I realize that I misused the term "unsafe". Better would have been "unsound" I guess, given that I am not sure where the bug actually is.
My hunch would be that lower level libraries like Tokio should have APIs that make it really hard (if not impossible) for higher level libraries to misuse them. In this specific case, it looks like rusoto is able to open and close a runtime without having actually received any data. I guess because tokio::io::AsyncRead::poll_read can return some form of "pending" future that does not mean that data has been fully consumed.

alice · July 12, 2020, 6:15pm

From what I understand, you did in fact use the word unsafe in the sense typically used in the context of Rust, although perhaps unsound would be more accurate. In any case, I still do not believe unsoundness is at fault here.

I do not think Tokio is a particularly low-level library. As for poll_read, it does not return a future at all, rather the poll_read method is of the same nature as the Future::poll method. The IO resource itself is more closely comparable to a future here.

Tokio has made the design choice that it should be possible to start multiple runtimes, and that it should be possible to shut down a runtime. Given these design choices, using lifetimes to ensure that the IO resource does not outlast the runtime is nearly impossible, and the few ways in which it is possible would involve introducing an extra parameter to every function that wants to, somewhere in its call-stack, create a new IO resource of any kind. It would also litter the code using Tokio with lifetime annotations. I do not believe that this is worth it, given that Tokio is often used by end users directly.

The other runtimes available have made the design choice that you can only ever have one runtime, and that it cannot ever be shut down. This sidesteps the issues, as IO resources can then never outlive the runtime.

lbolla · July 12, 2020, 6:33pm

@alice That's for your insight. So, in your opinion, where in this case, if any, is the fault? Is this simply a case of rusoto misusing Tokio?
If so, given the popularity of Tokio, which is used by more and more other libraries, is it unreasonable to worry that many other will run in similar issues?

alice · July 12, 2020, 7:22pm

To be honest, I need to understand what exactly caused the bug in question in more detail to really answer that. If TcpStream really does silently truncate the connection when a runtime is dropped, Tokio is at fault. Rusoto may also be at fault, depending on the exact details.

The snippet that @kornel linked is definitely a case of rusoto misusing Tokio, even if it is not necessarily the source of the bug. Creating a new runtime on every call to read is pretty damn expensive, considering that creating a runtime includes spawning a full thread pool, usually with 8 threads (depending on the number of cpu cores) just to perform a single read syscall on a TcpStream.

Generally, I think it is a bad idea to provide the async and blocking apis mixed together in the way rusoto does. The reqwest crate provides a very good example of what I would recommend regarding design of blocking apis to codebases using async/await. If you need to call block_on directly to use the blocking api of the library, then that library is doing something incorrectly.

It can also be something else, for example I notice that the linked bug has

reader.read_to_string(&mut string);

without any unwrap or question mark on that fallible operation, silently ignoring any errors. The snippet in question does not compile as is, but if that is also the case in the real code, then the author of that code is at fault for ignoring the compiler's #[must_use] warnings on fallible operations, assuming the read was not silently truncated, but instead returned an actual error.

As for worrying, Tokio and its ecosystem is actually very robust in most cases, and the vast majority of rough edges lie at the boundary between async and sync code. That said, if you avoid the following two things, you should be ok.

Using Tokio's tools outside of a Tokio runtime.
Moving IO resources between Tokio runtimes.

Prefer to finish your full IO operation in async code, instead of that half-block_on, half-blocking IO stuff that rusoto seems to want you to do.

alice · July 12, 2020, 7:57pm

Having looked at the relevant piece of source code, Tokio's TcpStream should indeed report an error if the runtime it is tied to is dropped.

system · October 10, 2020, 7:57pm

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.

Topic		Replies	Views
Request for review of a single line of unsafe code in async_executors help	3	414	May 22, 2020
Does Tokio::net::TcpStream::connect really block? help	10	283	February 2, 2024
Async newbie: help me improve my code! help	3	586	January 12, 2023
Cannot `shutdown` TcpStream after upgrading Tokio from 0.x to 1.x help	6	635	August 2, 2021
Understanding graceful shutdown and Drop	4	964	December 1, 2022

Unsafe Tokio and Rusoto

Related Topics