On tokio reform and hyper 0.12

I'm trying to make body-image fully asynchronous-friendly for use with tokio. Tokio reform and hyper 0.12 (initially, the master branch) offer more of what I need to do this. Below I provide some rough notes on progress with some embedded questions. Warning: long.

http headers

In most cases the byte-oriented http crate HeaderMap is fine for body-image, but it does need to parse Content-Length, Content-Encoding and Transfer-Encoding headers, for which I was relying on the typed hyper::header module as of hyper 0.11. The entire module was removed in hyper 0.12. There was a promise made to split it out as part of the next release, and then a redaction. I asked about this again two weeks ago but didn't hear back. So, to ease my migration and testing, I forked the header module into a new hyperx crate:

https://github.com/dekellum/hyperx

For extra-snarky humor value, pronounce it /hī ˈpərks/.

Its released on crates.io and you may find it helpful if you are in a similar situation.

tokio reform

This was/is tokio-core's run method:

pub fn run<F>(&mut self, f: F) -> Result<F::Item, F::Error>
where F: Future

The signature losses a return value and the Future associated types in tokio reform, at least with the default runtime and executor:

pub fn run<F>(future: F)
where F: Future<Item = (), Error = ()> + Send + 'static

I'm sure there must be good reason for this. Anyone know what it is? The Tokio reform RFC doesn't mention it, and I can't find any other conversation about it. (See below, I later discovered the more convenient tokio::runtime::current_thread::Runtime.)

hyper 0.12 client

I'd like to point out one thing not explained in this terse example and also here. If the Client::new() line is moved out of the closure, up to the beginning of main (like anyone would try in order to reuse it), the run method never returns. I suspect I will not be the last one to burn some time on that.

Here is my first reasonable working version of the upgrade, using a one-shot channel, and also with my first public use of stable impl Future. Regarding the later, I have a private function that remains returning a Box<Future> as apparently impl Future can currently only hide one type at a time?

Later I found it was even cleaner to use the current_thread::Runtime and its block_on method, at least in my tests. However, I suspect I will need to revert or replace this for cases where BodyImage is using tokio annotated blocking I/O. (A cool new method of annotating blocking I/O or slow computations without blocking the tokio event loop, by simply moving the event handling to another thread.)

timeout support

This area seems sketchy with hyper 0.11, so I left it as a FIXME. Here I was able to add timeouts using tokio-timer, which appears to work as expected and is much cleaner then the example originally given for the hyper 0.11 guide. I have two tunable timeout intervals, one for any response and another for a complete body. Any suggestions for improving this?

HTTPS dependencies

cargo tree --all-features -d shows I now have some duplication of dependencies via openssl(-sys) and (hyper(tokio(native)))-tls.

bitflags v0.9.1
└── openssl v0.9.24
    └── native-tls v0.1.5
        ├── hyper-tls v0.2.0
        │   └── body-image v0.3.0 (file:///home/david/src/body-image)
        └── tokio-tls v0.1.4
            └── hyper-tls v0.2.0 (*)

bitflags v1.0.3
└── clap v2.31.2
    └── body-image v0.3.0 (file:///home/david/src/body-image)

lazy_static v0.2.11
└── native-tls v0.1.5
    ├── hyper-tls v0.2.0
    │   └── body-image v0.3.0 (file:///home/david/src/body-image)
    └── tokio-tls v0.1.4
        └── hyper-tls v0.2.0 (*)

lazy_static v1.0.1
├── crossbeam-epoch v0.4.1
│   └── crossbeam-deque v0.3.1
│       └── tokio-threadpool v0.1.3
│           ├── tokio v0.1.6
│           │   ├── body-image v0.3.0 (file:///home/david/src/body-image)
│           │   ├── hyper v0.12.1
│           │   │   ├── body-image v0.3.0 (file:///home/david/src/body-image) (*)
│           │   │   └── hyper-tls v0.2.0
│           │   │       └── body-image v0.3.0 (file:///home/david/src/body-image) (*)
│           │   └── tokio-core v0.1.17
│           │       └── tokio-tls v0.1.4
│           │           └── hyper-tls v0.2.0 (*)
│           └── tokio-fs v0.1.0
│               └── tokio v0.1.6 (*)
└── openssl v0.9.24
    └── native-tls v0.1.5
        ├── hyper-tls v0.2.0 (*)
        └── tokio-tls v0.1.4 (*)

Output of cargo tree --all-features -r hyper-tls is (elided) below. Am I using the wrong crate(s) for HTTPS at this point? Why do I still have a dependency on tokio-core and older versions of lazy_static and openssl? Where and what PR's should I submit to help clean this up?

hyper-tls v0.2.0
├── native-tls v0.1.5
│   ├── lazy_static v0.2.11
│   └── openssl v0.9.24
│       ├── bitflags v0.9.1
│       ├── lazy_static v1.0.1 (*)
│       └── openssl-sys v0.9.32
└── tokio-tls v0.1.4
    ├── native-tls v0.1.5 (*)
    ├── tokio-core v0.1.17
    │   ├── scoped-tls v0.1.2
    │   ├── tokio v0.1.6 (*)
    │   ├── tokio-executor v0.1.2 (*)
    │   ├── tokio-io v0.1.6 (*)
    │   ├── tokio-reactor v0.1.1 (*)
    │   └── tokio-timer v0.2.3 (*)
    └── tokio-io v0.1.6 (*)

As this post might appear a bit critical: In general, the crates involved appear to be rapidly maturing and I'm excited about tokio reform, the hyper 0.12 release, and to have gotten this far with my project. Thanks for all this work!

1 Like

I'm preparing an 0.2 release of native-tls which will upgrade to openssl 0.10, which will bump the old lazy_static and bitflags versions coming from there. tokio-tls hasn't been updated to remove the tokio-core dependency, but we can do that when pulling in the native-tls bump as well.

3 Likes

This is because the future is used only to bootstrap the (default) runtime - the future is otherwise scheduled just like any other that may get spawned later. The function doesn’t return until the entire runtime is idle/quiesced. It wouldn’t make sense to return anything.

In tokio_core, the root future actually keeps the event loop running - if it completes, the loop stops. In that case, it made sense to return the future’s result.

3 Likes

Thanks @vitalyd, upon rereading the rustdoc for (reform's) tokio::runtime::run I see it does make the same points reasonable clear. I'm still left thinking that the Hyper 0.12 client examples (previously linked) are at least deceptive if not contra-educational.

With the the hyper.rs guide of 0.11 (wayback machine or github commits), the error-handling free, happy-path-only examples that contributed to my epic confusion on hyper's error design and the availability of Future::map_err. I've learned some more rust since then! :slight_smile:

Shouldn't new hyper 0.12 client examples stop flirting with the halting problem by using run with very un-intuitive, implicit Client drop and the entire application in one future? These should instead use either current_thread::Runtime::block_on or some combination of (default) Runtime::spawn and shutdown_on_idle, with an explicit drop(client) if it remains necessary?

Otherwise, these happy-path examples are more fit for marketing brochures and less like guides for beginner-to-intermediate rust engineers.

A related point of confusion, question: I don't yet really understand the purpose (or need for) future::lazy(|| {}), as seen in many of these examples. As I was making my updates I ended up adding it in and taking it back out several times, with no visible effect. What's up with that? Or where is that better explained?

How is the halting problem related to Hyper's examples? rt::run does exactly what you proposed - Runtime::spawn and shutdown_on_idle: mod.rs.html -- source.

At first I thought you were saying hyper::rt::run wasn't a plain wrapper for tokio::runtime::run. The fact that I had to go check (it is a plain wrapper) shows the cost of the indirection.

Its not exactly what I propose because its hidden behind two functions in the example with no opportunity for comments, e.g. "this blocks", "this doesn't block". Also I suspect that a drop(client) is needed between the last spawn and the shutdown, where some foolish-user might attempt to re-use the client and, for example, benefit from connection keep-alive.

Some futures may start doing work immediately after creation. If you want to prevent that, you can wrap them in lazy() so that the future is created only when it's polled (the first time) by some executor. So Lazy is sort of like a "holder" for a ... future Future :slight_smile:.

I agree that this isn't immediately clear in the docs, particularly when futures are typically referred to as being lazy and have a #[must_use] on them.

1 Like

If you can bare with me @sfackler, I have a branch to further this:

  • 421ea809 demonstrates the halting problem
  • 81024459 fixes the halt by being explicit, less magical, and more educational.
  • e688e39a improves the associated client doctest in the alternative and more compact way, using current_thread::Runtime and block_on

I'll hope to get some feedback here before attempting the PR gauntlet. :blush:

@vitalyd, followup questions per your explanation of futures lazy:

  1. Should one expect, when reviewing code using lazy, that there is some order of execution issue that is being avoided by its use? In which case, (1.1) should there not typically be a compile-time error if it was removed? Or lacking that (1.2) a runtime panic or error return?
  2. Are there cases where there might be a performance advantage in delaying with lazy? Any common scenarios?
  3. If none of the above hold true, would it be fair to say a use of lazy is a needless complication or poor style?

It might be fair to say that lazy() is fairly niche. What are some examples that you've seen using it?

The hyper 0.12 client example uses lazy for reasons not commented. In ef7df797 I drop the lazy to see what happens. In this case, it compiles, outputs the entire body, then halts. I don't understand why. The Client should still be dropped?

Per my other linked branch and changes, I advocate for a less magical use of futures and tokio. This anomaly tells me there are more unknown unknowns.

This is really a question for @seanmonstar, but there's a difference in order of operations there. With lazy, the Client isn't constructed (or any of its futures) until the default tokio runtime is running. Without lazy, the Client stack is started first, and then triggers default tokio runtime initialization. There's likely some "ambient" difference in how the client hooks into the runtime between the two cases. Without the lazy, the default runtime continues running after the response is received, presumably because no "completion" is triggered and so it thinks there are active background tasks. But I'm just speculating ...

1 Like

It would be nice to get some high level guidance before submitting a PR for proposed docs/guide changes, but I'm not going to hold my breath. Thanks!

If I were you, I'd submit GH issues for hyper on whatever examples/guidance/docs you find inadequate. @seanmonstar and other hyper maintainers can then work with you (if necessary) on making them better. For instance, explaining why lazy is needed in the discussed example.

It's sometimes hard to know what to document, particularly if you know the code/project well, because it's easy to lose sense of what a beginner would find difficult. So while you still have that perspective, I'm sure the hyper project would find it useful for you to itemize.

3 Likes

I agree it can be pretty confusing, and I racked my brain trying to come up with a better solution. I'll explain the problem first:

  • hyper::Client::new() by itself doesn't do anything.
  • The Client has a connection pool inside it, which at initialization is just empty.
  • You need to run a request future on a runtime. Once the request and response are completed, if the connection could be used again, it is inserted into the connection pool inside the Client.
  • The "connection" is actually more a handle to an HTTP state machine Future that was spawned into the runtime, since it needs to listen for hangups (and a whole lot more if it is HTTP2), even if the user isn't polling the Client.
  • That HTTP state machine also knows that when the Client is dropped, it can shutdown, since a user can't possibly use it again.
  • The Tokio runtime will run until idle, which means that all spawned futures have completed.
  • The kept-alive HTTP state machine isn't completed yet, since it could be used to start a new request.
  • While you can visually verify that you don't send any more requests on the Client, there compiler isn't able to determine that. If the Client is created in the same function scope that the runtime is blocking, the function won't exit until the runtime is idle. This prevents the Client from being dropped, and so they just sit there, blocking.

By making use of future::lazy, the scope of the Client is moved, so that it doesn't live as long as the runtime. Rather, in the example, once the request future is created and returned, the Client is dropped (because the closure exits). Once the request future completes, the client HTTP state machine can also drop, since it knows there's no more connection pool. Then, the runtime can exit, since it is idle.

4 Likes

Makes perfect sense, if not being unfortunate - thanks for explaining Sean.

I'm sure you've thought of this, but perhaps Client::get should consume self and return it back in the ResponseFuture? Since it's Clone, a caller that wants to send multiple requests concurrently can just clone the instances (not sure how expensive that is, though). Alternatively, maybe there can be a get_multi() method that takes an iterator of Uris and returns a Stream representing their completion, but still consumes self.

Thank you very much @seanmonstar for your work and detailed response here (pleasantly surprised). It would of course be super cool if rustc was able to verify that a user doesn't get the Client drop vs. Runtime shutdown order wrong (compiler error in this case). While I don't think I have any better ideas on a compile-time fix, I do believe that there is room for rustdoc and guide improvements that will help users to avoid this particular footgun as they do things like try to reuse the Client for multiple requests. I will submit a PR with hyper and hope you will consider it then.

As of tokio 0.1.7, the default Runtime gets a block_on method as well. So apparently others found the signature changes of reform's run/spawn restrictive. The implementation is similar to my initial, so now I'm using that: 82a98378

A few updates on progress:

HTTPS

hyper-tls 0.3.0 was released today and native-tls previously, with updates resolving the prior dependency duplicates and old tokio-core:

hyper-tls v0.3.0
├── native-tls v0.2.0
│   ├── openssl v0.10.10
│   │   ├── bitflags v1.0.3
│   │   ├── cfg-if v0.1.3 (*)
│   │   ├── foreign-types v0.3.2
│   │   │   └── foreign-types-shared v0.1.1
│   │   ├── lazy_static v1.0.1 (*)
│   │   ├── libc v0.2.42 (*)
│   │   └── openssl-sys v0.9.33
│   │       └── libc v0.2.42 (*)
│   ├── openssl-probe v0.1.2
│   └── openssl-sys v0.9.33 (*)
└── tokio-io v0.1.7 (*)

Thanks @sfackler and others! Current tokio development, suggests that this setup code is also due a change, not yet made in the above updates:

let connector = hyper_tls::HttpsConnector::new(13 /*DNS threads*/)?;
let client = hyper::Client::builder().build(connector);

Namely, the HttpsConnector should be able to use tokio's threadpool rather than its own dedicated pool, even if it needs to use blocking calls (via backup threads). Perhaps this awaits tokio #363 for such an overhaul?

hyper 0.12 client

Thus far I haven't succeeded to improve the hyper client documentation, for the next developer, based on issues encountered during upgrade and described in this thread. See hyper #1571 and referenced PRs.

body-image and tokio reform

I've just released body-image 0.3.0, which besides all the upgrades, includes a more composable and impl Future returning request_dialog and a fully asynchronous (blocking annotated) AsyncBodySink adapter.

Thank you!!!! I spent couple of days trying to figure out how actually do post request and return some result. All the current hyper examples and guides are not doing anything interesting with result, except of printing it out, you examples are so much better!