Announcing the `http` crate

Really happy to see this \o/.

I will test-drive it a little more tonight, but I've integrated it into a lib I'm writing yesterday and it was quite simple.

To the subject of "http" being more of a standard set of types then a client: I'd be good to have a very section in the documentation that lists clients and software built on top of it. It might be confusing for users at first that http is not a client or server, but we can make a virtue out of that: people look for "http" and immediately get an overview of what's available.

6 Likes

The query is not required to consist of key-value pairs, so any map could only be a secondary representation anyway. The raw string needs to be available, parsing of any specific form can be added on top of that.

4 Likes

Is it possible to iterate through all the headers without using the HeaderMap?

I'd like to forward something like the following without messing up the order:

FOO: 1
BAR: 1
FOO: 2

Using the HeaderMap I'll end up with this (if I understand correctly):

BAR: 1
FOO: 1
FOO: 2
3 Likes

Not yet, but it will be by rustconf (it doesn't quite work yet). You can ping me on IRC if you want more info.

Yes, the order there would be changed. Is there a situation where this is important? While the order of headers with the same name matters, order of separate headers should not matter at all.

1 Like

I can wait for the conference :slight_smile: Thanks for the info.

Implemented in minttp.

I'm very excited to finally have this!

Any powers that be that are reading: Is there any hope of making forward progress on the stabilization of the TryFrom trait? I've been trying to get attention on it for over a year, and it's becoming increasingly frustrating to see high profile crates like this have to use workarounds, e.g.:

We're mildly grumpy about it too, but when it does stabilize, we can just deprecate the internal trait and use the new one.

Is there buy in from all the major http client/server implementations to move to a shared dependency and will that be a breaking change for those of us downstream users of Reqwest or Hyper?

1 Like

Thanks for taking the time for such a detailed writeup! For any of your points, if you'd like to pursue them further, I highly suggest opening an issue on the repo. For now, I'll try to address the points inline.

  1. This would result in an 'explosion' of lifetimes, which would be very unfortunate. Even if reading in synchronous IO, you'll have needed to copy the bytes out of the socket into a buffer somewhere. And after having done that, if you hope to use the same buffer to read the body, then you would invalidate the lifetimes in a Request. So, you most likely will need a separate buffer, or make a copy. In both cases, Bytes is optimized to make that as cheap as possible. It has SSO (or SBO) built in, and can share just the 1 buffer across all the values.

  2. There is Bytes::from(String) which is free. The bytes aren't copied, it just assumes ownership of the buffer used by the string.

  3. I'm sorry, I don't really understand what this is.

  4. The T body is specifically generic because we don't want to declare where a body has to come from, or even what shape a body is. It could be a TcpStream, or a futures::Stream, or some parsed type, like Comments. Being generic allows it more flexibility, both around what frameworks can pop up, and how people can use them internally in their own apps.

  5. I believe websockets are out of scope of the crate. As long as it doesn't do anything to prevent usage, how exactly websockets are handled can be done by another crate. Is there something you see in the current design that would prevent people from using websockets?

I can tell you there is 'buy in' from hyper and reqwest :slight_smile:

It will mean a breaking change unfortunately, but it's for the best long term.

1 Like

HTTP proxies using this crate might want to preserve header order.

e: Is it possible to just use the header parsing part only?

I understand the desire. Still, the exact order of headers with different names is defined as insignificant in the relevant specs. Keeping the exact order would likely require sacrifices in the performance of the hash map. Libraries in other languages tend to all have the same behavior, too.

Proxies should also be filtering headers, so it shouldn't just pipe the raw stream through...

What part is this? Parsing headers from a byte stream? That doesn't exist in this crate, as it is specific to HTTP versions. If you take something like httparse, it doesn't use a hash map, and the exact order is kept. It just gives you essentially an array of key-value pairs.

Yes. I thought the crate translated header values into Rust types but looking at the docs I see that it keeps them as Bytes.

An example use case is where you are passed a borrow reference to a slice containing a bunch of URLs, and then want to perform HTTP requests for them using a synchronous API.

Currently, you would first need to copy from the subslices of the slice to a Bytes object, while with lifetimes you could just pass the borrowed subslice as the URL.

It's debatable whether the additional complexity is worth it, but not having this does limit the "universality" of the HTTP types.

That works for String, but not for other arbitrary containers since the ownership model, memory layout and memory allocator needs to be the one expected by ByteStr.

Using a generic parameter in place of ByteStr would allow usage of any type that implements AsRef.

Again, it's debatable whether the additional complexity is worth it, but not having this does limit the "universality" of the HTTP types.

It was about the fact that to be truly generic, the HTTP types need to preserve EVERYTHING about a parsed request or response, so that parsing and encoding results in an output that is bit-identical to the input to the parser, for any valid request or response.

This means that the order of headers, the amount of whitespace between the method and the URL, and all other details with no semantic meaning need to be recorded.

The main reason to have this is that it prevents detection of transparent proxies manipulating HTTP connections, and allows to successfully masquerade a new program as being a different pre-existing program.

For example, the people running an HTTP server may desire to only allow connections from their own client "app", and making connections from other software (such as a transparent proxy partially manipulating the "app"'s requests without its knowledge, or a clone of the "app") might require replicating exactly the requests they make including non-semantic details like header ordering and whitespace, to prevent detection and banning from the server.

Likewise, in general, it is very good practice for a generic transparent proxy (for example, a proxy that re-encodes most images to a lower quality and size, possibly changing their MIME type, to save bandwidth) to only modify the HTTP stream as little as possible, to reduce the chance of being detected and reduce the chance of introducing breakage in clients or servers that assume the non-semantic information to be in a certain format.

Obviously though all this should be optional, so there should probably be two versions of the data structures: one with and one without the non-semantic information.

It's great to support any body at the lowest layer, but interoperability requires agreement between e.g. web server implementations and web applications, so at some point it needs to be specified what traits the body should support for such interoperable usage.

Overall, a decision needs to made on the scope of this crate.

The current design seems quite good for HTTP types designed for use solely with Tokio-based APIs that only care about semantically-meaningful details, but it is not general enough to provide HTTP-related types for any possible use.

So one needs to either document the current limited scope, and perhaps name the create "tokio-http" or similar, or extend the design.

Thanks for clarifying! Again, if you'd like pursue any of these, it'd be best to open a specific issue on the repository, so that it doesn't get lost, and people can focus on discussing that single point.

As for being useful for any possible use, it becomes impossible to balance everything. We certainly have tokio use cases in mind, but also tried to be sure it wasn't favored. It should be a great fit for most cases, including just throwing a parser into the mix with blocking TcpStreams.

I've been doing this (repo is still private, hoping to make it public soon) and it works well :+1:

Perhaps use GitHub - bluss/indexmap: A hash table with consistent order and fast iteration; access items by key or sequence index to get a hash map that retains the order?

Actually, the implementation of HeaderMap is heavily based on OrderMap. So, distinct keys will maintain order. However, the multi map behavior is implemented by having side storage for multi values... that is why the order is not maintained.

It would be possible to maintain total order, however it would definitely add some cost (I don't know how much).

If you are interested in pursuing this, you should champion an issue on the repo :slight_smile:

The good news is that adding a total order guarantee would be a backwards compatible change (right now, only deterministic order is guaranteed), so it could happen at any time.

1 Like