What does it take to make an HTTP request

Hello,

Every so often, I have code that needs to make an HTTP request. To me, making an HTTP request feels like something that is very common and very basic.

So I'm looking on crates.io, and it seems one of the most popular crates for HTTP requests is reqwest. Since it is async, which I heard is a good thing for networking, I also need an async runtime, and it looks like the most popular is tokio.

So I'm adding reqwest and tokio to my project, and now suddenly Cargo pulls what looks to me like a very long list of dependencies (even if I only use the "rt" feature for tokio). And all I wanted was making an HTTP request, which I thought was a very basic, and very common thing to do.

So how should I think about this?

  1. "Yeah, making an HTTP request in Rust is just awkward. There should be a popular choice with far less overhead, but we aren't there yet."
  2. "What do you mean HTTP is simple? It's very challenging to do it right! Be glad that the smart people from reqwest and tokio worked so hard to make it happen the right way!"
  3. "Well, that's your own fault you are choosing such heavy-weight solutions. Simply use something light-weight, such as ..."
  4. "There's no reason to worry about the dependencies. They won't burden you in any significant way. Just relax and keep using reqwest and tokio and all will be perfectly fine."
  5. "What do you mean many dependencies? That's nothing! The average Rust project has many more!"

So what do people think? Thanks!

8 Likes

Directly after request comes the also very popular ureq which seems to be what you want:

So 3 :smile:

6 Likes

I think this is the place you're most likely going astray. Async is good, excellent even, and people using Rust often need async because of the kinds of things people attracted to Rust are doing / find interesting, but if you just want to make an HTTP request and you think of it as something common and basic, you most likely don't fall into that category.

In other words, if you need an async runtime, you probably know you need an async runtime, and have some need for tokio / its dependencies, beyond just making HTTP requests.

I believe ureq is frequently recommended as a simple, blocking HTTP client implementation. Its dependencies while not exactly tiny are significantly more limited than reqwest and mostly are pretty understandable. (I haven't used it nontrivially myself)

5 Likes

I used to try hyper, and it's really difficult. Then reqwest, it's easier anyway. Now, maybe io_uring is really a cool choice, but it's linux only.

Let's back to the question.

Tokio's deps:
bytes: convinient [u8] maybe. Althrough many say it's a redundant abstraction nowadays.
pin-project: to pin data and make self-reference safe.
libc: it's unavoidable according to its cool name.
mio: make io ops easier.
parking_lot: faster concurrent tools
signal: as its name
tokio_macros: it concludes syn macro_xx and so on, just something about compilers and Lexical Analysis and Syntax Analysis. They are quite hard things.
tracing: for log and debug.
windows_sys: windows' rust api

reqwest's deps:
encoding and decoding like serde(for struct transform), base64. Surely, many crate about String handling.
http_xxx: make socket.write("200 OK/r/n/r/n...") convinient
future_xx: Future::poll, many runtimes are built base on this.
hyper: hand shake between endpoints.
xxx_tls: network safety
mime: headers in http protocol.
qunn: cool quic protocol.
slab: less memory allocation
cookie: not only parse response, but also store the cookies. It also includes sha256, then ring and others in Cryptography.
time: string convert and parse
tower: complex things about Service trait, which is used in axum.

I think they are unavoidable and making maintenance easier, especially under the situation that there are so complex protocols, async runtime, string parsing, macros, allocation, network safety and platforms support.

Well, it really depends what you need from that request.

Are you happy with oneshot HTTP 1.1, or do you need keepalive? Are you expecting to get HTTP/2 server pushes? Are you trying to do multiple QUIC streams on one connection with HTTP/3?

HTTP can be easy, but it can also be really hard.

11 Likes

Note that if you use ureq to specifically to reduce dependencies (and the cookies feature is enabled), you may want to consider turning off IDNA support in idna_adapter. This is something you can do if your use case doesn't need support for internationalized domains.

(It may be worthwhile doing this for reqwest, too. But the overall impact is smaller compared to its other dependencies.)

If you want the ghetto approach you can open a TCP socket, hardcode a HTTP 1.0 request with a b"string", ignore error handling and hope you get back a success and look for the body by scanning for the first \r\n\r\n. No dependencies needed, but all the bugs are your own.

Alas, it stops being so easy when the other side insists on TLS.

Are you happy with oneshot HTTP 1.1

1.1 already allows chunked transfer encodings which complicates things. For dead-simple I'd stick to 1.0.

9 Likes

Rust likes to do things well, and making "a request" properly is actually a massively complicated job.

Nowadays you have to have HTTPS, and that's a whole bunch of cryptography, protocol management, certificate parsing, access to CA cert store, etc.

There are 3 major versions of HTTP, you may want to support at least two of them. That's another bunch of parsers, state machines, and connection management.

To make a request you need a URL, and there's like a dozen RFCs for this. IPv6 is a thing now, hopefully. It can appear in URLs too.

Compression on HTTP level can be helpful, and there are a few options to choose from.

And then you may want to have nice interface for reading those requests, possibly with streamed body, and standard structures for getting and setting headers.

When you're fetching text formats, they can come in many different character encodings, which requires a bunch of conversion routines and data tables.

You may need authentication, cookies (very messy spec – needs a whole database of domains that need special treatment)

There are internationalized domains, with their own special Unicode rules.

Every decent HTTP library has it all, and more. The main difference is that when you use pre-built libraries from other ecosystems you don't see all of this stuff being compiled.

libcurl has all this stuff, plus gives you support for 3 different e-mail protocols when you get it for just HTTP requests.

17 Likes

I don’t think this complexity is unique to Rust or any specific language. Nor is it solely an issue with HTTP. However, HTTP serves as a good example to explore the root causes of this complexity, often criticized as "bloat" or "dependency hell."

In my opinion, much of this complexity arises from not adhering to Alan Kay’s famous principle:

Simple things should be simple, complex things should be possible.

In this context, a question is, maybe: Should a request library really handle the content of a response?

For instance, Rust has standardized on UTF-8 for strings. Therefore, a request library only needs to handle raw bytes ([u8]) and UTF-8 strings, nothing more. If a user needs to handle a different character encoding, the library should leaving the decoding to the user.

This approach avoids bundling large code tables and unnecessary transitive dependencies. It also empowers users to choose the right tool for their needs, whether it's a lightweight crate for a specific encoding or a comprehensive library with all the bells and whistles.

The same logic applies to features like IDN handling, authentication, cookies, and so on. A request library should focus entirely on its core responsibility, handling requests and responses, while giving users the flexibility to extend it as needed.

Another benefit of this approach is that library authors aren't forced into tough decisions when choosing between competing implementations:
For example, when deciding between using libz or a pure Rust alternative for compression, the library can remain agnostic. Otherwise, you end up introducing feature flags or locking users into a choice. And what happens when a newer, faster compression library arrives? The cycle repeats.

Of course, this approach isn’t without challenges. There's always the temptation to support every niche use case, and it can be difficult to say "no". Adding a dependency often feels as simple as running cargo add, making it easy to overlook the implications.

2 Likes

I think you make a very good point wrt complexity. HTTP has five versions:

Of these, the first 2 are nearly universally uninteresting to support, and doing so adds incidental complexity. The same can be said of SSL 1.0, 2.0, 3.0, and TLS 1.0 and 1.1 (and a very long tail of deprecated cipher suites).

If simplicity is the goal and you control both the client and server code, CoAP might be a better protocol choice.

But would that flexibility satisfy the users who write code that needs to make an HTTP request?

In practice when users talk about “something simple” 99% of time they assume that they get “something simple” and bazillion related things that go with it.

That's how we endlessly go in circles: every single thing that starts as focused “on its core responsibility” inevitably get lots of stuff that users simply have no idea they need!

The only theoretically possible way out would be if we would stop hiding complexity by adding these implicit layers everywhere… but it's not possible to do, too, because people would naturally prefer things that “just work” over these that don't “just work” and your “perfect” simple world of libraries that do simple things would just end up unknown and not used.

HTTP is just one example of thing that sounds really simple, yet hideously complex “under the hood”… try to add support for “rich text” to your program one day (in a form that most users would perceive as simple: “simply MS Word document” or “simply WordPeferct document” or “simply iWork document”). HTTP is trivial by comparison…

let body: String = ureq::get("http://example.com")
    .header("Example-Header", "header value")
    .call()?
    .body_mut()
    .read_to_string()?;
}

That's really about it for a simple blocking HTTP request. There are tons of options if you need them.

Saying things should be simple is like saying there should be peace in the world. Of course it should be! Sadly, the real world is messy and complicated.

When you have to interact with complex systems, such random servers on the Web, simplicity may not even be an option. The complex mess already exists out there whether you like that or not. If you choose to ignore it, you'll end up sacrificing something – compatibility, performance, reliability, security.

It's easy to selfishly draw the line where it matters to you, and call everything else unnecessary. Maybe you don't need compatibility, maybe you don't need performance, etc.
In reality, what is necessary is subjective and context-dependent. Lack of support for various things will create problems and complexity for others.

There are buggy web servers (if you don't parse Content-Lenght as a comma-separated list, your client will get stuck). Some support only old protocols. Some assume new "browser-grade" clients, or send whatever works in Chrome.

Lack of support for non-UTF-8 encodings or IDN may mean the library is unreliable by default. Encoding is chosen by the server. A library that reads mojibake creates problems with text processing down the line. That adds complexity!

There are servers and proxies that will send gzipped content even when the client didn't indicate support for it. This is a protocol violation, but browsers were forgiving it for 20 years, so that's how HTTP works now.

If you're downloading a bit of JSON, a blocking buffering API will work for you, but it won't work for video, or real-time event streams, or for low-latency pass-through proxies.

You may have specific use-case in mind and think obviously the other ones you don't need should be optional… but that's not universal.
Even if something is needed rarely, leaving it unimplemented or disabled by default can create systemic problems. HTTP/1 had to give up on pipelining, because too many servers and proxies didn't support it. Text encoding of headers is a lost cause too. Trailers are unusable.

5 Likes

Oh right, I didn't consider until just now that you can return an IDN in a 302 Found's Location header, which then needs to get punycode'd. This violates the "header must be all ASCII" but I'm sure real world systems are doing it. That explains why a request library needs to have the code for that.

If I understand correctly, reqwest does provide a synchronus interface.

Using that doesn't reduce its dependencies. It's just a sync API wrapper around the async implementation.

3 Likes

I've done the ghetto HTTP client and server both, it's pretty trivial if it's just a dev tool and you know what the other end is and can get away with HTTP 1 and no TLS (eg localhost)

I wouldn't use it for anything serious, though!

Thanks for the hint, I might consider ureq for my next project.

Yes, that's probably the central question: do I really need async?

If you say I probably need it only if I know I need it, that sounds like very strongly I don't need it?

To take what would probably be my worst case scenario: let's say I want to submit ten HTTP requests in parallel, and I want to start processing partial responses as soon as I receive them. Would that be something where async is needed? Or at least better? Or would it be just as good or even better if I spawn a separate thread for each request?

If that scenario doesn't call for async, then I'm curious what does. A high rate of requests? I suppose my network bandwidth may be the limiting factor, but maybe it is not, or maybe other people have higher bandwidths than me.

Async only matters when the number of concurrent things you're doing is ⋙ than the number of cores on the machine.

If you're on a 4-core machines that's serving 2000 concurrent requests that all are making HTTP calls out to, say, CosmosDB, then you really want async.

If you're on a 8-core desktop and need to make 10 HTTP calls, whatever, it doesn't matter, stay synchronous and make your life easier.

2 Likes