My current thought is to keep track of how much load (say Memory x time + Bandwidth + CPU) a given IP address has created, and if it goes over some limit deny or abort requests. Periodically the load figure could be divided by 2, so recent load is given more weight.
The application level could adjust the limit in some circumstances ( say for successful login, login failure ), also perhaps depending on how generally loaded the server is.
For some services, certain (particular) requests can be extremely expensive (e.g. ones which cause queries that aren't backed up by indices; though these are usually restricted to logged-in users in most scenarios, I guess).
Ok, well the drawback there is if an attacker sends 1,000 requests per second, a valid request from a genuine client will be denied 99 times out of 100, which means the website is practically unusable.
The key to defending against attacks is to somehow reject attack requests but continue to respond to valid requests normally.
As has already been discussed, one method of DOS protection is to limit the number of requests submitted over a certain time period. This implicitly equates "load" with "# requests".
Another way you could handle this is by equating "load" with "response time". For example, you might say that each user gets to use 10 CPU-seconds every minute and add a middleware layer to your backend which times how long a request takes to complete.
You could also tweak this to measure the number of bytes transmitted in the request/response and combine these in weird and wonderful ways to come up with some metric that gets rate limited (e.g. each IP gets n "units" per minute, where unit = num_requests2 + 5×cummulative_request_time).
Your rate-limiting algorithm could be made to favour recent activity by doing some sort of exponential moving average (essentially your "periodically divide load by 2" idea) or using a leaky bucket.
For 99.99% of websites I probably wouldn't bother going this far though. Chances are, implementing something like this could cause developers to go down a rabbit-hole of increasingly complex solutions, which would not only add developer/maintenance overheads, but you might end up DOSing yourself because the DOS protection code triggers more database interaction and business logic than the code it's trying to protect (or at least, you'll be adding a non-zero amount of response overhead and infrastructure costs). Imagine having different rulesets for different types of users or setting up a scripting engine so you can tweak things on the fly or instrumenting your backend code with the concept of "gas".
In particular, trying to measure the memory usage associated with a request will probably require a lot of intrusive changes. You could probably make a custom allocator which associates allocations with a particular request via thread-locals or async-aware magic (e.g. like how tracing::instrument maintains spans in async functions), but it sounds like a pain and wouldn't be helpful for things outside your control (e.g. a Postgres database).
Yes, my thoughts are still a bit vague, what I really had in mind is the size of the request and the size of the response. One way attackers can consume resources is by sending a request slowly, so multiplying the size of the request by the time taken to receive it is a measure of load. This can be monitored as the request is read (using a timeout calculated to expire when the budget is met).
Yeah, that's kinda where I was going with the "rabbit-hole of increasingly complex solutions" comment.
We have this vague, unquantifable notion called "load" and are trying to use different numbers combined in various ways to quantify it for the purpose of rate-limiting.
What's stopping you from saying load = request_size * etime_taken + num_requests because you know your application is particularly vulnerable to a slow loris attack and want the time taken to have a much greater weight? This equation could become arbitrarily complex as you tweak it and add new parameters over time.
That said, if this level of protection was critical to me, I would probably go with a gas-based approach where my future (assuming an async backend) will automatically be cancelled whenever a request goes over a certain limit. That way your code is free to use custom logic which weighs something like the 2nd failed login attempt as more "expensive" than refreshing the homepage... I'd hate to be the guy trying to figure out how much gas each operation "costs", though. It's also very intrusive, so you'd probably want abstractions like an attribute macro (e.g. so you can easily say "calling this function costs 2 gas, and if it returns an error we add on an extra 5") and a database connection wrapper which automatically consumes an "appropriate" amount of gas from the current request.
My idea is not to have anything complicated, just limits on how much load an IP can generate on the server, with the limit being adjustable. Load can arise three ways:
(1) Memory load reading a request. Depends on the size of the request and how long it takes.
(2) CPU load processing a request. Simply the time taken to process the request.
(3) Memory load sending a response. Similar to (1).
I have been wondering if instead of refusing a request entirely ( due to the recent load generated by the IP address being too high ), instead processing could be delayed. So ( as a very simple example ) a client could initially perform 50 queries in a minute, but then (due to the load created), subsequent queries would be limited to say 10 queries per minute.
Generally I would say you'd be better served focusing on making your request handling as efficient as possible (both in terms of resource usage and time) and caching expensive operations over trying to roll your own DDoS protection.
Tracking IP usage adds another shared resource that a DDoS attack would increase contention on, which would increase load during an attack. Balancing how expensive the tracking is with how much good it can do is tricky, and also fairly hard to test reliably.
Certainly a serious attacker could have a network of compromised computers (or perhaps it simply owns them), I think that is a fundamental assumption, and ultimately if the network is large enough, there can be degraded performance for good faith clients, as it will not be possible to identify the malicious clients quickly enough. As an example, suppose our server can serve 1,000 requests per second, but an attacker has a network of 10,000 computers, then at the very least the attacker can overwhelm us for 10 seconds - there is no way to distinguish the good faith clients.
But still, I don't think this means we should adopt no protective measures.
After several days I have something working ( although it needs more testing and tidying up). I decided to keep 4 counters rather than a single measure of load, namely request count, read load, CPU for the query and write load.
The load counters are associated with the logged in user, or the IP address if there is no logged in user. The trickiest part has been establishing the user id (rather than the IP address) before reading the request body, this is desirable as logged in users will typically have bigger upload limits than users whose identity is unknown.
I think some web servers deal with the issue by having a fixed limit on request body size, meaning either nobody can upload large files, or you have a potential DoS issue.
Incidentally, I found a "slowloris" crate which is useful for testing. I don't yet have a test for a client that is (deliberately) slow reading the response.