Hello, thanks in advance for any review and feedback you may have, I’m new here.
The limit_req
module seems like a great piece of work, especially in a ‘Two-Stage’ setup, this gives amazing capabilities and flexibility to rate limit requests. Therefore I have no concerns with the technical working of the module.
What I’m trying to do:
Understanding ngx module limit_req
exact behavior, to be able to use it.
The challenge is to be able to come up with the parameter values for your use case and how to predict the behavior of that.
Where I’m stuck:
I’m not really stuck, but a confirmation or correction of my current understanding would be great . This might help others as well. Especially if currently available information does turn out to be wrong.
What I’ve already tried:
I’ve ran into problems with:
- Unclear documentation on actual behavior with provided parameters. I guess the module configuration directives are not the right place to outline the workings/controls of what these parameters exactly are doing. But ideally there is some documentation somewhere that clarifies this. I could not find it.
- Blog post with uncomplete/wrong information?
While trying to understand how to leverage the capabilities of this module, I could not find concise information on how it works. The documentations outline which parameters can be set/provided and the syntax of the module, but understanding the resulting behavior was challenging for me.
As a result, I’ve performed a deep dive to come up with my own understanding (which might be wrong).
Eventually, I’ve tried:
- Understanding the source code of the module:
https://github.com/nginx/nginx/blob/master/src/http/modules/ngx_http_limit_req_module.c
- Modelling (uniformly spaced) request patterns myself, on any given rate, burst and delay parameters.
- Review the Community Blog post scenario.
It all started with the image from the blog post (see above), modeling ~8 requests/second on a setting of rate=5/s
, burst=12
, delay=8
, over a 3s period. Sorry my post as a newby can’t include this image also.
This raised 3 main questions for me;
- The model covers 3s, it should allow for at least 15 requests given the rate of 5r/s. Essentially: how is ‘request decay’ being modeled/calculated? I had to conclude this diagram does not do that at all.
- How is the delay duration calculated? It looks like the delay is equally divided to match the rate limit. In practice, this is impossible, because prediction of incoming requests is impossible. I had to conclude this must be done in a different way.
- The diagram seems to show some ‘buckets’ of 1s, where the last requests on each bucket are rejected, on the next bucket requests might be delayed again. This did not make sense to me. Intuitively, I was thinking the bucket is about the requests, not time-binning.
Reviewing the source code, my understanding is that:
-
Request decay is applied using a sliding-window approximation, calculated at millisecond precision, based on the time since the last accounted request for the same key.
-
This decay updates an
excess
counter, which represents how many requests are effectively “in the leaky bucket” beyond the configured rate. -
This
excess
is then compared to thedelay
andburst
thresholds:- If
excess <= delay
: the request is allowed immediately. - If
excess > delay
but<= burst
: the request is delayed, and the delay (in milliseconds) is computed as:delay_ms
=(excess-delay) * 1000 / rate
- If
-
If
excess > burst
: the request is rejected. -
The delay grows linearly as
excess
increases between thedelay
andburst
thresholds.
New model
Modeling the scenario of the blog post (while using exactly 8req/s uniformly spaced), I came up with a very different result:
It is not as pretty as the blog post, but the objective is to show the correct behavior.
- The green line and dots show all the requests (virt excess). Requests above the red dotted line (burst limit), are rejected
- Blue dots show delay in ms (secondary Y-axis). This means corresponding requests are delayed.
- This happens when the green line is between the orange (delay) and red (burst) dotted lines.
Comparing this to the blog, all of the requests of the first 3s should be handled (no rejections).
I have linked a temporary file-bin with an ods spreadsheet with the calculations.
Remaining questions
- Does my understanding make sense? Corrections are very welcome
- Did I miss documentation/resources that explain the working?
- Is there a way to clarify this for others?
- Is a blog post update/fix required?