Understanding `limit_req` module & documentation

Hello, thanks in advance for any review and feedback you may have, I’m new here.
The limit_req module seems like a great piece of work, especially in a ‘Two-Stage’ setup, this gives amazing capabilities and flexibility to rate limit requests. Therefore I have no concerns with the technical working of the module.

What I’m trying to do:

Understanding ngx module limit_req exact behavior, to be able to use it.
The challenge is to be able to come up with the parameter values for your use case and how to predict the behavior of that.

Where I’m stuck:

I’m not really stuck, but a confirmation or correction of my current understanding would be great :slight_smile: . This might help others as well. Especially if currently available information does turn out to be wrong.

What I’ve already tried:

I’ve ran into problems with:

  • Unclear documentation on actual behavior with provided parameters. I guess the module configuration directives are not the right place to outline the workings/controls of what these parameters exactly are doing. But ideally there is some documentation somewhere that clarifies this. I could not find it.
  • Blog post with uncomplete/wrong information?

While trying to understand how to leverage the capabilities of this module, I could not find concise information on how it works. The documentations outline which parameters can be set/provided and the syntax of the module, but understanding the resulting behavior was challenging for me.
As a result, I’ve performed a deep dive to come up with my own understanding (which might be wrong).

Eventually, I’ve tried:

  • Understanding the source code of the module: https://github.com/nginx/nginx/blob/master/src/http/modules/ngx_http_limit_req_module.c
  • Modelling (uniformly spaced) request patterns myself, on any given rate, burst and delay parameters.
  • Review the Community Blog post scenario.

It all started with the image from the blog post (see above), modeling ~8 requests/second on a setting of rate=5/s, burst=12, delay=8, over a 3s period. Sorry my post as a newby can’t include this image also.

This raised 3 main questions for me;

  • The model covers 3s, it should allow for at least 15 requests given the rate of 5r/s. Essentially: how is ‘request decay’ being modeled/calculated? I had to conclude this diagram does not do that at all.
  • How is the delay duration calculated? It looks like the delay is equally divided to match the rate limit. In practice, this is impossible, because prediction of incoming requests is impossible. I had to conclude this must be done in a different way.
  • The diagram seems to show some ‘buckets’ of 1s, where the last requests on each bucket are rejected, on the next bucket requests might be delayed again. This did not make sense to me. Intuitively, I was thinking the bucket is about the requests, not time-binning.

Reviewing the source code, my understanding is that:

  • Request decay is applied using a sliding-window approximation, calculated at millisecond precision, based on the time since the last accounted request for the same key.

  • This decay updates an excess counter, which represents how many requests are effectively “in the leaky bucket” beyond the configured rate.

  • This excess is then compared to the delay and burst thresholds:

    • If excess <= delay: the request is allowed immediately.
    • If excess > delay but <= burst: the request is delayed, and the delay (in milliseconds) is computed as: delay_ms=(excess-delay) * 1000 / rate
  • If excess > burst: the request is rejected.

  • The delay grows linearly as excess increases between the delay and burst thresholds.

New model
Modeling the scenario of the blog post (while using exactly 8req/s uniformly spaced), I came up with a very different result:

It is not as pretty as the blog post, but the objective is to show the correct behavior.

  • The green line and dots show all the requests (virt excess). Requests above the red dotted line (burst limit), are rejected
  • Blue dots show delay in ms (secondary Y-axis). This means corresponding requests are delayed.
  • This happens when the green line is between the orange (delay) and red (burst) dotted lines.

Comparing this to the blog, all of the requests of the first 3s should be handled (no rejections).
I have linked a temporary file-bin with an ods spreadsheet with the calculations.

Remaining questions

  • Does my understanding make sense? Corrections are very welcome
  • Did I miss documentation/resources that explain the working?
  • Is there a way to clarify this for others?
    • Is a blog post update/fix required?
1 Like

Maybe this helps to clarify my point further, when trying to reproduce the blog post request pattern (estimation), I think it should actually come out as:

1 Like

Hi @Jort! Correct me if I am mistaken, but from what I am seeing in your calculations you are assuming that both the burst and delay values also get reset every second right?

1 Like

Hi @alessandro, thank you for looking into this and your quick feedback — much appreciated!

No, I’m not assuming that burst or delay get reset per second. Or at least they shouldn’t — I’ve reviewed the calculations again to verify, but if you spot something off in the logic, please do point it out.

My understanding is that both are fixed thresholds, and excess is what evolves over time via precise millisecond-based decay (not per-second resets). So for each request, the evaluation follows this sequence:

  • Apply decay to the existing excess based on time elapsed since the last accounted request (using rate)
  • Then simulate adding 1 unit to the excess for the current request — this addition is only stored if the request is not rejected
  • Then compare the simulated excess against the delay and burst thresholds to determine whether the request is passed, delayed, or rejected

So nothing in the model “resets” delay or burst on a time interval — instead, the excess naturally rises or falls over time depending on how requests are spaced and whether they’re accepted.

Practically, in the blog scenario:

  • The first request has no existing excess, so it starts at 0. Then 1 unit is added (or 1000 in code).
  • The second request comes in 0.125s later (8r/s), so 0.125 * 5r/s = 0.625 decay is applied. 1 - 0.625 = 0.375 is the excess before this request. Then 1 is added, giving 1.375, which is still below the delay threshold.
  • This pattern continues, and at request 20 (2.375s), the excess before the request is 7.125, and after adding the request it becomes 8.125, which triggers a delay (e.g. 25ms).
  • Rejections would occur only when the post-decay +1 would exceed the burst threshold — and such requests do not contribute to excess.

Also worth noting: in the blog post’s graph, only 21 requests arrive over ~2.7s, which keeps the excess just under the delay threshold — so no delay would be applied in that case.

Hope this clears it up — and if I misunderstood what you meant by “resetting,” could you let me know specifically what behavior you’re referring to?

Speaking about the blog specifically, and not the actual codebase, which I am not very familiar with, the graph seems to be working as intended to me. Before delving into it, it’s probably also helpful to note that there is no saying that those 8r/s come in spaced 0.125s apart. You could be getting 8 request within the first 0.1s and then nothing until 1s.

Going over the blog specifically – between 0 and 1s, 7 requests come in. Given the 5r/s limit, the burst value of 12, and delay value of 8, these 7 are processed immediately. Come the period between 1s and 2s, another 7 requests come in. At this point NGINX is still in it’s burst processing phase since the incoming rate of requests has not dropped below the predefined 5r/s, so it processes 1 more request without delay, and then proceeds to process another 4 (to reach the burst value of 12) at a delayed rate as to keep the rate of 5r/s. The remaining 3 requests are rejected. Between the 2s and 3s period, the incoming rate of requests still hasn’t dropped below 5r/s, and there are 6 incoming requests. NGINX proceeds to process those incoming requests at such a rate as to maintain 5r/s and rejects any other requests. The burst and delay values will not come into play until the average r/s drops below 5r/s.

1 Like

Hi @alessandro, thanks for expanding on your interpretation! I think we’re getting to the heart of it. :slightly_smiling_face:

First off, I agree with your comment about request spacing — there are no guarantees, and requests may come in bursts or irregular patterns. Fortunately, limit_req handles this gracefully.

From what I’ve seen in the code and confirmed through modeling, NGINX does not operate based on maintaining an average r/s, nor does it wait for traffic to “drop below 5r/s” before engaging burst or delay. Instead, each request is evaluated in isolation based on the current state of the excess counter.

Here’s how it works for each request:

  • The previous excess value is decayed linearly over time, based on the milliseconds since the last accounted request.
  • Then, +1 is added for the current request — unless the request is ultimately rejected.
  • The resulting excess is then compared to the burst and delay thresholds to determine whether the request is passed, delayed, or rejected.

Linking this to code, we can see at L454:

excess = lr->excess - ctx->rate * ms / 1000 + 1000;

Where:

  • lr->excess is the last stored excess for this key (typically based on client IP)
  • ctx->rate * ms / 1000 is the decay, calculated in milliseconds
  • +1000 simulates the addition of the current request (1 unit)

This computes the projected excess value assuming the request is accepted.

Then, a few lines down at L462–464, we have:

if ((ngx_uint_t) excess > limit->burst) {
    return NGX_BUSY;
}
  • If the resulting excess exceeds the burst, the request is immediately rejected
  • No delay is applied, and no further updates to internal state occur

Then we see on L466-467:

if (account) {
  lr->excess = excess;
  • account is true only if the request was accepted (either passed or delayed)
  • If rejected, lr->excess is not updated
  • This ensures rejected requests don’t affect future decay or state

Finally, if the request is accepted, the module calculates the delay (if applicable). At L547–553:

    if ((ngx_uint_t) excess <= (*limit)->delay) {
        max_delay = 0;

    } else {
        ctx = (*limit)->shm_zone->data;
        max_delay = (excess - (*limit)->delay) * 1000 / ctx->rate;
    }
  • If excess is at or below the delay threshold → the request proceeds immediately
  • If excess is above the threshold → a delay is calculated using a linear function based on how far excess exceeds the delay limit.

So rather than smoothing over traffic to “hold at 5r/s,” NGINX makes per-request decisions using a leaky-bucket model, where excess decays continuously, and thresholds (delay and burst) are applied statelessly.

1 Like

Thanks for the extra clarification! I’ve always assumed my explanation was the intented way limit_req works and by the looks of it so did the author of the blog post you shared, and the directive documentation seems to imply as much too.

The data you have collected is quite interesting though, and given my lack of expertise around the source code, I’m going to share this thread with the core team. Hopefully we can get a definite answer!

2 Likes