Why does NGF hardcode pod IPs in upstream instead of using dynamic endpoint tracking?

anwer_shahith · April 14, 2026, 5:03am

My issue:
New NGINX Gateway Fabric hardcodes pod IPs directly in upstream blocks. When a pod restarts and receives a new IP, the upstream config becomes stale, causing 502 Bad Gateway errors. The upstream is not dynamically updated to reflect the new pod IP.

How I encountered the problem:
We started receiving 502 Bad Gateway errors from service. Investigation revealed the nginx upstream block had a hardcoded pod IP (10.4.149.153), but the pod had restarted and been assigned a new IP (10.4.150.240). Traffic was being routed to a dead IP until a manual nginx reload was triggered.

Upstream config observed in new fabric:

upstream default_formula1-care-portal-productqa_80 {
  random two least_conn;
  zone default_formula1-care-portal-productqa_80 512k;
  server 10.4.149.153:80;
  keepalive 16;
}

Our older nginx ingress controller routes via proxy_pass http://upstream_balancer (Lua-based dynamic balancer) and does not have this issue.

Solutions I’ve tried:

Manual nginx reload on the fabric controller — this temporarily resolves the issue by re-syncing the pod IP
Confirmed pod is healthy (1/1 Running, 0 restarts) — issue is purely with stale upstream IP in the fabric config

Version of NGF

NGF version: [nginx-gateway-fabric:2.4.2]

Deployment environment:

AWS EKS
Kubernetes version: 1.33
Namespace: default

i see similar issues on stackoverflow https://stackoverflow.com/questions/54019648/kubernetes-nginx-refresh-ip-address-when-upstream-service-ip-changes/54030796

sjberman · April 14, 2026, 1:55pm

Hi @anwer_shahith, hardcoded IPs in the upstream is how NGINX is configured, this is not unique to NGINX Gateway Fabric. We also use proxy_pass http://upstream which proxies the request to the server IPs defined in the upstream. ingress-nginx used a lot of Lua to change default nginx behavior, which we do not use.

Whenever Pod IPs change, our controller updates the nginx config with the new IP addresses, the same way we update the config if any other policy or route changes. If that’s not happening then there’s a chance you’re facing a similar bug as defined here. Though this bug sees 504s, not 502s.

I’d be curious if you see any error logs in either the control plane or the data plane that you can share that might help us figure out why the addresses are not being updated. Also please share if you see any error statuses on your various resources (kubectl describe gateway, kubectl describe httproute, and so on).

anwer_shahith · April 15, 2026, 6:21am

Hi @sjberman This seems similar to the issue mentioned in PR #4697 and appears relevant to our case.
Here is the diagnostic data log you asked for:

2026/04/13 19:10:38 [error] 59646#59646: *419451 connect() failed (113: Host is unreachable) while connecting to upstream, client: 34.217.241.79, server: api-server-uat.company.com, request: “GET /oauth/token?grant_type=client_credentials&scope=all HTTP/1.1”, upstream: “http://10.4.139.19:8080/oauth/token?grant_type=client_credentials&scope=all”, host: “api-server-uat.company.com”{ “message”: “502 GET http://api-server-uat.company.com/oauth/token?grant_type=client_credentials&scope=all”,“host”: “api-server-uat.company.com”,2026/04/13 19:10:41 [error] 59646#59646: *419451 connect() failed (113: Host is unreachable) while connecting to upstream, client: 34.217.241.79, server: api-server-uat.company.com, request: “GET /oauth/token?grant_type=client_credentials&scope=all HTTP/1.1”, upstream: “http://10.4.139.19:8080/oauth/token?grant_type=client_credentials&scope=all”, host: “api-server-uat.company.com”{ “message”: “502 GET http://api-server-uat.company.com/oauth/token?grant_type=client_credentials&scope=all”,“host”: “api-server-uat.company.com”,

Logs from kubectl describe gateway and kubectl describe httproute may not be relevant at this stage, as the issue occurred a few days ago and the associated events are no longer available.

sodonova · April 17, 2026, 3:00pm

Hi @anwer_shahith thanks for these details. I’m trying to create a way to reliably produce this bug.

Do you know what happened in your environment when you encountered this? Did you perform a rolling-restart/scale on your deployments or were the deployments deleted and re-created?

anwer_shahith · April 19, 2026, 4:57pm

Hi, thanks for looking into this.

From my observation, this issue occurred when the application pod was recreated. At that time, there was only one backend pod running.

However, across the nginx replicas, I noticed inconsistent upstream configurations:

Some nginx pods were still routing traffic to the old pod IP
Others had already updated to the new pod IP

It seems that when the application pod was recreated and assigned a new IP, the update was not propagated consistently across all nginx replicas. Only a subset of nginx pods picked up the new endpoint, while others continued using the stale IP.

To fix this I I had to explicitly perform a rolling restart of nginx.

Let me know if you need additional details I have attached the configuration snapshot of nginx -T

nginx-log.txt (1.3 KB)

sodonova · April 20, 2026, 8:18am

Thanks @anwer_shahith for the logs attached, that should really help.

Right now, there are a few avenues we’re exploring.

From a Kubernetes perspective, it could be

The Kubernetes API server not updating EndpointSlices in time
The local controller cache not being updated with slow build time

Once we’ve reliably re-produced the bug, we’ll update you here.

One other thing I forgot to ask. Do you know roughly how many replicas of NGINX you had as well? We will try to test with a varying number of replicas as well, but it will be good to know exactly how many you had as well.

Thanks!

anwer_shahith · April 24, 2026, 5:57am

We were running NGINX as a DaemonSet, so the pod count was not less than 10.

Topic		Replies	Views
Migrating from Nginx Ingress to NGF with Istio NGINX Gateway Fabric config , nginx-gateway-fabric , networking	2	171	April 22, 2026
Fixed address in the Gateway API NGINX Gateway Fabric api-gateway	15	626	December 9, 2025
NGINX controller migration from ingress nginx NGINX Ingress Controller config , nginx-ingress , ingress-nginx	1	142	February 6, 2026
Can NGINX Dynamically Reload Upstream Servers from External File? NGINX reverse-proxy , upstream , dns	2	199	July 21, 2025
NGINX reverse proxy issues, services unable to communicate between them NGINX config , reverse-proxy , ubuntu , pterodactyl	22	1010	August 13, 2025

Why does NGF hardcode pod IPs in upstream instead of using dynamic endpoint tracking?

Related topics