Description
It would be good to discuss what guarantees implementors can provide during an update of an xRoute
. Specifically in Knative we've seen ingresses drop traffic when transitioning from one backend to another - even when both backends have healthy endpoints.
The situations I've seen this occur is when
1. Ingress doesn't track backend endpoints that aren't reachable via routes
You have two healthy backends A & B
When a route is updated from pointing to backend A to backend B the proxy has to page in backend B's endpoints. This unfortunately races against the route config being rolled out. Ingress drops traffic because endpoints aren't present and the route isn't pointing to backend A anymore.
2. Newly deployed healthy backends aren't safe for Routes reference
Even if an ingress solution tracks all endpoints there's a small window where traffic can be dropped when switching to a newly deployed backend C.
In Knative we have a test were we continuously probe a HTTPRoute while performing the following steps
- Spin up a deployment and service
- Wait for a healthy endpoint to appear (by polling the API server)
- Update an HTTPRoute to point to the new backend.
- Repeat 1-3 about 10x
The race here is the test observes and updates the HTTPRoute's backend faster than the ingress proxy can process/track the new endpoint.