-
Notifications
You must be signed in to change notification settings - Fork 433
Description
Describe the bug
The FlowExporter / FlowAggregator currently don't support "External-to-Pod" traffic, i.e. traffic coming from outside the cluster and targeting a NodePort or LoadBalancer Service.
This is mentioned in the documentation:
Currently, the Flow Exporter feature provides visibility for Pod-to-Pod, Pod-to-Service and Pod-to-External network flows along with the associated statistics such as data throughput (bits per second), packet throughput (packets per second), cumulative byte count and cumulative packet count. Pod-To-Service flow visibility is supported only when Antrea Proxy enabled, which is the case by default starting with Antrea v0.11. In the future, we will enable the support for External-To-Service flows.
However, when looking at NodePort Service traffic, it seems that all cases are not handled uniformly. In some cases, flows are actually exported to the FlowAggregator (with some misleading log messages), while in other cases, flows are not exported. More details below:
Case 1: NodePort with default externalTrafficPolicy
In this case, connections are ignored, thanks to this code:
antrea/pkg/agent/flowexporter/connections/conntrack.go
Lines 55 to 61 in d6766cc
| // Consider Pod-to-Pod, Pod-To-Service and Pod-To-External flows. | |
| if srcIP == gwIPv4 || dstIP == gwIPv4 { | |
| continue | |
| } | |
| if srcIP == gwIPv6 || dstIP == gwIPv6 { | |
| continue | |
| } |
In my opinion, this matches the documented behavior, even though ideally we would support this type of connections
Case 2: NodePort with externalTrafficPolicy=Local
In this case, filtering based on the source IP "fails", and connections are actually exported.
For example, the following may be logged by the FlowAggregator (when using the flow logger sink):
1699997594,1699997598,192.168.77.1,10.10.1.5,56171,80,TCP,,,,nginx-hvhjs,default,k8s-node-worker-1,0.0.0.0,0,,,,,,,,,,,,,
192.168.77.1 is the IP address of my local machine, from which I am accessing the NodePort Service. 10.10.1.5 is the IP address of the Pod implementing the Service.
So there is already a discrepancy with case 1. We also see the following warning in the Agent logs:
W1114 21:35:08.430174 1 exporter.go:615] Source IP: 192.168.77.1 doesn't exist in PodCIDRs
Case 3: NodePort with Antrea proxyAll enabled
This is similar to case 1, but this time we remove kube-proxy and enable proxyAll in AntreaProxy to handle NodePort traffic.
In this case, the flow is exported:
1699996576,1699996577,192.168.77.1,10.10.1.6,55418,80,TCP,,,,nginx-2zskx,default,k8s-node-worker-1,0.0.0.0,0,,,,,,,,,,,,,
and we see the following logs:
I1114 21:19:37.036002 1 connections.go:128] "Could not retrieve the Service info from antrea-agent-proxier" serviceStr="169.254.0.252:31749/TCP"
W1114 21:19:37.141838 1 exporter.go:615] Source IP: 192.168.77.1 doesn't exist in PodCIDRs
169.254.0.252 is a link-local address used by the proxyAll implementation to redirect traffic from the host network to OVS.
Case 4: NodePort with Pod as source
This is not a common use case by any means, but I thought I would also test this edge case.
The behavior depends on the value of externalTrafficPolicy. When using the default, the connection is treated as "Pod-to-External":
1699999841,1699999844,10.10.1.6,192.168.77.101,52136,30415,TCP,toolbox-pr96d,default,k8s-node-worker-1,antrea-agent-xwf6w,kube-system,k8s-node-worker-1,0.0.0.0,0,,,,,,,,,,,,,
When using externalTrafficPolicy=Local, the connection is actually exported twice:
1700000109,1700000114,10.10.1.6,10.10.1.5,37750,80,TCP,toolbox-pr96d,default,k8s-node-worker-1,nginx-hvhjs,default,k8s-node-worker-1,0.0.0.0,0,,,,,,,,,,,,,
1700000109,1700000114,10.10.1.6,192.168.77.101,37750,30415,TCP,toolbox-pr96d,default,k8s-node-worker-1,antrea-agent-xwf6w,kube-system,k8s-node-worker-1,0.0.0.0,0,,,,,,,,,,,,,
Once as Pod-to-External, and once as a Pod-to-Pod connection.
There is no log message from the FlowExporter in this case.
Note that if we enable proxyAll, results may differ yet again.
Other cases
There are potentially other cases to consider. In particular, I did not test with LoadBalancer Services. I imagine we have very similar issues.
Versions:
ToT version, which includes this PR: #5592
What should we do?
The main issue here IMO is the lack of consistency across all cases, based on whether externalTrafficPolicy is set to Local (which determines whether SNAT is needed) and whether proxyAll is enabled.
We should try to add support for "External-to-Pod" traffic and handle all possible cases consistently. We should include the Service information in the exported flow record, and avoid flooding the logs with warnings.
Because accessing a Service using NodePort or LoadBalancer from a Pod is not common, we don't have to handle this case for now. But ideally, we would handle this case gracefully (e.g., treat it as Pod-to-External, or not export it at all).