[FlowExporter] Need uniform handling for "External-to-Pod" traffic

**Describe the bug**
The FlowExporter / FlowAggregator currently don't support "External-to-Pod" traffic, i.e. traffic coming from outside the cluster and targeting a NodePort or LoadBalancer Service.

This is mentioned in the [documentation](https://github.com/antrea-io/antrea/blob/main/docs/network-flow-visibility.md#types-of-flows-and-associated-information):

> Currently, the Flow Exporter feature provides visibility for Pod-to-Pod, Pod-to-Service and Pod-to-External network flows along with the associated statistics such as data throughput (bits per second), packet throughput (packets per second), cumulative byte count and cumulative packet count. Pod-To-Service flow visibility is supported only [when Antrea Proxy enabled](https://github.com/antrea-io/antrea/blob/main/docs/feature-gates.md), which is the case by default starting with Antrea v0.11. In the future, we will enable the support for External-To-Service flows.

However, when looking at NodePort Service traffic, it seems that all cases are not handled uniformly. In some cases, flows are actually exported to the FlowAggregator (with some misleading log messages), while in other cases, flows are not exported. More details below:

_Case 1: NodePort with default externalTrafficPolicy_

In this case, connections are ignored, thanks to this code: https://github.com/antrea-io/antrea/blob/d6766ccab27e2297298f0e549909d160f92e733c/pkg/agent/flowexporter/connections/conntrack.go#L55-L61

In my opinion, this matches the documented behavior, even though ideally we would support this type of connections

_Case 2: NodePort with externalTrafficPolicy=Local_

In this case, filtering based on the source IP "fails", and connections are actually exported.
For example, the following may be logged by the FlowAggregator (when using the flow logger sink):
```
1699997594,1699997598,192.168.77.1,10.10.1.5,56171,80,TCP,,,,nginx-hvhjs,default,k8s-node-worker-1,0.0.0.0,0,,,,,,,,,,,,,
```
192.168.77.1 is the IP address of my local machine, from which I am accessing the NodePort Service. `10.10.1.5` is the IP address of the Pod implementing the Service.

So there is already a discrepancy with case 1. We also see the following warning in the Agent logs:
```
W1114 21:35:08.430174       1 exporter.go:615] Source IP: 192.168.77.1 doesn't exist in PodCIDRs
```

_Case 3: NodePort with Antrea proxyAll enabled_

This is similar to case 1, but this time we remove kube-proxy and enable proxyAll in AntreaProxy to handle NodePort traffic.
In this case, the flow is exported:
```
1699996576,1699996577,192.168.77.1,10.10.1.6,55418,80,TCP,,,,nginx-2zskx,default,k8s-node-worker-1,0.0.0.0,0,,,,,,,,,,,,,
```
and we see the following logs:
```
I1114 21:19:37.036002       1 connections.go:128] "Could not retrieve the Service info from antrea-agent-proxier" serviceStr="169.254.0.252:31749/TCP"
W1114 21:19:37.141838       1 exporter.go:615] Source IP: 192.168.77.1 doesn't exist in PodCIDRs
```
169.254.0.252 is a link-local address used by the proxyAll implementation to redirect traffic from the host network to OVS.

_Case 4: NodePort with Pod as source_

This is not a common use case by any means, but I thought I would also test this edge case.

The behavior depends on the value of `externalTrafficPolicy`. When using the default, the connection is treated as "Pod-to-External":
```
1699999841,1699999844,10.10.1.6,192.168.77.101,52136,30415,TCP,toolbox-pr96d,default,k8s-node-worker-1,antrea-agent-xwf6w,kube-system,k8s-node-worker-1,0.0.0.0,0,,,,,,,,,,,,,
```

When using `externalTrafficPolicy=Local`, the connection is actually exported twice:
```
1700000109,1700000114,10.10.1.6,10.10.1.5,37750,80,TCP,toolbox-pr96d,default,k8s-node-worker-1,nginx-hvhjs,default,k8s-node-worker-1,0.0.0.0,0,,,,,,,,,,,,,
1700000109,1700000114,10.10.1.6,192.168.77.101,37750,30415,TCP,toolbox-pr96d,default,k8s-node-worker-1,antrea-agent-xwf6w,kube-system,k8s-node-worker-1,0.0.0.0,0,,,,,,,,,,,,,
```
Once as Pod-to-External, and once as a Pod-to-Pod connection.

There is no log message from the FlowExporter in this case.

Note that if we enable proxyAll, results may differ yet again.

_Other cases_

There are potentially other cases to consider. In particular, I did not test with `LoadBalancer` Services. I imagine we have very similar issues.

**Versions:**
ToT version, which includes this PR: https://github.com/antrea-io/antrea/pull/5592

**What should we do?**
The main issue here IMO is the lack of consistency across all cases, based on whether `externalTrafficPolicy` is set to `Local` (which determines whether SNAT is needed) and whether `proxyAll` is enabled.
We should try to add support for "External-to-Pod" traffic and handle all possible cases consistently. We should include the Service information in the exported flow record, and avoid flooding the logs with warnings.
Because accessing a Service using NodePort or LoadBalancer from a Pod is not common, we don't have to handle this case for now. But ideally, we would handle this case gracefully (e.g., treat it as Pod-to-External, or not export it at all).

	// Consider Pod-to-Pod, Pod-To-Service and Pod-To-External flows.
	if srcIP == gwIPv4 \|\| dstIP == gwIPv4 {
	continue
	}
	if srcIP == gwIPv6 \|\| dstIP == gwIPv6 {
	continue
	}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[FlowExporter] Need uniform handling for "External-to-Pod" traffic #5706

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[FlowExporter] Need uniform handling for "External-to-Pod" traffic #5706

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions