fix: calico dnat network policy for egress api server#383
fix: calico dnat network policy for egress api server#383cloud-j-luna wants to merge 8 commits intomainfrom
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
WalkthroughDiscover Kubernetes API server endpoints and add per-service egress NetworkPolicy generation for services that declare read permissions when deployment network policies are enabled. Changes
Sequence DiagramsequenceDiagram
participant Run as doRunCmd
participant KubeClient as KubeClient
participant Endpoints as "kubernetes Endpoints"
participant Settings as "builder.Settings"
participant Builder as "NetPol Builder"
Run->>KubeClient: create client from context
Run->>Endpoints: GET "kubernetes" Endpoints (default ns)
Endpoints-->>Run: return subsets with addresses & ports
Run->>Settings: append each {IP,port} as net.TCPAddr
Run->>Builder: call ValidateSettings / Create netpols
Builder->>Builder: for each service with read perms
Builder->>Endpoints: allow egress to each APIServerEndpoint (IP/32) on deduplicated TCP ports
Estimated Code Review Effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
cluster/kube/builder/settings.go
Outdated
| // (from the "kubernetes" endpoints in the default namespace, not the ClusterIP). | ||
| // This is needed for network policies because CNIs like Calico evaluate | ||
| // egress rules after DNAT, so the ClusterIP is not what gets matched. | ||
| APIServerEndpointIP string |
There was a problem hiding this comment.
replace APIServerEndpointIP and APIServerEndpointIP with KubeAPI url.URL
There was a problem hiding this comment.
I was thinking about using the TCPAddr type rather https://pkg.go.dev/net#TCPAddr after refactoring as it holds IP and Port only without scheme and all the other URL fields. Thoughts?
There was a problem hiding this comment.
are endpoints always ip addresses tho?
There was a problem hiding this comment.
Yes, although the type definition has a Hostname field, it is optional. IP Address is always present
Signed-off-by: Joao Luna <7607329+cloud-j-luna@users.noreply.github.com>
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
cluster/kube/builder/settings.go (1)
57-68:⚠️ Potential issue | 🟠 MajorValidate
APIServerEndpointwhen network policies are enabled.
ValidateSettingsstill ignores this field, so a discovery miss incmd/provider-services/cmd/run.gopasses startup and only fails later whencluster/kube/builder/netpol.gobuilds the API-server egress rule. Reject empty IPs and ports outside1..65535here so misconfiguration fails fast.Proposed fix
func ValidateSettings(settings Settings) error { + if settings.NetworkPoliciesEnabled { + if len(settings.APIServerEndpoint.IP) == 0 || + settings.APIServerEndpoint.Port <= 0 || + settings.APIServerEndpoint.Port > 65535 { + return fmt.Errorf("%w: invalid API server endpoint", ErrSettingsValidation) + } + } + if settings.DeploymentIngressStaticHosts { if settings.DeploymentIngressDomain == "" { return fmt.Errorf("%w: empty ingress domain", ErrSettingsValidation) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cluster/kube/builder/settings.go` around lines 57 - 68, ValidateSettings currently ignores Settings.APIServerEndpoint so invalid or empty API server endpoints slip through; update ValidateSettings to validate APIServerEndpoint on Settings when network policies are enabled (or whenever DeploymentNetworkPolicies flag is true) by parsing the endpoint into host:port, rejecting empty host/IP and ports not in range 1..65535, and returning fmt.Errorf("%w: ...", ErrSettingsValidation) on failure; use the Settings.APIServerEndpoint symbol and ErrSettingsValidation in the new checks so misconfiguration fails fast before netpol.go's API-server egress rule is built.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@cluster/kube/builder/netpol.go`:
- Around line 244-247: The code only checks APIServerEndpoint.Port for range but
allows Port==0 and doesn't validate the IP; update the validation for
b.settings.APIServerEndpoint so that APIServerEndpoint.IP is non-empty (and
optionally a valid IP string) and APIServerEndpoint.Port is in 1..65535 (reject
0); return an error early from the same function instead of proceeding to the
CIDR construction (the code that builds the CIDR from APIServerEndpoint.IP and
the rule-appending logic that uses APIServerEndpoint.Port should only run after
these checks pass).
In `@cmd/provider-services/cmd/run.go`:
- Around line 551-560: The current endpoint discovery only captures the first
backend and assigns it to kubeSettings.APIServerEndpoint (using
kubeAPIServerEndpointName), which breaks HA control planes; modify the discovery
loop to collect all subset.Addresses (each IP from subset.Addresses[*].IP) into
a slice stored on the settings/builder (e.g., add a field like
APIServerEndpoints []net.TCPAddr or []net.IP on kubeSettings/builder.Settings)
instead of overwriting kubeSettings.APIServerEndpoint, and then update netpol.go
(the code that generates egress peers around the lines that create single /32
peers) to iterate that slice and emit one egress peer per backend address (or
emit a broader CIDR if preferred) so all control-plane backends are authorized.
---
Outside diff comments:
In `@cluster/kube/builder/settings.go`:
- Around line 57-68: ValidateSettings currently ignores
Settings.APIServerEndpoint so invalid or empty API server endpoints slip
through; update ValidateSettings to validate APIServerEndpoint on Settings when
network policies are enabled (or whenever DeploymentNetworkPolicies flag is
true) by parsing the endpoint into host:port, rejecting empty host/IP and ports
not in range 1..65535, and returning fmt.Errorf("%w: ...",
ErrSettingsValidation) on failure; use the Settings.APIServerEndpoint symbol and
ErrSettingsValidation in the new checks so misconfiguration fails fast before
netpol.go's API-server egress rule is built.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: dedf9751-4b4d-4b03-b2fc-5fef04da5586
📒 Files selected for processing (3)
cluster/kube/builder/netpol.gocluster/kube/builder/settings.gocmd/provider-services/cmd/run.go
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@cmd/provider-services/cmd/run.go`:
- Around line 556-559: The code appends net.TCPAddr using net.ParseIP(addr.IP)
which can return nil and later cause a panic when ep.IP.String() is called (see
kubeSettings.APIServerEndpoints and net.TCPAddr usage); update the block that
builds kubeSettings.APIServerEndpoints to defensively validate the parsed IP
(using the result of net.ParseIP) and skip or log any endpoints with invalid IPs
instead of appending a TCPAddr with a nil IP, ensuring downstream code that
calls ep.IP.String() cannot panic.
- Around line 544-569: When deploymentNetworkPoliciesEnabled is true and
endpoint discovery via fromctx.KubeClientFromCtx/Get for the "kubernetes"
Endpoints fails, do not silently continue with an empty
kubeSettings.APIServerEndpoints; instead propagate an error so the process fails
fast. Replace the logger.Error-only branch (the else after
kc.CoreV1().Endpoints(...).Get) with code that returns or wraps the discovery
error (e.g., return fmt.Errorf("discover API server endpoints for network
policies: %w", err)) from the enclosing function so callers see the failure;
keep successful population of kubeSettings.APIServerEndpoints as-is. This
ensures services with Permissions.Read (see netpol.go) won’t be left silently
non-functional.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 43e24af5-2f0b-475d-bc5c-3855fbe50108
📒 Files selected for processing (3)
cluster/kube/builder/netpol.gocluster/kube/builder/settings.gocmd/provider-services/cmd/run.go
🚧 Files skipped from review as they are similar to previous changes (1)
- cluster/kube/builder/netpol.go
This is WIP with messy code.