Skip to content

Switch to Istio CNI by default#3135

Merged
google-oss-prow[bot] merged 6 commits into
kubeflow:masterfrom
madmecodes:istio-cni-default
May 27, 2025
Merged

Switch to Istio CNI by default#3135
google-oss-prow[bot] merged 6 commits into
kubeflow:masterfrom
madmecodes:istio-cni-default

Conversation

@madmecodes

Copy link
Copy Markdown
Contributor

Switch to Istio CNI by default

This PR changes the default Istio installation to use Istio CNI instead of standard Istio.
Key benefits include:

  • Eliminates the need for privileged Istio init containers
  • Improves compatibility with Pod Security Standards (PSS)
  • Enables native sidecars support for better init container network access

Changes include:

  • Updated example/kustomization.yaml to use Istio CNI paths
  • Updated README.md to reflect new installation paths
  • Added note about Ray operator being configured for Istio CNI compatibility

This change is part of the broader Rootless Kubeflow initiative #2528
and follows up on previous work #3061.

@madmecodes

Copy link
Copy Markdown
Contributor Author

@juliusvonkohout For GCP hostPath check, https://github.com/kubeflow/manifests/blob/9825950c84fd5d29617bc8ad6c0d15a6432e7635/common/istio-cni-1-24/istio-install/base/install.yaml#L3009

am i suppose to spin the cluster in GCP/GKE ad check the paths? will that cost me?

@juliusvonkohout

Copy link
Copy Markdown
Member

@juliusvonkohout For GCP hostPath check,

https://github.com/kubeflow/manifests/blob/9825950c84fd5d29617bc8ad6c0d15a6432e7635/common/istio-cni-1-24/istio-install/base/install.yaml#L3009

am i suppose to spin the cluster in GCP/GKE ad check the paths? will that cost me?

I know the paths and that it is correct for gcp. The question is how we can enable both cni directories at the same time. Since the default ones are correct for Azure, Kind and others.

@juliusvonkohout juliusvonkohout linked an issue May 17, 2025 that may be closed by this pull request
7 tasks
@juliusvonkohout

Copy link
Copy Markdown
Member

See #3061 (comment). We need to support both paths at the same time somehow.

@madmecodes

Copy link
Copy Markdown
Contributor Author

Okay, looking into this

See #3061 (comment). We need to support both paths at the same time somehow.

@madmecodes

Copy link
Copy Markdown
Contributor Author

Why This Multi-Path Approach woukd Work?

The patch adds both the standard path (/opt/cni/bin) and the GCP-specific path (/home/kubernetes/bin) as separate volume mounts in the Istio CNI DaemonSet.

  1. The CNI installer in the container will attempt to install the CNI binary to both paths
  2. Regardless of whether the node is running on standard Kubernetes or GCP/GKE, one of the paths will exist and be the correct one.
  3. If a path doesn't exist, mounting a non-existent hostPath simply results in an empty directory, which causes no harm.

@juliusvonkohout what do you think, can this be a potential approach?

@juliusvonkohout

Copy link
Copy Markdown
Member

Why This Multi-Path Approach woukd Work?

The patch adds both the standard path (/opt/cni/bin) and the GCP-specific path (/home/kubernetes/bin) as separate volume mounts in the Istio CNI DaemonSet.

1. The CNI installer in the container will attempt to install the CNI binary to both paths

2. Regardless of whether the node is running on standard Kubernetes or GCP/GKE, one of the paths will exist and be the correct one.

3. If a path doesn't exist, mounting a non-existent hostPath simply results in an empty directory, which causes no harm.

@juliusvonkohout what do you think, can this be a potential approach?

Yes, could work. Do you mind testing it on GCP? I think there is a small free 4GB node available by default if you have a gmail adress. You jus tneed to install Istio, so it should be enough. I can then later also test on some GCP clusters.

@juliusvonkohout

Copy link
Copy Markdown
Member

Do you mind fixing python3: can't open file '/home/runner/work/manifests/manifests/tests/gh-actions/test_pipeline.py': [Errno 2] No such file or directory in https://github.com/kubeflow/manifests/actions/runs/15084697684/job/42405783624?pr=3135 in a separate PR ? I think the file has just been renamed since we have v1 and v2 kfp tests. CC @kunal-511 to help

@kunal-511

Copy link
Copy Markdown
Contributor

@madmecodes The test_pipeline.py has been changed to test_pipeline_v2.py in #3129

@madmecodes

madmecodes commented May 19, 2025

Copy link
Copy Markdown
Contributor Author

@madmecodes The test_pipeline.py has been changed to test_pipeline_v2.py in #3129

The test_pipeline.py has been changed to test_pipeline_v2.py in #3129

this is updated #3136

@madmecodes

Copy link
Copy Markdown
Contributor Author

GKE Istio CNI Multi-Path Testing Report

Problem Statement

Istio CNI fails on Google Kubernetes Engine (GKE) because /opt/cni/bin is mounted read-only, preventing the CNI installer from writing the istio-cni binary.

Solution Approach

Implemented multi-path support by mounting both standard (/opt/cni/bin) and GCP-specific (/home/kubernetes/bin) CNI binary directories.

Testing Environment

  • Platform: Google Kubernetes Engine (GKE)
  • Cluster: istio-cni-test (2 nodes, e2-standard-2)
  • Kubernetes: v1.32.3-gke.1927009
  • Istio CNI: 1.24.3

Commands Used and Results

1. Cluster Setup

gcloud container clusters create istio-cni-test \
  --zone us-central1-a \
  --num-nodes 2 \
  --machine-type e2-standard-2 \
  --disk-size 20GB

2. Install Kubeflow with Istio CNI

while ! kustomize build example | kubectl apply --server-side --force-conflicts -f -; do 
  echo "Retrying..."; sleep 10; 
done

3. Observed Failure

kubectl logs -n kube-system istio-cni-node-8fc7j

Output:

CNIBinTargetDirs: /host/opt/cni/bin
error: failed file copy of /opt/cni/bin/istio-cni to /host/opt/cni/bin: 
open /host/opt/cni/bin/istio-cni.tmp.586267941: read-only file system

4. Verified Multi-Path Volume Mounts

kubectl describe pod -n kube-system istio-cni-node-8fc7j

Output:

Mounts:
  /host/opt/cni/bin from cni-bin-dir (rw)
  /host/home/kubernetes/bin from cni-bin-dir-gcp (rw)

Volumes:
  cni-bin-dir:
    Type: HostPath (bare host directory volume)
    Path: /opt/cni/bin
  cni-bin-dir-gcp:
    Type: HostPath (bare host directory volume)  
    Path: /home/kubernetes/bin

5. Attempted Configuration Fix

kubectl patch daemonset istio-cni-node -n kube-system --type='json' \
  -p='[{"op": "add", "path": "/spec/template/spec/containers/0/env/-", 
       "value": {"name": "CNI_BIN_TARGET_DIRS", 
                "value": "/host/opt/cni/bin,/host/home/kubernetes/bin"}}]'

Result: Environment variable set but ignored by CNI installer.

6. Verified GCP Path Functionality

kubectl debug node/gke-istio-cni-test-default-pool-6c013e50-k2bx -it --image=busybox

Commands in debug pod:

ls -la /host/home/kubernetes/bin/    # Shows CNI binaries present
touch /host/home/kubernetes/bin/test-file && rm /host/home/kubernetes/bin/test-file
echo "GCP path is writable"

Output: GCP path is writable ✓

Key Findings

✅ What Works

  1. Multi-path volume mounts: Both /opt/cni/bin and /home/kubernetes/bin successfully mounted
  2. GCP path accessibility: /home/kubernetes/bin is writable and contains CNI binaries
  3. Infrastructure setup: All volume configurations applied correctly

❌ Current Limitation

Istio CNI installer hardcoded behavior: Despite environment variable CNI_BIN_TARGET_DIRS, installer only attempts standard path:

CNIBinTargetDirs: /host/opt/cni/bin  (single path only)

Root Cause Analysis

  • GKE mounts /opt/cni/bin as read-only for security
  • Istio CNI installer lacks native multi-directory support
  • Installer ignores CNI_BIN_TARGET_DIRS environment variable

@madmecodes

Copy link
Copy Markdown
Contributor Author
Screenshot 2025-05-25 at 11 10 32 PM

The path is this only /host/home/kubernetes/bin
But its still trying /host/opt/cni/bin/istio-cni.tmp.3670064388

Screenshot 2025-05-25 at 11 11 11 PM

The issue is that Istio CNI installer is hardcoded and doesn't respect the CNI_BIN_TARGET_DIRS variable.

… instead of standard Istio.

Signed-off-by: madmecodes <ayushguptadev1@gmail.com>
Signed-off-by: madmecodes <ayushguptadev1@gmail.com>
…ctories. Currently it only uses /host/opt/cni/bin

Signed-off-by: madmecodes <ayushguptadev1@gmail.com>
Signed-off-by: madmecodes <ayushguptadev1@gmail.com>
Signed-off-by: madmecodes <ayushguptadev1@gmail.com>
@madmecodes madmecodes force-pushed the istio-cni-default branch from 22ac963 to 4502462 Compare May 27, 2025 14:08
Signed-off-by: madmecodes <ayushguptadev1@gmail.com>
@madmecodes

Copy link
Copy Markdown
Contributor Author

@juliusvonkohout are the comments correct?

@juliusvonkohout

Copy link
Copy Markdown
Member

/lgtm
/approve

thank you

@google-oss-prow

Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: juliusvonkohout

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@google-oss-prow google-oss-prow Bot merged commit 0191cbe into kubeflow:master May 27, 2025
25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Istio CNI by default

3 participants