Skip to content

kserve enforce authentication#3180

Merged
google-oss-prow[bot] merged 25 commits into
kubeflow:masterfrom
madmecodes:kserve-auth-new
Jul 28, 2025
Merged

kserve enforce authentication#3180
google-oss-prow[bot] merged 25 commits into
kubeflow:masterfrom
madmecodes:kserve-auth-new

Conversation

@madmecodes

Copy link
Copy Markdown
Contributor

✏️ Summary of Changes

KServe JWT Authentication PR Analysis

Core Security Features

  • JWT-based authentication for cluster-local-gateway: Added RequestAuthentication and AuthorizationPolicy to secure the gateway that KServe uses by default.

  • Two authentication overlays:

    • m2m-auth: Basic JWT authentication requiring valid Kubernetes service account tokens.
    • m2m-auth-strict: Enhanced version with namespace-level isolation.
  • Comprehensive test suite: Multiple test scripts to validate authentication scenarios.

  • Documentation: Added KSERVE_JWT_AUTHENTICATION.md with a detailed implementation guide.

Technical Implementation Details

  1. RequestAuthentication: Validates JWT tokens from Kubernetes API server with proper issuer configuration.
  2. AuthorizationPolicy (DENY): Blocks all requests without valid JWT principals, except health checks.
  3. AuthorizationPolicy (ALLOW): Permits requests with valid JWT principals and exempts health check endpoints.
  4. External access support: Example configurations for secure external access via istio-ingressgateway.
  5. Namespace isolation examples: Templates for restricting access to same-namespace or explicit cross-namespace.

2. How It Addresses the Core Security Issue

The implementation directly addresses issue #2811 by:

  • Closing the authentication gap: Previously, cluster-local-gateway had no authentication while istio-ingressgateway had oauth2-proxy.
  • Consistent security model: Both gateways now require authentication (JWT for cluster-local, oauth2-proxy for ingress).
  • Default secure configuration: Authentication is enforced by default in the knative-cni install script.
  • Flexible authorization: Supports both permissive (any valid JWT) and strict (namespace-isolated) modes.

The PR represents a significant security improvement but leaves room for further enhancements based on community feedback and production usage patterns.

📦 Dependencies

None

🐛 Related Issues

#2811

✅ Contributor Checklist

  • I have tested these changes with kustomize. See Installation Prerequisites.
  • All commits are signed-off to satisfy the DCO check.
  • I have considered adding my company to the adopters page to support Kubeflow and help the community, since I expect help from the community for my issue (see 1. and 2.).

You can join the CNCF Slack and access our meetings at the Kubeflow Community website. Our channel on the CNCF Slack is here #kubeflow-platform.

Comment thread examples/kserve-external-access.yaml Outdated
@@ -0,0 +1,66 @@
# KServe External Access Configuration

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets move this to the other kserve test files. Do not create a new /examples folder.

Comment thread tests/knative-cni_install.sh Outdated
Comment on lines +13 to +14
# Apply cluster-local-gateway with JWT authentication (with retry)
echo "Applying cluster-local-gateway JWT authentication policies..."
for ((i=1; i<=3; i++)); do
if kustomize build common/istio/cluster-local-gateway/overlays/m2m-auth | kubectl apply -f -; then
echo "cluster-local-gateway JWT auth applied successfully"
break
else
echo "Attempt $i failed to apply cluster-local-gateway JWT auth, retrying..."
sleep 5
fi
done

@juliusvonkohout juliusvonkohout Jul 8, 2025

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are you adding this ?
kustomize build common/istio/cluster-local-gateway/overlays/m2m-auth | kubectl apply -f - should be enough instead of 12 lines

Comment thread tests/knative_auth_test.sh Outdated
@@ -0,0 +1,209 @@
#!/bin/bash

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tests/knative_authENTICATION_test.sh

Comment thread tests/kserve_complete_auth_test.sh Outdated
# Wait for InferenceService to be ready
log_info "Waiting for InferenceService to be ready..."
kubectl wait --for=condition=Ready inferenceservice/secure-sklearn -n $PRIMARY_NAMESPACE --timeout=300s || {
log_info "InferenceService not ready, continuing with tests..."

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets remove most of the logging stuff. We just want to fail directly if something is wrong.

Comment thread tests/kserve_complete_auth_test.sh Outdated

# Wait for InferenceService to be ready
log_info "Waiting for InferenceService to be ready..."
kubectl wait --for=condition=Ready inferenceservice/secure-sklearn -n $PRIMARY_NAMESPACE --timeout=300s || {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

300s is a bit much, lets use 180s

Comment thread tests/kserve_complete_auth_test.sh Outdated
@@ -0,0 +1,257 @@
#!/bin/bash

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fix the filename here as well

@@ -0,0 +1,148 @@
#!/bin/bash

@juliusvonkohout juliusvonkohout Jul 8, 2025

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the difference to the other tests? I think also the filename is not clear enough to explain what you are doing.

@@ -0,0 +1,58 @@
#!/bin/bash

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tests/kserve_setup_external_access.sh

@juliusvonkohout

Copy link
Copy Markdown
Member

Please also fix the tests.

@juliusvonkohout juliusvonkohout marked this pull request as ready for review July 15, 2025 15:09
@orfeas-k

Copy link
Copy Markdown

Does this also secure from this issue mentioned in KServe docs?

Currently when activator is on the request path, it is not able to check the originated namespace or original identity due to the net-istio issue.

My understanding is that this is actually not feasible to tackle on kubeflow's side, so users will still be able to access ISVCs in other users' namespaces, when activator intercepts them.

@madmecodes

Copy link
Copy Markdown
Contributor Author

Does this also secure from this issue mentioned in KServe docs?

Currently when activator is on the request path, it is not able to check the originated namespace or original identity due to the net-istio issue.

My understanding is that this is actually not feasible to tackle on kubeflow's side, so users will still be able to access ISVCs in other users' namespaces, when activator intercepts them.

I agree, the core problem: When Knative's activator is in the request path (during cold starts or scale-to-zero), it bypasses namespace-based JWT authentication because the activator has broad access permissions and the original request identity is lost.

From the issue: knative-extensions/net-istio#554 "services in different namespace will be able to communicate with each other whenever activator is in path, because activator is allowed to access any service"

I also thinnk this is not fixable at the manifests level - it's an architectural limitation that requires changes to Knative/KServe itself.

@madmecodes

Copy link
Copy Markdown
Contributor Author

Does this also secure from this issue mentioned in KServe docs?

Currently when activator is on the request path, it is not able to check the originated namespace or original identity due to the net-istio issue.

My understanding is that this is actually not feasible to tackle on kubeflow's side, so users will still be able to access ISVCs in other users' namespaces, when activator intercepts them.

So namespace level wont be possible, but we can try enforcing JWT, atleast that is doable, but im facing errors hard to find whats causing the tests failing.

@orfeas-k

Copy link
Copy Markdown

IMO It 'd be good to mention this architectural limitation in the new setup/overlay provided, given that people using it should be aware of this quite important limitation.

@madmecodes

Copy link
Copy Markdown
Contributor Author

IMO It 'd be good to mention this architectural limitation in the new setup/overlay provided, given that people using it should be aware of this quite important limitation.

Like in the manifest Readme for Istio, we should update that?

@madmecodes

Copy link
Copy Markdown
Contributor Author

@juliusvonkohout can you please re run these 2 tests?

@juliusvonkohout

Copy link
Copy Markdown
Member

@juliusvonkohout can you please re run these 2 tests?

You should be able to do so with

/retest

if not we have to add you as member to Kubeflow. Do you mind creating a PR to add the 4 GSOC students as member to the kubeflow organization @madmecodes ?

@orfeas-k

Copy link
Copy Markdown

Like in the manifest Readme for Istio, we should update that?

Yep exactly, preferably with sth like a warning box if it's aligned with the style of readmes in this repo

madmecodes and others added 21 commits July 28, 2025 10:46
Signed-off-by: madmecodes <ayushguptadev1@gmail.com>
Signed-off-by: madmecodes <ayushguptadev1@gmail.com>
Signed-off-by: madmecodes <ayushguptadev1@gmail.com>
Signed-off-by: madmecodes <ayushguptadev1@gmail.com>
Signed-off-by: madmecodes <ayushguptadev1@gmail.com>
Signed-off-by: madmecodes <ayushguptadev1@gmail.com>
Signed-off-by: madmecodes <ayushguptadev1@gmail.com>
Signed-off-by: madmecodes <ayushguptadev1@gmail.com>
Signed-off-by: madmecodes <ayushguptadev1@gmail.com>
Signed-off-by: madmecodes <ayushguptadev1@gmail.com>
Signed-off-by: madmecodes <ayushguptadev1@gmail.com>
Signed-off-by: madmecodes <ayushguptadev1@gmail.com>
Signed-off-by: Julius von Kohout <45896133+juliusvonkohout@users.noreply.github.com>
Signed-off-by: Julius von Kohout <45896133+juliusvonkohout@users.noreply.github.com>
Signed-off-by: Julius von Kohout <45896133+juliusvonkohout@users.noreply.github.com>
Signed-off-by: Julius von Kohout <45896133+juliusvonkohout@users.noreply.github.com>
Signed-off-by: Julius von Kohout <45896133+juliusvonkohout@users.noreply.github.com>
Signed-off-by: Julius von Kohout <45896133+juliusvonkohout@users.noreply.github.com>
Signed-off-by: Julius von Kohout <45896133+juliusvonkohout@users.noreply.github.com>
Signed-off-by: Julius von Kohout <45896133+juliusvonkohout@users.noreply.github.com>
Signed-off-by: Julius von Kohout <45896133+juliusvonkohout@users.noreply.github.com>
@juliusvonkohout

Copy link
Copy Markdown
Member

Thank you, that significantly improves the status quo.

/lgtm
/approve

@google-oss-prow

Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: juliusvonkohout

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@google-oss-prow google-oss-prow Bot merged commit 560c60a into kubeflow:master Jul 28, 2025
29 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Enable and document for Kubeflow 1.10 Kserve secure inferencing from inside and outside the cluster with tokens

3 participants