test(kserve): add raw deployment test, trim authentication and authorization documentation, print prediction output#3354
Conversation
|
Welcome to the Kubeflow Manifests Repository Thanks for opening your first PR. Your contribution means a lot to the Kubeflow community. Before making more PRs: Community Resources:
Thanks again for helping to improve Kubeflow. |
|
Also for pytest a bit more than could be useful. Can you log the prediction results ? |
|
Also looks like its failing or outputting the oauth2-proxy login page for some reason. |
|
And i see so the inference service might be missing a proper security context. |
juliusvonkohout
left a comment
There was a problem hiding this comment.
Thank you for the PR, I added comments
This is expected behavior. The Sign In HTML is the oauth2-proxy response for The full HTML body in the logs can be reduced by truncating the response |
Yes. The I'll add the proper |
Done -- added explicit prediction logging in kserve_sklearn_test.py INFO Prediction result: [1, 1] (expected [1, 1]) |
There was a problem hiding this comment.
Pull request overview
This PR addresses follow-up items from PR #3352, focusing on improving KServe test infrastructure, documentation, and security hardening. The changes include adding a raw deployment mode test, improving test output readability, trimming verbose documentation, and hardening security contexts.
Changes:
- Added Test 7 to verify KServe raw deployment mode with host-based routing and authentication enforcement
- Improved CI log readability by truncating curl response bodies to 200 characters and logging HTTP codes for all 13 test requests
- Trimmed ~70 lines of verbose content from oauth2-proxy README while preserving essential information
- Added security context to Knative Service test fixture and fixed cleanup gaps
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| tests/kserve_test.sh | Added raw deployment test (Test 7), unified curl response handling with body truncation for all tests, added security context to Knative Service fixture, fixed missing cleanup commands |
| tests/kserve_sklearn_test.py | Added prediction result logging before assertion for better debugging, fixed pytest log level flag |
| common/oauth2-proxy/README.md | Trimmed verbose sections, added CI workflow links, added concise note about routing modes and CI workarounds |
Signed-off-by: Julius von Kohout <45896133+juliusvonkohout@users.noreply.github.com>
- Rename UNAUTH_TOKEN to UNAUTHORIZED_TOKEN: this test validates authorization (RBAC), not authentication (JWT). The token is valid but the ServiceAccount lacks namespace-scoped permissions. - Remove --wait=false from cleanup deletes: deletion is part of the test -- verifying that resources tear down cleanly matters. Signed-off-by: Siddhant Jain <siddhantjainofficial26@gmail.com>
Julius: cleanup must succeed and should complete in <60s. Sequential deletes took ~5 min (sum of finalizer waits). Parallel deletes bring total to ~30-45s (max of individual). All 6 resources are independent — no ordering dependencies. Bump kserve_test.yaml timeout from 25m to 30m per Julius. Signed-off-by: Siddhant Jain <siddhantjainofficial26@gmail.com>
… app timeout to 30m Signed-off-by: Siddhant Jain <siddhantjainofficial26@gmail.com>
The sleep was reduced from 60s to 30s in d31918d, but Envoy in the KinD CI environment needs ~45-60s to pick up the new AuthorizationPolicy. With only 30s, Test 2b (host-based with token) fires before the ALLOW rule propagates, causing 403 RBAC: access denied. Restore 60s for both Test 2 and Test 7 (raw deployment). Signed-off-by: Siddhant Jain <siddhantjainofficial26@gmail.com>
…nPolicy The two sleep 60 calls add 2 minutes of dead time, causing the job to exceed the 30-minute timeout by ~15 seconds. Replace with polling loops that exit as soon as the AuthorizationPolicy propagates through Envoy (typically 10-30s). Each loop polls every 5 seconds for up to 120 seconds, providing both faster execution and more robust handling of slow environments. Signed-off-by: Siddhant Jain <siddhantjainofficial26@gmail.com>
…h-based route The previous polling loop used the path-based URL, which returns 404 immediately in CI before the route is configured. Since 404 != 403, the loop exited after 5 seconds thinking the AuthorizationPolicy had propagated — but it had not. Test 2b host-based then failed with 403. Fix: poll the host-based URL instead, which correctly returns 403 until the AuthorizationPolicy propagates through Envoy. Signed-off-by: Siddhant Jain <siddhantjainofficial26@gmail.com>
The test suite grew with Test 7 (raw deployment) and active polling loops. Total runtime is ~30m18s with 30m timeout. Bump to 35m to validate all tests pass before finalizing the timeout value. Signed-off-by: Siddhant Jain <siddhantjainofficial26@gmail.com>
Signed-off-by: Julius von Kohout <45896133+juliusvonkohout@users.noreply.github.com>
Signed-off-by: Julius von Kohout <45896133+juliusvonkohout@users.noreply.github.com>
Co-authored-by: Julius von Kohout <45896133+juliusvonkohout@users.noreply.github.com> Signed-off-by: Julius von Kohout <45896133+juliusvonkohout@users.noreply.github.com>
Signed-off-by: Julius von Kohout <45896133+juliusvonkohout@users.noreply.github.com>
Signed-off-by: Julius von Kohout <45896133+juliusvonkohout@users.noreply.github.com>
Signed-off-by: Julius von Kohout <45896133+juliusvonkohout@users.noreply.github.com>
KServe v0.16 renamed RawDeployment to Standard (and Serverless to Knative). Using the legacy name causes the validation webhook to reject status updates during deletion, leaving the finalizer in place and the ISVC stuck in a terminating state indefinitely. Update the kserve_test.sh annotation and comments to use Standard. Ref: kserve/kserve#4710, kserve/kserve#4798 Signed-off-by: Siddhant Jain <siddhantjainofficial26@gmail.com>
|
Thank you |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: juliusvonkohout The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
…ization documentation, print prediction output (kubeflow#3354) * test(kserve): add raw deployment test, trim auth docs, print prediction output Addresses follow-up items from PR kubeflow#3352 as requested by @juliusvonkohout: 1. Trim KServe authentication documentation (README.md): - Remove verbose subsections: CI fallback YAML, routing table, Models Web App section, CI Test Coverage table (~70 lines removed) - Add concise 4-line note about CI rules: [{}] workaround - Preserve core architecture and Architecture Analysis sections 2. Add Test 7 for raw deployment mode (kserve_test.sh): - Deploy sklearn model with RawDeployment annotation - AuthorizationPolicy uses serving.kserve.io/inferenceservice label (works for both serverless and raw modes, unlike Knative labels) - Host-based routing only (path-based requires kserve/kserve#5090, not yet merged upstream) - Verifies auth enforcement: rejects unauthenticated, accepts authenticated requests - Reference: inferenceservice.yaml ingressClassName config 3. Print prediction results to console (kserve_test.sh): - All 12 curl commands now capture and print response body + HTTP status code for easier debugging of test failures - Pattern: curl -s -w '\n%{http_code}' + head/tail split 4. Fix pre-existing cleanup gap: - Add missing kubectl delete for allow-isvc-sklearn AuthorizationPolicy (was created but never cleaned up) References: - Follow-up to: kubeflow#3352 - Raw deployment ref: kserve/kserve#5090 - Config ref: kserve/kserve@b8fa0fb/config/configmap/inferenceservice.yaml#L389 - Reviewer comment: kubeflow#3352 (comment) Signed-off-by: Siddhant Jain <siddhantjainofficial26@gmail.com> * address review: remove Architecture Analysis, add securityContext, truncate response body Changes addressing juliusvonkohout review comments: - Remove speculative Architecture Analysis section and orphaned [^6] footnote - Add CI workflow and test script links to KServe Authentication heading - Add host-based and path-based routing sentence with kserve#5090 reference - Add securityContext to secure-model-predictor Knative Service fixture - Truncate response body to 200 characters in echo statements for readable CI logs - Log prediction results in kserve_sklearn_test.py before assertion - Fix pre-existing em dashes in comments (Unicode to ASCII) Signed-off-by: Siddhant Jain <siddhantjainofficial26@gmail.com> * fix(test): add runAsUser to secure-model-predictor securityContext The gcr.io/knative-samples/helloworld-go image runs as root UID 0 by default. Setting runAsNonRoot: true without runAsUser causes kubelet to reject the pod, leading to ksvc timeout in CI. Fix: runAsUser 65534 forces non-root execution. Go binaries are statically linked and run at any UID. Signed-off-by: Siddhant Jain <siddhantjainofficial26@gmail.com> * fix(test): expand ksvc abbreviation, enable live pytest logging Address review: replace ksvc with Knative Service in echo statements. Switch --log-level info to --log-cli-level=INFO so prediction results stream to console during CI runs instead of being captured by pytest. Signed-off-by: Siddhant Jain <siddhantjainofficial26@gmail.com> * fix(test): force-kill port-forward to prevent script hang The kubectl port-forward background process ignores SIGTERM when connections are active. Bash waits for all child processes before exiting, causing the test step to hang indefinitely after all tests pass. Fix by sending SIGTERM, waiting 2s for graceful shutdown, then SIGKILL as fallback, and finally reaping with wait. Signed-off-by: Siddhant Jain <siddhantjainofficial26@gmail.com> * ci(kserve): add 60-min timeout and log collection on failure Add timeout-minutes: 60 per maintainer guidance -- no test should run longer than 60 minutes. Add 'Collect Logs on Failure' and 'Upload Diagnostic Logs' steps modeled after full_kubeflow_integration_test.yaml to capture pod descriptions, events, and container logs when the job fails, enabling root-cause analysis instead of blind timeout bumps. Signed-off-by: Siddhant Jain <siddhantjainofficial26@gmail.com> * fix(ci): collect logs on timeout, skip cleanup waits Two bugs found in run kubeflow#901 (cancelled at 60m, all tests passed): 1. Log collection used if: failure() but timeout triggers cancelled(), not failure(). Logs were never collected. Fix: failure() || cancelled(). 2. Cleanup kubectl-delete commands waited for Kubernetes finalizer reconciliation (default --wait=true). Each InferenceService delete blocked 3-5 min for Knative revision teardown on the 2-vCPU runner, adding ~15 min to the test step. The runner is destroyed after the job, so waiting is pointless. Fix: --wait=false on all deletes. Signed-off-by: Siddhant Jain <siddhantjainofficial26@gmail.com> * Reduce timeout for kserve test job to 25 minutes Signed-off-by: Julius von Kohout <45896133+juliusvonkohout@users.noreply.github.com> * address review: rename UNAUTH_TOKEN, remove --wait=false - Rename UNAUTH_TOKEN to UNAUTHORIZED_TOKEN: this test validates authorization (RBAC), not authentication (JWT). The token is valid but the ServiceAccount lacks namespace-scoped permissions. - Remove --wait=false from cleanup deletes: deletion is part of the test -- verifying that resources tear down cleanly matters. Signed-off-by: Siddhant Jain <siddhantjainofficial26@gmail.com> * fix(ci): parallelize cleanup deletes, bump timeout to 30m Julius: cleanup must succeed and should complete in <60s. Sequential deletes took ~5 min (sum of finalizer waits). Parallel deletes bring total to ~30-45s (max of individual). All 6 resources are independent — no ordering dependencies. Bump kserve_test.yaml timeout from 25m to 30m per Julius. Signed-off-by: Siddhant Jain <siddhantjainofficial26@gmail.com> * fix(ci): reduce AuthorizationPolicy sleep 60s to 30s, bump models web app timeout to 30m Signed-off-by: Siddhant Jain <siddhantjainofficial26@gmail.com> * fix(kserve): restore 60s sleep for AuthorizationPolicy propagation The sleep was reduced from 60s to 30s in d31918d, but Envoy in the KinD CI environment needs ~45-60s to pick up the new AuthorizationPolicy. With only 30s, Test 2b (host-based with token) fires before the ALLOW rule propagates, causing 403 RBAC: access denied. Restore 60s for both Test 2 and Test 7 (raw deployment). Signed-off-by: Siddhant Jain <siddhantjainofficial26@gmail.com> * fix(kserve): replace blind sleep with active polling for AuthorizationPolicy The two sleep 60 calls add 2 minutes of dead time, causing the job to exceed the 30-minute timeout by ~15 seconds. Replace with polling loops that exit as soon as the AuthorizationPolicy propagates through Envoy (typically 10-30s). Each loop polls every 5 seconds for up to 120 seconds, providing both faster execution and more robust handling of slow environments. Signed-off-by: Siddhant Jain <siddhantjainofficial26@gmail.com> * fix(kserve): poll host-based URL to avoid false-positive 404 from path-based route The previous polling loop used the path-based URL, which returns 404 immediately in CI before the route is configured. Since 404 != 403, the loop exited after 5 seconds thinking the AuthorizationPolicy had propagated — but it had not. Test 2b host-based then failed with 403. Fix: poll the host-based URL instead, which correctly returns 403 until the AuthorizationPolicy propagates through Envoy. Signed-off-by: Siddhant Jain <siddhantjainofficial26@gmail.com> * ci(kserve): bump timeout to 35m for expanded test suite The test suite grew with Test 7 (raw deployment) and active polling loops. Total runtime is ~30m18s with 30m timeout. Bump to 35m to validate all tests pass before finalizing the timeout value. Signed-off-by: Siddhant Jain <siddhantjainofficial26@gmail.com> * Update kserve_test.sh Signed-off-by: Julius von Kohout <45896133+juliusvonkohout@users.noreply.github.com> * Update kserve_test.sh Signed-off-by: Julius von Kohout <45896133+juliusvonkohout@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Julius von Kohout <45896133+juliusvonkohout@users.noreply.github.com> Signed-off-by: Julius von Kohout <45896133+juliusvonkohout@users.noreply.github.com> * Apply suggestion from @juliusvonkohout Signed-off-by: Julius von Kohout <45896133+juliusvonkohout@users.noreply.github.com> * Update kserve_test.sh Signed-off-by: Julius von Kohout <45896133+juliusvonkohout@users.noreply.github.com> * Apply suggestion from @juliusvonkohout Signed-off-by: Julius von Kohout <45896133+juliusvonkohout@users.noreply.github.com> * fix(kserve): use Standard instead of deprecated RawDeployment annotation KServe v0.16 renamed RawDeployment to Standard (and Serverless to Knative). Using the legacy name causes the validation webhook to reject status updates during deletion, leaving the finalizer in place and the ISVC stuck in a terminating state indefinitely. Update the kserve_test.sh annotation and comments to use Standard. Ref: kserve/kserve#4710, kserve/kserve#4798 Signed-off-by: Siddhant Jain <siddhantjainofficial26@gmail.com> --------- Signed-off-by: Siddhant Jain <siddhantjainofficial26@gmail.com> Signed-off-by: Julius von Kohout <45896133+juliusvonkohout@users.noreply.github.com> Co-authored-by: Julius von Kohout <45896133+juliusvonkohout@users.noreply.github.com>
✏️ Summary of Changes
Addresses the three follow-up items from PR #3352 as requested by @juliusvonkohout in this comment:
1. Trim KServe authentication documentation (common/oauth2-proxy/README.md)
[^6]footnote (review)rules: [{}]workaround2. Add test for raw deployment mode (tests/kserve_test.sh)
serving.kserve.io/deploymentMode: RawDeploymentannotationserving.kserve.io/inferenceservicelabel (works for both serverless and raw deployment modes, unlike Knative-specific labels)ingressPathTemplatewhich is not yet merged upstream3. Improve test output and CI log readability
HTTP <code> | <truncated body>format4. Security hardening and cleanup
securityContextblock tosecure-model-predictorKnative Service fixture (allowPrivilegeEscalation: false,runAsNonRoot: true,seccompProfile: RuntimeDefault,capabilities: drop: ["ALL"]) (review)kubectl deleteforallow-isvc-sklearnAuthorizationPolicy cleanup (pre-existing gap from refactor(tests): consolidate KServe test infrastructure into two-file pattern #3352)📦 Dependencies / Related Issues
✅ Contributor Checklist