Skip to content

refactor(tests): consolidate KServe test infrastructure into two-file pattern#3352

Merged
google-oss-prow[bot] merged 8 commits into
kubeflow:masterfrom
Raakshass:cleanup/kserve-test-infrastructure
Feb 24, 2026
Merged

refactor(tests): consolidate KServe test infrastructure into two-file pattern#3352
google-oss-prow[bot] merged 8 commits into
kubeflow:masterfrom
Raakshass:cleanup/kserve-test-infrastructure

Conversation

@Raakshass

Copy link
Copy Markdown
Contributor

✏️ Summary of Changes

Consolidates scattered KServe test scripts into the canonical kserve_install.sh + kserve_test.sh two-file pattern, matching every other component's test structure in this repository.

Consolidated tests/kserve_test.sh

Merged three separate test scripts into a single structured script with 6 tests:

# Test Source
1 Model inference via KServe SDK (pytest) existing kserve_test.sh
2 Ingress gateway security — AuthorizationPolicy enforcement existing kserve_test.sh (removed manual VirtualService)
3 KServe Models Web Application API — XSRF + auth + unauthorized access kserve_models_web_application_test.sh
4 Knative Service authentication via cluster-local-gateway existing kserve_test.sh
5 Namespace isolation — cross-namespace token rejection kserve_complete_authentication_test.sh
6 Unauthenticated request rejection existing kserve_test.sh
Key changes to kserve_test.sh:

Deleted obsolete files

File Reason
tests/kserve_complete_authentication_test.sh Logic merged into kserve_test.sh
tests/kserve_models_web_application_test.sh Logic merged into kserve_test.sh
tests/kserve_setup_external_access.sh Obsoleted by #3348 native pathTemplate support
tests/kserve/kserve-external-access.yaml Obsoleted by #3348 native pathTemplate support
tests/kserve_test.yaml Unused CI artifact (not to be confused with the workflow)

Updated CI workflows

  • .github/workflows/kserve_models_web_application_test.yaml:
    • Calls consolidated kserve_test.sh instead of deleted kserve_models_web_application_test.sh
    • Added setup-python@v4 step (required by pytest in consolidated script)
    • Added tests/kserve/** path trigger
    • Increased timeout-minutes from 25 to 45 (now runs all 6 tests)
  • .github/workflows/full_kubeflow_integration_test.yaml:
    • Removed redundant Test KServe Models Web Application API step (already covered by kserve_test.sh on the preceding step)

Updated dependencies

  • tests/kserve/requirements.txt: kserve>=0.15.0kserve>=0.16.0

📦 Dependencies

Depends-On: #3348

This PR assumes #3348 is merged first. It removes the manual VirtualService creation that #3348 replaces with native pathTemplate support, and updates prediction paths from /kserve/ to /serving/.

🐛 Related Issues

  • Addresses test infrastructure fragmentation across multiple KServe test scripts
  • Aligns with the two-file pattern (*_install.sh + *_test.sh) used by all other components

✅ Contributor Checklist

  • I have tested these changes with kustomize. See Installation Prerequisites.
  • All commits are signed-off to satisfy the DCO check.
  • I have considered adding my company to the adopters page to support Kubeflow and help the community, since I expect help from the community for my issue (see 1. and 2.).
    /cc @juliusvonkohout

@github-actions

Copy link
Copy Markdown

Welcome to the Kubeflow Manifests Repository

Thanks for opening your first PR. Your contribution means a lot to the Kubeflow community.

Before making more PRs:
Please ensure your PR follows our Contributing Guide.
Please also be aware that many components are synchronizes from upstream via the scripts in /scripts.
So in some cases you have to fix the problem in the upstream repositories first, but you can use a PR against kubeflow/manifests to test the platform integration.

Community Resources:

Thanks again for helping to improve Kubeflow.

@Raakshass Raakshass force-pushed the cleanup/kserve-test-infrastructure branch from 96606f1 to c4732b8 Compare February 18, 2026 18:06
Comment thread tests/kserve/requirements.txt Outdated
Comment thread .github/workflows/kserve_models_web_application_test.yaml Outdated
Comment thread tests/kserve_test.sh Outdated
@Raakshass Raakshass force-pushed the cleanup/kserve-test-infrastructure branch from c4732b8 to 93b6c08 Compare February 19, 2026 17:44
@Raakshass Raakshass force-pushed the cleanup/kserve-test-infrastructure branch from 93b6c08 to bef94d9 Compare February 20, 2026 09:49
@google-oss-prow google-oss-prow Bot added size/XL and removed size/L labels Feb 20, 2026
Comment thread tests/kserve_sklearn_test.py Outdated
@juliusvonkohout

Copy link
Copy Markdown
Member
image The test must fail if the a single part fails, also the python part.

Comment thread tests/kserve_test.sh Outdated
Comment thread tests/kserve_test.sh Outdated
Comment thread tests/kserve_test.sh
Comment thread tests/kserve_sklearn_test.py Outdated
Comment thread tests/kserve_test.sh Outdated
Comment thread tests/kserve_test.sh Outdated
Comment thread tests/kserve_sklearn_test.py
@Raakshass Raakshass force-pushed the cleanup/kserve-test-infrastructure branch from efeffdd to 0165de3 Compare February 20, 2026 14:33
Comment thread tests/kserve/test_sklearn.py
@Raakshass Raakshass force-pushed the cleanup/kserve-test-infrastructure branch 3 times, most recently from 5fc4d38 to 1775bec Compare February 22, 2026 07:57
Comment thread tests/kserve_test.sh Outdated
@Raakshass Raakshass force-pushed the cleanup/kserve-test-infrastructure branch from f93858e to e1903db Compare February 23, 2026 16:52
@juliusvonkohout

Copy link
Copy Markdown
Member

I think the raw deployment mode is for a follow up PR. This one here is long-lived enough.

"an additional test for a raw deployment would also be interesting, see kserve/kserve#5090
see also https://github.com/kserve/kserve/blob/b8fa0fbf8310c07b775e5c7407043253a2a0e6d0/config/configmap/inferenceservice.yaml#L389"

…ncipals

Signed-off-by: Siddhant Jain <siddhantjainofficial26@gmail.com>
@Raakshass Raakshass force-pushed the cleanup/kserve-test-infrastructure branch from 42d00eb to 9300633 Compare February 23, 2026 16:58
juliusvonkohout and others added 4 commits February 23, 2026 18:00
Signed-off-by: Julius von Kohout <45896133+juliusvonkohout@users.noreply.github.com>
Signed-off-by: Siddhant Jain <siddhantjainofficial26@gmail.com>
…redictor pod

Signed-off-by: Siddhant Jain <siddhantjainofficial26@gmail.com>
Signed-off-by: Siddhant Jain <siddhantjainofficial26@gmail.com>
@juliusvonkohout

Copy link
Copy Markdown
Member

Signed-off-by: Siddhant Jain <siddhantjainofficial26@gmail.com>
Comment thread common/oauth2-proxy/README.md
@juliusvonkohout

juliusvonkohout commented Feb 24, 2026

Copy link
Copy Markdown
Member

Thank you, lets cut down the documentation (removing code and documentation is often more important than adding.) and add tests for the raw deployment in a follow up PR. #3352 (comment)

And please print actual prediction result on the console, do not hide them, such that from to test output its easier to see results.

/lgtm
/approve

@google-oss-prow

Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: juliusvonkohout

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@google-oss-prow google-oss-prow Bot merged commit c589121 into kubeflow:master Feb 24, 2026
18 checks passed
@Raakshass

Copy link
Copy Markdown
Contributor Author

Thank you for the thorough review and merge, @juliusvonkohout. Noted the follow-up items:

Trim the KServe auth documentation — remove redundant code blocks
Add raw deployment mode test (ref: kserve/kserve#5090)
Print prediction results to console instead of suppressing them
I'll open the follow-up PR against the updated master.

Raakshass added a commit to Raakshass/manifests that referenced this pull request Feb 24, 2026
…on output

Addresses follow-up items from PR kubeflow#3352 as requested by @juliusvonkohout:

1. Trim KServe authentication documentation (README.md):
   - Remove verbose subsections: CI fallback YAML, routing table,
     Models Web App section, CI Test Coverage table (~70 lines removed)
   - Add concise 4-line note about CI rules: [{}] workaround
   - Preserve core architecture and Architecture Analysis sections

2. Add Test 7 for raw deployment mode (kserve_test.sh):
   - Deploy sklearn model with RawDeployment annotation
   - AuthorizationPolicy uses serving.kserve.io/inferenceservice label
     (works for both serverless and raw modes, unlike Knative labels)
   - Host-based routing only (path-based requires kserve/kserve#5090,
     not yet merged upstream)
   - Verifies auth enforcement: rejects unauthenticated, accepts
     authenticated requests
   - Reference: inferenceservice.yaml ingressClassName config

3. Print prediction results to console (kserve_test.sh):
   - All 12 curl commands now capture and print response body + HTTP
     status code for easier debugging of test failures
   - Pattern: curl -s -w '\n%{http_code}' + head/tail split

4. Fix pre-existing cleanup gap:
   - Add missing kubectl delete for allow-isvc-sklearn
     AuthorizationPolicy (was created but never cleaned up)

References:
- Follow-up to: kubeflow#3352
- Raw deployment ref: kserve/kserve#5090
- Config ref: kserve/kserve@b8fa0fb/config/configmap/inferenceservice.yaml#L389
- Reviewer comment: kubeflow#3352 (comment)

Signed-off-by: Siddhant Jain <siddhantjainofficial26@gmail.com>
Raakshass added a commit to Raakshass/manifests that referenced this pull request Mar 7, 2026
…on output

Addresses follow-up items from PR kubeflow#3352 as requested by @juliusvonkohout:

1. Trim KServe authentication documentation (README.md):
   - Remove verbose subsections: CI fallback YAML, routing table,
     Models Web App section, CI Test Coverage table (~70 lines removed)
   - Add concise 4-line note about CI rules: [{}] workaround
   - Preserve core architecture and Architecture Analysis sections

2. Add Test 7 for raw deployment mode (kserve_test.sh):
   - Deploy sklearn model with RawDeployment annotation
   - AuthorizationPolicy uses serving.kserve.io/inferenceservice label
     (works for both serverless and raw modes, unlike Knative labels)
   - Host-based routing only (path-based requires kserve/kserve#5090,
     not yet merged upstream)
   - Verifies auth enforcement: rejects unauthenticated, accepts
     authenticated requests
   - Reference: inferenceservice.yaml ingressClassName config

3. Print prediction results to console (kserve_test.sh):
   - All 12 curl commands now capture and print response body + HTTP
     status code for easier debugging of test failures
   - Pattern: curl -s -w '\n%{http_code}' + head/tail split

4. Fix pre-existing cleanup gap:
   - Add missing kubectl delete for allow-isvc-sklearn
     AuthorizationPolicy (was created but never cleaned up)

References:
- Follow-up to: kubeflow#3352
- Raw deployment ref: kserve/kserve#5090
- Config ref: kserve/kserve@b8fa0fb/config/configmap/inferenceservice.yaml#L389
- Reviewer comment: kubeflow#3352 (comment)

Signed-off-by: Siddhant Jain <siddhantjainofficial26@gmail.com>
google-oss-prow Bot pushed a commit that referenced this pull request Mar 12, 2026
…ization documentation, print prediction output (#3354)

* test(kserve): add raw deployment test, trim auth docs, print prediction output

Addresses follow-up items from PR #3352 as requested by @juliusvonkohout:

1. Trim KServe authentication documentation (README.md):
   - Remove verbose subsections: CI fallback YAML, routing table,
     Models Web App section, CI Test Coverage table (~70 lines removed)
   - Add concise 4-line note about CI rules: [{}] workaround
   - Preserve core architecture and Architecture Analysis sections

2. Add Test 7 for raw deployment mode (kserve_test.sh):
   - Deploy sklearn model with RawDeployment annotation
   - AuthorizationPolicy uses serving.kserve.io/inferenceservice label
     (works for both serverless and raw modes, unlike Knative labels)
   - Host-based routing only (path-based requires kserve/kserve#5090,
     not yet merged upstream)
   - Verifies auth enforcement: rejects unauthenticated, accepts
     authenticated requests
   - Reference: inferenceservice.yaml ingressClassName config

3. Print prediction results to console (kserve_test.sh):
   - All 12 curl commands now capture and print response body + HTTP
     status code for easier debugging of test failures
   - Pattern: curl -s -w '\n%{http_code}' + head/tail split

4. Fix pre-existing cleanup gap:
   - Add missing kubectl delete for allow-isvc-sklearn
     AuthorizationPolicy (was created but never cleaned up)

References:
- Follow-up to: #3352
- Raw deployment ref: kserve/kserve#5090
- Config ref: kserve/kserve@b8fa0fb/config/configmap/inferenceservice.yaml#L389
- Reviewer comment: #3352 (comment)

Signed-off-by: Siddhant Jain <siddhantjainofficial26@gmail.com>

* address review: remove Architecture Analysis, add securityContext, truncate response body

Changes addressing juliusvonkohout review comments:

- Remove speculative Architecture Analysis section and orphaned [^6] footnote

- Add CI workflow and test script links to KServe Authentication heading

- Add host-based and path-based routing sentence with kserve#5090 reference

- Add securityContext to secure-model-predictor Knative Service fixture

- Truncate response body to 200 characters in echo statements for readable CI logs

- Log prediction results in kserve_sklearn_test.py before assertion

- Fix pre-existing em dashes in comments (Unicode to ASCII)

Signed-off-by: Siddhant Jain <siddhantjainofficial26@gmail.com>

* fix(test): add runAsUser to secure-model-predictor securityContext

The gcr.io/knative-samples/helloworld-go image runs as root UID 0 by default. Setting runAsNonRoot: true without runAsUser causes kubelet to reject the pod, leading to ksvc timeout in CI.

Fix: runAsUser 65534 forces non-root execution. Go binaries are statically linked and run at any UID.
Signed-off-by: Siddhant Jain <siddhantjainofficial26@gmail.com>

* fix(test): expand ksvc abbreviation, enable live pytest logging

Address review: replace ksvc with Knative Service in echo statements.

Switch --log-level info to --log-cli-level=INFO so prediction results

stream to console during CI runs instead of being captured by pytest.

Signed-off-by: Siddhant Jain <siddhantjainofficial26@gmail.com>

* fix(test): force-kill port-forward to prevent script hang

The kubectl port-forward background process ignores SIGTERM when
connections are active. Bash waits for all child processes before
exiting, causing the test step to hang indefinitely after all tests
pass. Fix by sending SIGTERM, waiting 2s for graceful shutdown, then
SIGKILL as fallback, and finally reaping with wait.

Signed-off-by: Siddhant Jain <siddhantjainofficial26@gmail.com>

* ci(kserve): add 60-min timeout and log collection on failure

Add timeout-minutes: 60 per maintainer guidance -- no test should run
longer than 60 minutes. Add 'Collect Logs on Failure' and 'Upload
Diagnostic Logs' steps modeled after full_kubeflow_integration_test.yaml
to capture pod descriptions, events, and container logs when the job
fails, enabling root-cause analysis instead of blind timeout bumps.

Signed-off-by: Siddhant Jain <siddhantjainofficial26@gmail.com>

* fix(ci): collect logs on timeout, skip cleanup waits

Two bugs found in run #901 (cancelled at 60m, all tests passed):

1. Log collection used if: failure() but timeout triggers cancelled(),
   not failure(). Logs were never collected. Fix: failure() || cancelled().

2. Cleanup kubectl-delete commands waited for Kubernetes finalizer
   reconciliation (default --wait=true). Each InferenceService delete
   blocked 3-5 min for Knative revision teardown on the 2-vCPU runner,
   adding ~15 min to the test step. The runner is destroyed after the
   job, so waiting is pointless. Fix: --wait=false on all deletes.

Signed-off-by: Siddhant Jain <siddhantjainofficial26@gmail.com>

* Reduce timeout for kserve test job to 25 minutes

Signed-off-by: Julius von Kohout <45896133+juliusvonkohout@users.noreply.github.com>

* address review: rename UNAUTH_TOKEN, remove --wait=false

- Rename UNAUTH_TOKEN to UNAUTHORIZED_TOKEN: this test validates
  authorization (RBAC), not authentication (JWT). The token is valid
  but the ServiceAccount lacks namespace-scoped permissions.

- Remove --wait=false from cleanup deletes: deletion is part of the
  test -- verifying that resources tear down cleanly matters.

Signed-off-by: Siddhant Jain <siddhantjainofficial26@gmail.com>

* fix(ci): parallelize cleanup deletes, bump timeout to 30m

Julius: cleanup must succeed and should complete in <60s.
Sequential deletes took ~5 min (sum of finalizer waits).
Parallel deletes bring total to ~30-45s (max of individual).
All 6 resources are independent — no ordering dependencies.
Bump kserve_test.yaml timeout from 25m to 30m per Julius.

Signed-off-by: Siddhant Jain <siddhantjainofficial26@gmail.com>

* fix(ci): reduce AuthorizationPolicy sleep 60s to 30s, bump models web app timeout to 30m

Signed-off-by: Siddhant Jain <siddhantjainofficial26@gmail.com>

* fix(kserve): restore 60s sleep for AuthorizationPolicy propagation

The sleep was reduced from 60s to 30s in d31918d, but Envoy in the KinD CI environment needs ~45-60s to pick up the new AuthorizationPolicy. With only 30s, Test 2b (host-based with token) fires before the ALLOW rule propagates, causing 403 RBAC: access denied. Restore 60s for both Test 2 and Test 7 (raw deployment).

Signed-off-by: Siddhant Jain <siddhantjainofficial26@gmail.com>

* fix(kserve): replace blind sleep with active polling for AuthorizationPolicy

The two sleep 60 calls add 2 minutes of dead time, causing the job to exceed the 30-minute timeout by ~15 seconds. Replace with polling loops that exit as soon as the AuthorizationPolicy propagates through Envoy (typically 10-30s). Each loop polls every 5 seconds for up to 120 seconds, providing both faster execution and more robust handling of slow environments.

Signed-off-by: Siddhant Jain <siddhantjainofficial26@gmail.com>

* fix(kserve): poll host-based URL to avoid false-positive 404 from path-based route

The previous polling loop used the path-based URL, which returns 404 immediately in CI before the route is configured. Since 404 != 403, the loop exited after 5 seconds thinking the AuthorizationPolicy had propagated — but it had not. Test 2b host-based then failed with 403. Fix: poll the host-based URL instead, which correctly returns 403 until the AuthorizationPolicy propagates through Envoy.

Signed-off-by: Siddhant Jain <siddhantjainofficial26@gmail.com>

* ci(kserve): bump timeout to 35m for expanded test suite

The test suite grew with Test 7 (raw deployment) and active polling loops. Total runtime is ~30m18s with 30m timeout. Bump to 35m to validate all tests pass before finalizing the timeout value.

Signed-off-by: Siddhant Jain <siddhantjainofficial26@gmail.com>

* Update kserve_test.sh

Signed-off-by: Julius von Kohout <45896133+juliusvonkohout@users.noreply.github.com>

* Update kserve_test.sh

Signed-off-by: Julius von Kohout <45896133+juliusvonkohout@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Julius von Kohout <45896133+juliusvonkohout@users.noreply.github.com>
Signed-off-by: Julius von Kohout <45896133+juliusvonkohout@users.noreply.github.com>

* Apply suggestion from @juliusvonkohout

Signed-off-by: Julius von Kohout <45896133+juliusvonkohout@users.noreply.github.com>

* Update kserve_test.sh

Signed-off-by: Julius von Kohout <45896133+juliusvonkohout@users.noreply.github.com>

* Apply suggestion from @juliusvonkohout

Signed-off-by: Julius von Kohout <45896133+juliusvonkohout@users.noreply.github.com>

* fix(kserve): use Standard instead of deprecated RawDeployment annotation

KServe v0.16 renamed RawDeployment to Standard (and Serverless to Knative). Using the legacy name causes the validation webhook to reject status updates during deletion, leaving the finalizer in place and the ISVC stuck in a terminating state indefinitely.

Update the kserve_test.sh annotation and comments to use Standard.

Ref: kserve/kserve#4710, kserve/kserve#4798
Signed-off-by: Siddhant Jain <siddhantjainofficial26@gmail.com>

---------

Signed-off-by: Siddhant Jain <siddhantjainofficial26@gmail.com>
Signed-off-by: Julius von Kohout <45896133+juliusvonkohout@users.noreply.github.com>
Co-authored-by: Julius von Kohout <45896133+juliusvonkohout@users.noreply.github.com>
Raakshass added a commit to Raakshass/manifests that referenced this pull request Mar 27, 2026
… pattern (kubeflow#3352)

* refactor(tests): consolidate KServe test infrastructure into two-file pattern

Merge scattered KServe test scripts into the canonical *_install.sh +
*_test.sh pattern matching all other components. Inline utils.py and
iris_input.json into kserve_sklearn_test.py, removing the tests/kserve/
subfolder entirely.

Key changes:
- Consolidate 3 test scripts into kserve_test.sh (6 tests)
- Add pathTemplate ConfigMap patch in kustomization.yaml
- Use source.namespaces mTLS AuthorizationPolicy
- Both path-based and host-based routing tested via curl
- Python SDK prediction test runs independently via pytest
- Remove accidental build artifacts (kubeflow-all.yaml)

Signed-off-by: Siddhant Jain <siddhantjainofficial26@gmail.com>

* Apply suggestion from @juliusvonkohout

Signed-off-by: Julius von Kohout <45896133+juliusvonkohout@users.noreply.github.com>

* fix: address reviewer feedback — inline deps, revert AP to requestPrincipals

Signed-off-by: Siddhant Jain <siddhantjainofficial26@gmail.com>

* Apply suggestions from code review

Signed-off-by: Julius von Kohout <45896133+juliusvonkohout@users.noreply.github.com>

* fix: bootstrap pytest before python -m pytest in kserve_test.sh

Signed-off-by: Siddhant Jain <siddhantjainofficial26@gmail.com>

* fix: use permissive allow-all AuthorizationPolicy (rules: [{}]) for predictor pod

Signed-off-by: Siddhant Jain <siddhantjainofficial26@gmail.com>

* fix: use allow-all AuthorizationPolicy in bash test (same sidecar issue)

Signed-off-by: Siddhant Jain <siddhantjainofficial26@gmail.com>

* docs: update KServe Authentication section in oauth2-proxy README

Signed-off-by: Siddhant Jain <siddhantjainofficial26@gmail.com>

---------

Signed-off-by: Siddhant Jain <siddhantjainofficial26@gmail.com>
Signed-off-by: Julius von Kohout <45896133+juliusvonkohout@users.noreply.github.com>
Co-authored-by: Julius von Kohout <45896133+juliusvonkohout@users.noreply.github.com>
Raakshass added a commit to Raakshass/manifests that referenced this pull request Mar 27, 2026
…ization documentation, print prediction output (kubeflow#3354)

* test(kserve): add raw deployment test, trim auth docs, print prediction output

Addresses follow-up items from PR kubeflow#3352 as requested by @juliusvonkohout:

1. Trim KServe authentication documentation (README.md):
   - Remove verbose subsections: CI fallback YAML, routing table,
     Models Web App section, CI Test Coverage table (~70 lines removed)
   - Add concise 4-line note about CI rules: [{}] workaround
   - Preserve core architecture and Architecture Analysis sections

2. Add Test 7 for raw deployment mode (kserve_test.sh):
   - Deploy sklearn model with RawDeployment annotation
   - AuthorizationPolicy uses serving.kserve.io/inferenceservice label
     (works for both serverless and raw modes, unlike Knative labels)
   - Host-based routing only (path-based requires kserve/kserve#5090,
     not yet merged upstream)
   - Verifies auth enforcement: rejects unauthenticated, accepts
     authenticated requests
   - Reference: inferenceservice.yaml ingressClassName config

3. Print prediction results to console (kserve_test.sh):
   - All 12 curl commands now capture and print response body + HTTP
     status code for easier debugging of test failures
   - Pattern: curl -s -w '\n%{http_code}' + head/tail split

4. Fix pre-existing cleanup gap:
   - Add missing kubectl delete for allow-isvc-sklearn
     AuthorizationPolicy (was created but never cleaned up)

References:
- Follow-up to: kubeflow#3352
- Raw deployment ref: kserve/kserve#5090
- Config ref: kserve/kserve@b8fa0fb/config/configmap/inferenceservice.yaml#L389
- Reviewer comment: kubeflow#3352 (comment)

Signed-off-by: Siddhant Jain <siddhantjainofficial26@gmail.com>

* address review: remove Architecture Analysis, add securityContext, truncate response body

Changes addressing juliusvonkohout review comments:

- Remove speculative Architecture Analysis section and orphaned [^6] footnote

- Add CI workflow and test script links to KServe Authentication heading

- Add host-based and path-based routing sentence with kserve#5090 reference

- Add securityContext to secure-model-predictor Knative Service fixture

- Truncate response body to 200 characters in echo statements for readable CI logs

- Log prediction results in kserve_sklearn_test.py before assertion

- Fix pre-existing em dashes in comments (Unicode to ASCII)

Signed-off-by: Siddhant Jain <siddhantjainofficial26@gmail.com>

* fix(test): add runAsUser to secure-model-predictor securityContext

The gcr.io/knative-samples/helloworld-go image runs as root UID 0 by default. Setting runAsNonRoot: true without runAsUser causes kubelet to reject the pod, leading to ksvc timeout in CI.

Fix: runAsUser 65534 forces non-root execution. Go binaries are statically linked and run at any UID.
Signed-off-by: Siddhant Jain <siddhantjainofficial26@gmail.com>

* fix(test): expand ksvc abbreviation, enable live pytest logging

Address review: replace ksvc with Knative Service in echo statements.

Switch --log-level info to --log-cli-level=INFO so prediction results

stream to console during CI runs instead of being captured by pytest.

Signed-off-by: Siddhant Jain <siddhantjainofficial26@gmail.com>

* fix(test): force-kill port-forward to prevent script hang

The kubectl port-forward background process ignores SIGTERM when
connections are active. Bash waits for all child processes before
exiting, causing the test step to hang indefinitely after all tests
pass. Fix by sending SIGTERM, waiting 2s for graceful shutdown, then
SIGKILL as fallback, and finally reaping with wait.

Signed-off-by: Siddhant Jain <siddhantjainofficial26@gmail.com>

* ci(kserve): add 60-min timeout and log collection on failure

Add timeout-minutes: 60 per maintainer guidance -- no test should run
longer than 60 minutes. Add 'Collect Logs on Failure' and 'Upload
Diagnostic Logs' steps modeled after full_kubeflow_integration_test.yaml
to capture pod descriptions, events, and container logs when the job
fails, enabling root-cause analysis instead of blind timeout bumps.

Signed-off-by: Siddhant Jain <siddhantjainofficial26@gmail.com>

* fix(ci): collect logs on timeout, skip cleanup waits

Two bugs found in run kubeflow#901 (cancelled at 60m, all tests passed):

1. Log collection used if: failure() but timeout triggers cancelled(),
   not failure(). Logs were never collected. Fix: failure() || cancelled().

2. Cleanup kubectl-delete commands waited for Kubernetes finalizer
   reconciliation (default --wait=true). Each InferenceService delete
   blocked 3-5 min for Knative revision teardown on the 2-vCPU runner,
   adding ~15 min to the test step. The runner is destroyed after the
   job, so waiting is pointless. Fix: --wait=false on all deletes.

Signed-off-by: Siddhant Jain <siddhantjainofficial26@gmail.com>

* Reduce timeout for kserve test job to 25 minutes

Signed-off-by: Julius von Kohout <45896133+juliusvonkohout@users.noreply.github.com>

* address review: rename UNAUTH_TOKEN, remove --wait=false

- Rename UNAUTH_TOKEN to UNAUTHORIZED_TOKEN: this test validates
  authorization (RBAC), not authentication (JWT). The token is valid
  but the ServiceAccount lacks namespace-scoped permissions.

- Remove --wait=false from cleanup deletes: deletion is part of the
  test -- verifying that resources tear down cleanly matters.

Signed-off-by: Siddhant Jain <siddhantjainofficial26@gmail.com>

* fix(ci): parallelize cleanup deletes, bump timeout to 30m

Julius: cleanup must succeed and should complete in <60s.
Sequential deletes took ~5 min (sum of finalizer waits).
Parallel deletes bring total to ~30-45s (max of individual).
All 6 resources are independent — no ordering dependencies.
Bump kserve_test.yaml timeout from 25m to 30m per Julius.

Signed-off-by: Siddhant Jain <siddhantjainofficial26@gmail.com>

* fix(ci): reduce AuthorizationPolicy sleep 60s to 30s, bump models web app timeout to 30m

Signed-off-by: Siddhant Jain <siddhantjainofficial26@gmail.com>

* fix(kserve): restore 60s sleep for AuthorizationPolicy propagation

The sleep was reduced from 60s to 30s in d31918d, but Envoy in the KinD CI environment needs ~45-60s to pick up the new AuthorizationPolicy. With only 30s, Test 2b (host-based with token) fires before the ALLOW rule propagates, causing 403 RBAC: access denied. Restore 60s for both Test 2 and Test 7 (raw deployment).

Signed-off-by: Siddhant Jain <siddhantjainofficial26@gmail.com>

* fix(kserve): replace blind sleep with active polling for AuthorizationPolicy

The two sleep 60 calls add 2 minutes of dead time, causing the job to exceed the 30-minute timeout by ~15 seconds. Replace with polling loops that exit as soon as the AuthorizationPolicy propagates through Envoy (typically 10-30s). Each loop polls every 5 seconds for up to 120 seconds, providing both faster execution and more robust handling of slow environments.

Signed-off-by: Siddhant Jain <siddhantjainofficial26@gmail.com>

* fix(kserve): poll host-based URL to avoid false-positive 404 from path-based route

The previous polling loop used the path-based URL, which returns 404 immediately in CI before the route is configured. Since 404 != 403, the loop exited after 5 seconds thinking the AuthorizationPolicy had propagated — but it had not. Test 2b host-based then failed with 403. Fix: poll the host-based URL instead, which correctly returns 403 until the AuthorizationPolicy propagates through Envoy.

Signed-off-by: Siddhant Jain <siddhantjainofficial26@gmail.com>

* ci(kserve): bump timeout to 35m for expanded test suite

The test suite grew with Test 7 (raw deployment) and active polling loops. Total runtime is ~30m18s with 30m timeout. Bump to 35m to validate all tests pass before finalizing the timeout value.

Signed-off-by: Siddhant Jain <siddhantjainofficial26@gmail.com>

* Update kserve_test.sh

Signed-off-by: Julius von Kohout <45896133+juliusvonkohout@users.noreply.github.com>

* Update kserve_test.sh

Signed-off-by: Julius von Kohout <45896133+juliusvonkohout@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Julius von Kohout <45896133+juliusvonkohout@users.noreply.github.com>
Signed-off-by: Julius von Kohout <45896133+juliusvonkohout@users.noreply.github.com>

* Apply suggestion from @juliusvonkohout

Signed-off-by: Julius von Kohout <45896133+juliusvonkohout@users.noreply.github.com>

* Update kserve_test.sh

Signed-off-by: Julius von Kohout <45896133+juliusvonkohout@users.noreply.github.com>

* Apply suggestion from @juliusvonkohout

Signed-off-by: Julius von Kohout <45896133+juliusvonkohout@users.noreply.github.com>

* fix(kserve): use Standard instead of deprecated RawDeployment annotation

KServe v0.16 renamed RawDeployment to Standard (and Serverless to Knative). Using the legacy name causes the validation webhook to reject status updates during deletion, leaving the finalizer in place and the ISVC stuck in a terminating state indefinitely.

Update the kserve_test.sh annotation and comments to use Standard.

Ref: kserve/kserve#4710, kserve/kserve#4798
Signed-off-by: Siddhant Jain <siddhantjainofficial26@gmail.com>

---------

Signed-off-by: Siddhant Jain <siddhantjainofficial26@gmail.com>
Signed-off-by: Julius von Kohout <45896133+juliusvonkohout@users.noreply.github.com>
Co-authored-by: Julius von Kohout <45896133+juliusvonkohout@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants