Skip to content

Conversation

@varunrsekar
Copy link
Collaborator

DRA is GA in k8s v1.34! This change bumps up the consumed DRA APIs from v1beta2 -> v1 api version.

Changes:

  1. Bump go mod dependencies for k8s libs to v0.34
  2. Pin kserve libs to a commit in the master branch that supports k8s v0.34 libs (https://github.com/kserve/kserve/tree/1cb39eba0f1a/) - this is a temporary change until kserve v0.17.0 is released.
  3. Tweaks in all kserve-related code to support the above version

Note:

  1. With this change, NIM Operator will no longer support k8s versions below 1.34 for consuming DRA.
  2. NIM Operator will no longer support DRA resourceclaims/resourceclaimtemplates of api version v1beta2

Testing:

  1. Verified w/ DRA and w/o DRA with k8s cluster version v1.34
  2. Verified w/o DRA with k8s cluster version v1.33

@copy-pr-bot
Copy link

copy-pr-bot bot commented Dec 10, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@varunrsekar
Copy link
Collaborator Author

/cc @xieshenzh

@xieshenzh
Copy link
Contributor

xieshenzh commented Dec 10, 2025

/cc @xieshenzh

I think this should be updated as well:

mode, annotated := spec.Annotations["serving.kserve.org/deploymentMode"]
// If the annotation is absent, kserve defaults to serverless.
serverless := !annotated || strings.EqualFold(mode, "serverless")
// When Spec.InferencePlatform is "kserve" and used in "serverless" mode:
if platformIsKServe && serverless {

By the way, I believe the annotation should be changed to serving.kserve.io/deploymentMode.

@xieshenzh
Copy link
Contributor

xieshenzh commented Dec 10, 2025

These test cases should be updated, since the default mode is supposed to be read from a configmap when the annotation is absent:

{
name: "standalone platform – no errors",
modify: func(ns *appsv1alpha1.NIMService) {
ns.Spec.InferencePlatform = appsv1alpha1.PlatformTypeStandalone
},
wantErrs: 0,
wantWarnings: 0,
},
{
name: "kserve serverless (annotation absent) – valid",
modify: func(ns *appsv1alpha1.NIMService) {
ns.Spec.InferencePlatform = appsv1alpha1.PlatformTypeKServe
// No annotation ⇒ serverless by default.
},
wantErrs: 0,
wantWarnings: 0,
},
{
name: "kserve serverless (annotation present) – autoscaling set",
modify: func(ns *appsv1alpha1.NIMService) {
ns.Spec.InferencePlatform = appsv1alpha1.PlatformTypeKServe
ns.Spec.Annotations = map[string]string{"serving.kserve.org/deploymentMode": "Serverless"}
ns.Spec.Scale.Enabled = &trueVal
},
wantErrs: 1,
wantWarnings: 0,
},
{
name: "kserve serverless – ingress set",
modify: func(ns *appsv1alpha1.NIMService) {
ns.Spec.InferencePlatform = appsv1alpha1.PlatformTypeKServe
ns.Spec.Expose.Router.Ingress = &appsv1alpha1.RouterIngress{
IngressClass: "nginx",
}
},
wantErrs: 1,
wantWarnings: 0,
},
{
name: "kserve serverless – servicemonitor set",
modify: func(ns *appsv1alpha1.NIMService) {
ns.Spec.InferencePlatform = appsv1alpha1.PlatformTypeKServe
ns.Spec.Metrics.Enabled = &trueVal
},
wantErrs: 1,
wantWarnings: 0,
},
{
name: "kserve serverless – all prohibited set",
modify: func(ns *appsv1alpha1.NIMService) {
ns.Spec.InferencePlatform = appsv1alpha1.PlatformTypeKServe
ns.Spec.Scale.Enabled = &trueVal
ns.Spec.Expose.Router.Ingress = &appsv1alpha1.RouterIngress{
IngressClass: "nginx",
}
ns.Spec.Metrics.Enabled = &trueVal
},
wantErrs: 3,
wantWarnings: 0,
},
{
name: "kserve rawdeployment – allowed autoscaling, but multidnode forbidden",
modify: func(ns *appsv1alpha1.NIMService) {
ns.Spec.InferencePlatform = appsv1alpha1.PlatformTypeKServe
ns.Spec.Annotations = map[string]string{"serving.kserve.org/deploymentMode": "RawDeployment"}
ns.Spec.Scale.Enabled = &trueVal // should be fine
ns.Spec.MultiNode = &appsv1alpha1.NimServiceMultiNodeConfig{Parallelism: &appsv1alpha1.ParallelismSpec{Pipeline: ptr.To(uint32(1))}}
},
wantErrs: 1, // only multiNode should trigger
wantWarnings: 0,
},
{
name: "kserve – multidnode alone",
modify: func(ns *appsv1alpha1.NIMService) {
ns.Spec.InferencePlatform = appsv1alpha1.PlatformTypeKServe
ns.Spec.MultiNode = &appsv1alpha1.NimServiceMultiNodeConfig{Parallelism: &appsv1alpha1.ParallelismSpec{Pipeline: ptr.To(uint32(2))}}
},
wantErrs: 1,
wantWarnings: 0,

@xieshenzh
Copy link
Contributor

These examples should be updated as well, since Serverless is not supposed to be the default mode when the annotation is absent:
https://github.com/NVIDIA/k8s-nim-operator/tree/f628f6cf905082e0ce6a4db47586a1731c3abd60/config/samples/nim/serving/kserve

@shivamerla
Copy link
Collaborator

Overall looks good other than KServe comments. We need to update the sample here too: https://github.com/NVIDIA/k8s-nim-operator/blob/main/config/samples/nim/serving/advanced/dra/manual/llm.yaml

@varunrsekar
Copy link
Collaborator Author

These test cases should be updated, since the default mode is supposed to be read from a configmap when the annotation is absent:

@xieshenzh Do you mean that we need to setup a configmap with default deployment mode as KNative in these tests?

What's the expected behavior if the configmap is empty and we don't set any annotation in the ISvc? this testcase indicates the configmap being empty is valid.

Signed-off-by: Varun Ramachandra Sekar <[email protected]>
Signed-off-by: Varun Ramachandra Sekar <[email protected]>
Signed-off-by: Varun Ramachandra Sekar <[email protected]>
@xieshenzh
Copy link
Contributor

xieshenzh commented Dec 16, 2025

These test cases should be updated, since the default mode is supposed to be read from a configmap when the annotation is absent:

@xieshenzh Do you mean that we need to setup a configmap with default deployment mode as KNative in these tests?

What's the expected behavior if the configmap is empty and we don't set any annotation in the ISvc? this testcase indicates the configmap being empty is valid.

As far as I know, serverless deployment is no longer supported on RHOAI 3.x. So, the kserve community is changing the default deployment mode from serverless to standard/rawdeployment.
There is a constant for the default deployment mode, which is standard, but unfortunately it is not used anywhere in the code.

With the current kserve code, if the configmap is empty, the deploymentMode will be empty.
Then, I don't think the inferenceservice will be properly reconciled. Because the code sometimes expects knative/serverless is explicitly set, sometimes assumes knative/serverless is the default deployment mode.

I think the expected behavior if the configmap is empty is to use the standard mode.
If to use standard mode, the reconciler of NIMService should set the annotation explicitly when creating the inferenceservice, in order to avoid any potential issues from the kserve controller.

Otherwise, it is also acceptable to return an error and avoid creating an inferenceservice, if the configmap is empty.

@varunrsekar
Copy link
Collaborator Author

I think the expected behavior if the configmap is empty is to use the standard mode.
If to use standard mode, the reconciler of NIMService should set the annotation explicitly when creating the inferenceservice, in order to avoid any potential issues from the kserve controller.

I see here that the predictor reconciler defaults to knative even though we're expected to exit early in case of missing knative CRDs

There is a constant for the default deployment mode, which is standard, but unfortunately it is not used anywhere in the code.

isvcutils.GetDeploymentMode seems to provide the default deployment. Isn't this behavior sufficient? Or do you think we should set this as an annotation explicitly from the NIMService?

@xieshenzh
Copy link
Contributor

isvcutils.GetDeploymentMode seems to provide the default deployment. Isn't this behavior sufficient? Or do you think we should set this as an annotation explicitly from the NIMService?

For RHOAI, it is sufficient. The configmap is not supposed to be empty, if the kserve is installed with RHOAI.

But I am not sure if the configmap could be empty, when installing the community version of kserve.
If the configmap is empty, It is not sufficient, which happens in the unit test you mentioned earlier.
In this case, it would be safer to set the annotation explicitly, if isvcutils.GetDeploymentMode returns an empty string.

@visheshtanksale
Copy link
Collaborator

@varunrsekar What will be upgrade story when user who are on < v1.34 want to just upgrade the NIM Operator?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants