Skip to content

Conversation

@evberrypi
Copy link
Collaborator

This Pull request does the following:

  1. Upgrades the version of the GPU Operator from 23.3.2 to latest 23.6.1

  2. Upgrades the version of the NVIDIA driver from 525 -> 535

  3. Adds new variables:

  • nvaie
  • nvaie_gpu_operator_version
  • nvaie_gpu_driver_version (EKS and GKE only, you can't select your driver version on AKS)
  1. If you set nvaie to true (the default is false), then you get the stable version of the gpu operator and driver from the latest NV AI E release.

  2. Updates root level README software matrix, module level documentation, and terraform.tfvars to include the new variables

Validation:

  1. Run terraform plan with nvaie set to false and check that the planned version of gpu operator and driver match gpu_operator_version and gpu_driver_version variables.
  2. Set nvaie to true and rerun terraform plan -- this will show different versions of the gpu operator and driver to be installed as part of the plan. The values will match nvaie_gpu_operator_version and nvaie_gpu_driver_version variable values.

Note: when validating AKS, you will not be able to set any Driver version, and will only be able to validate that the GPU operator version changes when changing the value of nvaie

@evberrypi evberrypi merged commit fd9f145 into NVIDIA:main Sep 27, 2023
@evberrypi evberrypi deleted the elacey/nvidia-gpu-driver branch September 27, 2023 01:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants