-
Notifications
You must be signed in to change notification settings - Fork 9
Make NVDRIVER configurable from env file #280
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make NVDRIVER configurable from env file #280
Conversation
| with_retry 3 10s sudo apt-get update | ||
| install_packages_with_retry cuda-drivers | ||
| install_packages_with_retry cuda-drivers{{if .Version}}={{.Version}}{{end}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a matter of interest, what are valid values for Version here then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the package names these are the "branch numbers": 550, 560, 570, for example?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not checking for validity; if the user sets a wrong version, it will fail.
Valid versions are the ones provided by apt list -a cuda-drivers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Valid versions are the ones provided by apt list -a cuda-drivers
Which versions are those?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ubuntu@ip-10-0-0-196:~$ apt list -a cuda-drivers
Listing... Done
cuda-drivers/unknown 570.86.15-0ubuntu1 amd64
cuda-drivers/unknown 570.86.10-0ubuntu1 amd64
cuda-drivers/unknown 565.57.01-0ubuntu1 amd64
cuda-drivers/unknown 560.35.05-0ubuntu1 amd64
cuda-drivers/unknown 560.35.03-1 amd64
cuda-drivers/unknown 560.28.03-1 amd64
cuda-drivers/unknown 555.42.06-1 amd64
cuda-drivers/unknown 555.42.02-1 amd64
cuda-drivers/unknown 550.144.03-0ubuntu1 amd64
cuda-drivers/unknown 550.127.08-0ubuntu1 amd64
cuda-drivers/unknown 550.127.05-0ubuntu1 amd64
cuda-drivers/unknown 550.90.12-0ubuntu1 amd64
cuda-drivers/unknown 550.90.07-1 amd64
cuda-drivers/unknown 550.54.15-1 amd64
cuda-drivers/unknown 550.54.14-1 amd64
cuda-drivers/unknown 545.23.08-1 amd64
cuda-drivers/unknown 545.23.06-1 amd64
cuda-drivers/unknown 535.230.02-0ubuntu1 amd64
cuda-drivers/unknown 535.216.03-0ubuntu1 amd64
cuda-drivers/unknown 535.216.01-0ubuntu1 amd64
cuda-drivers/unknown 535.183.06-1 amd64
cuda-drivers/unknown 535.183.01-1 amd64
cuda-drivers/unknown 535.161.08-1 amd64
cuda-drivers/unknown 535.161.07-1 amd64
cuda-drivers/unknown 535.154.05-1 amd64
cuda-drivers/unknown 535.129.03-1 amd64
cuda-drivers/unknown 535.104.12-1 amd64
cuda-drivers/unknown 535.104.05-1 amd64
cuda-drivers/unknown 535.86.10-1 amd64
cuda-drivers/unknown 535.54.03-1 amd64
cuda-drivers/unknown 530.30.02-1 amd64
cuda-drivers/unknown 525.147.05-1 amd64
cuda-drivers/unknown 525.125.06-1 amd64
cuda-drivers/unknown 525.105.17-1 amd64
cuda-drivers/unknown 525.85.12-1 amd64
cuda-drivers/unknown 525.60.13-1 amd64
cuda-drivers/unknown 520.61.05-1 amd64
...There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. Let's at least add a string to the Version field that indicates that this is (currently) appended to the cuda-drivers package name and installed using apt?
| ` | ||
|
|
||
| type NvDriver struct { | ||
| Version string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Version string | |
| // Version -- if specified -- indicates the version of the `cuda-drivers` package to install. | |
| Version string |
| if env.Spec.NVIDIADriver.Version == "" { | ||
| return &NvDriver{} | ||
| } | ||
| return &NvDriver{ | ||
| Version: env.Spec.NVIDIADriver.Version, | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| if env.Spec.NVIDIADriver.Version == "" { | |
| return &NvDriver{} | |
| } | |
| return &NvDriver{ | |
| Version: env.Spec.NVIDIADriver.Version, | |
| } | |
| return &NvDriver{ | |
| Version: env.Spec.NVIDIADriver.Version, | |
| } |
The check for the empty string is redundant since golang uses "" as a zero value for strings anyway.
| install_packages_with_retry cuda-drivers | ||
| install_packages_with_retry cuda-drivers{{if .Version}}={{.Version}}{{end}} | ||
| nvidia-smi -L |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A nit here, but we should probably drop the -L so that we can also see the driver version being applied.
2a7950a to
571b07a
Compare
README.md
Outdated
| - 192.168.1.0/26 | ||
| image: | ||
| architecture: amd64 | ||
| architecture: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this change required?
|
|
||
| ``` | ||
| - 570.86.15-0ubuntu1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| - 570.86.15-0ubuntu1 | |
| Where `<version>` can be a prefix of any package version. The following are example package versions: | |
| - 570.86.15-0ubuntu1 |
README.md
Outdated
| nvidiaDriver: | ||
| install: true | ||
| version: <version> | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| nvidiaDriver: | |
| install: true | |
| version: <version> | |
| nvidiaDriver: | |
| install: true | |
| version: <version> |
pkg/provisioner/templates/test.bash
Outdated
| @@ -0,0 +1,207 @@ | |||
| #!/bin/bash | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this file have been committed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no
| with_retry 3 10s sudo apt-get update | ||
| install_packages_with_retry cuda-drivers | ||
| install_packages_with_retry cuda-drivers{{if .Version}}={{.Version}}{{end}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: do we add --no-install-recommends here somewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, here's the install_packages_with_retry function
install_packages_with_retry() {
local packages=("$@")
local max_retries=5
local retry_delay=5
for ((i=1; i<=$max_retries; i++)); do
echo "Attempt $i to install packages: ${packages[@]}"
# Attempt to install packages
sudo apt-get install -y "${packages[@]}"
# Check if the last command failed and the error is related to unsigned repository
if [ $? -ne 0 ] && grep -q 'NO_PUBKEY' <<< "$(tail -n 1 /var/lib/dpkg/status 2>/dev/null)"; then
echo "Error: Unsigned repository. Retrying in $retry_delay seconds..."
sleep $retry_delay
else
# Exit loop if installation is successful or the error is not related to unsigned repository
break
fi
done
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added to the function
571b07a to
a4bc41d
Compare
Signed-off-by: Carlos Eduardo Arango Gutierrez <[email protected]>
a4bc41d to
8c347ce
Compare
No description provided.