Skip to content

Conversation

@ArangoGutierrez
Copy link
Collaborator

No description provided.

with_retry 3 10s sudo apt-get update
install_packages_with_retry cuda-drivers
install_packages_with_retry cuda-drivers{{if .Version}}={{.Version}}{{end}}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a matter of interest, what are valid values for Version here then?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the package names these are the "branch numbers": 550, 560, 570, for example?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not checking for validity; if the user sets a wrong version, it will fail.
Valid versions are the ones provided by apt list -a cuda-drivers

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Valid versions are the ones provided by apt list -a cuda-drivers

Which versions are those?

Copy link
Collaborator Author

@ArangoGutierrez ArangoGutierrez Feb 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ubuntu@ip-10-0-0-196:~$ apt list -a cuda-drivers
Listing... Done
cuda-drivers/unknown 570.86.15-0ubuntu1 amd64
cuda-drivers/unknown 570.86.10-0ubuntu1 amd64
cuda-drivers/unknown 565.57.01-0ubuntu1 amd64
cuda-drivers/unknown 560.35.05-0ubuntu1 amd64
cuda-drivers/unknown 560.35.03-1 amd64
cuda-drivers/unknown 560.28.03-1 amd64
cuda-drivers/unknown 555.42.06-1 amd64
cuda-drivers/unknown 555.42.02-1 amd64
cuda-drivers/unknown 550.144.03-0ubuntu1 amd64
cuda-drivers/unknown 550.127.08-0ubuntu1 amd64
cuda-drivers/unknown 550.127.05-0ubuntu1 amd64
cuda-drivers/unknown 550.90.12-0ubuntu1 amd64
cuda-drivers/unknown 550.90.07-1 amd64
cuda-drivers/unknown 550.54.15-1 amd64
cuda-drivers/unknown 550.54.14-1 amd64
cuda-drivers/unknown 545.23.08-1 amd64
cuda-drivers/unknown 545.23.06-1 amd64
cuda-drivers/unknown 535.230.02-0ubuntu1 amd64
cuda-drivers/unknown 535.216.03-0ubuntu1 amd64
cuda-drivers/unknown 535.216.01-0ubuntu1 amd64
cuda-drivers/unknown 535.183.06-1 amd64
cuda-drivers/unknown 535.183.01-1 amd64
cuda-drivers/unknown 535.161.08-1 amd64
cuda-drivers/unknown 535.161.07-1 amd64
cuda-drivers/unknown 535.154.05-1 amd64
cuda-drivers/unknown 535.129.03-1 amd64
cuda-drivers/unknown 535.104.12-1 amd64
cuda-drivers/unknown 535.104.05-1 amd64
cuda-drivers/unknown 535.86.10-1 amd64
cuda-drivers/unknown 535.54.03-1 amd64
cuda-drivers/unknown 530.30.02-1 amd64
cuda-drivers/unknown 525.147.05-1 amd64
cuda-drivers/unknown 525.125.06-1 amd64
cuda-drivers/unknown 525.105.17-1 amd64
cuda-drivers/unknown 525.85.12-1 amd64
cuda-drivers/unknown 525.60.13-1 amd64
cuda-drivers/unknown 520.61.05-1 amd64
...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. Let's at least add a string to the Version field that indicates that this is (currently) appended to the cuda-drivers package name and installed using apt?

`

type NvDriver struct {
Version string
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Version string
// Version -- if specified -- indicates the version of the `cuda-drivers` package to install.
Version string

Comment on lines 47 to 50
if env.Spec.NVIDIADriver.Version == "" {
return &NvDriver{}
}
return &NvDriver{
Version: env.Spec.NVIDIADriver.Version,
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if env.Spec.NVIDIADriver.Version == "" {
return &NvDriver{}
}
return &NvDriver{
Version: env.Spec.NVIDIADriver.Version,
}
return &NvDriver{
Version: env.Spec.NVIDIADriver.Version,
}

The check for the empty string is redundant since golang uses "" as a zero value for strings anyway.

install_packages_with_retry cuda-drivers
install_packages_with_retry cuda-drivers{{if .Version}}={{.Version}}{{end}}
nvidia-smi -L
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A nit here, but we should probably drop the -L so that we can also see the driver version being applied.

README.md Outdated
- 192.168.1.0/26
image:
architecture: amd64
architecture:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this change required?


```
- 570.86.15-0ubuntu1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- 570.86.15-0ubuntu1
Where `<version>` can be a prefix of any package version. The following are example package versions:
- 570.86.15-0ubuntu1

README.md Outdated
Comment on lines 123 to 126
nvidiaDriver:
install: true
version: <version>

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
nvidiaDriver:
install: true
version: <version>
nvidiaDriver:
install: true
version: <version>

@@ -0,0 +1,207 @@
#!/bin/bash
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this file have been committed?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no

with_retry 3 10s sudo apt-get update
install_packages_with_retry cuda-drivers
install_packages_with_retry cuda-drivers{{if .Version}}={{.Version}}{{end}}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: do we add --no-install-recommends here somewhere?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, here's the install_packages_with_retry function

install_packages_with_retry() {
    local packages=("$@")
    local max_retries=5
    local retry_delay=5

    for ((i=1; i<=$max_retries; i++)); do
        echo "Attempt $i to install packages: ${packages[@]}"
        
        # Attempt to install packages
        sudo apt-get install -y "${packages[@]}"

        # Check if the last command failed and the error is related to unsigned repository
        if [ $? -ne 0 ] && grep -q 'NO_PUBKEY' <<< "$(tail -n 1 /var/lib/dpkg/status 2>/dev/null)"; then
            echo "Error: Unsigned repository. Retrying in $retry_delay seconds..."
            sleep $retry_delay
        else
            # Exit loop if installation is successful or the error is not related to unsigned repository
            break
        fi
    done
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added to the function

Signed-off-by: Carlos Eduardo Arango Gutierrez <[email protected]>
@ArangoGutierrez ArangoGutierrez merged commit 9a28600 into NVIDIA:main Feb 12, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants