Use sysroot 2.17 on CUDA < 11.8.#745
Conversation
|
/merge |
jakirkham
left a comment
There was a problem hiding this comment.
Thanks Bradley! 🙏
Think CUDA 11 ARM still needs 2.28. Have added a suggestion below
As the CI failure was only on x86_64, think this would be ok
| - matrix: | ||
| arch: aarch64 | ||
| cuda: "11.[2456]" | ||
| packages: | ||
| - sysroot_linux-aarch64==2.17 |
There was a problem hiding this comment.
ARM is not compatible with GLIBC 2.17
$ strings /opt/conda/envs/cuda11/lib/libcurand.so | grep GLIBC
GLIBC_2.17
GLIBC_2.27
GLIBC_2.17
So think we should stick to GLIBC 2.28 here
| - matrix: | |
| arch: aarch64 | |
| cuda: "11.[2456]" | |
| packages: | |
| - sysroot_linux-aarch64==2.17 | |
| - matrix: | |
| arch: aarch64 | |
| cuda: "11.[2456]" | |
| packages: | |
| - sysroot_linux-aarch64==2.28 |
There was a problem hiding this comment.
Good call. We can equivalently just erase this and let it use the matrix entry below.
There was a problem hiding this comment.
Agree that looks cleaner. Let's do it 👍
There was a problem hiding this comment.
Oops: this is not correct. https://anaconda.org/conda-forge/nvcc_linux-aarch64/files?version=11.4
The aarch64 packages for nvcc 11.4 (up to 11.7) require sysroot 2.17. The current state of the PR (using sysroot 2.17) is correct for all nvcc versions < 11.8, and is the same on x86 and ARM.
There was a problem hiding this comment.
I agree that CUDA libraries require glibc versions newer than 2.17 -- but the nvcc compiler metapackage for CUDA 11 is only installable with sysroot 2.17 until we get to recent builds of nvcc 11.8 that permit newer sysroots.
There was a problem hiding this comment.
@jakirkham I am going to merge this as-is so that we can unblock nightly CI.
I think there's nothing else we should do here, but if you want to lift the restrictions for old CUDA compilers, a follow-up PR would be welcome.
There was a problem hiding this comment.
So I had looked into this recently. Here is what I observed:
- The overly strict
sysrootconstraint was flagged as a bug - A fix was made to relax it
- The old versions don't have this fix, but could benefit from the same treatment
Typically that is enough to justify a repodata patch and it makes sense
Additionally noticed that nvcc_{{ target_platform }}'s pinnings could be handled more systematically, which is now fixed
Following this submitted a repodata patch
The repodata patch landed before last night's nightly run, which passed in CUDA 11.4 (the affected job)
Note there is a CUDA 12.5 failure in that nightly, but that is due to actual test failures
The following tests FAILED:
570 - install_relocatable-with-gtest_discover_tests-ninja_configure (Failed)
571 - install_relocatable-with-gtest_discover_tests-ninja (Not Run) ninja test
Errors while running CTest
So AFAICT this issue is already fixed
There was a problem hiding this comment.
Wonderful. I apologize, I didn't see those PRs, just the end of the Slack conversation, and thought this might have been the best we could do. I appreciate you taking the extra steps, which now allow us to simplify things.
I filed a follow-up to simplify the dependencies.yaml and use sysroot 2.28 everywhere in #749.
There was a problem hiding this comment.
I'm sorry as well. I meant to update this thread with those details and just hadn't had time to do so. The CUDA 12.8 rollout started very soon after
If you are feeling stuck, please feel free to DM me. Am here to help
Merging to unblock nightly CI, a follow-up may be created later.
This removes some special cases added in #745 around sysroot pinnings. Those were needed for CUDA 11.4 until a recent repodata patch. Thanks @jakirkham! Authors: - Bradley Dice (https://github.com/bdice) Approvers: - https://github.com/jakirkham URL: #749
Description
Nightly CI was failing on CUDA 11.4 due to #741. This fixes it.
Old versions of
nvccfor CUDA 11 do not support sysroot 2.28 until CUDA 11.8.Checklist
cmake-format.jsonis up to date with these changes.include_guard(GLOBAL))