[lit] Refactor available `ptxas` features #154439

justinfargnoli · 2025-08-19T23:30:03Z

ToT lit currently assumes that a given ptxas version supports all capabilities of prior ptxas releases. This approach was flexible enough to support the removal of 32-bit address compilation from ptxas in CUDA 12.1, but it struggles with the removal of Volta and prior compilation in CUDA 13.0.

To deal with this, this PR refactors how lit defines the set of features available for a given ptxas version. It invokes ptxas not just to get its version, but also to get the list of supported SMs, supported PTX ISA versions, and support for 32-bit compilation.

This approach should be flexible enough to deal with the changing support matrix of ptxas as it goes forward. One obvious downside is that this relies on parsing the stdout of ptxas, something that's inherently unstable. But, IMO, this is something that we can fix as needed.

github-actions · 2025-08-19T23:34:42Z

✅ With the latest revision this PR passed the Python code formatter.

Copilot

Pull Request Overview

This PR refactors the ptxas usage in LLVM's lit configuration by replacing the hardcoded version-based feature detection with dynamic capability detection. Instead of relying on specific ptxas version numbers, the new approach queries ptxas directly for its supported SM architectures, ISA versions, and other capabilities.

Replaced version-based ptxas feature detection with dynamic capability queries
Added new functions to query ptxas for supported SM architectures, ISA versions, and features
Updated test conditions from version-specific (e.g., ptxas-12.7) to capability-specific (e.g., ptxas-sm_90 && ptxas-isa-v8.0)

Reviewed Changes

Copilot reviewed 172 out of 172 changed files in this pull request and generated no comments.

File	Description
llvm/test/lit.cfg.py	Core refactoring: replaced `ptxas_version` and `enable_ptxas` functions with new capability detection functions
llvm/test/CodeGen/NVPTX/.ll/.py	Updated test RUN conditions to use new capability-based features instead of version numbers

Comments suppressed due to low confidence (3)

justinfargnoli · 2025-08-21T16:50:15Z

llvm/test/lit.cfg.py

+    major_version, minor_version = ptxas_version(ptxas_executable)
+    config.available_features.add("ptxas-{}.{}".format(major_version, minor_version))


I'm not aware of a test case that checks for this feature.

My intention in keeping it around was to be able to guard tests from bugs in ptxas.

What do we think about keeping/removing it?

justinfargnoli · 2025-08-21T16:57:51Z

Note: I've only tested this PR on public CUDA 13.0 and internal ToT ptxas.

Trying to see if I can trigger a build with the public build bot.

justinfargnoli · 2025-08-21T17:02:32Z

Trying to see if I can trigger a build with the public build bot.

https://lab.llvm.org/buildbot/#/buildrequests/5249048?redirect_to_build=true

llvmbot · 2025-08-21T18:12:40Z

@llvm/pr-subscribers-backend-nvptx

Author: Justin Fargnoli (justinfargnoli)

Changes

ToT lit currently assumes that a given ptxas version supports all capabilities of prior ptxas releases. This approach was flexible enough to support the removal of 32-bit address compilation from ptxas in CUDA 12.1, but it struggles with the removal of Volta and prior compilation in CUDA 13.0.

To deal with this, this PR refactors how lit defines the set of features available for a given ptxas version. It invokes ptxas not just to get its version, but also to get the list of supported SMs, supported PTX ISA versions, and support for 32-bit compilation.

This approach should be flexible enough to deal with the changing support matrix of ptxas as it goes forward. One obvious downside is that this relies on parsing the stdout of ptxas, something that's inherently unstable. But, IMO, this is something that we can fix as needed.

Patch is 154.79 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/154439.diff

172 Files Affected:

(modified) llvm/test/CodeGen/NVPTX/access-non-generic.ll (+1-1)
(modified) llvm/test/CodeGen/NVPTX/activemask.ll (+1-1)
(modified) llvm/test/CodeGen/NVPTX/addrspacecast-ptx64.ll (+2-2)
(modified) llvm/test/CodeGen/NVPTX/addrspacecast.ll (+1-1)
(modified) llvm/test/CodeGen/NVPTX/alias.ll (+1-1)
(modified) llvm/test/CodeGen/NVPTX/annotations.ll (+1-1)
(modified) llvm/test/CodeGen/NVPTX/applypriority.ll (+1-1)
(modified) llvm/test/CodeGen/NVPTX/arithmetic-fp-sm20.ll (+1-1)
(modified) llvm/test/CodeGen/NVPTX/arithmetic-int.ll (+1-1)
(modified) llvm/test/CodeGen/NVPTX/async-copy.ll (+2-2)
(modified) llvm/test/CodeGen/NVPTX/atomicrmw-expand.ll (+1-1)
(modified) llvm/test/CodeGen/NVPTX/atomics-sm60.ll (+2-2)
(modified) llvm/test/CodeGen/NVPTX/atomics-sm70.ll (+3-3)
(modified) llvm/test/CodeGen/NVPTX/atomics-sm90.ll (+3-3)
(modified) llvm/test/CodeGen/NVPTX/atomics-with-scope.ll (+2-2)
(modified) llvm/test/CodeGen/NVPTX/b52037.ll (+1-1)
(modified) llvm/test/CodeGen/NVPTX/barrier.ll (+1-1)
(modified) llvm/test/CodeGen/NVPTX/bf16-instructions.ll (+3-3)
(modified) llvm/test/CodeGen/NVPTX/bf16x2-instructions-approx.ll (+1-1)
(modified) llvm/test/CodeGen/NVPTX/bf16x2-instructions.ll (+2-2)
(modified) llvm/test/CodeGen/NVPTX/bmsk.ll (+1-1)
(modified) llvm/test/CodeGen/NVPTX/bswap.ll (+2-2)
(modified) llvm/test/CodeGen/NVPTX/byval-arg-vectorize.ll (+1-1)
(modified) llvm/test/CodeGen/NVPTX/byval-const-global.ll (+1-1)
(modified) llvm/test/CodeGen/NVPTX/calling-conv.ll (+1-1)
(modified) llvm/test/CodeGen/NVPTX/cluster-dim.ll (+1-1)
(modified) llvm/test/CodeGen/NVPTX/clusterlaunchcontrol-multicast.ll (+6-6)
(modified) llvm/test/CodeGen/NVPTX/clusterlaunchcontrol.ll (+2-2)
(modified) llvm/test/CodeGen/NVPTX/cmpxchg-sm60.ll (+1-1)
(modified) llvm/test/CodeGen/NVPTX/cmpxchg-sm70.ll (+1-1)
(modified) llvm/test/CodeGen/NVPTX/cmpxchg-sm90.ll (+1-1)
(modified) llvm/test/CodeGen/NVPTX/cmpxchg.ll (+1-1)
(modified) llvm/test/CodeGen/NVPTX/combine-mad.ll (+1-1)
(modified) llvm/test/CodeGen/NVPTX/combine-min-max.ll (+2-2)
(modified) llvm/test/CodeGen/NVPTX/common-linkage.ll (+2-2)
(modified) llvm/test/CodeGen/NVPTX/compare-int.ll (+1-1)
(modified) llvm/test/CodeGen/NVPTX/convert-call-to-indirect.ll (+1-1)
(modified) llvm/test/CodeGen/NVPTX/convert-fp.ll (+1-1)
(modified) llvm/test/CodeGen/NVPTX/convert-int-sm20.ll (+1-1)
(modified) llvm/test/CodeGen/NVPTX/convert-sm100.ll (+1-1)
(modified) llvm/test/CodeGen/NVPTX/convert-sm100a.ll (+3-3)
(modified) llvm/test/CodeGen/NVPTX/convert-sm80.ll (+1-1)
(modified) llvm/test/CodeGen/NVPTX/convert-sm89.ll (+1-1)
(modified) llvm/test/CodeGen/NVPTX/convert-sm90.ll (+1-1)
(modified) llvm/test/CodeGen/NVPTX/cp-async-bulk-s2g-sm100.ll (+2-2)
(modified) llvm/test/CodeGen/NVPTX/cp-async-bulk-tensor-g2s-1cta.ll (+2-2)
(modified) llvm/test/CodeGen/NVPTX/cp-async-bulk-tensor-g2s-2cta.ll (+2-2)
(modified) llvm/test/CodeGen/NVPTX/cp-async-bulk-tensor-g2s-cta-sm100.ll (+2-2)
(modified) llvm/test/CodeGen/NVPTX/cp-async-bulk-tensor-g2s-cta-sm100a.ll (+2-2)
(modified) llvm/test/CodeGen/NVPTX/cp-async-bulk-tensor-g2s-cta-sm90.ll (+2-2)
(modified) llvm/test/CodeGen/NVPTX/cp-async-bulk-tensor-g2s-gather4.ll (+2-2)
(modified) llvm/test/CodeGen/NVPTX/cp-async-bulk-tensor-g2s-im2colw.ll (+2-2)
(modified) llvm/test/CodeGen/NVPTX/cp-async-bulk-tensor-g2s-im2colw128.ll (+2-2)
(modified) llvm/test/CodeGen/NVPTX/cp-async-bulk-tensor-g2s.ll (+2-2)
(modified) llvm/test/CodeGen/NVPTX/cp-async-bulk-tensor-prefetch-sm100a.ll (+2-2)
(modified) llvm/test/CodeGen/NVPTX/cp-async-bulk-tensor-prefetch.ll (+1-1)
(modified) llvm/test/CodeGen/NVPTX/cp-async-bulk-tensor-reduce.ll (+1-1)
(modified) llvm/test/CodeGen/NVPTX/cp-async-bulk-tensor-s2g-scatter4.ll (+2-2)
(modified) llvm/test/CodeGen/NVPTX/cp-async-bulk-tensor-s2g.ll (+2-2)
(modified) llvm/test/CodeGen/NVPTX/cp-async-bulk.ll (+2-2)
(modified) llvm/test/CodeGen/NVPTX/discard.ll (+1-1)
(modified) llvm/test/CodeGen/NVPTX/distributed-shared-cluster.ll (+1-1)
(modified) llvm/test/CodeGen/NVPTX/dynamic_stackalloc.ll (+2-2)
(modified) llvm/test/CodeGen/NVPTX/elect.ll (+1-1)
(modified) llvm/test/CodeGen/NVPTX/f16-abs.ll (+4-4)
(modified) llvm/test/CodeGen/NVPTX/f16-ex2.ll (+1-1)
(modified) llvm/test/CodeGen/NVPTX/f16-instructions.ll (+4-4)
(modified) llvm/test/CodeGen/NVPTX/f16x2-instructions.ll (+3-3)
(modified) llvm/test/CodeGen/NVPTX/f32-ex2.ll (+1-1)
(modified) llvm/test/CodeGen/NVPTX/f32-lg2.ll (+1-1)
(modified) llvm/test/CodeGen/NVPTX/f32x2-instructions.ll (+2-2)
(modified) llvm/test/CodeGen/NVPTX/fabs-intrinsics.ll (+1-1)
(modified) llvm/test/CodeGen/NVPTX/fence-cluster.ll (+1-1)
(modified) llvm/test/CodeGen/NVPTX/fence-nocluster.ll (+3-3)
(modified) llvm/test/CodeGen/NVPTX/fence-proxy-tensormap.ll (+1-1)
(modified) llvm/test/CodeGen/NVPTX/fexp2.ll (+3-3)
(modified) llvm/test/CodeGen/NVPTX/flog2.ll (+1-1)
(modified) llvm/test/CodeGen/NVPTX/fma-disable.ll (+2-2)
(modified) llvm/test/CodeGen/NVPTX/fminimum-fmaximum.ll (+1-1)
(modified) llvm/test/CodeGen/NVPTX/fns.ll (+1-1)
(modified) llvm/test/CodeGen/NVPTX/fold-movs.ll (+1-1)
(modified) llvm/test/CodeGen/NVPTX/fp-contract-f32x2.ll (+2-2)
(modified) llvm/test/CodeGen/NVPTX/global-addrspace.ll (+1-1)
(modified) llvm/test/CodeGen/NVPTX/global-ordering.ll (+1-1)
(modified) llvm/test/CodeGen/NVPTX/griddepcontrol.ll (+1-1)
(modified) llvm/test/CodeGen/NVPTX/i16x2-instructions.ll (+2-2)
(modified) llvm/test/CodeGen/NVPTX/idioms.ll (+1-1)
(modified) llvm/test/CodeGen/NVPTX/indirect_byval.ll (+1-1)
(modified) llvm/test/CodeGen/NVPTX/inline-asm-b128-test1.ll (+1-1)
(modified) llvm/test/CodeGen/NVPTX/inline-asm-b128-test2.ll (+1-1)
(modified) llvm/test/CodeGen/NVPTX/inline-asm-b128-test3.ll (+1-1)
(modified) llvm/test/CodeGen/NVPTX/intrinsic-old.ll (+1-1)
(modified) llvm/test/CodeGen/NVPTX/intrinsics-sm90.ll (+1-1)
(modified) llvm/test/CodeGen/NVPTX/intrinsics.ll (+1-1)
(modified) llvm/test/CodeGen/NVPTX/kernel-param-align.ll (+1-1)
(modified) llvm/test/CodeGen/NVPTX/ld-addrspace.ll (+1-1)
(modified) llvm/test/CodeGen/NVPTX/ld-generic.ll (+1-1)
(modified) llvm/test/CodeGen/NVPTX/ld-st-addrrspace.py (+1-1)
(modified) llvm/test/CodeGen/NVPTX/ldg-invariant-256.ll (+2-2)
(modified) llvm/test/CodeGen/NVPTX/load-store-256-addressing-invariant.ll (+1-1)
(modified) llvm/test/CodeGen/NVPTX/load-store-256-addressing.ll (+1-1)
(modified) llvm/test/CodeGen/NVPTX/load-store-scalars.ll (+1-1)
(modified) llvm/test/CodeGen/NVPTX/load-store-sm-70.ll (+1-1)
(modified) llvm/test/CodeGen/NVPTX/load-store-sm-90.ll (+1-1)
(modified) llvm/test/CodeGen/NVPTX/load-store-vectors-256.ll (+2-2)
(modified) llvm/test/CodeGen/NVPTX/local-stack-frame.ll (+1-1)
(modified) llvm/test/CodeGen/NVPTX/managed.ll (+1-1)
(modified) llvm/test/CodeGen/NVPTX/match.ll (+1-1)
(modified) llvm/test/CodeGen/NVPTX/math-intrins-sm53-ptx42.ll (+1-1)
(modified) llvm/test/CodeGen/NVPTX/math-intrins-sm80-ptx70-autoupgrade.ll (+1-1)
(modified) llvm/test/CodeGen/NVPTX/math-intrins-sm80-ptx70.ll (+1-1)
(modified) llvm/test/CodeGen/NVPTX/math-intrins-sm86-ptx72-autoupgrade.ll (+1-1)
(modified) llvm/test/CodeGen/NVPTX/math-intrins-sm86-ptx72.ll (+1-1)
(modified) llvm/test/CodeGen/NVPTX/math-intrins.ll (+2-2)
(modified) llvm/test/CodeGen/NVPTX/mbarrier.ll (+2-2)
(modified) llvm/test/CodeGen/NVPTX/nanosleep.ll (+1-1)
(modified) llvm/test/CodeGen/NVPTX/nofunc.ll (+1-1)
(modified) llvm/test/CodeGen/NVPTX/noreturn.ll (+1-1)
(modified) llvm/test/CodeGen/NVPTX/packed-aggr.ll (+2-2)
(modified) llvm/test/CodeGen/NVPTX/param-overalign.ll (+1-1)
(modified) llvm/test/CodeGen/NVPTX/pr126337.ll (+1-1)
(modified) llvm/test/CodeGen/NVPTX/pr13291-i1-store.ll (+1-1)
(modified) llvm/test/CodeGen/NVPTX/prefetch-inferas-test.ll (+1-1)
(modified) llvm/test/CodeGen/NVPTX/prefetch.ll (+1-1)
(modified) llvm/test/CodeGen/NVPTX/reduction-intrinsics.ll (+2-2)
(modified) llvm/test/CodeGen/NVPTX/redux-sync-f32.ll (+1-1)
(modified) llvm/test/CodeGen/NVPTX/redux-sync.ll (+1-1)
(modified) llvm/test/CodeGen/NVPTX/reg-types.ll (+1-1)
(modified) llvm/test/CodeGen/NVPTX/setmaxnreg-sm100a.ll (+1-1)
(modified) llvm/test/CodeGen/NVPTX/setmaxnreg.ll (+1-1)
(modified) llvm/test/CodeGen/NVPTX/sext-setcc.ll (+1-1)
(modified) llvm/test/CodeGen/NVPTX/shfl-sync-p.ll (+1-1)
(modified) llvm/test/CodeGen/NVPTX/shfl-sync.ll (+1-1)
(modified) llvm/test/CodeGen/NVPTX/short-ptr.ll (+1-1)
(modified) llvm/test/CodeGen/NVPTX/simple-call.ll (+1-1)
(modified) llvm/test/CodeGen/NVPTX/st-addrspace.ll (+1-1)
(modified) llvm/test/CodeGen/NVPTX/st-generic.ll (+1-1)
(modified) llvm/test/CodeGen/NVPTX/st-param-imm.ll (+1-1)
(modified) llvm/test/CodeGen/NVPTX/st_bulk.ll (+2-2)
(modified) llvm/test/CodeGen/NVPTX/stacksaverestore.ll (+1-1)
(modified) llvm/test/CodeGen/NVPTX/surf-tex.py (+1-1)
(modified) llvm/test/CodeGen/NVPTX/symbol-naming.ll (+2-2)
(modified) llvm/test/CodeGen/NVPTX/szext.ll (+1-1)
(modified) llvm/test/CodeGen/NVPTX/tanhf.ll (+1-1)
(modified) llvm/test/CodeGen/NVPTX/tcgen05-alloc.ll (+2-2)
(modified) llvm/test/CodeGen/NVPTX/tcgen05-commit.ll (+2-2)
(modified) llvm/test/CodeGen/NVPTX/tcgen05-cp.ll (+1-1)
(modified) llvm/test/CodeGen/NVPTX/tcgen05-fence.ll (+1-1)
(modified) llvm/test/CodeGen/NVPTX/tcgen05-ld.ll (+2-2)
(modified) llvm/test/CodeGen/NVPTX/tcgen05-shift.ll (+1-1)
(modified) llvm/test/CodeGen/NVPTX/tcgen05-st.ll (+2-2)
(modified) llvm/test/CodeGen/NVPTX/trunc-setcc.ll (+1-1)
(modified) llvm/test/CodeGen/NVPTX/trunc-tofp.ll (+1-1)
(modified) llvm/test/CodeGen/NVPTX/unreachable.ll (+1-1)
(modified) llvm/test/CodeGen/NVPTX/vaargs.ll (+2-2)
(modified) llvm/test/CodeGen/NVPTX/variadics-backend.ll (+1-1)
(modified) llvm/test/CodeGen/NVPTX/vector-compare.ll (+1-1)
(modified) llvm/test/CodeGen/NVPTX/vector-select.ll (+1-1)
(modified) llvm/test/CodeGen/NVPTX/vote.ll (+1-1)
(modified) llvm/test/CodeGen/NVPTX/weak-global.ll (+2-2)
(modified) llvm/test/CodeGen/NVPTX/wgmma-sm90a-fence.ll (+1-1)
(modified) llvm/test/CodeGen/NVPTX/wmma-ptx60-sm70.py (+1-1)
(modified) llvm/test/CodeGen/NVPTX/wmma-ptx63-sm72.py (+1-1)
(modified) llvm/test/CodeGen/NVPTX/wmma-ptx63-sm75.py (+1-1)
(modified) llvm/test/CodeGen/NVPTX/wmma-ptx64-sm70.py (+1-1)
(modified) llvm/test/CodeGen/NVPTX/wmma-ptx65-sm75.py (+1-1)
(modified) llvm/test/CodeGen/NVPTX/wmma-ptx71-sm80.py (+1-1)
(modified) llvm/test/CodeGen/NVPTX/wmma-ptx78-sm90.py (+1-1)
(modified) llvm/test/CodeGen/NVPTX/wmma-ptx86-sm100a.py (+1-1)
(modified) llvm/test/CodeGen/NVPTX/wmma-ptx86-sm101a.py (+1-1)
(modified) llvm/test/CodeGen/NVPTX/wmma-ptx86-sm120a.py (+1-1)
(modified) llvm/test/lit.cfg.py (+66-64)

diff --git a/llvm/test/CodeGen/NVPTX/access-non-generic.ll b/llvm/test/CodeGen/NVPTX/access-non-generic.ll
index 601a35288f54d..be8d00a10108a 100644
--- a/llvm/test/CodeGen/NVPTX/access-non-generic.ll
+++ b/llvm/test/CodeGen/NVPTX/access-non-generic.ll
@@ -2,7 +2,7 @@
 ; RUN: llc < %s -mtriple=nvptx64 -mcpu=sm_20 | FileCheck %s --check-prefix PTX
 ; RUN: opt -mtriple=nvptx-- < %s -S -passes=infer-address-spaces | FileCheck %s --check-prefix IR
 ; RUN: opt -mtriple=nvptx64-- < %s -S -passes=infer-address-spaces | FileCheck %s --check-prefix IR
-; RUN: %if ptxas && !ptxas-12.0 %{ llc < %s -mtriple=nvptx -mcpu=sm_20 | %ptxas-verify %}
+; RUN: %if ptxas-32 %{ llc < %s -mtriple=nvptx -mcpu=sm_20 | %ptxas-verify %}
 ; RUN: %if ptxas %{ llc < %s -mtriple=nvptx64 -mcpu=sm_20 | %ptxas-verify %}
 
 @array = internal addrspace(3) global [10 x float] zeroinitializer, align 4
diff --git a/llvm/test/CodeGen/NVPTX/activemask.ll b/llvm/test/CodeGen/NVPTX/activemask.ll
index aa3c5819d7f91..96df904290e68 100644
--- a/llvm/test/CodeGen/NVPTX/activemask.ll
+++ b/llvm/test/CodeGen/NVPTX/activemask.ll
@@ -1,5 +1,5 @@
 ; RUN: llc < %s -mtriple=nvptx64 -O2 -mcpu=sm_52 -mattr=+ptx62 | FileCheck %s
-; RUN: %if ptxas %{ llc < %s -mtriple=nvptx64 -mcpu=sm_52 -mattr=+ptx62 | %ptxas-verify %}
+; RUN: %if ptxas-isa-v6.2 %{ llc < %s -mtriple=nvptx64 -mcpu=sm_52 -mattr=+ptx62 | %ptxas-verify %}
 
 declare i32 @llvm.nvvm.activemask()
 
diff --git a/llvm/test/CodeGen/NVPTX/addrspacecast-ptx64.ll b/llvm/test/CodeGen/NVPTX/addrspacecast-ptx64.ll
index 00b17896d2c9e..1b9dc6ab1e122 100644
--- a/llvm/test/CodeGen/NVPTX/addrspacecast-ptx64.ll
+++ b/llvm/test/CodeGen/NVPTX/addrspacecast-ptx64.ll
@@ -1,8 +1,8 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
 ; RUN: llc -O0 < %s -mtriple=nvptx64 -mcpu=sm_90 -mattr=+ptx78 | FileCheck %s -check-prefixes=NOPTRCONV
 ; RUN: llc -O0 < %s -mtriple=nvptx64 -mcpu=sm_90 -mattr=+ptx78 --nvptx-short-ptr | FileCheck %s -check-prefixes=PTRCONV
-; RUN: %if ptxas-12.8 %{ llc -O0 < %s -mtriple=nvptx64 -mcpu=sm_90 -mattr=+ptx78 | %ptxas-verify -arch=sm_90 %}
-; RUN: %if ptxas-12.8 %{ llc -O0 < %s -mtriple=nvptx64 -mcpu=sm_90 -mattr=+ptx78 --nvptx-short-ptr | %ptxas-verify -arch=sm_90 %}
+; RUN: %if ptxas-sm_90 && ptxas-isa-v7.8 %{ llc -O0 < %s -mtriple=nvptx64 -mcpu=sm_90 -mattr=+ptx78 | %ptxas-verify -arch=sm_90 %}
+; RUN: %if ptxas-sm_90 && ptxas-isa-v7.8 %{ llc -O0 < %s -mtriple=nvptx64 -mcpu=sm_90 -mattr=+ptx78 --nvptx-short-ptr | %ptxas-verify -arch=sm_90 %}
 
 ; ALL-LABEL: conv_shared_cluster_to_generic
 define i32 @conv_shared_cluster_to_generic(ptr addrspace(7) %ptr) {
diff --git a/llvm/test/CodeGen/NVPTX/addrspacecast.ll b/llvm/test/CodeGen/NVPTX/addrspacecast.ll
index 86008a1b70058..546d22ca0e691 100644
--- a/llvm/test/CodeGen/NVPTX/addrspacecast.ll
+++ b/llvm/test/CodeGen/NVPTX/addrspacecast.ll
@@ -1,7 +1,7 @@
 ; RUN: llc -O0 < %s -mtriple=nvptx -mcpu=sm_20 | FileCheck %s -check-prefixes=ALL,CLS32
 ; RUN: llc -O0 < %s -mtriple=nvptx64 -mcpu=sm_20 | FileCheck %s -check-prefixes=ALL,NOPTRCONV,CLS64
 ; RUN: llc -O0 < %s -mtriple=nvptx64 -mcpu=sm_20 --nvptx-short-ptr | FileCheck %s -check-prefixes=ALL,PTRCONV,CLS64
-; RUN: %if ptxas && !ptxas-12.0 %{ llc -O0 < %s -mtriple=nvptx -mcpu=sm_20 | %ptxas-verify %}
+; RUN: %if ptxas-32 %{ llc -O0 < %s -mtriple=nvptx -mcpu=sm_20 | %ptxas-verify %}
 ; RUN: %if ptxas %{ llc -O0 < %s -mtriple=nvptx64 -mcpu=sm_20 | %ptxas-verify %}
 ; RUN: %if ptxas %{ llc -O0 < %s -mtriple=nvptx64 -mcpu=sm_20 --nvptx-short-ptr | %ptxas-verify %}
 
diff --git a/llvm/test/CodeGen/NVPTX/alias.ll b/llvm/test/CodeGen/NVPTX/alias.ll
index 01761c21ab103..4b22df225e33f 100644
--- a/llvm/test/CodeGen/NVPTX/alias.ll
+++ b/llvm/test/CodeGen/NVPTX/alias.ll
@@ -1,5 +1,5 @@
 ; RUN: llc < %s -mtriple=nvptx64 -mcpu=sm_30 -mattr=+ptx64 | FileCheck %s
-; RUN: %if ptxas %{ llc < %s -mtriple=nvptx64 -mcpu=sm_30 -mattr=+ptx64 | %ptxas-verify %}
+; RUN: %if ptxas-isa-v6.4 %{ llc < %s -mtriple=nvptx64 -mcpu=sm_30 -mattr=+ptx64 | %ptxas-verify %}
 
 define i32 @a() { ret i32 0 }
 @b = internal alias i32 (), ptr @a
diff --git a/llvm/test/CodeGen/NVPTX/annotations.ll b/llvm/test/CodeGen/NVPTX/annotations.ll
index 5360e8988777b..e4aa0552e8420 100644
--- a/llvm/test/CodeGen/NVPTX/annotations.ll
+++ b/llvm/test/CodeGen/NVPTX/annotations.ll
@@ -1,6 +1,6 @@
 ; RUN: llc < %s -mtriple=nvptx -mcpu=sm_20 | FileCheck %s
 ; RUN: llc < %s -mtriple=nvptx64 -mcpu=sm_20 | FileCheck %s
-; RUN: %if ptxas && !ptxas-12.0 %{ llc < %s -mtriple=nvptx -mcpu=sm_20 | %ptxas-verify %}
+; RUN: %if ptxas-32 %{ llc < %s -mtriple=nvptx -mcpu=sm_20 | %ptxas-verify %}
 ; RUN: %if ptxas %{ llc < %s -mtriple=nvptx64 -mcpu=sm_20 | %ptxas-verify %}
 
 @texture = internal addrspace(1) global i64 0, align 8
diff --git a/llvm/test/CodeGen/NVPTX/applypriority.ll b/llvm/test/CodeGen/NVPTX/applypriority.ll
index 23b1bda9a32bf..9ac52c8bfe188 100644
--- a/llvm/test/CodeGen/NVPTX/applypriority.ll
+++ b/llvm/test/CodeGen/NVPTX/applypriority.ll
@@ -1,6 +1,6 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
 ; RUN: llc < %s -mtriple=nvptx64 -mcpu=sm_80 -mattr=+ptx74| FileCheck --check-prefixes=CHECK-PTX64 %s
-; RUN: %if ptxas-11.4 %{ llc < %s -mtriple=nvptx64 -mcpu=sm_80 -mattr=+ptx74| %ptxas-verify -arch=sm_80 %}
+; RUN: %if ptxas-sm_80 && ptxas-isa-v7.4 %{ llc < %s -mtriple=nvptx64 -mcpu=sm_80 -mattr=+ptx74| %ptxas-verify -arch=sm_80 %}
 
 target triple = "nvptx64-nvidia-cuda"
 
diff --git a/llvm/test/CodeGen/NVPTX/arithmetic-fp-sm20.ll b/llvm/test/CodeGen/NVPTX/arithmetic-fp-sm20.ll
index ce71d3a78c0de..e88d0396f0858 100644
--- a/llvm/test/CodeGen/NVPTX/arithmetic-fp-sm20.ll
+++ b/llvm/test/CodeGen/NVPTX/arithmetic-fp-sm20.ll
@@ -1,6 +1,6 @@
 ; RUN: llc < %s -mtriple=nvptx -mcpu=sm_20 -fp-contract=fast | FileCheck %s
 ; RUN: llc < %s -mtriple=nvptx64 -mcpu=sm_20 -fp-contract=fast | FileCheck %s
-; RUN: %if ptxas && !ptxas-12.0 %{ llc < %s -mtriple=nvptx -mcpu=sm_20 -fp-contract=fast | %ptxas-verify %}
+; RUN: %if ptxas-32 %{ llc < %s -mtriple=nvptx -mcpu=sm_20 -fp-contract=fast | %ptxas-verify %}
 ; RUN: %if ptxas %{ llc < %s -mtriple=nvptx64 -mcpu=sm_20 -fp-contract=fast | %ptxas-verify %}
 
 ;; These tests should run for all targets
diff --git a/llvm/test/CodeGen/NVPTX/arithmetic-int.ll b/llvm/test/CodeGen/NVPTX/arithmetic-int.ll
index 1fbfd0a987d7a..9e41e9e240902 100644
--- a/llvm/test/CodeGen/NVPTX/arithmetic-int.ll
+++ b/llvm/test/CodeGen/NVPTX/arithmetic-int.ll
@@ -1,6 +1,6 @@
 ; RUN: llc < %s -mtriple=nvptx -mcpu=sm_20 | FileCheck %s
 ; RUN: llc < %s -mtriple=nvptx64 -mcpu=sm_20 | FileCheck %s
-; RUN: %if ptxas && !ptxas-12.0 %{ llc < %s -mtriple=nvptx -mcpu=sm_20 | %ptxas-verify %}
+; RUN: %if ptxas-32 %{ llc < %s -mtriple=nvptx -mcpu=sm_20 | %ptxas-verify %}
 ; RUN: %if ptxas %{ llc < %s -mtriple=nvptx64 -mcpu=sm_20 | %ptxas-verify %}
 
 ;; These tests should run for all targets
diff --git a/llvm/test/CodeGen/NVPTX/async-copy.ll b/llvm/test/CodeGen/NVPTX/async-copy.ll
index cefb8ede9fa58..97b6b5d4d0097 100644
--- a/llvm/test/CodeGen/NVPTX/async-copy.ll
+++ b/llvm/test/CodeGen/NVPTX/async-copy.ll
@@ -1,7 +1,7 @@
 ; RUN: llc < %s -mtriple=nvptx -mcpu=sm_80 -mattr=+ptx70 | FileCheck -check-prefixes=CHECK,CHECK_PTX32 %s
 ; RUN: llc < %s -mtriple=nvptx64 -mcpu=sm_80 -mattr=+ptx70 | FileCheck -check-prefixes=CHECK,CHECK_PTX64 %s
-; RUN: %if ptxas-11.0 && ! ptxas-12.0 %{ llc < %s -mtriple=nvptx -mcpu=sm_80 -mattr=+ptx70 | %ptxas-verify -arch=sm_80 %}
-; RUN: %if ptxas-11.0 %{ llc < %s -mtriple=nvptx64 -mcpu=sm_80 -mattr=+ptx70 | %ptxas-verify -arch=sm_80 %}
+; RUN: %if ptxas-sm_80 && ptxas-isa-v7.0 && ptxas-32 %{ llc < %s -mtriple=nvptx -mcpu=sm_80 -mattr=+ptx70 | %ptxas-verify -arch=sm_80 %}
+; RUN: %if ptxas-sm_80 && ptxas-isa-v7.0 %{ llc < %s -mtriple=nvptx64 -mcpu=sm_80 -mattr=+ptx70 | %ptxas-verify -arch=sm_80 %}
 
 declare void @llvm.nvvm.cp.async.wait.group(i32)
 
diff --git a/llvm/test/CodeGen/NVPTX/atomicrmw-expand.ll b/llvm/test/CodeGen/NVPTX/atomicrmw-expand.ll
index 94b3f0a2e1c3e..88fae7a3f78a0 100644
--- a/llvm/test/CodeGen/NVPTX/atomicrmw-expand.ll
+++ b/llvm/test/CodeGen/NVPTX/atomicrmw-expand.ll
@@ -1,7 +1,7 @@
 ; RUN: llc < %s -mtriple=nvptx64 -mcpu=sm_30 | FileCheck %s --check-prefixes=ALL,SM30
 ; RUN: llc < %s -mtriple=nvptx64 -mcpu=sm_60 | FileCheck %s --check-prefixes=ALL,SM60
 ; RUN: %if ptxas %{ llc < %s -mtriple=nvptx64 -mcpu=sm_30 | %ptxas-verify %}
-; RUN: %if ptxas %{ llc < %s -mtriple=nvptx64 -mcpu=sm_60 | %ptxas-verify -arch=sm_60 %}
+; RUN: %if ptxas-sm_60 %{ llc < %s -mtriple=nvptx64 -mcpu=sm_60 | %ptxas-verify -arch=sm_60 %}
 
 ; CHECK-LABEL: fadd_double
 define void @fadd_double(ptr %0, double %1) {
diff --git a/llvm/test/CodeGen/NVPTX/atomics-sm60.ll b/llvm/test/CodeGen/NVPTX/atomics-sm60.ll
index 2e11323d1b3e1..5a7a1823cb2a0 100644
--- a/llvm/test/CodeGen/NVPTX/atomics-sm60.ll
+++ b/llvm/test/CodeGen/NVPTX/atomics-sm60.ll
@@ -1,7 +1,7 @@
 ; RUN: llc < %s -mtriple=nvptx -mcpu=sm_60 | FileCheck %s
 ; RUN: llc < %s -mtriple=nvptx64 -mcpu=sm_60 | FileCheck %s
-; RUN: %if ptxas && !ptxas-12.0 %{ llc < %s -mtriple=nvptx -mcpu=sm_60 | %ptxas-verify -arch=sm_60 %}
-; RUN: %if ptxas %{ llc < %s -mtriple=nvptx64 -mcpu=sm_60 | %ptxas-verify -arch=sm_60 %}
+; RUN: %if ptxas-sm_60 && ptxas-32 %{ llc < %s -mtriple=nvptx -mcpu=sm_60 | %ptxas-verify -arch=sm_60 %}
+; RUN: %if ptxas-sm_60 %{ llc < %s -mtriple=nvptx64 -mcpu=sm_60 | %ptxas-verify -arch=sm_60 %}
 
 ; CHECK-LABEL: .func test(
 define void @test(ptr %dp0, ptr addrspace(1) %dp1, ptr addrspace(3) %dp3, double %d) {
diff --git a/llvm/test/CodeGen/NVPTX/atomics-sm70.ll b/llvm/test/CodeGen/NVPTX/atomics-sm70.ll
index f710d7f883a1b..e1a69d2e3db20 100644
--- a/llvm/test/CodeGen/NVPTX/atomics-sm70.ll
+++ b/llvm/test/CodeGen/NVPTX/atomics-sm70.ll
@@ -2,9 +2,9 @@
 ; RUN: llc < %s -mtriple=nvptx -mcpu=sm_70 -mattr=+ptx63 | FileCheck %s --check-prefixes=CHECK
 ; RUN: llc < %s -mtriple=nvptx64 -mcpu=sm_70 -mattr=+ptx63 | FileCheck %s --check-prefixes=CHECK64
 ; RUN: llc < %s -mtriple=nvptx -mcpu=sm_70 -mattr=+ptx62 | FileCheck %s --check-prefixes=CHECKPTX62
-; RUN: %if ptxas && !ptxas-12.0 %{ llc < %s -mtriple=nvptx -mcpu=sm_70 -mattr=+ptx63 | %ptxas-verify -arch=sm_70 %}
-; RUN: %if ptxas %{ llc < %s -mtriple=nvptx64 -mcpu=sm_70 -mattr=+ptx63 | %ptxas-verify -arch=sm_70 %}
-; RUN: %if ptxas && !ptxas-12.0 %{ llc < %s -mtriple=nvptx -mcpu=sm_70 -mattr=+ptx62 | %ptxas-verify -arch=sm_70 %}
+; RUN: %if ptxas-sm_70 && ptxas-isa-v6.3 && ptxas-32 %{ llc < %s -mtriple=nvptx -mcpu=sm_70 -mattr=+ptx63 | %ptxas-verify -arch=sm_70 %}
+; RUN: %if ptxas-sm_70 && ptxas-isa-v6.3 %{ llc < %s -mtriple=nvptx64 -mcpu=sm_70 -mattr=+ptx63 | %ptxas-verify -arch=sm_70 %}
+; RUN: %if ptxas-sm_70 && ptxas-isa-v6.2 && ptxas-32 %{ llc < %s -mtriple=nvptx -mcpu=sm_70 -mattr=+ptx62 | %ptxas-verify -arch=sm_70 %}
 
 target triple = "nvptx64-nvidia-cuda"
 
diff --git a/llvm/test/CodeGen/NVPTX/atomics-sm90.ll b/llvm/test/CodeGen/NVPTX/atomics-sm90.ll
index f96fd30019025..79e12025ba614 100644
--- a/llvm/test/CodeGen/NVPTX/atomics-sm90.ll
+++ b/llvm/test/CodeGen/NVPTX/atomics-sm90.ll
@@ -2,9 +2,9 @@
 ; RUN: llc < %s -mtriple=nvptx -mcpu=sm_90 -mattr=+ptx78 | FileCheck %s --check-prefixes=CHECK
 ; RUN: llc < %s -mtriple=nvptx64 -mcpu=sm_90 -mattr=+ptx78 | FileCheck %s --check-prefixes=CHECK64
 ; RUN: llc < %s -mtriple=nvptx -mcpu=sm_86 -mattr=+ptx71 | FileCheck %s --check-prefixes=CHECKPTX71
-; RUN: %if ptxas && !ptxas-12.0 %{ llc < %s -mtriple=nvptx -mcpu=sm_90 -mattr=+ptx78 | %ptxas-verify -arch=sm_90 %}
-; RUN: %if ptxas %{ llc < %s -mtriple=nvptx64 -mcpu=sm_90 -mattr=+ptx78 | %ptxas-verify -arch=sm_90 %}
-; RUN: %if ptxas && !ptxas-12.0 %{ llc < %s -mtriple=nvptx -mcpu=sm_86 -mattr=+ptx71 | %ptxas-verify -arch=sm_86 %}
+; RUN: %if ptxas-sm_90 && ptxas-isa-v7.8 && ptxas-32 %{ llc < %s -mtriple=nvptx -mcpu=sm_90 -mattr=+ptx78 | %ptxas-verify -arch=sm_90 %}
+; RUN: %if ptxas-sm_90 && ptxas-isa-v7.8 %{ llc < %s -mtriple=nvptx64 -mcpu=sm_90 -mattr=+ptx78 | %ptxas-verify -arch=sm_90 %}
+; RUN: %if ptxas-sm_86 && ptxas-isa-v7.1 && ptxas-32 %{ llc < %s -mtriple=nvptx -mcpu=sm_86 -mattr=+ptx71 | %ptxas-verify -arch=sm_86 %}
 
 target triple = "nvptx64-nvidia-cuda"
 
diff --git a/llvm/test/CodeGen/NVPTX/atomics-with-scope.ll b/llvm/test/CodeGen/NVPTX/atomics-with-scope.ll
index e6636d706b49d..9e30519b31cc3 100644
--- a/llvm/test/CodeGen/NVPTX/atomics-with-scope.ll
+++ b/llvm/test/CodeGen/NVPTX/atomics-with-scope.ll
@@ -1,7 +1,7 @@
 ; RUN: llc < %s -mtriple=nvptx -mcpu=sm_60 | FileCheck %s -check-prefixes=CHECK,CHECK32
 ; RUN: llc < %s -mtriple=nvptx64 -mcpu=sm_60 | FileCheck %s
-; RUN: %if ptxas && !ptxas-12.0 %{ llc < %s -mtriple=nvptx -mcpu=sm_60 | %ptxas-verify -arch=sm_60 %}
-; RUN: %if ptxas %{ llc < %s -mtriple=nvptx64 -mcpu=sm_60 | %ptxas-verify -arch=sm_60 %}
+; RUN: %if ptxas-sm_60 && ptxas-32 %{ llc < %s -mtriple=nvptx -mcpu=sm_60 | %ptxas-verify -arch=sm_60 %}
+; RUN: %if ptxas-sm_60 %{ llc < %s -mtriple=nvptx64 -mcpu=sm_60 | %ptxas-verify -arch=sm_60 %}
 
 ; CHECK-LABEL: .func test_atomics_scope(
 define void @test_atomics_scope(ptr %fp, float %f,
diff --git a/llvm/test/CodeGen/NVPTX/b52037.ll b/llvm/test/CodeGen/NVPTX/b52037.ll
index b6317dfb28597..268a8972ebd22 100644
--- a/llvm/test/CodeGen/NVPTX/b52037.ll
+++ b/llvm/test/CodeGen/NVPTX/b52037.ll
@@ -4,7 +4,7 @@
 ; https://bugs.llvm.org/show_bug.cgi?id=52037 for the gory details.
 ;
 ; RUN: llc -mtriple=nvptx64-nvidia-cuda -mcpu=sm_70 -O3 -o - %s | FileCheck %s
-; RUN: %if ptxas %{ llc -mtriple=nvptx64-nvidia-cuda -mcpu=sm_70 -O3 -o - %s | %ptxas-verify -arch=sm_70 %}
+; RUN: %if ptxas-sm_70 %{ llc -mtriple=nvptx64-nvidia-cuda -mcpu=sm_70 -O3 -o - %s | %ptxas-verify -arch=sm_70 %}
 
 ; CHECK-LABEL: .visible .entry barney(
 ; CHECK-NOT:  .local{{.*}}__local_depot
diff --git a/llvm/test/CodeGen/NVPTX/barrier.ll b/llvm/test/CodeGen/NVPTX/barrier.ll
index a3b0d21f098f2..f209bdd0cfae7 100644
--- a/llvm/test/CodeGen/NVPTX/barrier.ll
+++ b/llvm/test/CodeGen/NVPTX/barrier.ll
@@ -1,6 +1,6 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
 ; RUN: llc < %s -mtriple=nvptx64 -mcpu=sm_30 -mattr=+ptx60 | FileCheck %s
-; RUN: %if ptxas %{ llc < %s -mtriple=nvptx64 -mcpu=sm_30 -mattr=+ptx60 | %ptxas-verify %}
+; RUN: %if ptxas-isa-v6.0 %{ llc < %s -mtriple=nvptx64 -mcpu=sm_30 -mattr=+ptx60 | %ptxas-verify %}
 
 declare void @llvm.nvvm.bar.warp.sync(i32)
 declare void @llvm.nvvm.barrier.cta.sync.aligned.all(i32)
diff --git a/llvm/test/CodeGen/NVPTX/bf16-instructions.ll b/llvm/test/CodeGen/NVPTX/bf16-instructions.ll
index aee58a044a986..835e09b9a38e0 100644
--- a/llvm/test/CodeGen/NVPTX/bf16-instructions.ll
+++ b/llvm/test/CodeGen/NVPTX/bf16-instructions.ll
@@ -3,9 +3,9 @@
 ; RUN: llc < %s -mtriple=nvptx64 -mcpu=sm_80 -mattr=+ptx71 | FileCheck --check-prefixes=CHECK,SM80 %s
 ; RUN: llc < %s -mtriple=nvptx64 -mcpu=sm_80 -mattr=+ptx71 -denormal-fp-math-f32=preserve-sign | FileCheck --check-prefixes=CHECK,SM80-FTZ %s
 ; RUN: llc < %s -mtriple=nvptx64 -mcpu=sm_90 -mattr=+ptx78 | FileCheck --check-prefixes=CHECK,SM90 %s
-; RUN: %if ptxas-11.8 %{ llc < %s -mtriple=nvptx64 -mcpu=sm_80 -mattr=+ptx71 | %ptxas-verify -arch=sm_80 %}
-; RUN: %if ptxas-11.8 %{ llc < %s -mtriple=nvptx64 -mcpu=sm_80 -mattr=+ptx71 -denormal-fp-math-f32=preserve-sign | %ptxas-verify -arch=sm_80 %}
-; RUN: %if ptxas-11.8 %{ llc < %s -mtriple=nvptx64 -mcpu=sm_90 -mattr=+ptx78 | %ptxas-verify -arch=sm_90 %}
+; RUN: %if ptxas-sm_80 && ptxas-isa-v7.1 %{ llc < %s -mtriple=nvptx64 -mcpu=sm_80 -mattr=+ptx71 | %ptxas-verify -arch=sm_80 %}
+; RUN: %if ptxas-sm_80 && ptxas-isa-v7.1 %{ llc < %s -mtriple=nvptx64 -mcpu=sm_80 -mattr=+ptx71 -denormal-fp-math-f32=preserve-sign | %ptxas-verify -arch=sm_80 %}
+; RUN: %if ptxas-sm_90 && ptxas-isa-v7.8 %{ llc < %s -mtriple=nvptx64 -mcpu=sm_90 -mattr=+ptx78 | %ptxas-verify -arch=sm_90 %}
 
 target triple = "nvptx64-nvidia-cuda"
 
diff --git a/llvm/test/CodeGen/NVPTX/bf16x2-instructions-approx.ll b/llvm/test/CodeGen/NVPTX/bf16x2-instructions-approx.ll
index e1d4ef1073a78..60a5abf03e19f 100644
--- a/llvm/test/CodeGen/NVPTX/bf16x2-instructions-approx.ll
+++ b/llvm/test/CodeGen/NVPTX/bf16x2-instructions-approx.ll
@@ -1,6 +1,6 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
 ; RUN: llc < %s -mtriple=nvptx64 -mcpu=sm_80 -mattr=+ptx71 | FileCheck --check-prefixes=CHECK %s
-; RUN: %if ptxas-11.8 %{ llc < %s -mtriple=nvptx64 -mcpu=sm_80 -mattr=+ptx71 | %ptxas-verify -arch=sm_80 %}
+; RUN: %if ptxas-sm_80 && ptxas-isa-v7.1 %{ llc < %s -mtriple=nvptx64 -mcpu=sm_80 -mattr=+ptx71 | %ptxas-verify -arch=sm_80 %}
 
 target datalayout = "e-m:o-i64:64-i128:128-n32:64-S128"
 
diff --git a/llvm/test/CodeGen/NVPTX/bf16x2-instructions.ll b/llvm/test/CodeGen/NVPTX/bf16x2-instructions.ll
index b4641d01eb927..4ea8ffc727b56 100644
--- a/llvm/test/CodeGen/NVPTX/bf16x2-instructions.ll
+++ b/llvm/test/CodeGen/NVPTX/bf16x2-instructions.ll
@@ -1,8 +1,8 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
 ; RUN: llc < %s -mtriple=nvptx64 -mcpu=sm_80 -mattr=+ptx71 | FileCheck --check-prefixes=CHECK,SM80 %s
 ; RUN: llc < %s -mtriple=nvptx64 -mcpu=sm_90 -mattr=+ptx78 | FileCheck --check-prefixes=CHECK,SM90 %s
-; RUN: %if ptxas-11.8 %{ llc < %s -mtriple=nvptx64 -mcpu=sm_80 -mattr=+ptx71 | %ptxas-verify -arch=sm_80 %}
-; RUN: %if ptxas-11.8 %{ llc < %s -mtriple=nvptx64 -mcpu=sm_90 -mattr=+ptx78 | %ptxas-verify -arch=sm_90 %}
+; RUN: %if ptxas-sm_80 && ptxas-isa-v7.1 %{ llc < %s -mtriple=nvptx64 -mcpu=sm_80 -mattr=+ptx71 | %ptxas-verify -arch=sm_80 %}
+; RUN: %if ptxas-sm_90 && ptxas-isa-v7.8 %{ llc < %s -mtriple=nvptx64 -mcpu=sm_90 -mattr=+ptx78 | %ptxas-verify -arch=sm_90 %}
 
 target datalayout = "e-m:o-i64:64-i128:128-n32:64-S128"
 
diff --git a/llvm/test/CodeGen/NVPTX/bmsk.ll b/llvm/test/CodeGen/NVPTX/bmsk.ll
index d5b278657bd52..b9404f2a160cd 100644
--- a/llvm/test/CodeGen/NVPTX/bmsk.ll
+++ b/llvm/test/CodeGen/NVPTX/bmsk.ll
@@ -1,6 +1,6 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
 ; RUN: llc -o - < %s -mcpu=sm_70 -mattr=+ptx76 | FileCheck %s
-; RUN: %if ptxas %{ llc < %s -mcpu=sm_70 -mattr=+ptx76 | %ptxas-verify -arch=sm_70 %}
+; RUN: %if ptxas-sm_70 && ptxas-isa-v7.6 %{ llc < %s -mcpu=sm_70 -mattr=+ptx76 | %ptxas-verify -arch=sm_70 %}
 
 target triple = "nvptx64-unknown-cuda"
 
diff --git a/llvm/test/CodeGen/NVPTX/bswap.ll b/llvm/test/CodeGen/NVPTX/bswap.ll
index 0d1d6da4ba2b6..b913b9a03c553 100644
--- a/llvm/test/CodeGen/NVPTX/bswap.ll
+++ b/llvm/test/CodeGen/NVPTX/bswap.ll
@@ -1,9 +1,9 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 4
 ; RUN: llc < %s -mtriple=nvptx64 -mcpu=sm_20 -mattr=+ptx70 | FileCheck -check-prefixes CHECK,PTX70 %s
 ; RUN: %if ptxas %{ llc < %s -mtriple=nvptx64 -mcpu=sm_20 | %ptxas-verify %}
-; RUN: %if ptxas-11.0 %{ llc < %s -mtriple=nvptx64 -mcpu=sm_20 -mattr=+ptx70 | %ptxas-verify %}
+; RUN: %if ptxas-isa-v7.0 %{ llc < %s -mtriple=nvptx64 -mcpu=sm_20 -mattr=+ptx70 | %ptxas-verify %}
 ; RUN: llc < %s -mtriple=nvptx64 -mcpu=sm_20 -mattr=+ptx71 | FileCheck -check-prefixes CHECK,PTX71 %s
-; RUN: %if ptxas-11.0 %{ llc < %s -mtriple=nvptx64 -mcpu=sm_20 -mattr=+ptx71 | %ptxas-verify %}
+; RUN: %if ptxas-isa-v7.1 %{ llc < %s -mtriple=nvptx64 -mcpu=sm_20 -mattr=+ptx71 | %ptxas-verify %}
 
 target triple = "nvptx64-nvidia-cuda"
 
diff --git a/llvm/test/CodeGen/NVPTX/byval-arg-vectorize.ll b/llvm/test/CodeGen/NVPTX/byval-arg-vectorize.ll
index 9988d5b122cc1..e7f71f4ad52ea 100644
--- a/llvm/test/CodeGen/NVPTX/byval-arg-vectorize.ll
+++ b/llvm/test/CodeGen/NVPTX/byval-arg-vectorize.ll
@@ -1,6 +1,6 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
 ; RUN: llc < %s -mcpu=sm_70 | FileCheck %s
-; RUN: %if ptxas %{ llc < %s -mcpu=sm_70 | %ptxas-verify -arch=sm_70 %}
+; RUN: %if ptxas-sm_70 %{ llc < %s -mcpu=sm_70 | %ptxas-verify -arch=sm_70 %}
 
 target triple = "nvptx64-nvidia-cuda"
 
diff --git a/llvm/test/CodeGen/NVPTX/byval-const-global.ll b/llvm/test/CodeGen/NVPTX/byval-const-global.ll
index b4934e1a94d1b..81e7edfd8602e 100644
--- a/llvm/test/CodeGen/NVPTX/byval-const-global.ll
+++ b/llvm/test/CodeGen/NVPTX/byval-const-global.ll
@@ -1,6 +1,6 @@
...
[truncated]

AlexMaclean

This will be very useful. I've personally run afoul of issues with versions of ptxas not working with a given configuration with %ptxas-verify several times. I would consider tweaking the naming of these a little bit. I'd recommend that we remove the "v" in the ISA features and rename "32" -> "ptr32". I'd also lean towards using () or {} to represent these (i.e. ptxas(sm_90), ptxas(isa-8.4)) but that is more a matter of personal taste so if you or others disagree I'm happy to leave as is.

llvm/test/lit.cfg.py

AlexMaclean · 2025-08-21T18:49:01Z

llvm/test/lit.cfg.py

    config.available_features.add("ptxas")
    tools.extend(
        [
            ToolSubst("%ptxas", ptxas_executable),
-            ToolSubst("%ptxas-verify", "{} -arch=sm_60 -c -".format(ptxas_executable)),
+            ToolSubst("%ptxas-verify", f"{ptxas_executable} -c -"),


What impact will removing "sm_60" have?

We'll now use the default architecture assumed by ptxas, which seems to always be the oldest supported architecture.

More to your point, though, assuming you're using a CUDA 13.0+ ptxas executable, there shouldn't be any impact.

However, if you're using CUDA 12.9, where ptxas defaults to sm_52 and a test was implicitly relying on ptxas-verify setting the arch to sm_60 (e.g. to test atom.add.f64), that test will begin to fail.

llvm/test/lit.cfg.py

Artem-B · 2025-08-21T22:06:40Z

llvm/test/lit.cfg.py

+        text=True,
+        check=True,
+    )
+    supported_sms = re.findall(r"'sm_(\d+(?:[af]?))'", result.stdout)


Nit: SM variants may be mentioned in other contexts.

E.g. ptxas-13 mentions sm_70, even though it no longer supports it. We're only saved by the fact that the pattern above matches the quoted string and the mention is unquoted:

--legacy-bar-warp-wide-behavior (-legacy-bar-warp-wide-behavior) ... sm_70 or higher.This is a deprecated option and it will be removed in future

You may want to limit the input to the "Allowed values for this option:" subsection of the --gpu-arch option.

justinfargnoli · 2025-08-25T22:00:52Z

I'd also lean towards using () or {} to represent these (i.e. ptxas(sm_90), ptxas(isa-8.4)) but that is more a matter of personal taste so if you or others disagree I'm happy to leave as is.

It looks like lit doesn't like the use of (), {}, or <>, so I went back to using -.

justinfargnoli requested a review from Copilot August 19, 2025 23:30

This comment was marked as outdated.

Sign in to view

justinfargnoli self-assigned this Aug 20, 2025

justinfargnoli requested a review from Copilot August 20, 2025 23:57

Copilot AI reviewed Aug 20, 2025

View reviewed changes

squash: all changes on dev/jf/upstream-lit

813da96

justinfargnoli force-pushed the dev/jf/upstream-lit branch from 2b19c39 to 813da96 Compare August 21, 2025 16:29

justinfargnoli changed the title ~~[lit] Refactor ptxas usage~~ [lit] Refactor available ptxas features Aug 21, 2025

justinfargnoli requested review from Artem-B, AlexMaclean, schwarzschild-radius and durga4github August 21, 2025 16:47

justinfargnoli commented Aug 21, 2025

View reviewed changes

justinfargnoli marked this pull request as ready for review August 21, 2025 18:11

llvmbot added the backend:NVPTX label Aug 21, 2025

AlexMaclean reviewed Aug 21, 2025

View reviewed changes

AlexMaclean approved these changes Aug 21, 2025

View reviewed changes

Artem-B reviewed Aug 21, 2025

View reviewed changes

justinfargnoli added 3 commits August 22, 2025 22:29

Address review comments

26548bd

Manually set isa version pre CUDA 13.0

82efb6b

Use - instead of ()

6146027

Use format strings

c54ec75

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[lit] Refactor available `ptxas` features #154439

[lit] Refactor available `ptxas` features #154439

Uh oh!

justinfargnoli commented Aug 19, 2025 •

edited

Loading

Uh oh!

This comment was marked as outdated.

Uh oh!

github-actions bot commented Aug 19, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

justinfargnoli Aug 21, 2025

Uh oh!

justinfargnoli commented Aug 21, 2025

Uh oh!

justinfargnoli commented Aug 21, 2025

Uh oh!

llvmbot commented Aug 21, 2025

Uh oh!

AlexMaclean left a comment

Uh oh!

Uh oh!

Uh oh!

AlexMaclean Aug 21, 2025

Uh oh!

justinfargnoli Aug 25, 2025

Uh oh!

justinfargnoli Aug 25, 2025

Uh oh!

Uh oh!

Uh oh!

Artem-B Aug 21, 2025

Uh oh!

justinfargnoli commented Aug 25, 2025

Uh oh!

Uh oh!

		major_version, minor_version = ptxas_version(ptxas_executable)
		config.available_features.add("ptxas-{}.{}".format(major_version, minor_version))

[lit] Refactor available ptxas features #154439

Are you sure you want to change the base?

[lit] Refactor available ptxas features #154439

Uh oh!

Conversation

justinfargnoli commented Aug 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment was marked as outdated.

Uh oh!

github-actions bot commented Aug 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

justinfargnoli Aug 21, 2025

Choose a reason for hiding this comment

Uh oh!

justinfargnoli commented Aug 21, 2025

Uh oh!

justinfargnoli commented Aug 21, 2025

Uh oh!

llvmbot commented Aug 21, 2025

Uh oh!

AlexMaclean left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

AlexMaclean Aug 21, 2025

Choose a reason for hiding this comment

Uh oh!

justinfargnoli Aug 25, 2025

Choose a reason for hiding this comment

Uh oh!

justinfargnoli Aug 25, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Artem-B Aug 21, 2025

Choose a reason for hiding this comment

Uh oh!

justinfargnoli commented Aug 25, 2025

Uh oh!

Uh oh!

[lit] Refactor available `ptxas` features #154439

[lit] Refactor available `ptxas` features #154439

justinfargnoli commented Aug 19, 2025 •

edited

Loading

github-actions bot commented Aug 19, 2025 •

edited

Loading