Skip to content

Conversation

justinfargnoli
Copy link
Contributor

@justinfargnoli justinfargnoli commented Aug 19, 2025

ToT lit currently assumes that a given ptxas version supports all capabilities of prior ptxas releases. This approach was flexible enough to support the removal of 32-bit address compilation from ptxas in CUDA 12.1, but it struggles with the removal of Volta and prior compilation in CUDA 13.0.

To deal with this, this PR refactors how lit defines the set of features available for a given ptxas version. It invokes ptxas not just to get its version, but also to get the list of supported SMs, supported PTX ISA versions, and support for 32-bit compilation.

This approach should be flexible enough to deal with the changing support matrix of ptxas as it goes forward. One obvious downside is that this relies on parsing the stdout of ptxas, something that's inherently unstable. But, IMO, this is something that we can fix as needed.

@justinfargnoli justinfargnoli requested a review from Copilot August 19, 2025 23:30
Copilot

This comment was marked as outdated.

Copy link

github-actions bot commented Aug 19, 2025

✅ With the latest revision this PR passed the Python code formatter.

@justinfargnoli justinfargnoli self-assigned this Aug 20, 2025
@justinfargnoli justinfargnoli requested a review from Copilot August 20, 2025 23:57
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors the ptxas usage in LLVM's lit configuration by replacing the hardcoded version-based feature detection with dynamic capability detection. Instead of relying on specific ptxas version numbers, the new approach queries ptxas directly for its supported SM architectures, ISA versions, and other capabilities.

  • Replaced version-based ptxas feature detection with dynamic capability queries
  • Added new functions to query ptxas for supported SM architectures, ISA versions, and features
  • Updated test conditions from version-specific (e.g., ptxas-12.7) to capability-specific (e.g., ptxas-sm_90 && ptxas-isa-v8.0)

Reviewed Changes

Copilot reviewed 172 out of 172 changed files in this pull request and generated no comments.

File Description
llvm/test/lit.cfg.py Core refactoring: replaced ptxas_version and enable_ptxas functions with new capability detection functions
llvm/test/CodeGen/NVPTX/.ll/.py Updated test RUN conditions to use new capability-based features instead of version numbers
Comments suppressed due to low confidence (3)

@justinfargnoli justinfargnoli changed the title [lit] Refactor ptxas usage [lit] Refactor available ptxas features Aug 21, 2025
Comment on lines 359 to 360
major_version, minor_version = ptxas_version(ptxas_executable)
config.available_features.add("ptxas-{}.{}".format(major_version, minor_version))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not aware of a test case that checks for this feature.

My intention in keeping it around was to be able to guard tests from bugs in ptxas.

What do we think about keeping/removing it?

@justinfargnoli
Copy link
Contributor Author

Note: I've only tested this PR on public CUDA 13.0 and internal ToT ptxas.

Trying to see if I can trigger a build with the public build bot.

@justinfargnoli
Copy link
Contributor Author

Trying to see if I can trigger a build with the public build bot.

https://lab.llvm.org/buildbot/#/buildrequests/5249048?redirect_to_build=true

@justinfargnoli justinfargnoli marked this pull request as ready for review August 21, 2025 18:11
@llvmbot
Copy link
Member

llvmbot commented Aug 21, 2025

@llvm/pr-subscribers-backend-nvptx

Author: Justin Fargnoli (justinfargnoli)

Changes

ToT lit currently assumes that a given ptxas version supports all capabilities of prior ptxas releases. This approach was flexible enough to support the removal of 32-bit address compilation from ptxas in CUDA 12.1, but it struggles with the removal of Volta and prior compilation in CUDA 13.0.

To deal with this, this PR refactors how lit defines the set of features available for a given ptxas version. It invokes ptxas not just to get its version, but also to get the list of supported SMs, supported PTX ISA versions, and support for 32-bit compilation.

This approach should be flexible enough to deal with the changing support matrix of ptxas as it goes forward. One obvious downside is that this relies on parsing the stdout of ptxas, something that's inherently unstable. But, IMO, this is something that we can fix as needed.


Patch is 154.79 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/154439.diff

172 Files Affected:

  • (modified) llvm/test/CodeGen/NVPTX/access-non-generic.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/activemask.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/addrspacecast-ptx64.ll (+2-2)
  • (modified) llvm/test/CodeGen/NVPTX/addrspacecast.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/alias.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/annotations.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/applypriority.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/arithmetic-fp-sm20.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/arithmetic-int.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/async-copy.ll (+2-2)
  • (modified) llvm/test/CodeGen/NVPTX/atomicrmw-expand.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/atomics-sm60.ll (+2-2)
  • (modified) llvm/test/CodeGen/NVPTX/atomics-sm70.ll (+3-3)
  • (modified) llvm/test/CodeGen/NVPTX/atomics-sm90.ll (+3-3)
  • (modified) llvm/test/CodeGen/NVPTX/atomics-with-scope.ll (+2-2)
  • (modified) llvm/test/CodeGen/NVPTX/b52037.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/barrier.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/bf16-instructions.ll (+3-3)
  • (modified) llvm/test/CodeGen/NVPTX/bf16x2-instructions-approx.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/bf16x2-instructions.ll (+2-2)
  • (modified) llvm/test/CodeGen/NVPTX/bmsk.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/bswap.ll (+2-2)
  • (modified) llvm/test/CodeGen/NVPTX/byval-arg-vectorize.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/byval-const-global.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/calling-conv.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/cluster-dim.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/clusterlaunchcontrol-multicast.ll (+6-6)
  • (modified) llvm/test/CodeGen/NVPTX/clusterlaunchcontrol.ll (+2-2)
  • (modified) llvm/test/CodeGen/NVPTX/cmpxchg-sm60.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/cmpxchg-sm70.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/cmpxchg-sm90.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/cmpxchg.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/combine-mad.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/combine-min-max.ll (+2-2)
  • (modified) llvm/test/CodeGen/NVPTX/common-linkage.ll (+2-2)
  • (modified) llvm/test/CodeGen/NVPTX/compare-int.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/convert-call-to-indirect.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/convert-fp.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/convert-int-sm20.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/convert-sm100.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/convert-sm100a.ll (+3-3)
  • (modified) llvm/test/CodeGen/NVPTX/convert-sm80.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/convert-sm89.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/convert-sm90.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/cp-async-bulk-s2g-sm100.ll (+2-2)
  • (modified) llvm/test/CodeGen/NVPTX/cp-async-bulk-tensor-g2s-1cta.ll (+2-2)
  • (modified) llvm/test/CodeGen/NVPTX/cp-async-bulk-tensor-g2s-2cta.ll (+2-2)
  • (modified) llvm/test/CodeGen/NVPTX/cp-async-bulk-tensor-g2s-cta-sm100.ll (+2-2)
  • (modified) llvm/test/CodeGen/NVPTX/cp-async-bulk-tensor-g2s-cta-sm100a.ll (+2-2)
  • (modified) llvm/test/CodeGen/NVPTX/cp-async-bulk-tensor-g2s-cta-sm90.ll (+2-2)
  • (modified) llvm/test/CodeGen/NVPTX/cp-async-bulk-tensor-g2s-gather4.ll (+2-2)
  • (modified) llvm/test/CodeGen/NVPTX/cp-async-bulk-tensor-g2s-im2colw.ll (+2-2)
  • (modified) llvm/test/CodeGen/NVPTX/cp-async-bulk-tensor-g2s-im2colw128.ll (+2-2)
  • (modified) llvm/test/CodeGen/NVPTX/cp-async-bulk-tensor-g2s.ll (+2-2)
  • (modified) llvm/test/CodeGen/NVPTX/cp-async-bulk-tensor-prefetch-sm100a.ll (+2-2)
  • (modified) llvm/test/CodeGen/NVPTX/cp-async-bulk-tensor-prefetch.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/cp-async-bulk-tensor-reduce.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/cp-async-bulk-tensor-s2g-scatter4.ll (+2-2)
  • (modified) llvm/test/CodeGen/NVPTX/cp-async-bulk-tensor-s2g.ll (+2-2)
  • (modified) llvm/test/CodeGen/NVPTX/cp-async-bulk.ll (+2-2)
  • (modified) llvm/test/CodeGen/NVPTX/discard.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/distributed-shared-cluster.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/dynamic_stackalloc.ll (+2-2)
  • (modified) llvm/test/CodeGen/NVPTX/elect.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/f16-abs.ll (+4-4)
  • (modified) llvm/test/CodeGen/NVPTX/f16-ex2.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/f16-instructions.ll (+4-4)
  • (modified) llvm/test/CodeGen/NVPTX/f16x2-instructions.ll (+3-3)
  • (modified) llvm/test/CodeGen/NVPTX/f32-ex2.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/f32-lg2.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/f32x2-instructions.ll (+2-2)
  • (modified) llvm/test/CodeGen/NVPTX/fabs-intrinsics.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/fence-cluster.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/fence-nocluster.ll (+3-3)
  • (modified) llvm/test/CodeGen/NVPTX/fence-proxy-tensormap.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/fexp2.ll (+3-3)
  • (modified) llvm/test/CodeGen/NVPTX/flog2.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/fma-disable.ll (+2-2)
  • (modified) llvm/test/CodeGen/NVPTX/fminimum-fmaximum.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/fns.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/fold-movs.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/fp-contract-f32x2.ll (+2-2)
  • (modified) llvm/test/CodeGen/NVPTX/global-addrspace.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/global-ordering.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/griddepcontrol.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/i16x2-instructions.ll (+2-2)
  • (modified) llvm/test/CodeGen/NVPTX/idioms.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/indirect_byval.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/inline-asm-b128-test1.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/inline-asm-b128-test2.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/inline-asm-b128-test3.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/intrinsic-old.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/intrinsics-sm90.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/intrinsics.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/kernel-param-align.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/ld-addrspace.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/ld-generic.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/ld-st-addrrspace.py (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/ldg-invariant-256.ll (+2-2)
  • (modified) llvm/test/CodeGen/NVPTX/load-store-256-addressing-invariant.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/load-store-256-addressing.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/load-store-scalars.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/load-store-sm-70.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/load-store-sm-90.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/load-store-vectors-256.ll (+2-2)
  • (modified) llvm/test/CodeGen/NVPTX/local-stack-frame.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/managed.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/match.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/math-intrins-sm53-ptx42.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/math-intrins-sm80-ptx70-autoupgrade.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/math-intrins-sm80-ptx70.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/math-intrins-sm86-ptx72-autoupgrade.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/math-intrins-sm86-ptx72.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/math-intrins.ll (+2-2)
  • (modified) llvm/test/CodeGen/NVPTX/mbarrier.ll (+2-2)
  • (modified) llvm/test/CodeGen/NVPTX/nanosleep.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/nofunc.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/noreturn.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/packed-aggr.ll (+2-2)
  • (modified) llvm/test/CodeGen/NVPTX/param-overalign.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/pr126337.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/pr13291-i1-store.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/prefetch-inferas-test.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/prefetch.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/reduction-intrinsics.ll (+2-2)
  • (modified) llvm/test/CodeGen/NVPTX/redux-sync-f32.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/redux-sync.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/reg-types.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/setmaxnreg-sm100a.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/setmaxnreg.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/sext-setcc.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/shfl-sync-p.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/shfl-sync.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/short-ptr.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/simple-call.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/st-addrspace.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/st-generic.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/st-param-imm.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/st_bulk.ll (+2-2)
  • (modified) llvm/test/CodeGen/NVPTX/stacksaverestore.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/surf-tex.py (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/symbol-naming.ll (+2-2)
  • (modified) llvm/test/CodeGen/NVPTX/szext.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/tanhf.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/tcgen05-alloc.ll (+2-2)
  • (modified) llvm/test/CodeGen/NVPTX/tcgen05-commit.ll (+2-2)
  • (modified) llvm/test/CodeGen/NVPTX/tcgen05-cp.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/tcgen05-fence.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/tcgen05-ld.ll (+2-2)
  • (modified) llvm/test/CodeGen/NVPTX/tcgen05-shift.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/tcgen05-st.ll (+2-2)
  • (modified) llvm/test/CodeGen/NVPTX/trunc-setcc.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/trunc-tofp.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/unreachable.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/vaargs.ll (+2-2)
  • (modified) llvm/test/CodeGen/NVPTX/variadics-backend.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/vector-compare.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/vector-select.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/vote.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/weak-global.ll (+2-2)
  • (modified) llvm/test/CodeGen/NVPTX/wgmma-sm90a-fence.ll (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/wmma-ptx60-sm70.py (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/wmma-ptx63-sm72.py (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/wmma-ptx63-sm75.py (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/wmma-ptx64-sm70.py (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/wmma-ptx65-sm75.py (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/wmma-ptx71-sm80.py (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/wmma-ptx78-sm90.py (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/wmma-ptx86-sm100a.py (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/wmma-ptx86-sm101a.py (+1-1)
  • (modified) llvm/test/CodeGen/NVPTX/wmma-ptx86-sm120a.py (+1-1)
  • (modified) llvm/test/lit.cfg.py (+66-64)
diff --git a/llvm/test/CodeGen/NVPTX/access-non-generic.ll b/llvm/test/CodeGen/NVPTX/access-non-generic.ll
index 601a35288f54d..be8d00a10108a 100644
--- a/llvm/test/CodeGen/NVPTX/access-non-generic.ll
+++ b/llvm/test/CodeGen/NVPTX/access-non-generic.ll
@@ -2,7 +2,7 @@
 ; RUN: llc < %s -mtriple=nvptx64 -mcpu=sm_20 | FileCheck %s --check-prefix PTX
 ; RUN: opt -mtriple=nvptx-- < %s -S -passes=infer-address-spaces | FileCheck %s --check-prefix IR
 ; RUN: opt -mtriple=nvptx64-- < %s -S -passes=infer-address-spaces | FileCheck %s --check-prefix IR
-; RUN: %if ptxas && !ptxas-12.0 %{ llc < %s -mtriple=nvptx -mcpu=sm_20 | %ptxas-verify %}
+; RUN: %if ptxas-32 %{ llc < %s -mtriple=nvptx -mcpu=sm_20 | %ptxas-verify %}
 ; RUN: %if ptxas %{ llc < %s -mtriple=nvptx64 -mcpu=sm_20 | %ptxas-verify %}
 
 @array = internal addrspace(3) global [10 x float] zeroinitializer, align 4
diff --git a/llvm/test/CodeGen/NVPTX/activemask.ll b/llvm/test/CodeGen/NVPTX/activemask.ll
index aa3c5819d7f91..96df904290e68 100644
--- a/llvm/test/CodeGen/NVPTX/activemask.ll
+++ b/llvm/test/CodeGen/NVPTX/activemask.ll
@@ -1,5 +1,5 @@
 ; RUN: llc < %s -mtriple=nvptx64 -O2 -mcpu=sm_52 -mattr=+ptx62 | FileCheck %s
-; RUN: %if ptxas %{ llc < %s -mtriple=nvptx64 -mcpu=sm_52 -mattr=+ptx62 | %ptxas-verify %}
+; RUN: %if ptxas-isa-v6.2 %{ llc < %s -mtriple=nvptx64 -mcpu=sm_52 -mattr=+ptx62 | %ptxas-verify %}
 
 declare i32 @llvm.nvvm.activemask()
 
diff --git a/llvm/test/CodeGen/NVPTX/addrspacecast-ptx64.ll b/llvm/test/CodeGen/NVPTX/addrspacecast-ptx64.ll
index 00b17896d2c9e..1b9dc6ab1e122 100644
--- a/llvm/test/CodeGen/NVPTX/addrspacecast-ptx64.ll
+++ b/llvm/test/CodeGen/NVPTX/addrspacecast-ptx64.ll
@@ -1,8 +1,8 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
 ; RUN: llc -O0 < %s -mtriple=nvptx64 -mcpu=sm_90 -mattr=+ptx78 | FileCheck %s -check-prefixes=NOPTRCONV
 ; RUN: llc -O0 < %s -mtriple=nvptx64 -mcpu=sm_90 -mattr=+ptx78 --nvptx-short-ptr | FileCheck %s -check-prefixes=PTRCONV
-; RUN: %if ptxas-12.8 %{ llc -O0 < %s -mtriple=nvptx64 -mcpu=sm_90 -mattr=+ptx78 | %ptxas-verify -arch=sm_90 %}
-; RUN: %if ptxas-12.8 %{ llc -O0 < %s -mtriple=nvptx64 -mcpu=sm_90 -mattr=+ptx78 --nvptx-short-ptr | %ptxas-verify -arch=sm_90 %}
+; RUN: %if ptxas-sm_90 && ptxas-isa-v7.8 %{ llc -O0 < %s -mtriple=nvptx64 -mcpu=sm_90 -mattr=+ptx78 | %ptxas-verify -arch=sm_90 %}
+; RUN: %if ptxas-sm_90 && ptxas-isa-v7.8 %{ llc -O0 < %s -mtriple=nvptx64 -mcpu=sm_90 -mattr=+ptx78 --nvptx-short-ptr | %ptxas-verify -arch=sm_90 %}
 
 ; ALL-LABEL: conv_shared_cluster_to_generic
 define i32 @conv_shared_cluster_to_generic(ptr addrspace(7) %ptr) {
diff --git a/llvm/test/CodeGen/NVPTX/addrspacecast.ll b/llvm/test/CodeGen/NVPTX/addrspacecast.ll
index 86008a1b70058..546d22ca0e691 100644
--- a/llvm/test/CodeGen/NVPTX/addrspacecast.ll
+++ b/llvm/test/CodeGen/NVPTX/addrspacecast.ll
@@ -1,7 +1,7 @@
 ; RUN: llc -O0 < %s -mtriple=nvptx -mcpu=sm_20 | FileCheck %s -check-prefixes=ALL,CLS32
 ; RUN: llc -O0 < %s -mtriple=nvptx64 -mcpu=sm_20 | FileCheck %s -check-prefixes=ALL,NOPTRCONV,CLS64
 ; RUN: llc -O0 < %s -mtriple=nvptx64 -mcpu=sm_20 --nvptx-short-ptr | FileCheck %s -check-prefixes=ALL,PTRCONV,CLS64
-; RUN: %if ptxas && !ptxas-12.0 %{ llc -O0 < %s -mtriple=nvptx -mcpu=sm_20 | %ptxas-verify %}
+; RUN: %if ptxas-32 %{ llc -O0 < %s -mtriple=nvptx -mcpu=sm_20 | %ptxas-verify %}
 ; RUN: %if ptxas %{ llc -O0 < %s -mtriple=nvptx64 -mcpu=sm_20 | %ptxas-verify %}
 ; RUN: %if ptxas %{ llc -O0 < %s -mtriple=nvptx64 -mcpu=sm_20 --nvptx-short-ptr | %ptxas-verify %}
 
diff --git a/llvm/test/CodeGen/NVPTX/alias.ll b/llvm/test/CodeGen/NVPTX/alias.ll
index 01761c21ab103..4b22df225e33f 100644
--- a/llvm/test/CodeGen/NVPTX/alias.ll
+++ b/llvm/test/CodeGen/NVPTX/alias.ll
@@ -1,5 +1,5 @@
 ; RUN: llc < %s -mtriple=nvptx64 -mcpu=sm_30 -mattr=+ptx64 | FileCheck %s
-; RUN: %if ptxas %{ llc < %s -mtriple=nvptx64 -mcpu=sm_30 -mattr=+ptx64 | %ptxas-verify %}
+; RUN: %if ptxas-isa-v6.4 %{ llc < %s -mtriple=nvptx64 -mcpu=sm_30 -mattr=+ptx64 | %ptxas-verify %}
 
 define i32 @a() { ret i32 0 }
 @b = internal alias i32 (), ptr @a
diff --git a/llvm/test/CodeGen/NVPTX/annotations.ll b/llvm/test/CodeGen/NVPTX/annotations.ll
index 5360e8988777b..e4aa0552e8420 100644
--- a/llvm/test/CodeGen/NVPTX/annotations.ll
+++ b/llvm/test/CodeGen/NVPTX/annotations.ll
@@ -1,6 +1,6 @@
 ; RUN: llc < %s -mtriple=nvptx -mcpu=sm_20 | FileCheck %s
 ; RUN: llc < %s -mtriple=nvptx64 -mcpu=sm_20 | FileCheck %s
-; RUN: %if ptxas && !ptxas-12.0 %{ llc < %s -mtriple=nvptx -mcpu=sm_20 | %ptxas-verify %}
+; RUN: %if ptxas-32 %{ llc < %s -mtriple=nvptx -mcpu=sm_20 | %ptxas-verify %}
 ; RUN: %if ptxas %{ llc < %s -mtriple=nvptx64 -mcpu=sm_20 | %ptxas-verify %}
 
 @texture = internal addrspace(1) global i64 0, align 8
diff --git a/llvm/test/CodeGen/NVPTX/applypriority.ll b/llvm/test/CodeGen/NVPTX/applypriority.ll
index 23b1bda9a32bf..9ac52c8bfe188 100644
--- a/llvm/test/CodeGen/NVPTX/applypriority.ll
+++ b/llvm/test/CodeGen/NVPTX/applypriority.ll
@@ -1,6 +1,6 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
 ; RUN: llc < %s -mtriple=nvptx64 -mcpu=sm_80 -mattr=+ptx74| FileCheck --check-prefixes=CHECK-PTX64 %s
-; RUN: %if ptxas-11.4 %{ llc < %s -mtriple=nvptx64 -mcpu=sm_80 -mattr=+ptx74| %ptxas-verify -arch=sm_80 %}
+; RUN: %if ptxas-sm_80 && ptxas-isa-v7.4 %{ llc < %s -mtriple=nvptx64 -mcpu=sm_80 -mattr=+ptx74| %ptxas-verify -arch=sm_80 %}
 
 target triple = "nvptx64-nvidia-cuda"
 
diff --git a/llvm/test/CodeGen/NVPTX/arithmetic-fp-sm20.ll b/llvm/test/CodeGen/NVPTX/arithmetic-fp-sm20.ll
index ce71d3a78c0de..e88d0396f0858 100644
--- a/llvm/test/CodeGen/NVPTX/arithmetic-fp-sm20.ll
+++ b/llvm/test/CodeGen/NVPTX/arithmetic-fp-sm20.ll
@@ -1,6 +1,6 @@
 ; RUN: llc < %s -mtriple=nvptx -mcpu=sm_20 -fp-contract=fast | FileCheck %s
 ; RUN: llc < %s -mtriple=nvptx64 -mcpu=sm_20 -fp-contract=fast | FileCheck %s
-; RUN: %if ptxas && !ptxas-12.0 %{ llc < %s -mtriple=nvptx -mcpu=sm_20 -fp-contract=fast | %ptxas-verify %}
+; RUN: %if ptxas-32 %{ llc < %s -mtriple=nvptx -mcpu=sm_20 -fp-contract=fast | %ptxas-verify %}
 ; RUN: %if ptxas %{ llc < %s -mtriple=nvptx64 -mcpu=sm_20 -fp-contract=fast | %ptxas-verify %}
 
 ;; These tests should run for all targets
diff --git a/llvm/test/CodeGen/NVPTX/arithmetic-int.ll b/llvm/test/CodeGen/NVPTX/arithmetic-int.ll
index 1fbfd0a987d7a..9e41e9e240902 100644
--- a/llvm/test/CodeGen/NVPTX/arithmetic-int.ll
+++ b/llvm/test/CodeGen/NVPTX/arithmetic-int.ll
@@ -1,6 +1,6 @@
 ; RUN: llc < %s -mtriple=nvptx -mcpu=sm_20 | FileCheck %s
 ; RUN: llc < %s -mtriple=nvptx64 -mcpu=sm_20 | FileCheck %s
-; RUN: %if ptxas && !ptxas-12.0 %{ llc < %s -mtriple=nvptx -mcpu=sm_20 | %ptxas-verify %}
+; RUN: %if ptxas-32 %{ llc < %s -mtriple=nvptx -mcpu=sm_20 | %ptxas-verify %}
 ; RUN: %if ptxas %{ llc < %s -mtriple=nvptx64 -mcpu=sm_20 | %ptxas-verify %}
 
 ;; These tests should run for all targets
diff --git a/llvm/test/CodeGen/NVPTX/async-copy.ll b/llvm/test/CodeGen/NVPTX/async-copy.ll
index cefb8ede9fa58..97b6b5d4d0097 100644
--- a/llvm/test/CodeGen/NVPTX/async-copy.ll
+++ b/llvm/test/CodeGen/NVPTX/async-copy.ll
@@ -1,7 +1,7 @@
 ; RUN: llc < %s -mtriple=nvptx -mcpu=sm_80 -mattr=+ptx70 | FileCheck -check-prefixes=CHECK,CHECK_PTX32 %s
 ; RUN: llc < %s -mtriple=nvptx64 -mcpu=sm_80 -mattr=+ptx70 | FileCheck -check-prefixes=CHECK,CHECK_PTX64 %s
-; RUN: %if ptxas-11.0 && ! ptxas-12.0 %{ llc < %s -mtriple=nvptx -mcpu=sm_80 -mattr=+ptx70 | %ptxas-verify -arch=sm_80 %}
-; RUN: %if ptxas-11.0 %{ llc < %s -mtriple=nvptx64 -mcpu=sm_80 -mattr=+ptx70 | %ptxas-verify -arch=sm_80 %}
+; RUN: %if ptxas-sm_80 && ptxas-isa-v7.0 && ptxas-32 %{ llc < %s -mtriple=nvptx -mcpu=sm_80 -mattr=+ptx70 | %ptxas-verify -arch=sm_80 %}
+; RUN: %if ptxas-sm_80 && ptxas-isa-v7.0 %{ llc < %s -mtriple=nvptx64 -mcpu=sm_80 -mattr=+ptx70 | %ptxas-verify -arch=sm_80 %}
 
 declare void @llvm.nvvm.cp.async.wait.group(i32)
 
diff --git a/llvm/test/CodeGen/NVPTX/atomicrmw-expand.ll b/llvm/test/CodeGen/NVPTX/atomicrmw-expand.ll
index 94b3f0a2e1c3e..88fae7a3f78a0 100644
--- a/llvm/test/CodeGen/NVPTX/atomicrmw-expand.ll
+++ b/llvm/test/CodeGen/NVPTX/atomicrmw-expand.ll
@@ -1,7 +1,7 @@
 ; RUN: llc < %s -mtriple=nvptx64 -mcpu=sm_30 | FileCheck %s --check-prefixes=ALL,SM30
 ; RUN: llc < %s -mtriple=nvptx64 -mcpu=sm_60 | FileCheck %s --check-prefixes=ALL,SM60
 ; RUN: %if ptxas %{ llc < %s -mtriple=nvptx64 -mcpu=sm_30 | %ptxas-verify %}
-; RUN: %if ptxas %{ llc < %s -mtriple=nvptx64 -mcpu=sm_60 | %ptxas-verify -arch=sm_60 %}
+; RUN: %if ptxas-sm_60 %{ llc < %s -mtriple=nvptx64 -mcpu=sm_60 | %ptxas-verify -arch=sm_60 %}
 
 ; CHECK-LABEL: fadd_double
 define void @fadd_double(ptr %0, double %1) {
diff --git a/llvm/test/CodeGen/NVPTX/atomics-sm60.ll b/llvm/test/CodeGen/NVPTX/atomics-sm60.ll
index 2e11323d1b3e1..5a7a1823cb2a0 100644
--- a/llvm/test/CodeGen/NVPTX/atomics-sm60.ll
+++ b/llvm/test/CodeGen/NVPTX/atomics-sm60.ll
@@ -1,7 +1,7 @@
 ; RUN: llc < %s -mtriple=nvptx -mcpu=sm_60 | FileCheck %s
 ; RUN: llc < %s -mtriple=nvptx64 -mcpu=sm_60 | FileCheck %s
-; RUN: %if ptxas && !ptxas-12.0 %{ llc < %s -mtriple=nvptx -mcpu=sm_60 | %ptxas-verify -arch=sm_60 %}
-; RUN: %if ptxas %{ llc < %s -mtriple=nvptx64 -mcpu=sm_60 | %ptxas-verify -arch=sm_60 %}
+; RUN: %if ptxas-sm_60 && ptxas-32 %{ llc < %s -mtriple=nvptx -mcpu=sm_60 | %ptxas-verify -arch=sm_60 %}
+; RUN: %if ptxas-sm_60 %{ llc < %s -mtriple=nvptx64 -mcpu=sm_60 | %ptxas-verify -arch=sm_60 %}
 
 ; CHECK-LABEL: .func test(
 define void @test(ptr %dp0, ptr addrspace(1) %dp1, ptr addrspace(3) %dp3, double %d) {
diff --git a/llvm/test/CodeGen/NVPTX/atomics-sm70.ll b/llvm/test/CodeGen/NVPTX/atomics-sm70.ll
index f710d7f883a1b..e1a69d2e3db20 100644
--- a/llvm/test/CodeGen/NVPTX/atomics-sm70.ll
+++ b/llvm/test/CodeGen/NVPTX/atomics-sm70.ll
@@ -2,9 +2,9 @@
 ; RUN: llc < %s -mtriple=nvptx -mcpu=sm_70 -mattr=+ptx63 | FileCheck %s --check-prefixes=CHECK
 ; RUN: llc < %s -mtriple=nvptx64 -mcpu=sm_70 -mattr=+ptx63 | FileCheck %s --check-prefixes=CHECK64
 ; RUN: llc < %s -mtriple=nvptx -mcpu=sm_70 -mattr=+ptx62 | FileCheck %s --check-prefixes=CHECKPTX62
-; RUN: %if ptxas && !ptxas-12.0 %{ llc < %s -mtriple=nvptx -mcpu=sm_70 -mattr=+ptx63 | %ptxas-verify -arch=sm_70 %}
-; RUN: %if ptxas %{ llc < %s -mtriple=nvptx64 -mcpu=sm_70 -mattr=+ptx63 | %ptxas-verify -arch=sm_70 %}
-; RUN: %if ptxas && !ptxas-12.0 %{ llc < %s -mtriple=nvptx -mcpu=sm_70 -mattr=+ptx62 | %ptxas-verify -arch=sm_70 %}
+; RUN: %if ptxas-sm_70 && ptxas-isa-v6.3 && ptxas-32 %{ llc < %s -mtriple=nvptx -mcpu=sm_70 -mattr=+ptx63 | %ptxas-verify -arch=sm_70 %}
+; RUN: %if ptxas-sm_70 && ptxas-isa-v6.3 %{ llc < %s -mtriple=nvptx64 -mcpu=sm_70 -mattr=+ptx63 | %ptxas-verify -arch=sm_70 %}
+; RUN: %if ptxas-sm_70 && ptxas-isa-v6.2 && ptxas-32 %{ llc < %s -mtriple=nvptx -mcpu=sm_70 -mattr=+ptx62 | %ptxas-verify -arch=sm_70 %}
 
 target triple = "nvptx64-nvidia-cuda"
 
diff --git a/llvm/test/CodeGen/NVPTX/atomics-sm90.ll b/llvm/test/CodeGen/NVPTX/atomics-sm90.ll
index f96fd30019025..79e12025ba614 100644
--- a/llvm/test/CodeGen/NVPTX/atomics-sm90.ll
+++ b/llvm/test/CodeGen/NVPTX/atomics-sm90.ll
@@ -2,9 +2,9 @@
 ; RUN: llc < %s -mtriple=nvptx -mcpu=sm_90 -mattr=+ptx78 | FileCheck %s --check-prefixes=CHECK
 ; RUN: llc < %s -mtriple=nvptx64 -mcpu=sm_90 -mattr=+ptx78 | FileCheck %s --check-prefixes=CHECK64
 ; RUN: llc < %s -mtriple=nvptx -mcpu=sm_86 -mattr=+ptx71 | FileCheck %s --check-prefixes=CHECKPTX71
-; RUN: %if ptxas && !ptxas-12.0 %{ llc < %s -mtriple=nvptx -mcpu=sm_90 -mattr=+ptx78 | %ptxas-verify -arch=sm_90 %}
-; RUN: %if ptxas %{ llc < %s -mtriple=nvptx64 -mcpu=sm_90 -mattr=+ptx78 | %ptxas-verify -arch=sm_90 %}
-; RUN: %if ptxas && !ptxas-12.0 %{ llc < %s -mtriple=nvptx -mcpu=sm_86 -mattr=+ptx71 | %ptxas-verify -arch=sm_86 %}
+; RUN: %if ptxas-sm_90 && ptxas-isa-v7.8 && ptxas-32 %{ llc < %s -mtriple=nvptx -mcpu=sm_90 -mattr=+ptx78 | %ptxas-verify -arch=sm_90 %}
+; RUN: %if ptxas-sm_90 && ptxas-isa-v7.8 %{ llc < %s -mtriple=nvptx64 -mcpu=sm_90 -mattr=+ptx78 | %ptxas-verify -arch=sm_90 %}
+; RUN: %if ptxas-sm_86 && ptxas-isa-v7.1 && ptxas-32 %{ llc < %s -mtriple=nvptx -mcpu=sm_86 -mattr=+ptx71 | %ptxas-verify -arch=sm_86 %}
 
 target triple = "nvptx64-nvidia-cuda"
 
diff --git a/llvm/test/CodeGen/NVPTX/atomics-with-scope.ll b/llvm/test/CodeGen/NVPTX/atomics-with-scope.ll
index e6636d706b49d..9e30519b31cc3 100644
--- a/llvm/test/CodeGen/NVPTX/atomics-with-scope.ll
+++ b/llvm/test/CodeGen/NVPTX/atomics-with-scope.ll
@@ -1,7 +1,7 @@
 ; RUN: llc < %s -mtriple=nvptx -mcpu=sm_60 | FileCheck %s -check-prefixes=CHECK,CHECK32
 ; RUN: llc < %s -mtriple=nvptx64 -mcpu=sm_60 | FileCheck %s
-; RUN: %if ptxas && !ptxas-12.0 %{ llc < %s -mtriple=nvptx -mcpu=sm_60 | %ptxas-verify -arch=sm_60 %}
-; RUN: %if ptxas %{ llc < %s -mtriple=nvptx64 -mcpu=sm_60 | %ptxas-verify -arch=sm_60 %}
+; RUN: %if ptxas-sm_60 && ptxas-32 %{ llc < %s -mtriple=nvptx -mcpu=sm_60 | %ptxas-verify -arch=sm_60 %}
+; RUN: %if ptxas-sm_60 %{ llc < %s -mtriple=nvptx64 -mcpu=sm_60 | %ptxas-verify -arch=sm_60 %}
 
 ; CHECK-LABEL: .func test_atomics_scope(
 define void @test_atomics_scope(ptr %fp, float %f,
diff --git a/llvm/test/CodeGen/NVPTX/b52037.ll b/llvm/test/CodeGen/NVPTX/b52037.ll
index b6317dfb28597..268a8972ebd22 100644
--- a/llvm/test/CodeGen/NVPTX/b52037.ll
+++ b/llvm/test/CodeGen/NVPTX/b52037.ll
@@ -4,7 +4,7 @@
 ; https://bugs.llvm.org/show_bug.cgi?id=52037 for the gory details.
 ;
 ; RUN: llc -mtriple=nvptx64-nvidia-cuda -mcpu=sm_70 -O3 -o - %s | FileCheck %s
-; RUN: %if ptxas %{ llc -mtriple=nvptx64-nvidia-cuda -mcpu=sm_70 -O3 -o - %s | %ptxas-verify -arch=sm_70 %}
+; RUN: %if ptxas-sm_70 %{ llc -mtriple=nvptx64-nvidia-cuda -mcpu=sm_70 -O3 -o - %s | %ptxas-verify -arch=sm_70 %}
 
 ; CHECK-LABEL: .visible .entry barney(
 ; CHECK-NOT:  .local{{.*}}__local_depot
diff --git a/llvm/test/CodeGen/NVPTX/barrier.ll b/llvm/test/CodeGen/NVPTX/barrier.ll
index a3b0d21f098f2..f209bdd0cfae7 100644
--- a/llvm/test/CodeGen/NVPTX/barrier.ll
+++ b/llvm/test/CodeGen/NVPTX/barrier.ll
@@ -1,6 +1,6 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
 ; RUN: llc < %s -mtriple=nvptx64 -mcpu=sm_30 -mattr=+ptx60 | FileCheck %s
-; RUN: %if ptxas %{ llc < %s -mtriple=nvptx64 -mcpu=sm_30 -mattr=+ptx60 | %ptxas-verify %}
+; RUN: %if ptxas-isa-v6.0 %{ llc < %s -mtriple=nvptx64 -mcpu=sm_30 -mattr=+ptx60 | %ptxas-verify %}
 
 declare void @llvm.nvvm.bar.warp.sync(i32)
 declare void @llvm.nvvm.barrier.cta.sync.aligned.all(i32)
diff --git a/llvm/test/CodeGen/NVPTX/bf16-instructions.ll b/llvm/test/CodeGen/NVPTX/bf16-instructions.ll
index aee58a044a986..835e09b9a38e0 100644
--- a/llvm/test/CodeGen/NVPTX/bf16-instructions.ll
+++ b/llvm/test/CodeGen/NVPTX/bf16-instructions.ll
@@ -3,9 +3,9 @@
 ; RUN: llc < %s -mtriple=nvptx64 -mcpu=sm_80 -mattr=+ptx71 | FileCheck --check-prefixes=CHECK,SM80 %s
 ; RUN: llc < %s -mtriple=nvptx64 -mcpu=sm_80 -mattr=+ptx71 -denormal-fp-math-f32=preserve-sign | FileCheck --check-prefixes=CHECK,SM80-FTZ %s
 ; RUN: llc < %s -mtriple=nvptx64 -mcpu=sm_90 -mattr=+ptx78 | FileCheck --check-prefixes=CHECK,SM90 %s
-; RUN: %if ptxas-11.8 %{ llc < %s -mtriple=nvptx64 -mcpu=sm_80 -mattr=+ptx71 | %ptxas-verify -arch=sm_80 %}
-; RUN: %if ptxas-11.8 %{ llc < %s -mtriple=nvptx64 -mcpu=sm_80 -mattr=+ptx71 -denormal-fp-math-f32=preserve-sign | %ptxas-verify -arch=sm_80 %}
-; RUN: %if ptxas-11.8 %{ llc < %s -mtriple=nvptx64 -mcpu=sm_90 -mattr=+ptx78 | %ptxas-verify -arch=sm_90 %}
+; RUN: %if ptxas-sm_80 && ptxas-isa-v7.1 %{ llc < %s -mtriple=nvptx64 -mcpu=sm_80 -mattr=+ptx71 | %ptxas-verify -arch=sm_80 %}
+; RUN: %if ptxas-sm_80 && ptxas-isa-v7.1 %{ llc < %s -mtriple=nvptx64 -mcpu=sm_80 -mattr=+ptx71 -denormal-fp-math-f32=preserve-sign | %ptxas-verify -arch=sm_80 %}
+; RUN: %if ptxas-sm_90 && ptxas-isa-v7.8 %{ llc < %s -mtriple=nvptx64 -mcpu=sm_90 -mattr=+ptx78 | %ptxas-verify -arch=sm_90 %}
 
 target triple = "nvptx64-nvidia-cuda"
 
diff --git a/llvm/test/CodeGen/NVPTX/bf16x2-instructions-approx.ll b/llvm/test/CodeGen/NVPTX/bf16x2-instructions-approx.ll
index e1d4ef1073a78..60a5abf03e19f 100644
--- a/llvm/test/CodeGen/NVPTX/bf16x2-instructions-approx.ll
+++ b/llvm/test/CodeGen/NVPTX/bf16x2-instructions-approx.ll
@@ -1,6 +1,6 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
 ; RUN: llc < %s -mtriple=nvptx64 -mcpu=sm_80 -mattr=+ptx71 | FileCheck --check-prefixes=CHECK %s
-; RUN: %if ptxas-11.8 %{ llc < %s -mtriple=nvptx64 -mcpu=sm_80 -mattr=+ptx71 | %ptxas-verify -arch=sm_80 %}
+; RUN: %if ptxas-sm_80 && ptxas-isa-v7.1 %{ llc < %s -mtriple=nvptx64 -mcpu=sm_80 -mattr=+ptx71 | %ptxas-verify -arch=sm_80 %}
 
 target datalayout = "e-m:o-i64:64-i128:128-n32:64-S128"
 
diff --git a/llvm/test/CodeGen/NVPTX/bf16x2-instructions.ll b/llvm/test/CodeGen/NVPTX/bf16x2-instructions.ll
index b4641d01eb927..4ea8ffc727b56 100644
--- a/llvm/test/CodeGen/NVPTX/bf16x2-instructions.ll
+++ b/llvm/test/CodeGen/NVPTX/bf16x2-instructions.ll
@@ -1,8 +1,8 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
 ; RUN: llc < %s -mtriple=nvptx64 -mcpu=sm_80 -mattr=+ptx71 | FileCheck --check-prefixes=CHECK,SM80 %s
 ; RUN: llc < %s -mtriple=nvptx64 -mcpu=sm_90 -mattr=+ptx78 | FileCheck --check-prefixes=CHECK,SM90 %s
-; RUN: %if ptxas-11.8 %{ llc < %s -mtriple=nvptx64 -mcpu=sm_80 -mattr=+ptx71 | %ptxas-verify -arch=sm_80 %}
-; RUN: %if ptxas-11.8 %{ llc < %s -mtriple=nvptx64 -mcpu=sm_90 -mattr=+ptx78 | %ptxas-verify -arch=sm_90 %}
+; RUN: %if ptxas-sm_80 && ptxas-isa-v7.1 %{ llc < %s -mtriple=nvptx64 -mcpu=sm_80 -mattr=+ptx71 | %ptxas-verify -arch=sm_80 %}
+; RUN: %if ptxas-sm_90 && ptxas-isa-v7.8 %{ llc < %s -mtriple=nvptx64 -mcpu=sm_90 -mattr=+ptx78 | %ptxas-verify -arch=sm_90 %}
 
 target datalayout = "e-m:o-i64:64-i128:128-n32:64-S128"
 
diff --git a/llvm/test/CodeGen/NVPTX/bmsk.ll b/llvm/test/CodeGen/NVPTX/bmsk.ll
index d5b278657bd52..b9404f2a160cd 100644
--- a/llvm/test/CodeGen/NVPTX/bmsk.ll
+++ b/llvm/test/CodeGen/NVPTX/bmsk.ll
@@ -1,6 +1,6 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
 ; RUN: llc -o - < %s -mcpu=sm_70 -mattr=+ptx76 | FileCheck %s
-; RUN: %if ptxas %{ llc < %s -mcpu=sm_70 -mattr=+ptx76 | %ptxas-verify -arch=sm_70 %}
+; RUN: %if ptxas-sm_70 && ptxas-isa-v7.6 %{ llc < %s -mcpu=sm_70 -mattr=+ptx76 | %ptxas-verify -arch=sm_70 %}
 
 target triple = "nvptx64-unknown-cuda"
 
diff --git a/llvm/test/CodeGen/NVPTX/bswap.ll b/llvm/test/CodeGen/NVPTX/bswap.ll
index 0d1d6da4ba2b6..b913b9a03c553 100644
--- a/llvm/test/CodeGen/NVPTX/bswap.ll
+++ b/llvm/test/CodeGen/NVPTX/bswap.ll
@@ -1,9 +1,9 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 4
 ; RUN: llc < %s -mtriple=nvptx64 -mcpu=sm_20 -mattr=+ptx70 | FileCheck -check-prefixes CHECK,PTX70 %s
 ; RUN: %if ptxas %{ llc < %s -mtriple=nvptx64 -mcpu=sm_20 | %ptxas-verify %}
-; RUN: %if ptxas-11.0 %{ llc < %s -mtriple=nvptx64 -mcpu=sm_20 -mattr=+ptx70 | %ptxas-verify %}
+; RUN: %if ptxas-isa-v7.0 %{ llc < %s -mtriple=nvptx64 -mcpu=sm_20 -mattr=+ptx70 | %ptxas-verify %}
 ; RUN: llc < %s -mtriple=nvptx64 -mcpu=sm_20 -mattr=+ptx71 | FileCheck -check-prefixes CHECK,PTX71 %s
-; RUN: %if ptxas-11.0 %{ llc < %s -mtriple=nvptx64 -mcpu=sm_20 -mattr=+ptx71 | %ptxas-verify %}
+; RUN: %if ptxas-isa-v7.1 %{ llc < %s -mtriple=nvptx64 -mcpu=sm_20 -mattr=+ptx71 | %ptxas-verify %}
 
 target triple = "nvptx64-nvidia-cuda"
 
diff --git a/llvm/test/CodeGen/NVPTX/byval-arg-vectorize.ll b/llvm/test/CodeGen/NVPTX/byval-arg-vectorize.ll
index 9988d5b122cc1..e7f71f4ad52ea 100644
--- a/llvm/test/CodeGen/NVPTX/byval-arg-vectorize.ll
+++ b/llvm/test/CodeGen/NVPTX/byval-arg-vectorize.ll
@@ -1,6 +1,6 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
 ; RUN: llc < %s -mcpu=sm_70 | FileCheck %s
-; RUN: %if ptxas %{ llc < %s -mcpu=sm_70 | %ptxas-verify -arch=sm_70 %}
+; RUN: %if ptxas-sm_70 %{ llc < %s -mcpu=sm_70 | %ptxas-verify -arch=sm_70 %}
 
 target triple = "nvptx64-nvidia-cuda"
 
diff --git a/llvm/test/CodeGen/NVPTX/byval-const-global.ll b/llvm/test/CodeGen/NVPTX/byval-const-global.ll
index b4934e1a94d1b..81e7edfd8602e 100644
--- a/llvm/test/CodeGen/NVPTX/byval-const-global.ll
+++ b/llvm/test/CodeGen/NVPTX/byval-const-global.ll
@@ -1,6 +1,6 @@
...
[truncated]

Copy link
Member

@AlexMaclean AlexMaclean left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be very useful. I've personally run afoul of issues with versions of ptxas not working with a given configuration with %ptxas-verify several times. I would consider tweaking the naming of these a little bit. I'd recommend that we remove the "v" in the ISA features and rename "32" -> "ptr32". I'd also lean towards using () or {} to represent these (i.e. ptxas(sm_90), ptxas(isa-8.4)) but that is more a matter of personal taste so if you or others disagree I'm happy to leave as is.

config.available_features.add("ptxas")
tools.extend(
[
ToolSubst("%ptxas", ptxas_executable),
ToolSubst("%ptxas-verify", "{} -arch=sm_60 -c -".format(ptxas_executable)),
ToolSubst("%ptxas-verify", f"{ptxas_executable} -c -"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What impact will removing "sm_60" have?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll now use the default architecture assumed by ptxas, which seems to always be the oldest supported architecture.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More to your point, though, assuming you're using a CUDA 13.0+ ptxas executable, there shouldn't be any impact.

However, if you're using CUDA 12.9, where ptxas defaults to sm_52 and a test was implicitly relying on ptxas-verify setting the arch to sm_60 (e.g. to test atom.add.f64), that test will begin to fail.

text=True,
check=True,
)
supported_sms = re.findall(r"'sm_(\d+(?:[af]?))'", result.stdout)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: SM variants may be mentioned in other contexts.

E.g. ptxas-13 mentions sm_70, even though it no longer supports it. We're only saved by the fact that the pattern above matches the quoted string and the mention is unquoted:

--legacy-bar-warp-wide-behavior                     (-legacy-bar-warp-wide-behavior)
...
        sm_70 or higher.This is a deprecated option and it will be removed in future

You may want to limit the input to the "Allowed values for this option:" subsection of the --gpu-arch option.

@justinfargnoli
Copy link
Contributor Author

I'd also lean towards using () or {} to represent these (i.e. ptxas(sm_90), ptxas(isa-8.4)) but that is more a matter of personal taste so if you or others disagree I'm happy to leave as is.

It looks like lit doesn't like the use of (), {}, or <>, so I went back to using -.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants