[AutoBump] Merge with 08195f31 (Jan 23) (18) #556

jorickert · 2025-05-21T08:33:06Z

No description provided.

… it as much as possible. (llvm#123896)

This cannot happen. Also simplify the LaneBitmask check from !none to any.

This was allocating tiny helper classes for every instruction visited. We can just dispatch over the cases in the visitor function instead.

…#123587) This commit completed four tasks: - Add `-mrelax/-mno-relax` options support for LoongArch in clang driver. - Print error for `-gsplit-dwarf` with LoongArch linker relaxation (`-mrelax`). - Pass `-X` to linker to discard a plethora of `.L` symbols due to linker relaxation. - Forward `--no-relax` option to linker.

) This is looking at operand 0 of a REG_SEQUENCE, which can never have a subregister index.

…ing (llvm#123474) `handleIntrinsicByApplyingToShadow` (introduced in llvm#114490) requires that the intrinsic supports integer-ish operands; this is not the case for all intrinsics. This patch generalizes the function to bitcast the shadow arguments to be the same type as the original intrinsic, thus guaranteeing that the intrinsic exists. Additionally, it casts the computed shadow to be an appropriate shadow type. This function assumes that the intrinsic will handle arbitrary bit-patterns (for example, if the intrinsic accepts floats for var1, we assume that it works normally even if inputs are NaNs etc.).

…lvm#123911) We already bail if the user is tied in checkUsers, which is true for all passthrus. Remove the check in getOperandLog2EEW so that it only worries about computing the OperandInfo, and leaves the passthru correctness to checkUsers.

…124046) Reverts llvm#123853 The introduction of `reflect-error.ll` surfaced a bug with the use of `report_fatal_error` in `SPIRVInstructionSelector` that was propagated into the pr. This has caused a build-bot breakage, and the work to solve the underlying issue is tracked here: llvm#124045. We can re-apply this commit when the underlying issue is resolved.

…lvm#123701) Now a `emitc.switch` with argument of `emitc.expression` wouldn't emit its argument to cpp. This patch fix it.

…e constraints" (llvm#111143) Closes llvm#98592

…ng it LLVM dialect (llvm#123840) With these changes, CUF atomic operations are handled as cudadevice intrinsics and are converted straight to the LLVM dialect with the `llvm.atomicrw` operation. I am only submitting changes for `atomicadd` to gather feedback. If we are to proceed with these changes I will add support for all other applicable atomic operations following this pattern.

Increases minimum CAS size from 16 bit to 32 bit, for better SASS codegen. When atomics are emulated using atom.cas.b16, the SASS generated includes 2 (nested) emulation loops. When emulated using an atom.cas.b32 loop, the SASS too has a single emulation loop. Using 32 bit CAS thus results in better codegen.

Widely supported but missing on AIX https://www.austingroupbugs.net/view.php?id=993

When looking at the slowest lit tests, I'm seeing these four tests take two to eight minutes. Test coverage on Linux should be sufficient for the functionality on top of it not really being useful on Windows at all. This was observed when hacking on the new premerge in a windows VM.

…vm#116771) Two options for clang -mno-scq: Disable sc.q instruction. -mscq: Enable sc.q instruction. The default is -mno-scq.

…llvm#97930)

…lvm#123881) This extension adds eight 48 bit load store instructions. The current spec can be found at: https://github.com/quic/riscv-unified-db/releases/latest This patch adds assembler only support. --------- Co-authored-by: Harsh Chandel <[email protected]>

https://discourse.llvm.org/t/rfc-profile-guided-static-data-partitioning/83744 proposes to partition static data sections. This patch introduces a codegen pass. This patch produces jump table hotness in the in-memory states (machine jump table info and entries). Target-lowering and asm-printer consume the states and produce `.hot` section suffix. The follow up PR llvm#122215 implements such changes. --------- Co-authored-by: Ellis Hoag <[email protected]>

…#118656) This patch is an extension to llvm#115128. After profiling LLVM test-suite, I see a lot of loop nest of depth more than `MaxLoopNestDepth` which is 10. Early exit for them would save compile-time as it would avoid computing DependenceInfo and CacheCost. Please see 'bound-max-depth' branch on compile-time-tracker.

Fixes llvm#113191 Issue: [flang][OpenMP] Runtime segfault when an allocatable variable is used with copyin Rootcause: The value of the threadprivate variable is not being copied from the primary thread to the other threads within a parallel region. As a result it tries to access a null pointer inside a parallel region which causes segfault. Fix: When allocatables used with copyin clause need to ensure that, on entry to any parallel region each thread’s copy of a variable will acquire the allocation status of the primary thread, before copying the value of a threadprivate variable of the primary thread to the threadprivate variable of each other member of the team.

When `try_table`'s catch clause's destination has a return type, as in the case of catch with a concrete tag, catch_ref, and catch_all_ref. For example: ```wasm block exnref try_table (catch_all_ref 0) ... end_try_table end_block ... use exnref ... ``` This code is not valid because the block's body type is not exnref. So we add an unreachable after the 'end_try_table' to make the code valid here: ```wasm block exnref try_table (catch_all_ref 0) ... end_try_table unreachable ;; Newly added end_block ``` Because 'unreachable' is a terminator we also need to split the BB. --- We need to handle the same thing for unwind mismatch handling. In the code below, we create a "trampoline BB" that will be the destination for the nested `try_table`~`end_try_table` added to fix a unwind mismatch: ```wasm try_table (catch ... ) block exnref ... try_table (catch_all_ref N) some code end_try_table ... end_block ;; Trampoline BB throw_ref end_try_table ``` While the `block` added for the trampoline BB has the return type `exnref`, its body, which contains the nested `try_table` and other code, wouldn't have the `exnref` return type. Most times it didn't become a problem because the block's body ended with something like `br` or `return`, but that may not always be the case, especially when there is a loop. So we add an `unreachable` to make the code valid here too: ```wasm try_table (catch ... ) block exnref ... try_table (catch_all_ref N) some code end_try_table ... unreachable ;; Newly added end_block ;; Trampoline BB throw_ref end_try_table ``` In this case we just append the `unreachable` at the end of the layout predecessor BB. (This was tricky to do in the first (non-mismatch) case because there `end_try_table` and `end_block` were added in the beginning of an EH pad in `placeTryTableMarker` and moving `end_try_table` and the new `unreachable` to the previous BB caused other problems.) --- This adds many `unreaachable`s to the output, but this adds `unreachable` to only a few places to see if this is working. The FileCheck lines in `exception.ll` and `cfg-stackify-eh.ll` are already heavily redacted to only leave important control-flow instructions, so I don't think it's worth adding `unreachable`s everywhere.

…ypes (llvm#123818) Fixes clangd/clangd#1249

Resubmit, previously PR has compilation issues.

Most of this is mis-compiling with +fullfp16 and should be disabled for GISel.

Assign register banks to virtual registers. Does not use generic RegBankSelect. After register bank selection all register operand of G_ instructions have LLT and register banks exclusively. If they had register class, reassign appropriate register bank. Assign register banks using machine uniformity analysis: Sgpr - uniform values and some lane masks Vgpr - divergent, non S1, values Vcc - divergent S1 values(lane masks) AMDGPURegBankSelect does not consider available instructions and, in some cases, G_ instructions with some register bank assignment can't be inst-selected. This is solved in RegBankLegalize. Exceptions when uniformity analysis does not work: S32/S64 lane masks: - need to end up with sgpr register class after instruction selection - In most cases Uniformity analysis declares them as uniform (forced by tablegen) resulting in sgpr S32/S64 reg bank - When Uniformity analysis declares them as divergent (some phis), use intrinsic lane mask analyzer to still assign sgpr register bank temporal divergence copy: - COPY to vgpr with implicit use of $exec inside of the cycle - this copy is declared as uniform by uniformity analysis - make sure that assigned bank is vgpr Note: uniformity analysis does not consider that registers with vgpr def are divergent (you can have uniform value in vgpr). - TODO: implicit use of $exec could be implemented as indicator that instruction is divergent

This is subject to agreement by the Flang community (https://discourse.llvm.org/t/rfc-deprecate-ofast-in-flang/80243).

…m#123900) ... for the dynamic blocks created for operator new calls. This way we get the type of memory allocated right. As a side-effect, the diagnostics now point to the std::allocator calls, which is an improvement.

) As part of the "RemoveDIs" project, BasicBlock::iterator now carries a debug-info bit that's needed when getFirstNonPHI and similar feed into instruction insertion positions. Call-sites where that's necessary were updated a year ago; but to ensure some type safety however, we'd like to have all calls to moveBefore use iterators. This patch adds a (guaranteed dereferenceable) iterator-taking moveBefore, and changes a bunch of call-sites where it's obviously safe to change to use it by just calling getIterator() on an instruction pointer. A follow-up patch will contain less-obviously-safe changes. We'll eventually deprecate and remove the instruction-pointer insertBefore, but not before adding concise documentation of what considerations are needed (very few).

When generating `arm_neon.h`, NeonEmitter outputs code that violates strict aliasing rules (C23 6.5 Expressions #7, C++23 7.2.1 Value category [basic.lval] #11), for example: bfloat16_t __reint = __p0; uint32_t __reint1 = (uint32_t)(*(uint16_t *) &__reint) << 16; __ret = *(float32_t *) &__reint1; This patch fixed the offending code by replacing it with a call to `__builtin_bit_cast`.

This patch adds SM and PTX versions for SM 101, 120 and their arch-accelerated variants. All these are supported in cuda-12.8. sm120/120a requires ptx8.7 and the rest require ptx8.6. Signed-off-by: Durgadoss R <[email protected]>

Lower G_ instructions that can't be inst-selected with register bank assignment from AMDGPURegBankSelect based on uniformity analysis. - Lower instruction to perform it on assigned register bank - Put uniform value in vgpr because SALU instruction is not available - Execute divergent instruction in SALU - "waterfall loop" Given LLTs on all operands after legalizer, some register bank assignments require lowering while other do not. Note: cases where all register bank assignments would require lowering are lowered in legalizer. AMDGPURegBankLegalize goals: - Define Rules: when and how to perform lowering - Goal of defining Rules it to provide high level table-like brief overview of how to lower generic instructions based on available target features and uniformity info (uniform vs divergent). - Fast search of Rules, depends on how complicated Rule.Predicate is - For some opcodes there would be too many Rules that are essentially all the same just for different combinations of types and banks. Write custom function that handles all cases. - Rules are made from enum IDs that correspond to each operand. Names of IDs are meant to give brief description what lowering does for each operand or the whole instruction. - AMDGPURegBankLegalizeHelper implements lowering algorithms Since this is the first patch that actually enables -new-reg-bank-select here is the summary of regression tests that were added earlier: - if instruction is uniform always select SALU instruction if available - eliminate back to back vgpr to sgpr to vgpr copies of uniform values - fast rules: small differences for standard and vector instruction - enabling Rule based on target feature - salu_float - how to specify lowering algorithm - vgpr S64 AND to S32 - on G_TRUNC in reg, it is up to user to deal with truncated bits G_TRUNC in reg is treated as no-op. - dealing with truncated high bits - ABS S16 to S32 - sgpr S1 phi lowering - new opcodes for vcc-to-scc and scc-to-vcc copies - lowering for vgprS1-to-vcc copy (formally this is vgpr-to-vcc G_TRUNC) - S1 zext and sext lowering to select - uniform and divergent S1 AND(OR and XOR) lowering - inst-selected into SALU instruction - divergent phi with uniform inputs - divergent instruction with temporal divergent use, source instruction is defined as uniform(AMDGPURegBankSelect) - missing temporal divergence lowering - uniform phi, because of undef incoming, is assigned to vgpr. Will be fixed in AMDGPURegBankSelect via another fix in machine uniformity analysis.

…vm#117939) Canonicalize gathers/scatters with contiguous (i.e. [0, 1, 2, ...]) offsets into vector masked load/store ops.

…#123958) `TimerGroup` don't need to use as field of `ClangTidyProfiling`. We can construct it local during destructing.

…23454) skip header file before register AST Matchers it can avoid to matcher lots of ast node when lint header file

Add IDs for bit width that cover multiple LLTs: B32 B64 etc. "Predicate" wrapper class for bool predicate functions used to write pretty rules. Predicates can be combined using &&, || and !. Lowering for splitting and widening loads. Write rules for loads to not change existing mir tests from old regbankselect.

…l coroutine clones (llvm#118628) Summary: CoroCloner, by calling into CloneFunctionInto, does a lot of repeated work priming DIFinder and building a list of common module-level debug info metadata. For programs compiled with full debug info this can get very expensive. This diff builds the data once and shares it between all clones. Anecdata for a sample cpp source file compiled with full debug info: | | Baseline | IdentityMD set | Prebuilt CommonDI (cur.) | |-----------------|----------|----------------|--------------------------| | CoroSplitPass | 306ms | 221ms | 68ms | | CoroCloner | 101ms | 72ms | 0.5ms | | CollectCommonDI | - | - | 63ms | | Speed up | 1x | 1.4x | 4.5x | Note that CollectCommonDebugInfo happens once *per coroutine* rather than per clone. Test Plan: ninja check-llvm-unit ninja check-llvm Compiled a sample internal source file, checked time trace output for scope timings.

…2866) Change existing code for G_PHI to match what LLVM-IR version is doing via PHINode::hasConstantOrUndefValue. This is not safe for regular PHI since it may appear with an undef operand and getVRegDef can fail. Most notably this improves number of values that can be allocated to sgpr in AMDGPURegBankSelect. Common case here are phis that appear in structurize-cfg lowering for cycles with multiple exits: Undef incoming value is coming from block that reached cycle exit condition, if other incoming is uniform keep the phi uniform despite the fact it is joining values from pair of blocks that are entered via divergent condition branch.

This is the behavior expected by DWARF. It also requires some fixups to algorithms which were storing the addresses of some objects (Blocks and Variables) relative to the beginning of the function. There are plenty of things that still don't work in this setups, but this change is sufficient for the expression evaluator to correctly recognize the entry point of a function in this case.

…llvm#123745) Add the following workflows: - `fullbuild` on aarch64 ubuntu - `overlay` on windows 2025 - `overlay` on aarch64 ubuntu `ccache` variant is used on `aarch64` due to hendrikmuhs/ccache-action#279

…ot (llvm#121463) In function handleMFLOSlot, we may get a variable LastInstInFunction with a value of true from function getNextMachineInstr and IInSlot may be null which would trigger an assert. So we need to skip this case. Fix llvm#118223.

With the removal of mlir-vulkan-runner (as part of llvm#73457) in e7e3c45, mlir-cpu-runner is now the only runner for all CPU and GPU targets, and the "cpu" name has been misleading for some time already. This commit renames it to mlir-runner.

[AutoBump] Merge with fixes of eb206e9 (Jan 24) (21)

[AutoBump] Merge with b4e81fd (Jan 24) (20)

[AutoBump] Merge with fixes of 8388040 (Jan 23) (19)

HanKuanChen and others added 30 commits January 23, 2025 09:47

[SLP] Move transformMaskAfterShuffle into BaseShuffleAnalysis and use…

d3aea77

… it as much as possible. (llvm#123896)

[gn] port ec15b24

1042ddc

PeepholeOpt: Remove null TargetRegisterInfo check (llvm#123933)

6f69ade

This cannot happen. Also simplify the LaneBitmask check from !none to any.

PeepholeOpt: Stop allocating tiny helper classes (NFC) (llvm#123936)

2646e2d

This was allocating tiny helper classes for every instruction visited. We can just dispatch over the cases in the visitor function instead.

PeepholeOpt: Remove check for subreg index on a def operand (llvm#123943

15c2d4b

) This is looking at operand 0 of a REG_SEQUENCE, which can never have a subregister index.

[emitc] Fix the translation switchop with argument of expressionop (l…

3ef90f8

…lvm#123701) Now a `emitc.switch` with argument of `emitc.expression` wouldn't emit its argument to cpp. This patch fix it.

[Clang] Implement CWG 2628 "Implicit deduction guides should propagat…

b46fcb9

…e constraints" (llvm#111143) Closes llvm#98592

[Signals] Exclude dladdr for AIX after llvm#123879

1c5d971

Widely supported but missing on AIX https://www.austingroupbugs.net/view.php?id=993

[LoongArch] Support sc.q instruction for 128bit cmpxchg operation (ll…

19834b4

…vm#116771) Two options for clang -mno-scq: Disable sc.q instruction. -mscq: Enable sc.q instruction. The default is -mno-scq.

[Clang] [NFC] Mark UnresolvedSetImpl's move operations as defaulted (…

0bcf34e

…llvm#97930)

[LoongArch] Summary llvm20 release notes

d80b814

[LoongArch] Summary clang20 release notes

3c7a878

[LoongArch] Update lld20 release notes

aa273fd

[gn build] Port de209fa

646f034

Specify triple for llc test

ea49d47

Temporarily disable test on Fuchsia

5d8390d

[clang][CodeComplete] Use HeuristicResolver to resolve DependentNameT…

ba17485

…ypes (llvm#123818) Fixes clangd/clangd#1249

[GISel] Add more FP opcodes to CSE (llvm#123949)

220004d

Resubmit, previously PR has compilation issues.

davemgreen and others added 28 commits January 24, 2025 09:51

[AArch64][GlobalISel] Add some bf16 GISel test coverage. NFC

3208801

Most of this is mis-compiling with +fullfp16 and should be disabled for GISel.

[SLP][X86] Add missing SSE2/SSE4 checks from vector rotate tests

625e0a4

[Flang][Driver] Deprecate Ofast (llvm#101701)

9bb3c62

This is subject to agreement by the Flang community (https://discourse.llvm.org/t/rfc-deprecate-ofast-in-flang/80243).

[clang][NFC] Update cxx_dr_status.html

33fc477

[NVPTX] Add SM versions for 101 and 120 (llvm#124155)

965ff7f

This patch adds SM and PTX versions for SM 101, 120 and their arch-accelerated variants. All these are supported in cuda-12.8. sm120/120a requires ptx8.7 and the rest require ptx8.6. Signed-off-by: Durgadoss R <[email protected]>

[mlir][vector] Canonicalize gathers/scatters with trivial offsets (ll…

88136f9

…vm#117939) Canonicalize gathers/scatters with contiguous (i.e. [0, 1, 2, ...]) offsets into vector masked load/store ops.

[clang-tidy][NFC] simplify TimerGroup in ClangTidyProfiling (llvm…

46a08ce

…#123958) `TimerGroup` don't need to use as field of `ClangTidyProfiling`. We can construct it local during destructing.

[clang-tidy][NFC] improve performance misc-unused-using-decls (llvm#1…

8e6d6a5

…23454) skip header file before register AST Matchers it can avoid to matcher lots of ast node when lint header file

[libc][workflow] improve ci coverage with windows-2025 and arm ubuntu (…

acc13db

…llvm#123745) Add the following workflows: - `fullbuild` on aarch64 ubuntu - `overlay` on windows 2025 - `overlay` on aarch64 ubuntu `ccache` variant is used on `aarch64` due to hendrikmuhs/ccache-action#279

[gn] port 4018317

b4e81fd

[AutoBump] Merge with 08195f3 (Jan 23)

d21dbbb

[AutoBump] Merge with fixes of 8388040 (Jan 23)

a8bfe24

[AutoBump] Merge with b4e81fd (Jan 24)

9dfe3ec

[AutoBump] Merge with fixes of eb206e9 (Jan 24)

f106a87

Merge pull request #559 from Xilinx/bump_to_eb206e9e

c48c70f

[AutoBump] Merge with fixes of eb206e9 (Jan 24) (21)

Merge pull request #558 from Xilinx/bump_to_b4e81fd1

19ec244

[AutoBump] Merge with b4e81fd (Jan 24) (20)

Merge pull request #557 from Xilinx/bump_to_8388040f

9a3a74a

[AutoBump] Merge with fixes of 8388040 (Jan 23) (19)

jorickert merged commit 91fab1b into bump_to_7e622b61 Jun 26, 2025
10 of 11 checks passed

jorickert deleted the bump_to_08195f31 branch June 26, 2025 08:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[AutoBump] Merge with 08195f31 (Jan 23) (18) #556

[AutoBump] Merge with 08195f31 (Jan 23) (18) #556

Uh oh!

jorickert commented May 21, 2025

Uh oh!

Uh oh!

Uh oh!

[AutoBump] Merge with 08195f31 (Jan 23) (18) #556

[AutoBump] Merge with 08195f31 (Jan 23) (18) #556

Uh oh!

Conversation

jorickert commented May 21, 2025

Uh oh!

Uh oh!

Uh oh!