Skip to content

[lldb] Revive TestSimulatorPlatform.py (#142244) #1

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 851 commits into from

Conversation

arjunUpatel
Copy link
Owner

[lldb] Revive TestSimulatorPlatform.py (llvm#142244)

This test was incorrectly disabled and bitrotted since then. This PR
fixes up the test and re-enables it.

  • Build against the system libc++ (which can target the simulator)
  • Bump the deployment target for iOS and tvOS on Apple Silicon
  • Skip backdeploying to pre-Apple Silicon OS on Apple Silicon.

[llvm-rc] Add support for multiplication and division in expressions (llvm#143373)

This is supported by GNU windres. MS rc.exe does accept these
expressions, but doesn't evalulate them correctly, it only returns the
left hand side.

This fixes one aspect of
llvm#143157.

[LV] Remove unused LoopBypassBlocks from ILV (NFC).

After recent refactorings to move parts of skeleton creation
LoopBypassBlocks isn't used any more. Remove it.

[Clang] [Cygwin] va_list must be treated like normal Windows (llvm#143115)

Handling of va_list on Cygwin environment must be matched to normal
Windows environment.

The existing test test/CodeGen/ms_abi.c seems relevant, but it
contains __attribute__((sysv_abi)), which is not supported on Cygwin.
The new test is based on the __attribute__((ms_abi)) portion of that
test.


Co-authored-by: jeremyd2019 [email protected]

[SeparateConstOffsetFromGEP] Decompose constant xor operand if possible (llvm#135788)

Try to transform XOR(A, B+C) in to XOR(A,C) + B where XOR(A,C) becomes
the base for memory operations. This transformation is true under the
following conditions
Check 1 - B and C are disjoint.
Check 2 - XOR(A,C) and B are disjoint.

This transformation is beneficial particularly for GEPs because
Disjoint OR operations often map better to addressing modes than XOR.
This can enable further optimizations in the GEP offset folding pipeline

[BOLT] Expose external entry count for functions (llvm#141674)

Record the number of function invocations from external code - code
outside the binary, which may include JIT code and DSOs. Accounting
external entry counts improves the fidelity of call graph flow
conservation analysis.

Test Plan: updated shrinkwrapping.test

[flang][runtime] Replace recursion with iterative work queue (llvm#137727)

Recursion, both direct and indirect, prevents accurate stack size
calculation at link time for GPU device code. Restructure these
recursive (often mutually so) routines in the Fortran runtime with new
implementations based on an iterative work queue with
suspendable/resumable work tickets: Assign, Initialize, initializeClone,
Finalize, and Destroy.

Default derived type I/O is also recursive, but already disabled. It can
be added to this new framework later if the overall approach succeeds.

Note that derived type FINAL subroutine calls, defined assignments, and
defined I/O procedures all perform callbacks into user code, which may
well reenter the runtime library. This kind of recursion is not handled
by this change, although it may be possible to do so in the future using
thread-local work queues.

The effects of this restructuring on CPU performance are yet to be
measured.

[flang][NFC] Clean up code in two new functions (llvm#142037)

Two recently-added functions in Semantics/tools.h need some cleaning up
to conform to the coding style of the project. One of them should
actually be in Parser/tools.{h,cpp}, the other doesn't need to be
defined in the header.

[flang] Ensure overrides of special procedures (llvm#142465)

When a derived type declares a generic procedure binding of interest to
the runtime library, such as for ASSIGNMENT(=), it overrides any binding
that might have been present for the parent type.

Fixes llvm#142414.

[IR] Simplify scalable vector handling in ShuffleVectorInst::getShuffleMask. NFC (llvm#143596)

Combine the scalable vector UndefValue check with the earlier
ConstantAggregateZero handling for fixed and scalable vectors.

Assert that the rest of the code is only reached for fixed vectors.

Use append instead of resize since we know the size is increasing.

[IR2Vec] Exposing Embedding as an data type wrapped around std::vector (llvm#143197)

Currently Embedding is std::vector<double>. This PR makes it a data type wrapped around std::vector<double> to overload basic arithmetic operators and expose comparison operations. It simplifies the usage here and in the passes where operations on Embedding would be performed.

(Tracking issue - llvm#141817)

[RISCV][TTI] Allow partial reduce with mismatched extends (llvm#143608)

This depends on the recently add partial_reduce_sumla node for lowering
but at this point, we have all the parts.

[lldb] Fix target stop-hook add help output

The help output for target stop-hook add references non-existing
option --one-line-command. The correct option is --one-liner:

-o <one-line-command> ( --one-liner <one-line-command> )
     Add a command for the stop hook.  Can be specified more than once,
     and commands will be run in the order they appear.

This commit fixes the help text.

rdar://152730660

[HWASAN] Disable LSan test on Android (llvm#143625)

Android HWASan does not support LSan.

[flang][cuda] Fix CUDA generic resolution for VALUE arguments in device procedures (llvm#140952)

For actual arguments that have VALUE attribute inside device routines, treat them as if they have device attribute.

Disable prctl test when building for arm or riscv. (llvm#143627)

I'm setting up a buildbot for arm32 using qemu and qemu doesn't support
PR_GET_THP_DISABLE.
Disable the test for now while we figure out what to do about that.

Also disable for riscv because we may do the same for riscv buildbots.

Revert "[SeparateConstOffsetFromGEP] Decompose constant xor operand if possible (llvm#135788)"

This reverts commit 13ccce2.

The tests are on non-canonical IR, and adds an extra unrelated
pre-processing step to the pass. I'm assuming this is a workaround
for the known-bits recursion depth limit in instcombine.

[CIR] Upstream support for calling constructors (llvm#143579)

This change adds support for calling C++ constructors. The support for
actually defining a constructor is still missing and will be added in a
later change.

Revert "[CI] Migrate to runtimes build" (llvm#143612)

Reverts llvm#142696

See llvm#143610 for details; I
believe this PR causes CI builders to build LLVM in a way that's been
broken for a while. To keep CI green, if this is the correct culprit,
those tests should be fixed or skipped

[TySan][CMake] Depend on tysan for check-tysan in runtimes build (llvm#143597)

The runtimes build expects libclang_rt.tysan.a to be available, but the
check-tysan target does not actually depend on it when built using a
runtimes build with LLVM_ENABLE_RUNTIMES pointing at ./llvm. This means
we get test failures when running check-compiler-rt due to the missing
static archive.

This patch also makes check-tysan depend on tysan when we are using the
runtimes build.

This is causing premerge failures currently since we recently migrated
to the runtimes build.

[PGO][Offload] Fix offload coverage mapping (llvm#143490)

This pull request fixes coverage mapping on GPU targets.

  • It adds an address space cast to the coverage mapping generation pass.
  • It reads the profiled function names from the ELF directly. Reading it
    from public globals was causing issues in cases where multiple
    device-code object files are linked together.

[RISCV][NFC] Factor out VLEN in the SiFive7 scheduling model (llvm#143629)

In preparation of reusing SiFive7Model for sifive-x390, which has a VLEN
of 1024, it's better (and less chaotic) to factor out the VLEN parameter
from various of places first: the plan is to do a major overhaul on this
file in which all the WriteRes are going to be encapsulated in a big
multiclass, where VLEN is one of its template arguments. Such that we
can instantiate different scheduling models with different VLEN.

Before that happens, a placeholder defvar SiFive7VLEN is used instead
in this patch.

NFC.

Co-authored-by: Michael Maitland [email protected]

Revert "[SelectionDAG] Make (a & x) | (~a & y) -> (a & (x ^ y)) ^ y available for all targets" (llvm#143648)

[NFC] get rid of undef in avx512vl-intrinsics.ll test (llvm#143641)

[AMDGPU][True16] remove AsmVOP3OpSel (llvm#143465)

This is NFC. Clean up the AsmVOP3OpSel field, and use Vop3Base instead.

[flang][runtime] Fix build bot flang-runtime-cuda-gcc errors (llvm#143650)

Adjust default parent class accessibility to attemp to work around what
appear to be old GCC's interpretation.

[RISCV][NFC] Improve test coverage for xtheadcondmov and xmipscmov (llvm#143567)

Co-authored-by: Harsh Chandel [email protected]

[flang][cuda] Add option to disable warp function in semantic (llvm#143640)

These functions are not available in some lower compute capabilities.
Add option in the language feature to enforce the semantic check on
these.

[RISCV] Select signed bitfield insert for XAndesPerf (llvm#143356)

This patch is similar to llvm#142737

The XAndesPerf extension includes signed bitfield extraction
instruction `NDS.BFOS, which can extract the bits from 0 to Len - 1,
place them starting at bit Lsb, zero-filled the bits from 0 to Lsb -1,
and sign-extend the result.

When Lsb == Msb, it is a special case where the Lsb will be set to 0
instead of being equal to the Msb.

[Clang][NFC] Move UntypedParameters instead of copy (llvm#143646)

Static analysis flagged that UntypedParameters could be moved instead of
copied. This would avoid copying a large object.

[libc++] Add missing C++20 [time.point.arithmetic] (llvm#143165)

This was part of https://wg21.link/p0355r7, but apparently never
implemented.


Co-authored-by: MarcoFalke *~=`'#}+{/-|&$^[email protected]
Co-authored-by: Hristo Hristov [email protected]

[X86] Add test coverage showing failure to merge "zero input passthrough" behaviour for BSR instructions on x86_64 targets

[X86] combineConcatVectorOps - ensure we're only concatenating v2f64 generic shuffles into vXf64 vshufpd

Identified while triaging llvm#143606 - we can't concat v4f64 lhs/rhs subvecs and then expect the v2f64 operands to be in the correct place for VSHUFPD

Test coverage will follow

[test][AArch64] Adjust vector insertion lit tests (llvm#143101)

The test cases test_insert_v16i8_insert_2_undef_base and
test_insert_v16i8_insert_2_undef_base_different_valeus in
CodeGen/AArch64/arm64-vector-insertion.ll was leaving element 8 in the
vector as "undef" without any real explanation. It kind of looked like a
typo as the input IR looked like this
%v.8 = insertelement <16 x i8> %v.7, i8 %a, i32 8
%v.10 = insertelement <16 x i8> %v.7, i8 %a, i32 10
leaving %v.8 as unused.

This patch is cleaning up the tests a bit by adding separate test cases
to validate what is happening when skipping insert at index 8, while
amending the original tests cases to use %v.8 instead of %v.7 when
creating %v.10.

[BOLT][AArch64] Fix adr-relaxation.s test (llvm#143151)

On some AArch64 machines the splitting was inconsistent.
This causes cold foo to have a mov instruction before adrp.

<foo.cold.0>:
  mov     x0, #0x0                // =0
  adrp    x1, 0x600000 <_start>
  add     x1, x1, #0x14
  ret

This patch removes the mov instruction right above .L2, making
splitting deterministic.

[CI] Add mention of LLVM Developer Policy in email-check message (NFC) (llvm#143300)

As for now, It may be hard for people to get truth from long Discourse
discussion, so a link to official document may be enough to convince
changing email from private to public.

[X86] Add test coverage showing failure to merge "zero input passthrough" behaviour for BSF instructions on x86_64 targets

[X86] add test coverage for llvm#143606

[X86] bmi-select-distrib.ll - remove unused check prefixes and pull out PR comments above tests. NFC

Revert "[AArch64][GlobalISel] Expand 64bit extracts to 128bit to allow more patterns (llvm#142904)"

This reverts commit 61cdba6 due to verifier
issues.

[coro][NFC] Move switch basic block to beginning of coroutine (llvm#143626)

This makes the code flow when reading the LLVM IR of a split coroutine a
bit more natural. It does not change anything from an end-user
perspective but makes debugging the CoroSplit pass slightly easier.

Reland "[SelectionDAG] Make (a & x) | (~a & y) -> (a & (x ^ y)) ^ y available for all targets" (llvm#143651)

[flang] Enable delayed localization by default for do concurrent (llvm#142567)

This PR aims to make it easier and more self-contained to revert the
switch/flag if we discover any problems with enabling it by default.

[OpenMP 6.0 ]Codegen for Reduction over private variables with reduction clause (llvm#134709)

Codegen support for reduction over private variable with reduction
clause. Section 7.6.10 in in OpenMP 6.0 spec.

  • An internal shared copy is initialized with an initializer value.
  • The shared copy is updated by combining its value with the values from
    the private copies created by the clause.
  • Once an encountering thread verifies that all updates are complete,
    its original list item is updated by merging its value with that of the
    shared copy and then broadcast to all threads.

Sample Test Case from OpenMP 6.0 Example

#include <assert.h>
#include <omp.h>
#define N 10

void do_red(int n, int *v, int &sum_v)
{
    sum_v = 0; // sum_v is private
    #pragma omp for reduction(original(private),+: sum_v)
    for (int i = 0; i < n; i++) 
    {
        sum_v += v[i];
    }
}

int main(void)
{
    int v[N];
    for (int i = 0; i < N; i++)
        v[i] = i;
    #pragma omp parallel num_threads(4)
    {
        int s_v; // s_v is private
        do_red(N, v, s_v);
        assert(s_v == 45);
    }
    return 0;
}

Expected Codegen:

 // A shared global/static variable is introduced for the reduction result.
 // This variable is initialized (e.g., using memset or a UDR initializer)
 // e.g., .omp.reduction.internal_private_var

 // Barrier before any thread performs combination
  call void @__kmpc_barrier(...)

 // Initialization block (executed by thread 0)
 // e.g., call void @llvm.memset.p0.i64(...) or call @udr_initializer(...)

  call void @__kmpc_critical(...)
    // Inside critical section:
    // Load the current value from the shared variable
    // Load the thread-local private variable's value
    // Perform the reduction operation 
    // Store the result back to the shared variable

  call void @__kmpc_end_critical(...)
  // Barrier after all threads complete their combinations

  call void @__kmpc_barrier(...)
 // Broadcast phase:
 // Load the final result from the shared variable)
 // Store the final result to the original private variable in each thread
 // Final barrier after broadcast

  call void @__kmpc_barrier(...)

Co-authored-by: Chandra Ghale [email protected]

[flang][OpenMP] Map basic local specifiers to private clauses (llvm#142735)

Starts the effort to map do concurrent locality specifiers to OpenMP
clauses. This PR adds support for basic specifiers (no init or copy
regions yet).

[MemCpyOpt] handle memcpy from memset in more cases (llvm#140954)

This aims to reduce the divergence between the initial checks in this
function and processMemCpyMemCpyDependence (in particular, adding
handling of offsets), with the goal to eventually reduce duplication
there and improve this pass in other ways.

[AArch64][Clang] Update new Neon vector element types. (llvm#142760)

This updates the element types used in the new __Int8x8_t types added in
llvm#126945, mostly to allow C++ name mangling in ItaniumMangling
mangleAArch64VectorBase to work correctly. Char is replaced by
SignedCharTy or UnsignedCharTy as required and Float16Ty is better using
HalfTy to match the vector types. Same for Long types.

[mlir][async][nfc] Fix typo in async op description (llvm#143621)

[flang][Driver] Enable support for -mmacos-version-min= (llvm#143508)

So far as I can tell this option is driver-only so we can just re-use
what already exists for clang. I've added a unit test based on clang's
unit test to demonstrate that the option is handled.

Still TODO is to ensure that flang-rt is built with the same macos
minimum version as compiler-rt. At the moment, setting the flang minimum
version to older than the macos version on which flang was built will
lead to link warnings because flangrt is built for version of macos on
which flang was built rather than the oldest supported version (as
compiler-rt is).

[C++20][Modules] Fix false compilation error with constexpr (llvm#143168)

Use declaresSameEntity when evaluating constexpr to avoid resetting
computed union value due to using different instances of the merged
field decl.

[libunwind] Remove checks for -nostdlib++ (llvm#143162)

libunwind uses a C linker, so it's never even trying to link against any
C++ libraries. This removes the code which tries to drop C++ libraries,
which makes the CMake configuration simpler and allows for upgrading
GCC.

[LLVM][SROA] Teach SROA how to "bitcast" between fixed and scalable vectors. (llvm#130973)

For function whose vscale_range is limited to a single value we can size
scalable vectors. This aids SROA by allowing scalable vector load and
store operations to be considered for replacement whereby bitcasts
through memory can be replaced by vector insert or extract operations.

LLVM Buildbot failure on openmp runtime test (llvm#143674)

Error looks to be missing includes for complex number support in some
system. Removing test for now.
Relevant PR :
PR-134709

 .---command stderr------------
# | /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.src/openmp/runtime/test/worksharing/for/omp_for_private_reduction.cpp:78:42: error: use of undeclared identifier 'I'
# |    78 |   double _Complex expected = 0.0 + 0.0 * I;
# |       |                                          ^
# | /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.src/openmp/runtime/test/worksharing/for/omp_for_private_reduction.cpp:79:40: error: use of undeclared identifier 'I'
# |    79 |   double _Complex result = 0.0 + 0.0 * I;
# |       |                                        ^
# | /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.src/openmp/runtime/test/worksharing/for/omp_for_private_reduction.cpp:84:22: error: use of undeclared identifier 'I'
# |    84 |     arr[i] = i - i * I;
# |       |                      ^
# | /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.src/openmp/runtime/test/worksharing/for/omp_for_private_reduction.cpp:92:19: error: use of undeclared identifier 'creal'
# |    92 |       real_sum += creal(arr[i]);
# |       |                   ^~~~~
# | /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.src/openmp/runtime/test/worksharing/for/omp_for_private_reduction.cpp:93:19: error: use of undeclared identifier 'cimag'
# |    93 |       imag_sum += cimag(arr[i]);
# |       |                   ^~~~~
# | /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.src/openmp/runtime/test/worksharing/for/omp_for_private_reduction.cpp:96:36: error: use of undeclared identifier 'I'
# |    96 |     result = real_sum + imag_sum * I;
# |       |                                    ^
# | /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.src/openmp/runtime/test/worksharing/for/omp_for_private_reduction.cpp:97:9: error: use of undeclared identifier 'cabs'
# |    97 |     if (cabs(result - expected) > 1e-6) {
# |       |         ^~~~
# | 7 errors generated.

Co-authored-by: Chandra Ghale [email protected]

[DebugInfo][RemoveDIs] Remove scoped-dbg-format-setter (llvm#143450)

This was a utility for flipping between intrinsic and debug record mode
-- we don't need it any more. The "IsNewDbgInfoFormat" should be true
everywhere.

[AArch64] Consider negated powers of 2 when calculating throughput cost (llvm#143013)

Negated powers of 2 have similar or (exact in the case of remainder)
codegen with lowering sdiv. In the case of sdiv, it just negates the
result in the end anyway, so nothing dissimilar at all.

[clang][AArch64] test -cc1 -print-enabled-extensions (llvm#143570)

This adds tests that document how -cc1 and -print-enabled-extensions
interact. The current behaviour looks wrong, and is caused by the fact
that --print-enabled-extensions uses the MC subtarget feature API to
determine the list of extensions to print, whereas the frontend uses the
TargetParser API. The latter does no dependency expansion for the
-target-feature flags but the MC API does.

This doesn't fix anything but at least it documents the current
behaviour, and will serve as a pre-commit test for any future fixes.

[ConstantFolding] Fold sqrt poison -> poison (llvm#141821)

I noticed this when a sqrt produced by VectorCombine with a poison
operand wasn't getting folded away to poison.

Most intrinsics in general could probably be folded to poison if one of
their arguments are poison too. Are there any exceptions to this we need
to be aware of?

[doc] Use ISO nomenclature for 1024 byte units (llvm#133148)

Increase specificity by using the correct unit sizes. KBytes is an
abbreviation for kB, 1000 bytes, and the hardware industry as well as
several operating systems have now switched to using 1000 byte kBs.

If this change is acceptable, sometimes GitHub mangles merges to use the
original email of the account. $dayjob asks contributions have my work
email. Thanks!

[mlir][vector] Fix attaching write effects on transfer_write's base (llvm#142940)

This fixes an issue with TransferWriteOp's implementation of the
MemoryEffectOpInterface where the write effect was attached to the
stored value rather than the base.

This had the effect that when asking for the memory effects for the
input memref buffer using getEffectsOnValue(...), the function would
return no-effects (as the effect would have been attached to the stored
value rather than the input buffer).

[flang][OpenMP] Extend locality spec to OMP claues (init and dealloc regions) (llvm#142795)

Extends support for locality specifier to OpenMP translation by adding
supprot for transling localizers that have init and dealloc regions.

[debuginfo][coro] Fix linkage name for clones of coro functions (llvm#141889)

So far, the DW_AT_linkage_name of the coroutine resume, destroy,
cleanup and noalloc function clones were incorrectly set to the
original function name instead of the updated function names.

With this commit, we now update the DW_AT_linkage_name to the correct
name. This has multiple benefits:

  1. it's easier for me (and other toolchain developers) to understand the
    output of llvm-dwarf-dump when coroutines are involved.
  2. When hitting a breakpoint, both LLDB and GDB now tell you which clone
    of the function you are in. E.g., GDB now prints "Breakpoint 1.2,
    coro_func(int) [clone .resume] (v=43) at ..." instead of "Breakpoint
    1.2, coro_func(int) (v=43) at ...".
  3. GDB's info line coro_func command now allows you to distinguish the
    multiple different clones of the function.

In Swift, the linkage names of the clones were already updated. The
comment right above the relevant code in CoroSplit.cpp already hinted
that the linkage name should probably also be updated in C++. This
comment was added in commit 6ce76ff, and back then the
corresponding DW_AT_specification (i.e., SP->getDeclaration()) was
not updated, yet, which led to problems for C++. In the meantime, commit
ca1a5b3 added code to also update SP->getDeclaration, as such
there is no reason anymore to not update the linkage name for C++.

Note that most test cases used inconsistent function names for the LLVM
function vs. the DISubprogram linkage name. clang would never emit such
LLVM IR. This confused me initially, and hence I fixed it while updating
the test case.

Drive-by fix: The change in CGVTables.cpp is purely stylistic, NFC.
When looking for other usages of replaceWithDistinct, I got initially
confused because CGVTables.cpp was calling a static function via an
object instance.

MSP430: Add tests for fcmp (llvm#142706)

The existing coverage is thin. libcalls.ll seems to be the main fcmp
test, and it doesn't cover all the condition types, and runs with -O0.

Test all conditions for f32 and f64

[RISCV][FPEnv] Lowering of fpenv intrinsics (llvm#141498)

The change implements custom lowering of get_fpenv, set_fpenv and
reset_fpenv for RISCV target.

[lldb] Show coro_frame in std::coroutine_handle pretty printer (llvm#141516)

This commit adjusts the pretty printer for std::coroutine_handle based
on recent personal experiences with debugging C++20 coroutines:

  1. It adds the coro_frame member. This member exposes the complete
    coroutine frame contents, including the suspension point id and all
    internal variables which the compiler decided to persist into the
    coroutine frame. While this data is highly compiler-specific, inspecting
    it can help identify the internal state of suspended coroutines.
  2. It includes the promise and coro_frame members, even if
    devirtualization failed and we could not infer the promise type / the
    coro_frame type. Having them available as void* pointers can still be
    useful to identify, e.g., which two coroutine handles have the same
    frame / promise pointers.

MSP430: Stop using setCmpLibcallCC (llvm#142708)

This appears to only be useful for the eq/ne cases, and only for
ARM libcalls. This is setting it to the default values, and there's
no change in the new fcmp test output.

MSP430: Partially move runtime libcall config out of TargetLowering (llvm#142709)

RuntimeLibcalls needs to be correct outside of codegen contexts.

[HLSL][SPIR-V] Change SPV AS map for groupshared (llvm#143519)

The previous mapping we setting the hlsl_groupshared AS to 0, which
translated to either Generic or Function.
Changing this to 3, which translated to Workgroup.

Related to llvm#142804

[HLSL][SPIR-V] Handle SV_Position builtin in PS (llvm#141759)

This commit is using the same mechanism as vk::ext_builtin_input to
implement the SV_Position semantic input.
The HLSL signature is not yet ready for DXIL, hence this commit only
implements the SPIR-V side.

This is incomplete as it doesn't allow the semantic on hull/domain and
other shaders, but it's a first step to validate the overall
input/output
semantic logic.

Fixes llvm#136969

[libc++] Fix constraints in __countr_zero and __popcount

Currently these two functions are constrained on is_unsigned, which is
more permissive than what is required by the standard for their public
counterparts. This fixes the constraints to match the public functions
by using __libcpp_is_unsigned_integer instead.

[libc++] Refactor signed/unsigned integer traits (llvm#142750)

This patch does a few things:

  • __libcpp_is_signed_integer and __libcpp_is_unsigned_integer are
    refactored to be variable templates instead of class templates.
  • the two traits are merged into a single header
    <__type_traits/integer_traits.h>.
  • __libcpp_signed_integer, __libcpp_unsigned_integer and
    __libcpp_integer are moved into the same header.
  • The above mentioned concepts are renamed to __signed_integer,
    __unsigned_integer and __signed_or_unsigned_integer respectively.

[libc++][NFC] Move __libcpp_is_integral into the else branch (llvm#142556)

This makes it clear that __libcpp_is_integral is an implementation
detail of is_integral if we don't have __is_integral and not its own
utility.

[gn build] Port 3c56437

[DebugInfo][RemoveDIs] Use autoupgrader to convert old debug-info (llvm#143452)

By chance, two things have prevented the autoupgrade path being
exercised much so far:

  • LLParser setting the debug-info mode to "old" on seeing intrinsics,
  • The test in AutoUpgrade.cpp wanting to upgrade into a "new" debug-info
    block.

In practice, this appears to mean this code path hasn't seen the various
invalid inputs that can come its way. This commit does a number of
things:

  • Tolerates the various illegal inputs that can be written with
    debug-intrinsics, and that must be tolerated until the Verifier runs,
  • Printing illegal/null DbgRecord fields must succeed,
  • Verifier errors need to localise the function/block where the error
    is,
  • Tests that now see debug records will print debug-record errors,

Plus a few new tests for other intrinsic-to-debug-record failures modes
I found. There are also two edge cases:

  • Some of the unit tests switch back and forth between intrinsic and
    record modes at will; I've deleted coverage and some assertions to
    tolerate this as intrinsic support is now Gone (TM),
  • In sroa-extract-bits.ll, the order of debug records flips. This is
    because the autoupgrader upgrades in the opposite order to the basic
    block conversion routines... which doesn't change the record order, but
    does change the use list order in Metadata! This should (TM) have no
    consequence to the correctness of LLVM, but will change the order of
    various records and the order of DWARF record output too.

I tried to reduce this patch to a smaller collection of changes, but
they're all intertwined, sorry.

[mlir][spirv] Add lowering of multiple math trig/hypb functions (llvm#143604)

Add Math to SPIRV lowering for tan, asin, acos, sinh, cosh, asinh, acosh
and atanh. This completes the lowering of all trigonometric and
hyperbolic functions from math to SPIRV.

[flang][OpenMP] Consider previous DSA for static duration variables (llvm#143601)

Symbols that have a pre-existing DSA set in the enclosing context should
not be made shared based on them being static duration variables.

Suggested-by: Leandro Lupori [email protected]


Signed-off-by: Kajetan Puchalski [email protected]

[flang][runtime] Another try to fix build failure (llvm#143702)

Tweak accessibility to try to get code past whatever gcc is being used
by the flang-runtime-cuda-gcc build bot.

[mlir][spirv] Include SPIRV_AnyImage in SPIRV_Type (llvm#143676)

This change is trigger by encountering the following error:

<unknown>:0: error: 'spirv.Load' op result #0 must be void
or bool or 8/16/32/64-bit integer or 16/32/64-bit float or
vector of bool or 8/16/32/64-bit integer or 16/32/64-bit
float values of length 2/3/4/8/16 or any SPIR-V pointer type
or any SPIR-V array type or any SPIR-V run time array type
or any SPIR-V struct type or any SPIR-V cooperative matrix
type or any SPIR-V matrix type or any SPIR-V sampled image
type, but got '!spirv.image<f32, Dim2D, NoDepth, NonArrayed,
SingleSampled, NoSampler, Rgba8>'<unknown>:0: note: see current
operation:
%126 = "spirv.Load"(%125) {relaxed_precision} : (!spirv.ptr<!spirv.image<f32, Dim2D, NoDepth, NonArrayed, SingleSampled, NoSampler, Rgba8>, UniformConstant>) -> !spirv.image<f32, Dim2D, NoDepth, NonArrayed, SingleSampled, NoSampler, Rgba8>

[Clang] default-movable should be based on the first declaration (llvm#143661)

When the definition of a special member function was defaulted we would
not consider it user-provided, even when the first declaration was not
defaulted.

Fixes llvm#143599

[DebugInfo][RemoveDIs] Remove some debug intrinsic-only codepaths (llvm#143451)

These are opportunistic deletions as more places that make use of the
IsNewDbgInfoFormat flag are removed. It should (TM)(R) all be dead code
now that IsNewDbgInfoFormat should be true everywhere.

FastISel: we don't need to do debug-aware instruction counting any more,
because there are no debug instructions,
Autoupgrade: you can no-longer avoid autoupgrading of intrinsics to
records
DIBuilder: Delete the code for creating debug intrinsics (!)
LoopUtils: No need to handle debug instructions, they don't exist

[flang] Add David Truby as maintainer for Flang on Windows (llvm#142619)

Revert "[DebugInfo][RemoveDIs] Remove some debug intrinsic-only codepaths (llvm#143451)"

This reverts commit c71a2e6.

/me squints -- this is hitting an assertion I thought had been deleted,
will revert and investigate for a bit.

[mlir][spirv] Truncate Literal String size at max number words (llvm#142916)

If not truncated the SPIRV serialization would not fail but instead
produce an invalid SPIR-V module.


Signed-off-by: Davide Grohmann [email protected]

[X86][BreakFalseDeps] Using reverse order for undef register selection (llvm#137569)

BreakFalseDeps picks the best register for undef operands if
instructions have false dependency. The problem is if the instruction is
close to the beginning of the function, ReachingDefAnalysis is over
optimism to the unused registers, which results in collision with
registers just defined in the caller.

This patch changes the selection of undef register in an reverse order,
which reduces the probability of register collisions between caller and
callee. It brings improvement in some of our internal benchmarks with
negligible effect on other benchmarks.

[AArch64] Expand llvm.histogram intrinsic to support umax, umin, and uadd.sat operations (llvm#138447)

This patch extends the llvm.histogram intrinsic to support additional
update operations beyond the existing add. Specifically, the new
supported operations are:

  • umax: unsigned maximum

  • umin: unsigned minimum

  • uadd.sat: unsigned saturated addition

Based on the discussion from:

https://discourse.llvm.org/t/rfc-expanding-the-experimental-histogram-intrinsic/84673

[flang][acc] Ensure all acc.loop get a default parallelism determination mode (llvm#143623)

This PR updates the flang lowering to explicitly implement the OpenACC
rules:

  • As per OpenACC 3.3 standard section 2.9.6 independent clause: A loop
    construct with no auto or seq clause is treated as if it has the
    independent clause when it is an orphaned loop construct or its parent
    compute construct is a parallel construct.
  • As per OpenACC 3.3 standard section 2.9.7 auto clause: When the parent
    compute construct is a kernels construct, a loop construct with no
    independent or seq clause is treated as if it has the auto clause.
  • Loops in serial regions are seq if they have no other parallelism
    marking such as gang, worker, vector.

For now the acc.loop verifier has not yet been updated to enforce
this.

[HLSL][Driver] Make vk1.3 the default. (llvm#143384)

The HLSL driver currently defaults the triple to an unversioned os and
subarch when targeting SPIR-V. This means the SPIR-V backend decides the
default value. That is not a great option because a change the backend
could cause a change in Clang.

Now that we want to choose the default we need to consider the best
option. DXC currently defaults to Vulkan1.0. We are planning on not
supporting Vulkan1.0 in the Clang HLSL compiler because it is newer
versions of Vulkan are commonly supported on nearly all hardware, so
users do not use it.

Since we have to change from DXC anyway, we are using VK1.3. It has been
out long enough to be commonly available, and the initial implementation
of SPIR-V features for HLSL are assuming Vulkan 1.3.


Co-authored-by: Nathan Gauër [email protected]

[BasicAA][ValueTracking] Use MaxLookupSearchDepth constant (NFC)

Use MaxLookupSearchDepth in all places limiting an underlying
object walk, instead of hardcoding 6 in various places.

Revert runtime work queue patch, it breaks some tests that need investigation (llvm#143713)

Revert "[flang][runtime] Another try to fix build failure"

This reverts commit 13869ca.

Revert "[flang][runtime] Fix build bot flang-runtime-cuda-gcc errors
(llvm#143650)"

This reverts commit d75e284.

Revert "[flang][runtime] Replace recursion with iterative work queue
(llvm#137727)"

This reverts commit 163c67a.

[mlir][spirv] Add definition for GL Exp2 (llvm#143678)

[Clang][ByteCode][NFC] Move APInt into pushInteger since it is being passed by value (llvm#143578)

Static analysis flagged that we could move APInt instead of copy, indeed
it has a move constructor and so we should move into values for APInt.

[flang][OpenMP] Overhaul implementation of ATOMIC construct (llvm#137852)

The parser will accept a wide variety of illegal attempts at forming an
ATOMIC construct, leaving it to the semantic analysis to diagnose any
issues. This consolidates the analysis into one place and allows us to
produce more informative diagnostics.

The parser's outcome will be parser::OpenMPAtomicConstruct object
holding the directive, parser::Body, and an optional end-directive. The
prior variety of OmpAtomicXyz classes, as well as OmpAtomicClause have
been removed. READ, WRITE, etc. are now proper clauses.

The semantic analysis consistently operates on "evaluation"
representations, mainly evaluate::Expr (as SomeExpr) and
evaluate::Assignment. The results of the semantic analysis are stored in
a mutable member of the OpenMPAtomicConstruct node. This follows a
precedent of having typedExpr member in parser::Expr, for example.
This allows the lowering code to avoid duplicated handling of AST nodes.

Using a BLOCK construct containing multiple statements for an ATOMIC
construct that requires multiple statements is now allowed. In fact, any
nesting of such BLOCK constructs is allowed.

This implementation will parse, and perform semantic checks for both
conditional-update and conditional-update-capture, although no MLIR will
be generated for those. Instead, a TODO error will be issues prior to
lowering.

The allowed forms of the ATOMIC construct were based on the OpenMP 6.0
spec.

[flang][Driver] Guard check for pic/pie settings without driver flags (llvm#143530)

The default relocation model for clang depends on the cmake flag
CLANG_DEFAULT_PIE_ON_LINUX. By default it is set to ON, but when it's
OFF, the default relocation model will be "static".
The outcome of the test running clang without any PIC/PIE flags will
depend on the cmake flag, so make sure it only runs when the flag is ON.

[PowerPC][AIX] xfail atan-intrinsic to unblock bot (llvm#143723)

Testcase from llvm#143416 is
causing the AIX bot to be red. XFAIL for now till issue can be resolved.

[LTO] Fix used before intialised warning (llvm#143705)

For whatever reason I can't reproduce this locally but I can on Compiler
Explorer (https://godbolt.org/z/nfv4b83q6) and on our flang gcc bot
(https://lab.llvm.org/buildbot/#/builders/130/builds/13683/steps/5/logs/stdio).

In file included from ../llvm-project/llvm/include/llvm/LTO/LTO.h:33,
from
../llvm-project/clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp:29:
../llvm-project/llvm/include/llvm/Transforms/IPO/FunctionImport.h: In
constructor ‘llvm::FunctionImporter::ImportListsTy::ImportListsTy()’:
../llvm-project/llvm/include/llvm/Transforms/IPO/FunctionImport.h:275:33:
warning: member ‘llvm::FunctionImporter::ImportListsTy::ImportIDs’ is
used uninitialized [-Wuninitialized]
275 | ImportListsTy() : EmptyList(ImportIDs) {}
| ^~~~~~~~~
../llvm-project/llvm/include/llvm/Transforms/IPO/FunctionImport.h: In
constructor
‘llvm::FunctionImporter::ImportListsTy::ImportListsTy(size_t)’:

../llvm-project/llvm/include/llvm/Transforms/IPO/FunctionImport.h:276:44:
warning: member ‘llvm::FunctionImporter::ImportListsTy::ImportIDs’ is
used uninitialized [-Wuninitialized]
276 | ImportListsTy(size_t Size) : EmptyList(ImportIDs), ListsImpl(Size)
{}
| ^~~~~~~~~

ImportIDs was being used during construction of EmptyList, before
ImportIDs itself had been constructed.

[flang] Fix warnings

This patch fixes:

flang/lib/Lower/OpenMP/OpenMP.cpp:3904:9: error: unused variable
'action0' [-Werror,-Wunused-variable]

flang/lib/Lower/OpenMP/OpenMP.cpp:3905:9: error: unused variable
'action1' [-Werror,-Wunused-variable]

[NFC][PowerPC] Rename xxevalPattern to adhere to naming convention. (llvm#143675)

Rename class xxevalPattern to adhere to naming convention listed in
the coding guideline and used for all other classes in the td file.

[libc++] Make forward_list constexpr as part of P3372R3 (llvm#129435)

Fixes llvm#128658

[llvm] annotate interfaces in llvm/TargetParser for DLL export (llvm#143616)

Purpose

This patch is one in a series of code-mods that annotate LLVM’s public
interface for export. This patch annotates the llvm/TargetParser
library. These annotations currently have no meaningful impact on the
LLVM build; however, they are a prerequisite to support an LLVM Windows
DLL (shared library) build.

Background

This effort is tracked in llvm#109483. Additional context is provided in
this
discourse
,
and documentation for LLVM_ABI and related annotations is found in the
LLVM repo
here.

Most of these changes were generated automatically using the Interface
Definition Scanner (IDS)
tool,
followed formatting with git clang-format.

Additionally, I manually removed the redundant declaration of
getCanonicalArchName from
llvm/include/llvm/TargetParser/ARMTargetParser.h because IDS only
auto-annotates the first declaration it encounters, and the second
un-annotated declaration results in an MSVC warning.

Validation

Local builds and tests to validate cross-platform compatibility. This
included llvm, clang, and lldb on the following configurations:

  • Windows with MSVC
  • Windows with Clang
  • Linux with GCC
  • Linux with Clang
  • Darwin with Clang

[llvm] annotate interfaces in llvm/SandboxIR for DLL export (llvm#142863)

Purpose

This patch is one in a series of code-mods that annotate LLVM’s public
interface for export. This patch annotates the llvm/SandboxIR library.
These annotations currently have no meaningful impact on the LLVM build;
however, they are a prerequisite to support an LLVM Windows DLL (shared
library) build.

Background

This effort is tracked in llvm#109483. Additional context is provided in
this
discourse
,
and documentation for LLVM_ABI and related annotations is found in the
LLVM repo
here.

The bulk of these changes were generated automatically using the
Interface Definition Scanner (IDS)
tool, followed formatting with git clang-format.

The following manual adjustments were also applied after running IDS on
Linux:

  • Remove explicit GlobalWithNodeAPI::LLVMGVToGV::operator() template
    function instantiations that were previously added for the dylib build.
    Instead, directly annotate the LLVMGVToGV::operator() method with
    LLVM_ABI. This is done so the DLL build works with both MSVC and
    clang-cl.
  • Explicitly #include "llvm/SandboxIR/Value.h" in Tracker.h so that
    the symbol is available for exported templates in this file. These
    templates get fully instantiated on DLL export, so they require the full
    definition of Value.
  • Add extern template instantiation declarations for GlobalWithNodeAPI
    template types in Constants.h and annotate them with
    LLVM_TEMPLATE_ABI.
  • Add LLVM_EXPORT_TEMPLATE to GlobalWithNodeAPI template
    instantiations in Constants.cpp.

Validation

Local builds and tests to validate cross-platform compatibility. This
included llvm, clang, and lldb on the following configurations:

  • Windows with MSVC
  • Windows with Clang
  • Linux with GCC
  • Linux with Clang
  • Darwin with Clang

[llvm] annotate interfaces in llvm/TextAPI for DLL export (llvm#143447)

Purpose

This patch is one in a series of code-mods that annotate LLVM’s public
interface for export. This patch annotates the llvm/TextAPI library.
These annotations currently have no meaningful impact on the LLVM build;
however, they are a prerequisite to support an LLVM Windows DLL (shared
library) build.

Background

This effort is tracked in llvm#109483. Additional context is provided in
this
discourse
,
and documentation for LLVM_ABI and related annotations is found in the
LLVM repo
here.

These changes were generated automatically using the Interface
Definition Scanner (IDS)
tool,
followed formatting with git clang-format.

Validation

Local builds and tests to validate cross-platform compatibility. This
included llvm, clang, and lldb on the following configurations:

  • Windows with MSVC
  • Windows with Clang
  • Linux with GCC
  • Linux with Clang
  • Darwin with Clang

[TableGen] Simplify computeUberWeights. NFC. (llvm#143716)

Using RegUnitIterator made the code more complicated than having two
nested loops over each register and each register's regunits.

[CIR] Upstream minimal builtin function call support (llvm#142981)

This patch adds all bits required to implement builtin function calls to
ClangIR. It doesn't actually implement any of the builtins except those
that fold to a constant ahead of CodeGen
(__builtin_is_constant_evaluated() being one example).

[clang][analyzer] Correct SMT Layer for _BitInt cases refutations (llvm#143310)

Since _BitInt was added later, ASTContext did not comprehend getting a
type by bitwidth that's not a power of 2, and the SMT layer also did not
comprehend this. This led to unexpected crashes using Z3 refutation
during randomized testing. The assertion and redacted and summarized
crash stack is shown here.

clang:
../../clang/include/clang/StaticAnalyzer/Core/PathSensitive/SMTConv.h:103:
static llvm::SMTExprRef
clang::ento::SMTConv::fromBinOp(llvm::SMTSolverRef &,
const llvm::SMTExprRef &, const BinaryOperator::Opcode, const
llvm::SMTExprRef &, bool):
Assertion `*Solver->getSort(LHS) == *Solver->getSort(RHS) && "AST's must
have the same sort!"' failed.
...

clang::ento::SMTConv::fromBinOp(std::shared_ptr&, llvm::SMTExpr const* const&, clang::BinaryOperatorKind, llvm::SMTExpr const* const&, bool) SMTConstraintManager.cpp clang::ASTContext&, llvm::SMTExpr const* const&, clang::QualType, clang::BinaryOperatorKind, llvm::SMTExpr const* const&, clang::QualType, clang::QualType*) SMTConstraintManager.cpp clang::ASTContext&, clang::ento::SymExpr const*, llvm::APSInt const&, llvm::APSInt const&, bool) SMTConstraintManager.cpp clang::ento::ExplodedNode const*, clang::ento::PathSensitiveBugReport&)

Co-authored-by: Vince Bridgers [email protected]

[MLIR][Transform] apply_registered_pass op's options as a dict (llvm#143159)

Improve ApplyRegisteredPassOp's support for taking options by taking
them as a dict (vs a list of string-valued key-value pairs).

Values of options are provided as either static attributes or as params
(which pass in attributes at interpreter runtime). In either case, the
keys and value attributes are converted to strings and a single
options-string, in the format used on the commandline, is constructed to
pass to the addToPipeline-pass API.

Reapply 76197ea after removing an assertion

Specifically this is the assertion in BasicBlock.cpp. Now that we're not
examining or setting that flag consistently (because it'll be deleted in
about an hour) there's no need to keep this assertion.

Original commit title:

[DebugInfo][RemoveDIs] Remove some debug intrinsic-only codepaths (llvm#143451)

[libc][NFC] Remove template from GPU allocator reference counter

Summary:
We don't need this to be generic, precommit for
llvm#143607

[DLCov][NFC] Annotate intentionally-blank DebugLocs in existing code (llvm#136192)

Following the work in PR llvm#107279, this patch applies the annotative
DebugLocs, which indicate that a particular instruction is intentionally
missing a location for a given reason, to existing sites in the compiler
where their conditions apply. This is NFC in ordinary LLVM builds (each
function DebugLoc::getFoo() is inlined as DebugLoc()), but marks the
instruction in coverage-tracking builds so that it will be ignored by
Debugify, allowing only real errors to be reported. From a developer
standpoint, it also communicates the intentionality and reason for a
missing DebugLoc.

Some notes for reviewers:

  • The difference between I->dropLocation() and
    I->setDebugLoc(DebugLoc::getDropped()) is that the former may decide
    to keep some debug info alive, while the latter will always be empty; in
    this patch, I always used the latter (even if the former could
    technically be correct), because the former could result in some
    (barely) different output, and I'd prefer to keep this patch purely NFC.
  • I've generally documented the uses of DebugLoc::getUnknown(), with
    the exception of the vectorizers - in summary, they are a huge cause of
    dropped source locations, and I don't have the time or the domain
    knowledge currently to solve that, so I've plastered it all over them as
    a form of "fixme".

[libc] Add NULL macro definitions to header files (llvm#142764)

By the C standard, <locale.h>, <stddef.h> <stdio.h>, <stdlib.h>,
<string.h>, <time.h>, and <wchar.h> require NULL to be defined.

[X86] Don't emit ENDBR for asm goto branch targets (llvm#143439)

Similarly to llvm#141562, which disabled BTI generation for ARM asm goto
branch targets, drop unnecessary ENDBRs from IsInlineAsmBrIndirectTarget
machine basic blocks.

[lldb][nfc] Factor out code checking if Variable is in scope (llvm#143572)

This is useful for checking whether a variable is in scope inside a
specific block.

[CIR] Upstream splat op for VectorType (llvm#139827)

This change adds support for splat op for VectorType

Issue llvm#136487

[flang] silence bogus error with BIND(C) variable in hermetic module (llvm#143737)

The global name semantic check was firing in a bogus way when BIND(C)
variables are in hermetic module.

Do not raise the error if one of the symbol with the conflicting global
name is an "hermetic variant" of the other.

Squelch an unused-function warning

After removing some debug-intrinsic creation code, this function is now
unused (and un-necessary)

[Clang][Tooling][NFC] Use move to avoid copies of large objects (llvm#143603)

Static analysis flagged these cases in which can use std::move and avoid
copies of large objects.

[IR] Fix warnings (llvm#143752)

This patch fixes:

llvm/lib/IR/DIBuilder.cpp:1072:18: error: unused function
'getDeclareIntrin' [-Werror,-Wunused-function]

llvm/include/llvm/IR/DIBuilder.h:51:15: error: private field
'DeclareFn' is not used [-Werror,-Wunused-private-field]

llvm/include/llvm/IR/DIBuilder.h:52:15: error: private field
'ValueFn' is not used [-Werror,-Wunused-private-field]

llvm/include/llvm/IR/DIBuilder.h:53:15: error: private field
'LabelFn' is not used [-Werror,-Wunused-private-field]

llvm/include/llvm/IR/DIBuilder.h:54:15: error: private field
'AssignFn' is not used [-Werror,-Wunused-private-field]

[GISelValueTracking] Add test case for G_PTRTOINT

While we can only reason about the index/address, the G_PTRTOINT
operations returns all representation bits, so we can't assume the
remaining ones are all zeroes. This behaviour was clarified as part of
the discussion in https://discourse.llvm.org/t/clarifiying-the-semantics-of-ptrtoint/83987/54.
The LangRef semantics of ptrtoint being a full representation bitcast
were documented in llvm#139349.

Prior to 77c8d21 we were incorrectly
assuming known zeroes beyond the index size even if the input was
completely unknown. This commit adds a test case for G_PTRTOINT which
was omitted from that change.

See llvm#139598

Reviewed By: arsenm

Pull Request: llvm#139608

[OpenMP][Offload] Update the Logic for Configuring Auto Zero-Copy (llvm#143638)

Summary:

Currently the Auto Zero-Copy is enabled by checking every initialized
device to ensure that no dGPU is attached to an APU. However, an APU is
designed to comprise a homogeneous set of GPUs, therefore, it should be
sufficient to check any device for configuring Auto Zero-Copy. In this
PR, it checks the first initialized device in the list.

The changes in this PR are to clearly reflect the design and logic of
enabling the feature for further improving the readibility.

[SPIRV] FIX print the symbolic operand for opcode for the operation OpSpecConstantOp (llvm#135756)

Current implementation outputs opcode is an immediate but spirv-tools
requires that the name of the operation without "Op" is needed for the
instruction OpSpecConstantOp
that is if the opcode is OpBitcast the instruction must be
%1 = OpSpecConstantOp %6 Bitcast %17
instead of
%1 = OpBitcast %6 124 %17

refer this commit for more
info


Co-authored-by: Dmitry Sidorov [email protected]
Co-authored-by: Ebin-McW [email protected]

[libc++] Upgrade to GCC 15 (llvm#138293)

[RISCV] Guard the alternative static chain register use on ILP32E/LP64E (llvm#142715)

Asserts the use of t3(x28) as the static chain register when branch control flow protection is enabled with ILP32E/LP64E, because such register is not present within the ABI.

[NFC][PowerPC] Pre-commit test case for exploitation of xxeval for the pattern ternary(A,X,or(B,C)) (llvm#143693)

Pre-commit test case for exploitation of xxeval for ternary operations
of the pattern ternary(A,X,or(B,C)).
Exploitation of xxeval to be added later.

Co-authored-by: Tony Varghese [email protected]

Update BUILD.bazel

Add missing dependency after llvm#142916.

[libc++] Simplify the implementation of __next_prime a bit (llvm#143512)

Make clang/test/Frontend/aarch64-print-enabled-extensions-cc1.c write output file to temp dir

[libc++] Remove static_assert from hash.cpp that fires unconditionall

[Clang][OpenMP] Fix mapping of arrays of structs with members with mappers (llvm#142511)

This builds upon llvm#101101 from @jyu2-git, which used compiler-generated
mappers when mapping an array-section of structs with members that have
user-defined default mappers.

Now we do the same when mapping arrays of structs.

[OpenACC][CIR] Add parallelism determ. to all acc.loops (llvm#143751)

PR llvm#143720 adds a requirement to the ACC dialect that every acc.loop
must have a seq, independent, or auto attribute for the 'default'
device_type. The standard has rules for how this can be intuited:

orphan/parallel/parallel loop: independent
kernels/kernels loop: auto
serial/serial loop: seq, unless there is a gang/worker/vector, at which
point it should be 'auto'.

This patch implements all of this rule as a 'cleanup' step on the IR
generation for combined/loop operations. Note that the test impact is
much less since I inadvertently have my 'operation' terminating curley
matching the end curley from 'attribute' instead of the front of the
line, so I've added sufficient tests to ensure I captured the above.

[bazel] Port fe7bf4b

[libc] Reduce direct use of errno in src/stdlib and src/__support tests. (llvm#143767)

  • Get rid of libc_errno assignments in str_to_* __support tests, since
    those API have been migrated to return error in a struct instead.
  • Migrate tests for atof and to strto* functions from <stdlib.h> and for
    strdup from <string.h> to use ErrnoCheckingTest harness.

[SystemZ][z/OS] Refactor AutoConvert.h to remove large MVS guard (llvm#143174)

This AutoConvert.h header frequently gets mislabeled as an unused
include because it is guarded by MVS internally and every usage is also
guarded. This refactors the change to remove this guard and instead make
these functions a noop on other non-z/OS platforms.

[acc] acc.loop verifier now requires parallelism determination flag (llvm#143720)

The OpenACC specification for acc loop describe that a loop's
parallelism determination mode is either auto, independent, or seq. The
rules are as follows.

  • As per OpenACC 3.3 standard section 2.9.6 independent clause: A loop
    construct with no auto or seq clause is treated as if it has the
    independent clause when it is an orphaned loop construct or its parent
    compute construct is a parallel construct.
  • As per OpenACC 3.3 standard section 2.9.7 auto clause: When the parent
    compute construct is a kernels construct, a loop construct with no
    independent or seq clause is treated as if it has the auto clause.
  • Additionally, loops marked with gang, worker, or vector are not
    guaranteed to be parallel. Specifically noted in 2.9.7 auto clause: If
    not, or if it is unable to make a determination, it must treat the auto
    clause as if it is a seq clause, and it must ignore any gang, worker, or
    vector clauses on the loop construct.

The verifier for acc.loop was updated to enforce this marking because
the context in which a loop appears is not trivially determined once IR
transformations begin. For example, orphaned loops are implicitly
independent, but after inlining into an acc.kernels region they
would be implicitly considered auto. Thus now the verifier requires
that a frontend specifically generates acc dialect with this marking
since it knows the context.

[NVPTX] Misc table-gen cleanup (NFC) (llvm#142877)

[VPlan] Always verify VPCanonicalIVPHIRecipe placement (NFC).

Loop regions are dissolved since dcef154, remove the
check for VerifyLate and corresponding TODO.

[SandboxVectorizer] Use llvm::find (NFC) (llvm#143724)

llvm::find allows us to pass a range.

[Format] Use llvm::min_element (NFC) (llvm#143725)

llvm::min_elements allows us to pass a range.

[lld] Use std::tie to implement comparison operators (NFC) (llvm#143726)

std::tie facilitates lexicographical comparisons through std::tuple's
built-in operator< and operator>.

[llvm] Use std::tie to implement operator< (NFC) (llvm#143728)

std::tie facilitates lexicographical comparisons through std::tuple's
built-in operator<.

[mlir] Simplify calls to *Map::{insert,try_emplace} (NFC) (llvm#143729)

This patch simplifies code by removing the values from
insert/try_emplace. Note that default values inserted by try_emplace
are immediately overrideen in all these cases.

[llvm] Add a tool to check mustache compliance against the public spec (llvm#142813)

This is a cli tool to that tests the conformance of LLVM's mustache
implementation against the public Mustache spec, hosted at
https://github.com/mustache/spec. This is a revised version of the
patches in llvm#111487.

Co-authored-by: Peter Chou [email protected]

[SelectionDAG] Add ISD::VSELECT to SelectionDAG::canCreateUndefOrPoison. (llvm#143760)

[LV] Use GeneratedRTChecks to check if safety checks were added (NFC).

Directly check via GeneratedRTChecks if any checks have been added,
instead of needing to go through ILV. This simplifies the code and
enables further refactoring in follow-up patches.

[bazel] port 5dafe9d

[libc] Character converter skeleton class (llvm#143619)

Made CharacterConverter class skeleton

[lldb][RPC] Upstream LLDB to RPC converstion Python script (llvm#138028)

As part of upstreaming LLDB RPC, this commit adds a python script that
is used by LLDB RPC to modify the public lldb header files for use with
RPC.

https://discourse.llvm.org/t/rfc-upstreaming-lldb-rpc/85804

[flang] Don't duplicate hermetic module file dependencies (llvm#143605)

When emitting the modules on which a module depends under the
-fhermetic-module-files options, eliminate duplicates by name rather
than by symbol addresses. This way, when a dependent module is in the
symbol table more than once due to the use of a nested hermetic module,
it doesn't get emitted multiple times to the new module file.

[libc] Switched calls to inline_memcpy to __builtin_memcpy for wide char utilities (llvm#143011)

Switched calls to inline_memcpy to __builtin_memcpy for wide char
utilities
Removed unnecessary wctype_utils dependencies from the cmake file

[MLIR][Transform] apply_registered_op fixes: arg order & python options auto-conversion (llvm#143779)

[libc] Move libc_errno.h to libc/src/__support and make LIBC_ERRNO_MODE_SYSTEM to be header-only. (llvm#143187)

This is the first step in preparation for:
https://discourse.llvm.org/t/rfc-make-clang-builtin-math-functions-constexpr-with-llvm-libc-to-support-c-23-constexpr-math-functions/86450

[libc][obvious] Changed incorrect type (llvm#143780)

After changing mbstate_t to mbstate we forgot to change the
character_converter files to reflect it.

Co-authored-by: Sriya Pratipati [email protected]

[GlobalOpt] Bail out on non-ConstExprs in isSimpleEnoughtToCommit. (llvm#143400)

Bail out for non ConstantExpr constants in
isSimpleEnoughValueToCommitHelper to prevent crash for non-ConstantExpr
constants

PR: llvm#143400

[Clang][NFC] Move HeadingAndSpellings to avoid copying (llvm#143611)

Static analysis flagged that we could move HeadingAndSpellings and avoid
a copy of a large object.

[Clang] fix missing source location for errors in macro-expanded (llvm#143460)

Fixes llvm#143216


This patch fixes diagnostic locations for tokens from macro expansions.

Workaround MSVC Linker Issue when Cross-Compiling for ARM64EC (llvm#143659)

This MR presents a temporary workaround for the issue described at
llvm#143575. While an upstream
MSVC
bug

is reported, it makes sense to apply a workaround in LLVM code to
quickly unblock anyone affected.

[Clang] [NFC] Move diagnostics emitting code from DiagnosticIDs into DiagnosticsEngine (llvm#143517)

It makes more sense for this functionality to be all in one place rather
than split up across two files—at least it caused me a bit of a headache
to try and find all places where we were actually forwarding the
diagnostic to the DiagnosticConsumer. Moreover, moving these functions
into DiagnosticsEngine simplifies the code quite a bit since we access
members of DiagnosticsEngine more frequently than those of
DiagnosticIDs. There was also a duplicated code snippet that I’ve
moved out into a new function.

[mlir] Fix ComposeExpandOfCollapseOp for dynamic case (llvm#142663)

Changes findCollapsingReassociation to return nullopt in all cases
where source shape has >=2 dynamic dims. expand(collapse) can
reshape to in any valid output shape but a collapse can only collapse
contiguous dimensions. When there are >=2 dynamic dimensions it is
impossible to determine if it can be simplified to a collapse or if it
is preforming a more advanced reassociation.

This problem was uncovered by
llvm#137963


Signed-off-by: Ian Wood [email protected]

[LOH] Don't emit AdrpAddStr when register could be clobbered (llvm#142849)

llvm@b783aa8
added a check to ensure an AdrpAddLdr LOH isn't created when there is
an instruction between the add and ldr

https://github.com/llvm/llvm-project/blob/50c5704dc000cc0af41a511aa44db03233edf0af/llvm/lib/Target/AArch64/AArch64CollectLOH.cpp#L419-L431

We need a similar check for AdrpAddStr. Although this technically
isn't implemented in LLD, it could be in the future.

https://github.com/llvm/llvm-project/blob/50c5704dc000cc0af41a511aa44db03233edf0af/lld/MachO/Arch/ARM64.cpp#L699-L702

[mlir][generate-test-checks] Do not emit the autogenerated note if it exists (llvm#143750)

Prior to this PR, the script removed the already existing autogenerated
note if we came across a line that was equal to the note. But the
default note is multiple lines, so there would never be a match.
Instead, check to see if the current line is a substring of the
autogenerated note.

Co-authored-by: Michael Maitland [email protected]

[mlir][generate-test-checks] Emit attributes with rest of CHECK lines (llvm#143759)

Prior to this patch, generating test checks in place put the ATTR
definitions at the very top of the file, above the RUN lines and
autogenerated note. All CHECK lines should below the RUN lines and
autogenerated note.

This change ensures that the attribute definitions are emitted with the
rest of the CHECK lines.


Co-authored-by: Michael Maitland [email protected]

[ConstantFolding] Add folding for [de]interleave2, insert and extract (llvm#141301)

The change adds folding for 4 vector intrinsics: interleave2,
deinterleave2, vector_extract and vector_insert. For the last 2
intrinsics the change does not use ShuffleVector fold mechanism as
it's much simpler to construct result vector explicitly.

[libc] Perform bitfield zero initialization wave-parallel (llvm#143607)

Summary:
We need to set the bitfield memory to zero because the system does not
guarantee zeroed out memory. Even if fresh pages are zero, the system
allows re-use so we would need a kfd level API to skip this step.

Because we can't this patch updates the logic to perform the zero
initialization wave-parallel. This reduces the amount of time it takes
to allocate a fresh by up to a tenth.

This has the unfortunate side effect that

kazutakahirata and others added 30 commits June 16, 2025 08:59
These are identified by misc-include-cleaner.  I've filtered out those
that break builds.  Also, I'm staying away from llvm-config.h,
config.h, and Compiler.h, which likely cause platform- or
compiler-specific build failures.
Prepare for removing AVRMCExpr. Adopt the new naming convention (S_
instead of VK_; the relocation specifier was previously named
`VariantKind`)) used by most other targets.

Make AVRMCAsmInfo.h include AVRMCExpr.h and change .cpp files to include
AVRMCAsmInfo.h. We will eventually remove AVRMCExpr.h.
The attributor conservatively marks pointers whose loads are eligible to
be marked as `!invariant.load`.
It does so by identifying:
1. Pointers marked `noalias` and `readonly`
2. Pointers whose underlying objects are all eligible for invariant
loads.

The attributor then manifests this attribute at non-atomic non-volatile
load instructions.
Although taskgroup is a privatizing construct, because of
task_reduction clause, a new scope was not being created for it.
This could cause an extra privatization of variables when
taskgroup was lowered, because its scope would be the same as of
the parent privatizing construct.

This fixes regressions in tests 1052_0201 and 1052_0205, from
Fujitsu testsuite.

This issue didn't happen before because implicit symbols were
being created in a different way before llvm#142154.
A canonicalization pattern from `spirv.GL.Length` to `spirv.GL.FAbs` for scalar operands is also added.
This patch enhances the PtrReplacer as follows:
1. Users are now collected iteratively to be generous on the stack. In
the case of PHIs with incoming values which have not yet been visited,
they are pushed back into the stack for reconsideration.
2. Replace users of the pointer root in a reverse-postorder traversal,
instead of a simple traversal over the collected users. This reordering
ensures that the operands of an instruction are replaced before
replacing the instruction itself.
3. During the replacement of PHI, use the same incoming value if it does
not have a replacement.

This patch specifically fixes the case when an incoming value of a PHI
is addrspacecasted.
putting the function name is the dissassembly instruction messes up the
alignment making it less readable. put it instead with the comment.

This also aligns the opcodes and instruction to the left matching the
cli
To migrate away from the legacy
XXXMCExpr::printImpl/evaluateAsRelocatableImpl overrides and align with
other targets.

While the AArch64MCAsmInfoXXX hooks introduce some duplication, they
enable better separation for object file formats.

Note: While AArch64MCAsmInfoDarwin uses the `@specifier` notation, it
might use AArch64MCExpr with specifier VK_ABS.
test/tools/llvm-mca/AArch64/Exynos/zero-latency-move.s abuses a parser
behavior that :lo12: is also parsed for Mach-O (though it will fail for
-filetype=obj).
…en if the element type isn't a legal scalar type. (llvm#144007)

This fixes an inconsistency in i64 vector handling between RV32 and
RV64. Even if i64 isn't legal as a scalar, we should still be able
to split a large i64 vector to get down to a legal vector type. We only
need to give up if we need to split a vscale x 1 vector.
Directly modeled after what we do for vector.reverse, but with
restrictions on EVL and mask added.
This contains two closely related changes:
1) Explicitly recurse on the i1 case - "3" happens to be the right
   magic constant at m1, but is not otherwise correct, and we're
   better off deferring this to existing logic.
2) Match the lowering for high LMUL shuffles - we've switched to using
   a linear number of m1 vrgather instead of a single big vrgather.
   This results in substantially faster (but also larger) code for
   reverse shuffles larger than m1.  Note that fixed vectors need
   a slide at the end, but scalable ones don't.

This will have the effect of biasing the vectorizer towards larger
(particularly scalable larger) vector factors. This increases VF for the
s112 and s1112 loops from TSVC_2 (in all configurations).

We could refine the high LMUL estimates a bit more, but I think getting
the linear scaling right is probably close enough for the moment.
This change adds a folder for the VecCmpOp

Issue llvm#136487
Removed strcmp, strlen, and memset calls from table.h and replaced them
with internal functions.

---------

Co-authored-by: Sriya Pratipati <[email protected]>
## Purpose

This patch is one in a series of code-mods that annotate LLVM’s public
interface for export. This patch annotates the `llvm/Passes` library and
other pass-related headers. These annotations currently have no
meaningful impact on the LLVM build; however, they are a prerequisite to
support an LLVM Windows DLL (shared library) build.

## Background

This effort is tracked in llvm#109483. Additional context is provided in
[this
discourse](https://discourse.llvm.org/t/psa-annotating-llvm-public-interface/85307),
and documentation for `LLVM_ABI` and related annotations is found in the
LLVM repo
[here](https://github.com/llvm/llvm-project/blob/main/llvm/docs/InterfaceExportAnnotations.rst).

The bulk of these changes were generated automatically using the
[Interface Definition Scanner (IDS)](https://github.com/compnerd/ids)
tool, followed formatting with `git clang-format`.

The following manual adjustments were also applied after running IDS on
Linux:
- Remove the redundant declaration of the `initializeKCFIPass` function
from llvm/include/llvm/InitializePasses.h because IDS only
auto-annotates the first declaration it encounters, and the second
un-annotated declaration results in an MSVC warning
- Add `LLVM_ABI` to a number of private `AnalysisKey` fields in classes
that extend the `AnalysisInfoMixin` template class.
- Add `LLVM_ABI` to the `ChangeReporter` and `TextChangeReporter`
template class definitions in
llvm/include/llvm/Passes/StandardInstrumentations.h and remove the
extern template instantiations. This is the only way I've found to get
everything compiling warning-free when building a DLL because both
template classes have methods implemented out-of-line.

## Validation

Local builds and tests to validate cross-platform compatibility. This
included llvm, clang, and lldb on the following configurations:

- Windows with MSVC
- Windows with Clang
- Linux with GCC
- Linux with Clang
- Darwin with Clang
## Purpose

This patch is one in a series of code-mods that annotate LLVM’s public
interface for export. This patch annotates the `llvm/XRay` library.
These annotations currently have no meaningful impact on the LLVM build;
however, they are a prerequisite to support an LLVM Windows DLL (shared
library) build.

## Background

This effort is tracked in llvm#109483. Additional context is provided in
[this
discourse](https://discourse.llvm.org/t/psa-annotating-llvm-public-interface/85307),
and documentation for `LLVM_ABI` and related annotations is found in the
LLVM repo
[here](https://github.com/llvm/llvm-project/blob/main/llvm/docs/InterfaceExportAnnotations.rst).

The bulk of these changes were generated automatically using the
[Interface Definition Scanner (IDS)](https://github.com/compnerd/ids)
tool, followed formatting with `git clang-format`.

Additionally, I manually added `LLVM_ABI_FRIEND` to friend member
functions declared with `LLVM_ABI`.

## Validation

Local builds and tests to validate cross-platform compatibility. This
included llvm, clang, and lldb on the following configurations:

- Windows with MSVC
- Windows with Clang
- Linux with GCC
- Linux with Clang
- Darwin with Clang
…sor::createPadHighOp` (llvm#144397)

Use `ValueRange` instead of `SmallVector` in `tensor::createPadHighOp`
for the `dynOutDims` arg.
…3763)

## Purpose

This patch is one in a series of code-mods that annotate LLVM’s public
interface for export. This patch annotates the `llvm/ObjectYAML`
library. These annotations currently have no meaningful impact on the
LLVM build; however, they are a prerequisite to support an LLVM Windows
DLL (shared library) build.

## Background

This effort is tracked in llvm#109483. Additional context is provided in
[this
discourse](https://discourse.llvm.org/t/psa-annotating-llvm-public-interface/85307),
and documentation for `LLVM_ABI` and related annotations is found in the
LLVM repo
[here](https://github.com/llvm/llvm-project/blob/main/llvm/docs/InterfaceExportAnnotations.rst).

These were generated automatically using the [Interface Definition
Scanner (IDS)](https://github.com/compnerd/ids) tool, followed
formatting with `git clang-format`.

## Validation

Local builds and tests to validate cross-platform compatibility. This
included llvm, clang, and lldb on the following configurations:

- Windows with MSVC
- Windows with Clang
- Linux with GCC
- Linux with Clang
- Darwin with Clang
This change adds support for ComplexType ImaginaryLiteral

llvm#141365
…144391)

Change in llvm#107278 modified the CMake CACHE variable with values
that are not supported for it as documented. This patch renames the
derived vars so that they do not conflict with the CACHE variable.
This change introduces a ConstantLValueEmitter class, which will be
needed for emitting CIR for non-trivial constant pointers. This change
introduces the class with most branches reaching an NYI diagnostic. The
only path that is currently implemented is the case where an absolute
pointer (usually a null pointer) is emitted. This corresponds to the
existing handler for emitting l-value constants.
…Signature Elements (llvm#144106)

It has pointed out
[here](llvm#143198 (comment))
that we may be able to use `llvm::EnumEntry` so that we can re-use the
printing logic across enumerations.

- Enables re-use of `printEnum` and `printFlags` methods via templates
- Allows easy definition of `getEnumName` function for enum-to-string
conversion, eliminating the need to use a string stream for constructing
the Name SmallString

- Also, does a small fix-up of the operands for descriptor table clause
to be consistent with other `Build*` methods

For reference, the
[test-cases](https://github.com/llvm/llvm-project/blob/main/llvm/unittests/Frontend/HLSLRootSignatureDumpTest.cpp)
that must not change expected output.
arsenm and others added 28 commits June 18, 2025 08:19
…o a read of vlenb. (llvm#144571)

We know that vlenb is a multiple of RVVBytesPerBlock so we aren't
shifting out any non-zero bits.
…#143762)

CFIPrograms' most common uses are within debug frames, but it is not
their only use. For example, some assembly writers encode them by hand
into .cfi_escape directives. This PR extracts printing code for them
into its own files, which avoids the need for the main class to depend
on DWARFUnit, sections, and similar.

One in a series of NFC DebugInfo/DWARF refactoring changes to layer it
more cleanly, so that binary CFI parsing can be used from low-level
code, (such as byte strings created via .cfi_escape) without circular
dependencies. The final goal is to make a more limited dwarf library
usable from lower-level code.

More information can be found at
https://discourse.llvm.org/t/rfc-debuginfo-dwarf-refactor-into-to-lower-and-higher-level-libraries/86665
…llvm#142713)

We already evaluate the initializers for all global variables, as
required by the standard. Leverage that evaluation instead of trying to
separately validate static class members.

This has a few benefits:

- Improved diagnostics; we now get notes explaining what failed to
evaluate.
- Improved correctness: is_constant_evaluated is handled correctly.

The behavior follows the proposed resolution for CWG1721.

Fixes llvm#88462. Fixes llvm#99680.
…3820)

Implement Xtensa Interrupt. HighInterrupts, Exception, Debug Options.
Also implement small Xtensa Options like PRID, Coprocessor and Timers.
…in a specified address space to local (llvm#144287)

Currently, the `EliminateAvailableExternallyPass` only converts certain
available externally functions to local if `avail-extern-to-local` is
set or in
contextual profiling mode. For global variables, it only drops their
initializers.

This PR adds an option to allow the pass to convert global variables in
a
specified address space to local. The motivation for this change is to
correctly
support lowering of LDS variables (`__shared__` variables, in more
generic
terminology) when ThinLTO is enabled for AMDGPU.

A `__shared__` variable is lowered to a hidden global variable in a
particular
address space by the frontend, which is roughly same as a `static` local
variable. To properly lower it in the backend, the compiler needs to
check all
its uses. Enabling ThinLTO currently breaks this when a function
containing a
`__shared__` variable is imported from another module. Even though the
global
variable is imported along with its associated function, and the
function is
privatized by the `EliminateAvailableExternallyPass`, the global
variable itself
is not.

It's safe to privatize such global variables, because they're _local_ to
their
associated functions. If the function itself is privatized, its
associated
global variables should also be privatized accordingly.
…ter' (llvm#92467)

Clarify that these functions are no-ops when linking to LLVM as a shared object.
## Purpose

This patch makes a minor changes to LLVM and Clang so that LLVM can be
built as a Windows DLL with `clang-cl`. These changes were not required
for building a Windows DLL with MSVC.

## Background

The Windows DLL effort is tracked in llvm#109483. Additional context is
provided in [this
discourse](https://discourse.llvm.org/t/psa-annotating-llvm-public-interface/85307),
and documentation for `LLVM_ABI` and related annotations is found in the
LLVM repo
[here](https://github.com/llvm/llvm-project/blob/main/llvm/docs/InterfaceExportAnnotations.rst).

## Overview
Specific changes made in this patch:
- Remove `constexpr` fields that reference DLL exported symbols. These
symbols cannot be resolved at compile time when building a Windows DLL
using `clang-cl`, so they cannot be `constexpr`. Instead, they are made
`const` and initialized in the implementation file rather than at
declaration in the header.
- Annotate symbols now defined out-of-line with `LLVM_ABI` so they are
exported when building as a shared library.
- Explicitly add default copy assignment operator for `ELFFile` to
resolve a compiler warning.

## Validation

Local builds and tests to validate cross-platform compatibility. This
included llvm, clang, and lldb on the following configurations:

- Windows with MSVC
- Windows with Clang
- Linux with GCC
- Linux with Clang
- Darwin with Clang
July 14 2024 I landed a change to update progress reporting when
loading kernel/firmware binaries
llvm#98845
In DynamicLoader::LoadBinaryWithUUIDAndAddress I removed code that
was setting the ModuleSpec to the provided name, if the name provided
is that of a file on disk.  With this code missing, if a filepath
name is passed in, this code will fail to find that binary on the local
disk.  There's nothing in the PR / intention that would lead to this
change, it was unintentional.
Test cleanup: 
1) separate layout.mlir from ops.mlir for layout related test 
2) remove lane layout for ops working at work item scope. 
3) remove redundant test in create_tdesc/update_tdesc/prefetch. 
4) remove "test_" from all test function name.
…ement and fptrunc users (llvm#141758)

Now we only support D16 folding for `image sample` instructions with a
single user: a `fptrunc` to half.
However, we can actually support D16 folding for image.sample
instructions with multiple users,
as long as each user follows the pattern of extractelement followed by
fptrunc to half.
For example:
```
  %sample = call <4 x float> @llvm.amdgcn.image.sample
  %e0 = extractelement <4 x float> %sample, i32 0
  %h0 = fptrunc float %e0 to half
  %e1 = extractelement <4 x float> %sample, i32 1
  %h1 = fptrunc float %e1 to half
  %e2 = extractelement <4 x float> %sample, i32 2
  %h2 = fptrunc float %e2 to half
```
This change enables D16 folding for such cases and avoids generating
`v_cvt_f16_f32_e32` instructions.
… alias.

The motivation for this is that it causes the jump table entry's symbol
to have an st_size equal to the jump table entry size, instead of being
equal to the size of the entire jump table, which is incorrect and can
lead to unexpected behavior in binary analysis tools that rely on the
size field such as Bloaty.

Reviewers: fmayer

Reviewed By: fmayer

Pull Request: llvm#144462
…tension (llvm#144320)

The spec can be found at:
https://github.com/andestech/andes-v5-isa/releases/tag/ast-v5_4_0-release.

This patch only supports assembler. The instructions are similar to
`Zvfbfmin` and the only difference with `Zvfbfmin` is that
`XAndesVBFHCvt` doesn't have mask variant.
…mory ranges (llvm#136040)

Recently I was debugging a Minidump with a few thousand ranges, and came
across the (now deleted) comment:

```
  // I don't have a sense of how frequently this is called or how many memory
  // ranges a Minidump typically has, so I'm not sure if searching for the
  // appropriate range linearly each time is stupid.  Perhaps we should build
  // an index for faster lookups.
```

blaming this comment, it's 9 years old! Much overdue for this simple fix
with a range data vector.

I had to add a default constructor to Range in order to implement the
RangeDataVector, but otherwise this just a replacement of look up logic.
These instructions count leading/trailing ones in the register.

Currently these are only generated when we have `Zbb` enabled (along
with `Xqcibm`) since it contains the `CTTZ/CTLZ` instructions.
Some of these incorrectly call the l suffixed version of libm
functions and others assert.
…vm#144382)

This wasn't setting the correct libcall names, which default to the
l suffixed libm names.
Previously, the loop would only add labels in the set of sections that were closest to Target. Set of section here because multiple sections can have the same address, so all of their symbols would be added to the set of candidate symbols. The following changes make it such that we loop down from sections closest to the Target and populate the set of candidate symbols with symbols from the first set of sections that do contain symbols
@arjunUpatel
Copy link
Owner Author

This stack of pull requests is managed by Graphite. Learn more about stacking.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.