Skip to content

Conversation

chorman0773
Copy link

@chorman0773 chorman0773 commented Sep 13, 2018

Implements <span> from C++2a. This header provides std::span and all associated features in <span> as described by https://en.cppreference.com/w/cpp/header/span. The Pull Request is intended to add these to libstd++ which is currently not provided by GCC.
This header comes with the guarentee that it follows the specification described in the above resource. It is not guarenteed to be Complient with the latest version of the C++2a standard working draft, or any later version, should the header and/or class specification change.

Implements <span> from C++2a. This header provides std::span and all associated features in <span> as described by https://en.cppreference.com/w/cpp/header/span.
@DesWurstes
Copy link
Contributor

Send this to [email protected].

@jwakely
Copy link
Contributor

jwakely commented Oct 2, 2018

This is an unofficial mirror that nobody from the GCC project is involved with. Sending pull requests here is a waste of time.

Please see https://gcc.gnu.org/contribute.html for how to contribute to GCC, thanks.

@chorman0773 chorman0773 closed this Oct 2, 2018
nstester pushed a commit to nstester/gcc that referenced this pull request Aug 19, 2021
As suggested in the PR, the following patch adds two new clrsb
expansion possibilities if target doesn't have clrsb_optab for the
requested nor wider modes, but does have clz_optab for the requested
mode.
One expansion is
clrsb (op0)
expands as
clz (op0 ^ (((stype)op0) >> (prec-1))) - 1
which is usable if CLZ_DEFINED_VALUE_AT_ZERO is 2 with value
of prec, because the clz argument can be 0 and clrsb should give
prec-1 in that case.
The other expansion is
clz (((op0 << 1) ^ (((stype)op0) >> (prec-1))) | 1)
where the clz argument is never 0, but it is one operation longer.
E.g. on x86_64-linux with -O2 -mno-lzcnt, this results for
int foo (int x) { return __builtin_clrsb (x); }
in
-       subq    $8, %rsp
-       movslq  %edi, %rdi
-       call    __clrsbdi2
-       addq    $8, %rsp
-       subl    $32, %eax
+       leal    (%rdi,%rdi), %eax
+       sarl    $31, %edi
+       xorl    %edi, %eax
+       orl     $1, %eax
+       bsrl    %eax, %eax
+       xorl    $31, %eax
and with -O2 -mlzcnt:
+       movl    %edi, %eax
+       sarl    $31, %eax
+       xorl    %edi, %eax
+       lzcntl  %eax, %eax
+       subl    $1, %eax
On armv7hl-linux-gnueabi with -O2:
-       push    {r4, lr}
-       bl      __clrsbsi2
-       pop     {r4, pc}
+       @ link register save eliminated.
+       eor     r0, r0, r0, asr gcc-mirror#31
+       clz     r0, r0
+       sub     r0, r0, #1
+       bx      lr
As it (at least usually) will make code larger, it is
disabled for -Os or cold instructions.

2021-08-19  Jakub Jelinek  <[email protected]>

	PR middle-end/101950
	* optabs.c (expand_clrsb_using_clz): New function.
	(expand_unop): Use it as another clrsb expansion fallback.

	* gcc.target/i386/pr101950-1.c: New test.
	* gcc.target/i386/pr101950-2.c: New test.
xionghul pushed a commit to xionghul/gcc that referenced this pull request Jan 16, 2023
This recognises the patterns of the form:
  while (n & 1) { n >>= 1 }

Unfortunately there are currently two issues relating to this patch.

Firstly, simplify_using_initial_conditions does not recognise that
	(n != 0) and ((n & 1) == 0) implies that ((n >> 1) != 0).

This preconditions arise following the loop copy-header pass, and the
assumptions returned by number_of_iterations_exit_assumptions then
prevent final value replacement from using the niter result.

I'm not sure what is the best way to fix this - one approach could be to
modify simplify_using_initial_conditions to handle this sort of case,
but it seems that it basically wants the information that ranger could
give anway, so would something like that be a better option?

The second issue arises in the vectoriser, which is able to determine
that the niter->assumptions are always true.
When building with -march=armv8.4-a+sve -S -O3, we get this codegen:

foo (unsigned int b) {
    int c = 0;

    if (b == 0)
      return PREC;

    while (!(b & (1 << (PREC - 1)))) {
        b <<= 1;
        c++;
    }

    return c;
}

foo:
.LFB0:
        .cfi_startproc
        cmp     w0, 0
        cbz     w0, .L6
        blt     .L7
        lsl     w1, w0, 1
        clz     w2, w1
        cmp     w2, 14
        bls     .L8
        mov     x0, 0
        cntw    x3
        add     w1, w2, 1
        index   z1.s, #0, gcc-mirror#1
        whilelo p0.s, wzr, w1
.L4:
        add     x0, x0, x3
        mov     p1.b, p0.b
        mov     z0.d, z1.d
        whilelo p0.s, w0, w1
        incw    z1.s
        b.any   .L4
        add     z0.s, z0.s, gcc-mirror#1
        lastb   w0, p1, z0.s
        ret
        .p2align 2,,3
.L8:
        mov     w0, 0
        b       .L3
        .p2align 2,,3
.L13:
        lsl     w1, w1, 1
.L3:
        add     w0, w0, 1
        tbz     w1, gcc-mirror#31, .L13
        ret
        .p2align 2,,3
.L6:
        mov     w0, 32
        ret
        .p2align 2,,3
.L7:
        mov     w0, 0
        ret
        .cfi_endproc

In essence, the vectoriser uses the niter information to determine
exactly how many iterations of the loop it needs to run. It then uses
SVE whilelo instructions to run this number of iterations. The original
loop counter is also vectorised, despite only being used in the final
iteration, and then the final value of this counter is used as the
return value (which is the same as the number of iterations it computed
in the first place).

This vectorisation is obviously bad, and I think it exposes a latent
bug in the vectoriser, rather than being an issue caused by this
specific patch.

gcc/ChangeLog:

	* tree-ssa-loop-niter.cc (number_of_iterations_cltz): New.
	(number_of_iterations_bitcount): Add call to the above.
	(number_of_iterations_exit_assumptions): Add EQ_EXPR case for
	c[lt]z idiom recognition.

gcc/testsuite/ChangeLog:

	* gcc.dg/tree-ssa/cltz-max.c: New test.
	* gcc.dg/tree-ssa/clz-char.c: New test.
	* gcc.dg/tree-ssa/clz-int.c: New test.
	* gcc.dg/tree-ssa/clz-long-long.c: New test.
	* gcc.dg/tree-ssa/clz-long.c: New test.
	* gcc.dg/tree-ssa/ctz-char.c: New test.
	* gcc.dg/tree-ssa/ctz-int.c: New test.
	* gcc.dg/tree-ssa/ctz-long-long.c: New test.
	* gcc.dg/tree-ssa/ctz-long.c: New test.
XYenChi pushed a commit to XYenChi/gcc that referenced this pull request Mar 6, 2024
This recognises the patterns of the form:
  while (n & 1) { n >>= 1 }

Unfortunately there are currently two issues relating to this patch.

Firstly, simplify_using_initial_conditions does not recognise that
	(n != 0) and ((n & 1) == 0) implies that ((n >> 1) != 0).

This preconditions arise following the loop copy-header pass, and the
assumptions returned by number_of_iterations_exit_assumptions then
prevent final value replacement from using the niter result.

I'm not sure what is the best way to fix this - one approach could be to
modify simplify_using_initial_conditions to handle this sort of case,
but it seems that it basically wants the information that ranger could
give anway, so would something like that be a better option?

The second issue arises in the vectoriser, which is able to determine
that the niter->assumptions are always true.
When building with -march=armv8.4-a+sve -S -O3, we get this codegen:

foo (unsigned int b) {
    int c = 0;

    if (b == 0)
      return PREC;

    while (!(b & (1 << (PREC - 1)))) {
        b <<= 1;
        c++;
    }

    return c;
}

foo:
.LFB0:
        .cfi_startproc
        cmp     w0, 0
        cbz     w0, .L6
        blt     .L7
        lsl     w1, w0, 1
        clz     w2, w1
        cmp     w2, 14
        bls     .L8
        mov     x0, 0
        cntw    x3
        add     w1, w2, 1
        index   z1.s, #0, #1
        whilelo p0.s, wzr, w1
.L4:
        add     x0, x0, x3
        mov     p1.b, p0.b
        mov     z0.d, z1.d
        whilelo p0.s, w0, w1
        incw    z1.s
        b.any   .L4
        add     z0.s, z0.s, #1
        lastb   w0, p1, z0.s
        ret
        .p2align 2,,3
.L8:
        mov     w0, 0
        b       .L3
        .p2align 2,,3
.L13:
        lsl     w1, w1, 1
.L3:
        add     w0, w0, 1
        tbz     w1, gcc-mirror#31, .L13
        ret
        .p2align 2,,3
.L6:
        mov     w0, 32
        ret
        .p2align 2,,3
.L7:
        mov     w0, 0
        ret
        .cfi_endproc

In essence, the vectoriser uses the niter information to determine
exactly how many iterations of the loop it needs to run. It then uses
SVE whilelo instructions to run this number of iterations. The original
loop counter is also vectorised, despite only being used in the final
iteration, and then the final value of this counter is used as the
return value (which is the same as the number of iterations it computed
in the first place).

This vectorisation is obviously bad, and I think it exposes a latent
bug in the vectoriser, rather than being an issue caused by this
specific patch.

gcc/ChangeLog:

	* tree-ssa-loop-niter.cc (number_of_iterations_cltz): New.
	(number_of_iterations_bitcount): Add call to the above.
	(number_of_iterations_exit_assumptions): Add EQ_EXPR case for
	c[lt]z idiom recognition.

gcc/testsuite/ChangeLog:

	* gcc.dg/tree-ssa/cltz-max.c: New test.
	* gcc.dg/tree-ssa/clz-char.c: New test.
	* gcc.dg/tree-ssa/clz-int.c: New test.
	* gcc.dg/tree-ssa/clz-long-long.c: New test.
	* gcc.dg/tree-ssa/clz-long.c: New test.
	* gcc.dg/tree-ssa/ctz-char.c: New test.
	* gcc.dg/tree-ssa/ctz-int.c: New test.
	* gcc.dg/tree-ssa/ctz-long-long.c: New test.
	* gcc.dg/tree-ssa/ctz-long.c: New test.
cooljeanius referenced this pull request in cooljeanius/gcc Sep 20, 2024
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
cooljeanius referenced this pull request in cooljeanius/gcc Sep 20, 2024
Fix code scanning alert #31: Incorrect conversion between integer types
iains referenced this pull request in NinaRanns/gcc Oct 29, 2024
…-opt

Contracts nonattr add eval semantic opt
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants