Skip to content

u128::midpoint's implementation could be better #124790

@scottmcm

Description

@scottmcm

Today it uses the hackers delight trick

pub const fn midpoint(self, rhs: $SelfT) -> $SelfT {
// Use the well known branchless algorithm from Hacker's Delight to compute
// `(a + b) / 2` without overflowing: `((a ^ b) >> 1) + (a & b)`.
((self ^ rhs) >> 1) + (self & rhs)
}

but

#![feature(num_midpoint)]
#[no_mangle] pub fn demo(x: u128, y: u128) -> u128 { x.midpoint(y) }

gives

demo:
        mov     rax, rdx
        xor     rax, rdi
        mov     r8, rcx
        xor     r8, rsi
        shrd    rax, r8, 1
        shr     r8
        and     rcx, rsi
        and     rdx, rdi
        add     rax, rdx
        adc     r8, rcx
        mov     rdx, r8
        ret

Whereas if it was instead

#[no_mangle] pub fn demo(x: u128, y: u128) -> u128 {
    let (z, c) = x.overflowing_add(y);
    (z >> 1) | (u128::from(c) << 127)
}

that compiles to

demo:
        mov     rax, rdi
        add     rax, rdx
        adc     rsi, rcx
        setb    cl
        shrd    rax, rsi, 1
        movzx   edx, cl
        shld    rdx, rsi, 63
        ret

which seems strictly better.

(Alternatively you could try making it an intrinsic with a fallback body, since the obvious LLVM -- https://llvm.godbolt.org/z/c56YrMz8c -- seems to work decently.)

Metadata

Metadata

Assignees

No one assigned

    Labels

    C-optimizationCategory: An issue highlighting optimization opportunities or PRs implementing suchI-slowIssue: Problems and improvements with respect to performance of generated code.T-libsRelevant to the library team, which will review and decide on the PR/issue.requires-nightlyThis issue requires a nightly compiler in some way.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions