-
Notifications
You must be signed in to change notification settings - Fork 13.6k
Open
Labels
C-optimizationCategory: An issue highlighting optimization opportunities or PRs implementing suchCategory: An issue highlighting optimization opportunities or PRs implementing suchI-slowIssue: Problems and improvements with respect to performance of generated code.Issue: Problems and improvements with respect to performance of generated code.T-libsRelevant to the library team, which will review and decide on the PR/issue.Relevant to the library team, which will review and decide on the PR/issue.requires-nightlyThis issue requires a nightly compiler in some way.This issue requires a nightly compiler in some way.
Description
Today it uses the hackers delight trick
rust/library/core/src/num/mod.rs
Lines 127 to 131 in 96f1da8
pub const fn midpoint(self, rhs: $SelfT) -> $SelfT { | |
// Use the well known branchless algorithm from Hacker's Delight to compute | |
// `(a + b) / 2` without overflowing: `((a ^ b) >> 1) + (a & b)`. | |
((self ^ rhs) >> 1) + (self & rhs) | |
} |
but
#![feature(num_midpoint)]
#[no_mangle] pub fn demo(x: u128, y: u128) -> u128 { x.midpoint(y) }
gives
demo:
mov rax, rdx
xor rax, rdi
mov r8, rcx
xor r8, rsi
shrd rax, r8, 1
shr r8
and rcx, rsi
and rdx, rdi
add rax, rdx
adc r8, rcx
mov rdx, r8
ret
Whereas if it was instead
#[no_mangle] pub fn demo(x: u128, y: u128) -> u128 {
let (z, c) = x.overflowing_add(y);
(z >> 1) | (u128::from(c) << 127)
}
that compiles to
demo:
mov rax, rdi
add rax, rdx
adc rsi, rcx
setb cl
shrd rax, rsi, 1
movzx edx, cl
shld rdx, rsi, 63
ret
which seems strictly better.
(Alternatively you could try making it an intrinsic with a fallback body, since the obvious LLVM -- https://llvm.godbolt.org/z/c56YrMz8c -- seems to work decently.)
Metadata
Metadata
Assignees
Labels
C-optimizationCategory: An issue highlighting optimization opportunities or PRs implementing suchCategory: An issue highlighting optimization opportunities or PRs implementing suchI-slowIssue: Problems and improvements with respect to performance of generated code.Issue: Problems and improvements with respect to performance of generated code.T-libsRelevant to the library team, which will review and decide on the PR/issue.Relevant to the library team, which will review and decide on the PR/issue.requires-nightlyThis issue requires a nightly compiler in some way.This issue requires a nightly compiler in some way.