Skip to content

Conversation

@camshaft
Copy link
Contributor

@camshaft camshaft commented Jun 27, 2023

Description of changes:

This change optimized the checksum routines for smaller payloads. This is especially important for the IPv4 header, which only checksums 18 bytes (20 if you include the zeroed checksum itself).

With these optimizations I was able to improve things by about 20%.

Testing:

I've updated the benchmarks to include the 20 bytes:

inet/s2n/checksum/20    time:   [3.8394 ns 3.8579 ns 3.8828 ns]
                        thrpt:  [4.7972 GiB/s 4.8282 GiB/s 4.8514 GiB/s]
                 change:
                        time:   [-18.044% -17.376% -16.737%] (p = 0.00 < 0.05)
                        thrpt:  [+20.102% +21.030% +22.017%]
                        Performance has improved.
inet/s2n/checksum/1500  time:   [38.249 ns 38.280 ns 38.309 ns]
                        thrpt:  [36.466 GiB/s 36.494 GiB/s 36.523 GiB/s]
                 change:
                        time:   [-0.4633% -0.3641% -0.2679%] (p = 0.00 < 0.05)
                        thrpt:  [+0.2687% +0.3654% +0.4655%]
                        Change within noise threshold.
inet/s2n/checksum/9000  time:   [195.10 ns 195.26 ns 195.40 ns]
                        thrpt:  [42.895 GiB/s 42.927 GiB/s 42.961 GiB/s]
                 change:
                        time:   [+0.0146% +0.4505% +1.1789%] (p = 0.17 > 0.05)
                        thrpt:  [-1.1652% -0.4484% -0.0146%]
                        No change in performance detected.
inet/s2n/checksum/65536 time:   [1.3706 µs 1.3709 µs 1.3712 µs]
                        thrpt:  [44.511 GiB/s 44.523 GiB/s 44.531 GiB/s]
                 change:
                        time:   [+0.0143% +0.0411% +0.0705%] (p = 0.00 < 0.05)
                        thrpt:  [-0.0704% -0.0411% -0.0143%]
                        Change within noise threshold.

I've also added a few new kani proofs to show functional equivalence of the optimized code and the RFC code.

Before

s2n_quic_core::inet::ipv4::Header::update_checksum:
 movzx   eax, word, ptr, [rdi]
 movzx   ecx, word, ptr, [rdi, +, 2]
 add     ecx, eax
 movzx   eax, word, ptr, [rdi, +, 4]
 movzx   edx, word, ptr, [rdi, +, 6]
 add     edx, eax
 add     edx, ecx
 movzx   eax, word, ptr, [rdi, +, 8]
 movzx   ecx, word, ptr, [rdi, +, 12]
 add     ecx, eax
 movzx   eax, word, ptr, [rdi, +, 14]
 add     eax, ecx
 add     eax, edx
 movzx   ecx, word, ptr, [rdi, +, 16]
 movzx   edx, word, ptr, [rdi, +, 18]
 add     edx, ecx
 add     edx, eax
 movzx   eax, dx
 shr     edx, 16
 add     edx, eax
 movzx   eax, dx
 shr     edx, 16
 add     edx, eax
 mov     eax, edx
 shr     eax, 16
 add     eax, edx
 cmp     ax, -1
 je      .LBB731_1
 not     eax
 rol     ax, 8
 rol     ax, 8
 mov     word, ptr, [rdi, +, 10], ax
 ret
.LBB731_1:
 mov     ax, -1
 rol     ax, 8
 mov     word, ptr, [rdi, +, 10], ax
 ret

After

s2n_quic_core::inet::ipv4::Header::update_checksum:
 mov     word, ptr, [rdi, +, 10], 0
 mov     eax, dword, ptr, [rdi]
 mov     ecx, dword, ptr, [rdi, +, 4]
 add     rcx, rax
 mov     eax, dword, ptr, [rdi, +, 8]
 mov     edx, dword, ptr, [rdi, +, 12]
 add     rdx, rax
 add     rdx, rcx
 mov     eax, dword, ptr, [rdi, +, 16]
 add     rax, rdx
 mov     rcx, rax
 shr     rcx, 16
 add     cx, ax
 adc     cx, 0
 shr     rax, 32
 add     ax, cx
 adc     ax, 0
 cmp     ax, -1
 not     eax
 mov     ecx, 65535
 cmovne  ecx, eax
 mov     word, ptr, [rdi, +, 10], cx
 ret

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@camshaft camshaft force-pushed the camshaft/checksum-opt branch from cb9b3bc to fa13acb Compare June 27, 2023 19:25
@camshaft camshaft force-pushed the camshaft/checksum-opt branch from fa13acb to 7f3d29b Compare June 27, 2023 20:07
@camshaft camshaft marked this pull request as ready for review June 27, 2023 20:39
checksum.write(header.as_bytes());
header.checksum_mut().set(checksum.finish());
}
header.update_checksum();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the cached_checksum still updated somewhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No it was just to reduce the amount of probing performed

// stores a copy of Checksum so we don't have to probe the platform function every time
cached_checksum: Checksum,

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah the checksum implements Copy so it's being copied there.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, the probing is happening in the impl Default for Checksum

@camshaft camshaft merged commit 3ebebc6 into main Jun 30, 2023
@camshaft camshaft deleted the camshaft/checksum-opt branch June 30, 2023 00:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants