Skip to content

Conversation

FelixMcFelix
Copy link
Contributor

This PR bumps the viona receive queue length to be a larger value (8192), matching what can be observed in GCP. One of the pieces here is that we're now allowing a list of virtqueues to be created with different lengths, all of which must be validated to fit the u16 and power-of-two requirement.

We don't necessarily want that particular value at this time, since that's a lot of physically contiguous pages a needed to create the rx virtqueue. But this gives us the tools to investigate Rx and Tx queue lengths separately for #930, ideally on a racklette.

FelixMcFelix and others added 4 commits August 22, 2025 13:09
This PR bumps the viona receive queue length to be a larger value,
matching what can be observed in GCP. One of the pieces here is that
we're now allowing a list of virtqueues to be created with different
lengths, all of which must be validated to fit the `u16` and
power-of-two requirement.
@FelixMcFelix
Copy link
Contributor Author

Some testing notes on berlin with the Rx queue size set to 8192 and Tx queue size maintained at 256. Images are builtin alpine and Ubuntu 24.02 noble. alpine0/alpine1 and noble0/noble2 are anti-affinity groups.

  • alpine1 has its Rx queue at 0xfffffd3a03a81b90, noble0 has its Rx queue at 0xfffffd3aba2f8090 on the same sled.
  • Each is acting as an iperf server.

Looking at the number of free queue slots when it's delivery time:

BRM06240029 # dtrace -n 'viona_ring_num_avail:entry/arg0 == 0xfffffd3a03a81b90/{this->v=1} viona_ring_num_avail:return/this->v/{@["space"] = lquantize(arg1, 0, 8192, 256); this->v=0}'
dtrace: description 'viona_ring_num_avail:entry' matched 2 probes
^C

  space
           value  ------------- Distribution ------------- count
            6656 |                                         0
            6912 |                                         4
            7168 |                                         4972
            7424 |                                         27500
            7680 |@@@@@                                    534309
            7936 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@      3727360
         >= 8192 |                                         1903

BRM06240029 # dtrace -n 'viona_ring_num_avail:entry/arg0 == 0xfffffd3aba2f8090/{this->v=1} viona_ring_num_avail:return/this->v/{@["space"] = lquantize(arg1, 0, 8192, 256); this->v=0}'
dtrace: description 'viona_ring_num_avail:entry' matched 2 probes
^C

  space
           value  ------------- Distribution ------------- count
            6912 |                                         0
            7168 |                                         18407
            7424 |                                         50041
            7680 |@@@                                      480239
            7936 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@     5552156
         >= 8192 |                                         12264

It seems an Rx queue size of 2048 will suffice at the current Tx queue length. Confirming errors on these rings are low:

> 0xfffffd3a03a81b90::print viona_vring_t vr_size vr_stats vr_err_stats
vr_size = 0x2000
vr_stats = {
    vr_stats.vts_packets = 0x16d6432
    vr_stats.vts_bytes = 0x355ffd9de3
    vr_stats.vts_errors = 0x4
    vr_stats.vts_drops = 0x4
}
vr_err_stats = {
    vr_err_stats.rs_ndesc_too_high = 0
    vr_err_stats.rs_bad_idx = 0
    vr_err_stats.rs_indir_bad_len = 0
    vr_err_stats.rs_indir_bad_nest = 0
    vr_err_stats.rs_indir_bad_next = 0
    vr_err_stats.rs_no_space = 0
    vr_err_stats.rs_too_many_desc = 0
    vr_err_stats.rs_desc_bad_len = 0
    vr_err_stats.rs_len_overflow = 0
    vr_err_stats.rs_bad_ring_addr = 0
    vr_err_stats.rs_fail_hcksum = 0
    vr_err_stats.rs_fail_hcksum6 = 0
    vr_err_stats.rs_fail_hcksum_proto = 0
    vr_err_stats.rs_bad_rx_frame = 0
    vr_err_stats.rs_rx_merge_overrun = 0x4
    vr_err_stats.rs_rx_merge_underrun = 0
    vr_err_stats.rs_rx_pad_short = 0x22
    vr_err_stats.rs_rx_mcast_check = 0
    vr_err_stats.rs_rx_drop_over_mtu = 0
    vr_err_stats.rs_rx_gro_fallback = 0
    vr_err_stats.rs_rx_gro_fallback_fail = 0
    vr_err_stats.rs_too_short = 0
    vr_err_stats.rs_tx_absent = 0
    vr_err_stats.rs_tx_gso_fail = 0
    vr_err_stats.rs_rx_hookdrop = 0
    vr_err_stats.rs_tx_hookdrop = 0
}
> 0xfffffd3aba2f8090::print viona_vring_t vr_size vr_stats vr_err_stats
vr_size = 0x2000
vr_stats = {
    vr_stats.vts_packets = 0xa59bf1
    vr_stats.vts_bytes = 0x1bb8f244fb
    vr_stats.vts_errors = 0x12
    vr_stats.vts_drops = 0x12
}
vr_err_stats = {
    vr_err_stats.rs_ndesc_too_high = 0
    vr_err_stats.rs_bad_idx = 0
    vr_err_stats.rs_indir_bad_len = 0
    vr_err_stats.rs_indir_bad_nest = 0
    vr_err_stats.rs_indir_bad_next = 0
    vr_err_stats.rs_no_space = 0
    vr_err_stats.rs_too_many_desc = 0
    vr_err_stats.rs_desc_bad_len = 0
    vr_err_stats.rs_len_overflow = 0
    vr_err_stats.rs_bad_ring_addr = 0
    vr_err_stats.rs_fail_hcksum = 0
    vr_err_stats.rs_fail_hcksum6 = 0
    vr_err_stats.rs_fail_hcksum_proto = 0
    vr_err_stats.rs_bad_rx_frame = 0
    vr_err_stats.rs_rx_merge_overrun = 0x12
    vr_err_stats.rs_rx_merge_underrun = 0
    vr_err_stats.rs_rx_pad_short = 0x5
    vr_err_stats.rs_rx_mcast_check = 0
    vr_err_stats.rs_rx_drop_over_mtu = 0
    vr_err_stats.rs_rx_gro_fallback = 0
    vr_err_stats.rs_rx_gro_fallback_fail = 0
    vr_err_stats.rs_too_short = 0
    vr_err_stats.rs_tx_absent = 0
    vr_err_stats.rs_tx_gso_fail = 0
    vr_err_stats.rs_rx_hookdrop = 0
    vr_err_stats.rs_tx_hookdrop = 0
}

Lets compare dogfood with berlin in this case, using the alpine VMs as an example.

dogfood:

localhost:~# iperf3 -c 172.30.0.11 -Z 
Connecting to host 172.30.0.11, port 5201
[  5] local 172.30.0.10 port 57366 connected to 172.30.0.11 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  1.31 GBytes  11.2 Gbits/sec  3674    655 KBytes       
[  5]   1.00-2.00   sec  1.32 GBytes  11.3 Gbits/sec  3676    925 KBytes       
[  5]   2.00-3.00   sec  1.33 GBytes  11.5 Gbits/sec  5431    587 KBytes       
[  5]   3.00-4.00   sec  1.37 GBytes  11.8 Gbits/sec  4573    699 KBytes       
[  5]   4.00-5.00   sec  1.28 GBytes  11.0 Gbits/sec  7280    461 KBytes       
[  5]   5.00-6.00   sec  1.25 GBytes  10.7 Gbits/sec  7039    601 KBytes       
[  5]   6.00-7.00   sec  1.30 GBytes  11.1 Gbits/sec  7263    598 KBytes       
[  5]   7.00-8.00   sec  1.28 GBytes  11.0 Gbits/sec  5528    737 KBytes       
[  5]   8.00-9.00   sec  1.32 GBytes  11.3 Gbits/sec  5059    996 KBytes       
[  5]   9.00-10.00  sec  1.25 GBytes  10.7 Gbits/sec  4091    629 KBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  13.0 GBytes  11.2 Gbits/sec  53614             sender
[  5]   0.00-10.00  sec  13.0 GBytes  11.2 Gbits/sec                  receiver

iperf Done.

berlin:

localhost:~# iperf3 -c 172.30.0.6 -Z
Connecting to host 172.30.0.6, port 5201
[  5] local 172.30.0.5 port 49066 connected to 172.30.0.6 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  1.69 GBytes  14.5 Gbits/sec  661   1.52 MBytes       
[  5]   1.00-2.00   sec  1.56 GBytes  13.4 Gbits/sec  850   1.08 MBytes       
[  5]   2.00-3.00   sec  1.56 GBytes  13.4 Gbits/sec  585   1.80 MBytes       
[  5]   3.00-4.00   sec  1.73 GBytes  14.8 Gbits/sec  1108   1.80 MBytes       
[  5]   4.00-5.00   sec  1.73 GBytes  14.9 Gbits/sec  860   1.44 MBytes       
[  5]   5.00-6.00   sec  1.62 GBytes  13.9 Gbits/sec  670   1.90 MBytes       
[  5]   6.00-7.00   sec  1.36 GBytes  11.7 Gbits/sec  736   1.33 MBytes       
[  5]   7.00-8.00   sec  1.74 GBytes  14.9 Gbits/sec  827   1.66 MBytes       
[  5]   8.00-9.00   sec  1.66 GBytes  14.3 Gbits/sec  1125   1.78 MBytes       
[  5]   9.00-10.00  sec  1.70 GBytes  14.6 Gbits/sec  655   1.61 MBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  16.3 GBytes  14.0 Gbits/sec  8077             sender
[  5]   0.00-10.00  sec  16.3 GBytes  14.0 Gbits/sec                  receiver

iperf Done.

A bunch of that difference will be due to packet loaning and the new cxgbe reclamation work, but we've not accounted for all the retransmissions, it seems. These appear to be zero now on sled-local traffic, at least.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants