Improve pure Go speed in low shard count #306

klauspost · 2025-08-26T15:15:02Z

Add a 32MB lookup table that allows looking up 2 values at once. Will only be used when no assembly functions are available.

Benchmarks: (with no assembly)

benchmark                                                old ns/op      new ns/op      delta
BenchmarkGalois128K-32                                   24802          15883          -35.96%
BenchmarkGalois1M-32                                     202043         123711         -38.77%
BenchmarkGaloisXor128K-32                                29152          18226          -37.48%
BenchmarkGaloisXor1M-32                                  230820         148170         -35.81%
BenchmarkEncode2x1x1M-32                                 25016          25261          +0.98%
BenchmarkEncode800x200/64-32                             127666         103062         -19.27%
BenchmarkEncode800x200/256-32                            470906         354406         -24.74%
BenchmarkEncode800x200/1024-32                           1835598        1377017        -24.98%
BenchmarkEncode800x200/4096-32                           7423581        5238705        -29.43%
BenchmarkEncode800x200/16384-32                          30795690       21454123       -30.33%
BenchmarkEncode800x200/65536-32                          120715900      96913158       -19.72%
BenchmarkEncode800x200/262144-32                         490054033      368235100      -24.86%
BenchmarkEncode800x200/1048576-32                        2011852300     1549867400     -22.96%
BenchmarkEncode1K/4+4/cauchy-32                          4894           2764           -43.52%
BenchmarkEncode1K/8+8/cauchy-32                          17621          11426          -35.16%
BenchmarkEncode1K/16+16/cauchy-32                        66220          58349          -11.89%
BenchmarkEncode1K/32+32/cauchy-32                        257469         239701         -6.90%
BenchmarkEncode1K/64+64/cauchy-32                        1041104        967883         -7.03%
BenchmarkEncode1K/128+128/cauchy-32                      4065196        4178904        +2.80%
BenchmarkDecode1K/4+4/cauchy-32                          5928           3833           -35.34%
BenchmarkDecode1K/4+4/cauchy-inv-32                      5853           3487           -40.42%
BenchmarkDecode1K/4+4/cauchy-single-32                   1790           1193           -33.35%
BenchmarkDecode1K/4+4/cauchy-single-inv-32               1448           904            -37.58%
BenchmarkDecode1K/8+8/cauchy-32                          20207          13253          -34.41%
BenchmarkDecode1K/8+8/cauchy-inv-32                      19062          11996          -37.07%
BenchmarkDecode1K/8+8/cauchy-single-32                   3579           2252           -37.08%
BenchmarkDecode1K/8+8/cauchy-single-inv-32               2609           1605           -38.48%
BenchmarkDecode1K/16+16/cauchy-32                        119255         70093          -41.22%
BenchmarkDecode1K/16+16/cauchy-inv-32                    92758          59753          -35.58%
BenchmarkDecode1K/16+16/cauchy-single-32                 7827           5863           -25.09%
BenchmarkDecode1K/16+16/cauchy-single-inv-32             6115           3981           -34.90%
BenchmarkDecode1K/32+32/cauchy-32                        320032         294357         -8.02%
BenchmarkDecode1K/32+32/cauchy-inv-32                    277815         239258         -13.88%
BenchmarkDecode1K/32+32/cauchy-single-32                 15596          13458          -13.71%
BenchmarkDecode1K/32+32/cauchy-single-inv-32             9237           7938           -14.06%
BenchmarkDecode1K/64+64/cauchy-32                        1401801        1369309        -2.32%
BenchmarkDecode1K/64+64/cauchy-inv-32                    1067298        955657         -10.46%
BenchmarkDecode1K/64+64/cauchy-single-32                 41705          35822          -14.11%
BenchmarkDecode1K/64+64/cauchy-single-inv-32             19323          15517          -19.70%
BenchmarkDecode1K/128+128/cauchy-32                      7225691        7160856        -0.90%
BenchmarkDecode1K/128+128/cauchy-inv-32                  4159486        3818808        -8.19%
BenchmarkDecode1K/128+128/cauchy-single-32               124440         109424         -12.07%
BenchmarkDecode1K/128+128/cauchy-single-inv-32           36524          31136          -14.75%
BenchmarkEncode10x2x10000-32                             50440          42083          -16.57%
BenchmarkEncode100x20x10000-32                           1801528        1106006        -38.61%
BenchmarkEncode17x3x1M-32                                1288167        1001932        -22.22%
BenchmarkEncode10x4x16M-32                               13472620       10327386       -23.35%
BenchmarkEncode5x2x1M-32                                 310911         185120         -40.46%
BenchmarkEncode10x2x1M-32                                589206         410960         -30.25%
BenchmarkEncode10x4x1M-32                                1015838        866408         -14.71%
BenchmarkEncode50x20x1M-32                               22557354       19183256       -14.96%
BenchmarkEncode17x3x16M-32                               16104608       13499597       -16.18%
BenchmarkEncode_8x4x8M-32                                5519244        3932924        -28.74%
BenchmarkEncode_12x4x12M-32                              12346553       9079752        -26.46%
BenchmarkEncode_16x4x16M-32                              21485559       17850644       -16.92%
BenchmarkEncode_16x4x32M-32                              42595970       35871009       -15.79%
BenchmarkEncode_16x4x64M-32                              82783715       69571462       -15.96%
BenchmarkEncode_8x5x8M-32                                7074967        5185001        -26.71%
BenchmarkEncode_8x6x8M-32                                8509701        6225369        -26.84%
BenchmarkEncode_8x7x8M-32                                9515548        7190973        -24.43%
BenchmarkEncode_8x9x8M-32                                12594933       10125813       -19.60%
BenchmarkEncode_8x10x8M-32                               13865411       11618479       -16.21%
BenchmarkEncode_8x11x8M-32                               14907551       13268491       -10.99%
BenchmarkEncode_8x8x05M-32                               969947         684114         -29.47%
BenchmarkEncode_8x8x1M-32                                1626195        1284539        -21.01%
BenchmarkEncode_8x8x8M-32                                11010901       8424452        -23.49%
BenchmarkEncode_8x8x32M-32                               42695493       32035889       -24.97%
BenchmarkEncode_24x8x24M-32                              96542217       79041486       -18.13%
BenchmarkEncode_24x8x48M-32                              188859267      155747871      -17.53%
BenchmarkVerify800x200/64-32                             135567         109423         -19.28%
BenchmarkVerify800x200/256-32                            481634         347095         -27.93%
BenchmarkVerify800x200/1024-32                           1869884        1322751        -29.26%
BenchmarkVerify800x200/4096-32                           7491368        5123890        -31.60%
BenchmarkVerify800x200/16384-32                          29535987       20752858       -29.74%
BenchmarkVerify800x200/65536-32                          125985833      85297623       -32.30%
BenchmarkVerify800x200/262144-32                         508379267      355721767      -30.03%
BenchmarkVerify800x200/1048576-32                        2068388300     1449036800     -29.94%
BenchmarkVerify10x2x10000-32                             53561          42944          -19.82%
BenchmarkVerify50x5x100000-32                            907225         783960         -13.59%
BenchmarkVerify10x2x1M-32                                640749         536002         -16.35%
BenchmarkVerify5x2x1M-32                                 396362         304214         -23.25%
BenchmarkVerify10x4x1M-32                                1212746        1091746        -9.98%
BenchmarkVerify50x20x1M-32                               24401410       20788040       -14.81%
BenchmarkVerify10x4x16M-32                               17572573       14192901       -19.23%
BenchmarkReconstruct10x2x10000-32                        35819          40261          +12.40%
BenchmarkReconstruct800x200/64-32                        910172         847324         -6.91%
BenchmarkReconstruct800x200/256-32                       1950439        1606723        -17.62%
BenchmarkReconstruct800x200/1024-32                      5781554        4567305        -21.00%
BenchmarkReconstruct800x200/4096-32                      21559582       16228687       -24.73%
BenchmarkReconstruct800x200/16384-32                     84484767       61475983       -27.23%
BenchmarkReconstruct800x200/65536-32                     352869233      257830060      -26.93%
BenchmarkReconstruct800x200/262144-32                    1505845200     1137826500     -24.44%
BenchmarkReconstruct800x200/1048576-32                   6090028900     4613878500     -24.24%
BenchmarkReconstruct50x5x50000-32                        689686         465919         -32.44%
BenchmarkReconstruct10x2x1M-32                           514134         376113         -26.85%
BenchmarkReconstruct5x2x1M-32                            309217         163760         -47.04%
BenchmarkReconstruct10x4x1M-32                           751008         634086         -15.57%
BenchmarkReconstruct50x20x1M-32                          11040695       8815629        -20.15%
BenchmarkReconstruct10x4x16M-32                          8192061        7168362        -12.50%
BenchmarkReconstructData10x2x10000-32                    37946          38289          +0.90%
BenchmarkReconstructData800x200/64-32                    981698         846979         -13.72%
BenchmarkReconstructData800x200/256-32                   1962800        1593880        -18.80%
BenchmarkReconstructData800x200/1024-32                  5818608        4503410        -22.60%
BenchmarkReconstructData800x200/4096-32                  22954233       15814243       -31.11%
BenchmarkReconstructData800x200/16384-32                 88836892       63010456       -29.07%
BenchmarkReconstructData800x200/65536-32                 384783267      246166020      -36.02%
BenchmarkReconstructData800x200/262144-32                1265891200     1127835600     -10.91%
BenchmarkReconstructData800x200/1048576-32               5463190100     4630147200     -15.25%
BenchmarkReconstructData50x5x50000-32                    667775         465496         -30.29%
BenchmarkReconstructData10x2x1M-32                       513945         369514         -28.10%
BenchmarkReconstructData5x2x1M-32                        282178         151657         -46.25%
BenchmarkReconstructData10x4x1M-32                       724144         593335         -18.06%
BenchmarkReconstructData50x20x1M-32                      11100261       9621467        -13.32%
BenchmarkReconstructData10x4x16M-32                      8161166        6423180        -21.30%
BenchmarkReconstructP10x2x10000-32                       2975           2759           -7.26%
BenchmarkReconstructP10x5x20000-32                       10975          11836          +7.85%
BenchmarkSplit10x4x160M-32                               2171627        2003794        -7.73%
BenchmarkSplit5x2x5M-32                                  68340          62255          -8.90%
BenchmarkSplit10x2x1M-32                                 11724          10878          -7.22%
BenchmarkSplit10x4x10M-32                                127757         126308         -1.13%
BenchmarkSplit50x20x50M-32                               631860         606853         -3.96%
BenchmarkSplit17x3x272M-32                               1603306        1504980        -6.13%
BenchmarkParallel_8x8x64K-32                             81963          62322          -23.96%
BenchmarkParallel_8x8x05M-32                             657356         493039         -25.00%
BenchmarkParallel_20x10x05M-32                           2035771        1701756        -16.41%
BenchmarkParallel_8x8x1M-32                              1312086        989650         -24.57%
BenchmarkParallel_8x8x8M-32                              12395383       8855818        -28.56%
BenchmarkParallel_8x8x32M-32                             145119543      46900712       -67.68%
BenchmarkParallel_8x3x1M-32                              488483         346840         -29.00%
BenchmarkParallel_8x4x1M-32                              652976         466834         -28.51%
BenchmarkParallel_8x5x1M-32                              847350         599749         -29.22%
BenchmarkStreamEncode10x2x10000-32                       53168          44803          -15.73%
BenchmarkStreamEncode100x20x10000-32                     1967911        1206515        -38.69%
BenchmarkStreamEncode17x3x1M-32                          1825471        1604931        -12.08%
BenchmarkStreamEncode10x4x16M-32                         20421472       17869211       -12.50%
BenchmarkStreamEncode5x2x1M-32                           512446         434971         -15.12%
BenchmarkStreamEncode10x2x1M-32                          931308         847344         -9.02%
BenchmarkStreamEncode10x4x1M-32                          1385173        1276586        -7.84%
BenchmarkStreamEncode50x20x1M-32                         24626658       22235222       -9.71%
BenchmarkStreamEncode17x3x16M-32                         28870364       25197182       -12.72%
BenchmarkStreamVerify10x2x10000-32                       59623          46435          -22.12%
BenchmarkStreamVerify50x5x50000-32                       1349335        1316518        -2.43%
BenchmarkStreamVerify10x2x1M-32                          1037307        934257         -9.93%
BenchmarkStreamVerify5x2x1M-32                           649599         551300         -15.13%
BenchmarkStreamVerify10x4x1M-32                          1694152        1497405        -11.61%
BenchmarkStreamVerify50x20x1M-32                         26931902       23227298       -13.76%
BenchmarkStreamVerify10x4x16M-32                         25068808       20562626       -17.98%

benchmark                                                old MB/s      new MB/s      speedup
BenchmarkGalois128K-32                                   5284.82       8252.31       1.56x
BenchmarkGalois1M-32                                     5189.86       8475.99       1.63x
BenchmarkGaloisXor128K-32                                4496.17       7191.61       1.60x
BenchmarkGaloisXor1M-32                                  4542.83       7076.83       1.56x
BenchmarkEncode2x1x1M-32                                 125747.73     124529.14     0.99x
BenchmarkEncode800x200/64-32                             501.31        620.99        1.24x
BenchmarkEncode800x200/256-32                            543.63        722.34        1.33x
BenchmarkEncode800x200/1024-32                           557.86        743.64        1.33x
BenchmarkEncode800x200/4096-32                           551.76        781.87        1.42x
BenchmarkEncode800x200/16384-32                          532.02        763.68        1.44x
BenchmarkEncode800x200/65536-32                          542.89        676.23        1.25x
BenchmarkEncode800x200/262144-32                         534.93        711.89        1.33x
BenchmarkEncode800x200/1048576-32                        521.20        676.56        1.30x
BenchmarkEncode1K/4+4/cauchy-32                          1673.81       2963.98       1.77x
BenchmarkEncode1K/8+8/cauchy-32                          929.79        1433.96       1.54x
BenchmarkEncode1K/16+16/cauchy-32                        494.84        561.59        1.13x
BenchmarkEncode1K/32+32/cauchy-32                        254.54        273.41        1.07x
BenchmarkEncode1K/64+64/cauchy-32                        125.90        135.42        1.08x
BenchmarkEncode1K/128+128/cauchy-32                      64.48         62.73         0.97x
BenchmarkDecode1K/4+4/cauchy-32                          1381.93       2137.44       1.55x
BenchmarkDecode1K/4+4/cauchy-inv-32                      1399.72       2349.22       1.68x
BenchmarkDecode1K/4+4/cauchy-single-32                   4575.70       6869.25       1.50x
BenchmarkDecode1K/4+4/cauchy-single-inv-32               5657.40       9062.93       1.60x
BenchmarkDecode1K/8+8/cauchy-32                          810.82        1236.21       1.52x
BenchmarkDecode1K/8+8/cauchy-inv-32                      859.53        1365.73       1.59x
BenchmarkDecode1K/8+8/cauchy-single-32                   4577.48       7275.14       1.59x
BenchmarkDecode1K/8+8/cauchy-single-inv-32               6279.80       10205.55      1.63x
BenchmarkDecode1K/16+16/cauchy-32                        274.77        467.49        1.70x
BenchmarkDecode1K/16+16/cauchy-inv-32                    353.26        548.39        1.55x
BenchmarkDecode1K/16+16/cauchy-single-32                 4186.28       5588.65       1.33x
BenchmarkDecode1K/16+16/cauchy-single-inv-32             5358.52       8230.70       1.54x
BenchmarkDecode1K/32+32/cauchy-32                        204.78        222.64        1.09x
BenchmarkDecode1K/32+32/cauchy-inv-32                    235.90        273.91        1.16x
BenchmarkDecode1K/32+32/cauchy-single-32                 4202.16       4869.76       1.16x
BenchmarkDecode1K/32+32/cauchy-single-inv-32             7095.27       8255.89       1.16x
BenchmarkDecode1K/64+64/cauchy-32                        93.50         95.72         1.02x
BenchmarkDecode1K/64+64/cauchy-inv-32                    122.81        137.15        1.12x
BenchmarkDecode1K/64+64/cauchy-single-32                 3142.82       3658.95       1.16x
BenchmarkDecode1K/64+64/cauchy-single-inv-32             6783.14       8446.98       1.25x
BenchmarkDecode1K/128+128/cauchy-32                      36.28         36.61         1.01x
BenchmarkDecode1K/128+128/cauchy-inv-32                  63.02         68.65         1.09x
BenchmarkDecode1K/128+128/cauchy-single-32               2106.58       2395.66       1.14x
BenchmarkDecode1K/128+128/cauchy-single-inv-32           7177.38       8419.32       1.17x
BenchmarkEncode10x2x10000-32                             2379.08       2851.50       1.20x
BenchmarkEncode100x20x10000-32                           666.10        1084.98       1.63x
BenchmarkEncode17x3x1M-32                                16280.13      20931.08      1.29x
BenchmarkEncode10x4x16M-32                               17433.95      22743.51      1.30x
BenchmarkEncode5x2x1M-32                                 23608.18      39650.03      1.68x
BenchmarkEncode10x2x1M-32                                21355.70      30618.31      1.43x
BenchmarkEncode10x4x1M-32                                14451.19      16943.58      1.17x
BenchmarkEncode50x20x1M-32                               3253.94       3826.27       1.18x
BenchmarkEncode17x3x16M-32                               20835.30      24855.88      1.19x
BenchmarkEncode_8x4x8M-32                                18238.60      25595.03      1.40x
BenchmarkEncode_12x4x12M-32                              16306.30      22173.14      1.36x
BenchmarkEncode_16x4x16M-32                              15617.20      18797.32      1.20x
BenchmarkEncode_16x4x32M-32                              15754.74      18708.38      1.19x
BenchmarkEncode_16x4x64M-32                              16213.06      19292.07      1.19x
BenchmarkEncode_8x5x8M-32                                15413.77      21032.18      1.36x
BenchmarkEncode_8x6x8M-32                                13800.78      18864.83      1.37x
BenchmarkEncode_8x7x8M-32                                13223.53      17498.20      1.32x
BenchmarkEncode_8x9x8M-32                                11322.52      14083.45      1.24x
BenchmarkEncode_8x10x8M-32                               10890.04      12996.10      1.19x
BenchmarkEncode_8x11x8M-32                               10691.46      12012.18      1.12x
BenchmarkEncode_8x8x05M-32                               8648.52       12262.00      1.42x
BenchmarkEncode_8x8x1M-32                                10316.85      13060.88      1.27x
BenchmarkEncode_8x8x8M-32                                12189.53      15931.92      1.31x
BenchmarkEncode_8x8x32M-32                               12574.42      16758.42      1.33x
BenchmarkEncode_24x8x24M-32                              8341.49       10188.40      1.22x
BenchmarkEncode_24x8x48M-32                              8528.11       10341.15      1.21x
BenchmarkVerify800x200/64-32                             472.09        584.89        1.24x
BenchmarkVerify800x200/256-32                            531.52        737.55        1.39x
BenchmarkVerify800x200/1024-32                           547.63        774.14        1.41x
BenchmarkVerify800x200/4096-32                           546.76        799.39        1.46x
BenchmarkVerify800x200/16384-32                          554.71        789.48        1.42x
BenchmarkVerify800x200/65536-32                          520.19        768.32        1.48x
BenchmarkVerify800x200/262144-32                         515.65        736.94        1.43x
BenchmarkVerify800x200/1048576-32                        506.95        723.64        1.43x
BenchmarkVerify10x2x10000-32                             2240.42       2794.32       1.25x
BenchmarkVerify50x5x100000-32                            6062.44       7015.67       1.16x
BenchmarkVerify10x2x1M-32                                19637.81      23475.50      1.20x
BenchmarkVerify5x2x1M-32                                 18518.52      24127.82      1.30x
BenchmarkVerify10x4x1M-32                                12104.82      13446.41      1.11x
BenchmarkVerify50x20x1M-32                               3008.04       3530.89       1.17x
BenchmarkVerify10x4x16M-32                               13366.34      16549.19      1.24x
BenchmarkReconstruct10x2x10000-32                        3350.18       2980.57       0.89x
BenchmarkReconstruct800x200/64-32                        70.32         75.53         1.07x
BenchmarkReconstruct800x200/256-32                       131.25        159.33        1.21x
BenchmarkReconstruct800x200/1024-32                      177.12        224.20        1.27x
BenchmarkReconstruct800x200/4096-32                      189.99        252.39        1.33x
BenchmarkReconstruct800x200/16384-32                     193.93        266.51        1.37x
BenchmarkReconstruct800x200/65536-32                     185.72        254.18        1.37x
BenchmarkReconstruct800x200/262144-32                    174.08        230.39        1.32x
BenchmarkReconstruct800x200/1048576-32                   172.18        227.27        1.32x
BenchmarkReconstruct50x5x50000-32                        7974.64       11804.62      1.48x
BenchmarkReconstruct10x2x1M-32                           24473.99      33455.17      1.37x
BenchmarkReconstruct5x2x1M-32                            23737.47      44822.02      1.89x
BenchmarkReconstruct10x4x1M-32                           19547.15      23151.54      1.18x
BenchmarkReconstruct50x20x1M-32                          6648.16       8326.16       1.25x
BenchmarkReconstruct10x4x16M-32                          28671.79      32766.35      1.14x
BenchmarkReconstructData10x2x10000-32                    3162.37       3134.05       0.99x
BenchmarkReconstructData800x200/64-32                    65.19         75.56         1.16x
BenchmarkReconstructData800x200/256-32                   130.43        160.61        1.23x
BenchmarkReconstructData800x200/1024-32                  175.99        227.38        1.29x
BenchmarkReconstructData800x200/4096-32                  178.44        259.01        1.45x
BenchmarkReconstructData800x200/16384-32                 184.43        260.02        1.41x
BenchmarkReconstructData800x200/65536-32                 170.32        266.23        1.56x
BenchmarkReconstructData800x200/262144-32                207.08        232.43        1.12x
BenchmarkReconstructData800x200/1048576-32               191.93        226.47        1.18x
BenchmarkReconstructData50x5x50000-32                    8236.31       11815.36      1.43x
BenchmarkReconstructData10x2x1M-32                       24482.98      34052.56      1.39x
BenchmarkReconstructData5x2x1M-32                        26012.05      48398.94      1.86x
BenchmarkReconstructData10x4x1M-32                       20272.31      24741.62      1.22x
BenchmarkReconstructData50x20x1M-32                      6612.49       7628.81       1.15x
BenchmarkReconstructData10x4x16M-32                      28780.33      36567.72      1.27x
BenchmarkReconstructP10x2x10000-32                       40334.81      43493.13      1.08x
BenchmarkReconstructP10x5x20000-32                       27335.36      25347.36      0.93x
BenchmarkParallel_8x8x64K-32                             12793.36      16825.03      1.32x
BenchmarkParallel_8x8x05M-32                             12761.14      17014.08      1.33x
BenchmarkParallel_20x10x05M-32                           7726.13       9242.59       1.20x
BenchmarkParallel_8x8x1M-32                              12786.67      16952.67      1.33x
BenchmarkParallel_8x8x8M-32                              10828.04      15155.88      1.40x
BenchmarkParallel_8x8x32M-32                             3699.51       11446.97      3.09x
BenchmarkParallel_8x3x1M-32                              23612.54      33255.54      1.41x
BenchmarkParallel_8x4x1M-32                              19270.09      26953.70      1.40x
BenchmarkParallel_8x5x1M-32                              16087.19      22728.65      1.41x
BenchmarkStreamEncode10x2x10000-32                       1880.84       2231.98       1.19x
BenchmarkStreamEncode100x20x10000-32                     508.15        828.83        1.63x
BenchmarkStreamEncode17x3x1M-32                          9765.04       11106.89      1.14x
BenchmarkStreamEncode10x4x16M-32                         8215.48       9388.90       1.14x
BenchmarkStreamEncode5x2x1M-32                           10231.09      12053.39      1.18x
BenchmarkStreamEncode10x2x1M-32                          11259.17      12374.86      1.10x
BenchmarkStreamEncode10x4x1M-32                          7570.00       8213.91       1.09x
BenchmarkStreamEncode50x20x1M-32                         2128.95       2357.92       1.11x
BenchmarkStreamEncode17x3x16M-32                         9879.08       11319.23      1.15x
BenchmarkStreamVerify10x2x10000-32                       1677.21       2153.54       1.28x
BenchmarkStreamVerify50x5x50000-32                       3705.53       3797.90       1.02x
BenchmarkStreamVerify10x2x1M-32                          10108.64      11223.64      1.11x
BenchmarkStreamVerify5x2x1M-32                           8070.95       9510.04       1.18x
BenchmarkStreamVerify10x4x1M-32                          6189.39       7002.62       1.13x
BenchmarkStreamVerify50x20x1M-32                         1946.72       2257.21       1.16x
BenchmarkStreamVerify10x4x16M-32                         6692.47       8159.08       1.22x

Disabled when shard count gets high as speed degenerates.

Summary by CodeRabbit

New Features
- Benchmark tool: optional CPU, memory, and trace profiling via CLI flags that write profiles to specified paths.
- CLI flag to enable an alternative fast single-parity encoding path.
Performance
- Improved non-SIMD/non-assembly fallbacks with wider chunk processing for higher throughput.
- Runtime switch to choose a different encoding strategy for large shard counts.
Chores
- Build constraints and low-level memory access helpers updated to support new runtime paths.

coderabbitai · 2025-08-26T15:15:12Z

Walkthrough

Adds benchmark profiling flags and a -fast1 option; introduces deterministic RNG prefill for benchmarks; adds generic unsafe/safe byte load/store helpers with updated build tags; implements 8-byte/16-bit table-based Galois multiply/XOR paths and a runtime skip2B switch; adds an internal generic integer constraint indexer. No exported API changes.

Changes

Cohort / File(s)	Summary
Benchmark profiling & fast1 `benchmark/main.go`	Adds `-cpu.profile`, `-mem.profile`, `-trace.profile` flags and CPU/heap/trace start/stop hooks; adds `-fast1` flag wired to `reedsolomon.WithFastOneParityMatrix()`; uses deterministic RNG prefill for data shards.
Galois 8-byte/16-bit table paths `galois_amd64.go`, `galois_nopshufb_amd64.go`, `galois_noasm.go`	Introduces 8-byte per-iteration processing using a 16-bit multiplication table (`mt16`) in `galMulSlice` and `galMulSliceXor`; adds non‑SIMD fallback and replaces prior 4-byte unrolled loops with 8-byte paths while preserving byte-tail fallbacks.
Options and runtime gate `options.go`, `reedsolomon.go`	Adds `skip2B bool` to `options`; `New(...)` sets `r.o.skip2B` for large shard counts (dataShards+parityShards >= 20). Adds internal `indexer` generic constraint for integral types.
Unsafe/safe generic load/store helpers & build tags `unsafe.go`, `unsafe_disabled.go`	Updates build tags (removes `noasm` requirement), introduces `unsafeEnabled` consts, and adds generic `load/store` helpers for 64/16/8-bit accesses implemented with `unsafe` or `encoding/binary` depending on build.
Misc (no exported API changes) `...`	Internal performance and correctness changes only; exported signatures unchanged.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant User
  participant Bench as benchmark/main
  participant Prof as pprof/trace
  participant RS as reedsolomon

  User->>Bench: run with flags (-cpu.profile, -mem.profile, -trace.profile, -fast1, ...)
  alt cpu.profile set
    Bench->>Prof: StartCPUProfile(file)
    note right of Prof #DDEEFF: deferred StopCPUProfile
  end
  alt trace.profile set
    Bench->>Prof: trace.Start(file)
    note right of Prof #DDEEFF: deferred trace.Stop
  end
  Bench->>RS: getOptions(..., include WithFastOneParityMatrix if -fast1)
  loop benchmark
    Bench->>RS: Encode/Decode shards (prefilled via deterministic RNG)
    RS-->>Bench: results
  end
  alt mem.profile set
    Bench->>Prof: WriteHeapProfile on exit
  end

sequenceDiagram
  autonumber
  participant Caller
  participant Galois as galMulSlice/galMulSliceXor
  participant Table as mt / mt16
  participant SIMD

  Caller->>Galois: request multiply/XOR
  alt AVX2 or SSSE3 available
    Galois->>SIMD: use SIMD path
    SIMD-->>Galois: write outputs
  else neither SIMD nor skip2B true
    Galois->>Table: use 16-bit table (mt16) for 8-byte blocks
    Table-->>Galois: produce 16-bit results -> store16 writes
  else skip2B == true
    Galois->>Table: use 8-bit scalar table per byte
    Table-->>Galois: write bytes
  end
  Galois-->>Caller: done (tail bytes handled by byte fallback)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

I hop through bytes with whiskered cheer,
Eight at a time the tables appear.
Profiles hum, traces glow at night,
Fast1 and unsafe tunnels take flight.
Carrot-code crunches, benchmarks delight — 🐇

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.

✨ Finishing Touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch do-double-lookups

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

Visit our Status Page to check the current availability of CodeRabbit.
Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (11)

reedsolomon.go (1)
1692-1695: Generic constraint should be more permissive (+consider uintptr)

Current type set excludes user-defined integer types and uintptr. Since these helpers are used with unsafe.Add (which takes a uintptr), allowing uintptr (and using tilde to accept aliases) improves ergonomics and future-proofs the constraint.

Apply:
-type indexer interface {
-	int | int8 | int16 | int32 | int64 | uint | uint8 | uint16 | uint32 | uint64
-}
+type indexer interface {
+	~int | ~int8 | ~int16 | ~int32 | ~int64 |
+	~uint | ~uint8 | ~uint16 | ~uint32 | ~uint64 |
+	~uintptr
+}
benchmark/main.go (2)
115-122: Close CPU profile file and check StartCPUProfile error

Minor robustness fix: StartCPUProfile returns an error; also make sure the profile file is closed. Defer order should ensure StopCPUProfile runs before Close.
-	if cpuprofile != "" {
-		f, err := os.Create(cpuprofile)
-		if err != nil {
-			log.Fatal(err)
-		}
-		pprof.StartCPUProfile(f)
-		defer pprof.StopCPUProfile()
-	}
+	if cpuprofile != "" {
+		f, err := os.Create(cpuprofile)
+		if err != nil {
+			log.Fatal(err)
+		}
+		if err := pprof.StartCPUProfile(f); err != nil {
+			f.Close()
+			log.Fatal(err)
+		}
+		defer pprof.StopCPUProfile()
+		defer f.Close()
+	}
457-459: Flag wiring for fast1 is good; confirm behavior/documentation

The fast1 flag correctly appends WithFastOneParityMatrix. Consider printing a note in non-CSV mode when fast1 is active to aid reproducibility of benchmark output.
unsafe.go (2)

17-43: AllocAligned implementation looks solid; minor clarity nit

The alignment math is correct. If you want to make intent clearer, name the misalignment variable “misalign” and document that each shard starts at a 64B boundary due to eachAligned stride.

45-50: Unnecessary commented code can be removed

The commented alternatives using binary/unsafe are no longer needed and can be removed to reduce noise.

galois_nopshufb_amd64.go (1)

74-87: XOR variant: same benefits; validate table initialization costs

The XOR path mirrors the non-XOR case well. Given the 32 MB table, consider noting (in code or PR description) the amortization threshold for mt16 creation to guide users on memory/latency trade-offs when assembly isn’t available.
galois_noasm.go (4)
26-29: Tail loop: tiny readability nit — direct indexing is simpler here.

If store8/load8 resolve to plain indexing on pure-Go builds, consider direct assignment for clarity in this simple per-byte tail. If the helpers are needed for build-tag uniformity, feel free to ignore.
- for n, input := range in {
-     store8(out, mt[input], n)
- }
+ for n, input := range in {
+     out[n] = mt[input]
+ }
38-47: XOR 8-byte mt16 path looks solid; please add an in-place test.

Semantics preserve in-place correctness (in==out) since each lane is read (load16(out, off), load16(in, off)) before being written. To guard against future refactors, add a test that exercises in==out for various lengths (including <8, =8, 8k+tail).

I can provide a table-driven unit test covering in==out with random data and multiple c values (including c=0,1, and random) if you’d like.

49-51: Tail XOR: optional simplification for readability.

Same nit as above: if helpers aren’t strictly required here, the direct form is a bit easier to read.
- for n := range in {
-     store8(out, load8(out, n)^mt[load8(in, n)], n)
- }
+ for n := range in {
+     out[n] ^= mt[in[n]]
+ }
15-24: Fast-path c==0 to avoid table work (pure-Go path).

Two small short-circuit wins:

galMulSlice: when c==0, the output is all zeros. Zero the slice and return.

galMulSliceXor: when c==0, out ^= 0 → no-op. Just return.

These save the mt16 fetch and loops on common code paths.

Example patch (outside the shown changed lines):
// At the top of galMulSlice, after the c==1 check.
if c == 0 {
    out = out[:len(in)]
    for i := range out {
        out[i] = 0
    }
    return
}

// At the top of galMulSliceXor, after the c==1 check.
if c == 0 {
    // out ^= 0 → no-op
    return
}
Also applies to: 38-47
galois_amd64.go (1)
139-149: XOR non-SIMD mt16 fallback mirrors noasm path correctly; consider dedup.

Logic matches the noasm variant and remains safe for in-place use. To reduce duplication across files, consider factoring the 8-byte mt16 loops into a small shared helper (guarded by build tags) that both amd64 and noasm variants can call. This keeps behavior in one place.

Example sketch:
//go:build !pshufb || (amd64 && !avx2_ssse3_runtime)
func galMulSliceGo(c byte, in, out []byte) { /* mt16 loop + tail */ }

... in both files ...
if !o.useAVX2 && !o.useSSSE3 {
    galMulSliceGo(c, in, out)
    // then tail or return depending on helper contract
}

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 5e65504 and 5d7e142.

📒 Files selected for processing (8)

benchmark/main.go (5 hunks)
galois.go (2 hunks)
galois_amd64.go (2 hunks)
galois_noasm.go (2 hunks)
galois_nopshufb_amd64.go (2 hunks)
reedsolomon.go (1 hunks)
unsafe.go (3 hunks)
unsafe_disabled.go (3 hunks)

🔥 Files not summarized due to errors (1)

galois.go: Error: Server error: no LLM provider could handle the message

👮 Files not reviewed due to content moderation or server errors (1)

galois.go

🧰 Additional context used

🧬 Code graph analysis (1)

benchmark/main.go (1)

options.go (1)

WithFastOneParityMatrix (258-262)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)

GitHub Check: build (1.25.x, ubuntu-latest)
GitHub Check: build (1.24.x, windows-latest)
GitHub Check: build (1.25.x, windows-latest)
GitHub Check: build (1.25.x, macos-latest)
GitHub Check: build (1.24.x, macos-latest)
GitHub Check: build (1.23.x, ubuntu-latest)
GitHub Check: build (1.24.x, ubuntu-latest)
GitHub Check: build (1.23.x, macos-latest)
GitHub Check: build (1.23.x, windows-latest)
GitHub Check: build-special

🔇 Additional comments (4)

unsafe_disabled.go (1)

11-14: Fallback helpers look correct and endian-consistent

Little-endian loads/stores and direct 8-bit access are straightforward and safe for all architectures. Signatures align with the new indexer constraint.

Also applies to: 29-55

galois_nopshufb_amd64.go (1)

51-65: 8-byte, 16-bit table path is a nice win

Switching to getMulTable16 + 8-byte stride should materially reduce loop overhead on the pure Go path. Tail handling remains correct. LGTM.

galois_noasm.go (1)

15-24: 8-byte mt16 path is correct and alias-safe (in-place ok).

The 4× store16 with prior load16 per 2-byte lane ensures we always read before we overwrite the same lane, so in==out in-place operation remains correct. Chunking by 8 bytes with in/out advancement is clean and bounds-safe.

galois_amd64.go (1)

84-95: Non-SIMD fallback using mt16 is correct; race instrumentation not needed.

Good addition. The pure-Go loop is only used when AVX2/SSSE3 aren’t active, and being Go code, it’s already race-instrumented by the toolchain (unlike the asm paths), so the absence of raceReadSlice/raceWriteSlice here is appropriate. The in/out advancement and 8-byte chunking mirror the noasm implementation.

unsafe.go

Add a 32MB lookup table that allows looking up 2 values at once. Will only be used when no assembly functions are available. Benchmarks: (with no assembly) ``` benchmark old ns/op new ns/op delta BenchmarkGalois128K-32 24802 15883 -35.96% BenchmarkGalois1M-32 202043 123711 -38.77% BenchmarkGaloisXor128K-32 29152 18226 -37.48% BenchmarkGaloisXor1M-32 230820 148170 -35.81% BenchmarkEncode2x1x1M-32 25016 25261 +0.98% BenchmarkEncode800x200/64-32 127666 103062 -19.27% BenchmarkEncode800x200/256-32 470906 354406 -24.74% BenchmarkEncode800x200/1024-32 1835598 1377017 -24.98% BenchmarkEncode800x200/4096-32 7423581 5238705 -29.43% BenchmarkEncode800x200/16384-32 30795690 21454123 -30.33% BenchmarkEncode800x200/65536-32 120715900 96913158 -19.72% BenchmarkEncode800x200/262144-32 490054033 368235100 -24.86% BenchmarkEncode800x200/1048576-32 2011852300 1549867400 -22.96% BenchmarkEncode1K/4+4/cauchy-32 4894 2764 -43.52% BenchmarkEncode1K/8+8/cauchy-32 17621 11426 -35.16% BenchmarkEncode1K/16+16/cauchy-32 66220 58349 -11.89% BenchmarkEncode1K/32+32/cauchy-32 257469 239701 -6.90% BenchmarkEncode1K/64+64/cauchy-32 1041104 967883 -7.03% BenchmarkEncode1K/128+128/cauchy-32 4065196 4178904 +2.80% BenchmarkDecode1K/4+4/cauchy-32 5928 3833 -35.34% BenchmarkDecode1K/4+4/cauchy-inv-32 5853 3487 -40.42% BenchmarkDecode1K/4+4/cauchy-single-32 1790 1193 -33.35% BenchmarkDecode1K/4+4/cauchy-single-inv-32 1448 904 -37.58% BenchmarkDecode1K/8+8/cauchy-32 20207 13253 -34.41% BenchmarkDecode1K/8+8/cauchy-inv-32 19062 11996 -37.07% BenchmarkDecode1K/8+8/cauchy-single-32 3579 2252 -37.08% BenchmarkDecode1K/8+8/cauchy-single-inv-32 2609 1605 -38.48% BenchmarkDecode1K/16+16/cauchy-32 119255 70093 -41.22% BenchmarkDecode1K/16+16/cauchy-inv-32 92758 59753 -35.58% BenchmarkDecode1K/16+16/cauchy-single-32 7827 5863 -25.09% BenchmarkDecode1K/16+16/cauchy-single-inv-32 6115 3981 -34.90% BenchmarkDecode1K/32+32/cauchy-32 320032 294357 -8.02% BenchmarkDecode1K/32+32/cauchy-inv-32 277815 239258 -13.88% BenchmarkDecode1K/32+32/cauchy-single-32 15596 13458 -13.71% BenchmarkDecode1K/32+32/cauchy-single-inv-32 9237 7938 -14.06% BenchmarkDecode1K/64+64/cauchy-32 1401801 1369309 -2.32% BenchmarkDecode1K/64+64/cauchy-inv-32 1067298 955657 -10.46% BenchmarkDecode1K/64+64/cauchy-single-32 41705 35822 -14.11% BenchmarkDecode1K/64+64/cauchy-single-inv-32 19323 15517 -19.70% BenchmarkDecode1K/128+128/cauchy-32 7225691 7160856 -0.90% BenchmarkDecode1K/128+128/cauchy-inv-32 4159486 3818808 -8.19% BenchmarkDecode1K/128+128/cauchy-single-32 124440 109424 -12.07% BenchmarkDecode1K/128+128/cauchy-single-inv-32 36524 31136 -14.75% BenchmarkEncode10x2x10000-32 50440 42083 -16.57% BenchmarkEncode100x20x10000-32 1801528 1106006 -38.61% BenchmarkEncode17x3x1M-32 1288167 1001932 -22.22% BenchmarkEncode10x4x16M-32 13472620 10327386 -23.35% BenchmarkEncode5x2x1M-32 310911 185120 -40.46% BenchmarkEncode10x2x1M-32 589206 410960 -30.25% BenchmarkEncode10x4x1M-32 1015838 866408 -14.71% BenchmarkEncode50x20x1M-32 22557354 19183256 -14.96% BenchmarkEncode17x3x16M-32 16104608 13499597 -16.18% BenchmarkEncode_8x4x8M-32 5519244 3932924 -28.74% BenchmarkEncode_12x4x12M-32 12346553 9079752 -26.46% BenchmarkEncode_16x4x16M-32 21485559 17850644 -16.92% BenchmarkEncode_16x4x32M-32 42595970 35871009 -15.79% BenchmarkEncode_16x4x64M-32 82783715 69571462 -15.96% BenchmarkEncode_8x5x8M-32 7074967 5185001 -26.71% BenchmarkEncode_8x6x8M-32 8509701 6225369 -26.84% BenchmarkEncode_8x7x8M-32 9515548 7190973 -24.43% BenchmarkEncode_8x9x8M-32 12594933 10125813 -19.60% BenchmarkEncode_8x10x8M-32 13865411 11618479 -16.21% BenchmarkEncode_8x11x8M-32 14907551 13268491 -10.99% BenchmarkEncode_8x8x05M-32 969947 684114 -29.47% BenchmarkEncode_8x8x1M-32 1626195 1284539 -21.01% BenchmarkEncode_8x8x8M-32 11010901 8424452 -23.49% BenchmarkEncode_8x8x32M-32 42695493 32035889 -24.97% BenchmarkEncode_24x8x24M-32 96542217 79041486 -18.13% BenchmarkEncode_24x8x48M-32 188859267 155747871 -17.53% BenchmarkVerify800x200/64-32 135567 109423 -19.28% BenchmarkVerify800x200/256-32 481634 347095 -27.93% BenchmarkVerify800x200/1024-32 1869884 1322751 -29.26% BenchmarkVerify800x200/4096-32 7491368 5123890 -31.60% BenchmarkVerify800x200/16384-32 29535987 20752858 -29.74% BenchmarkVerify800x200/65536-32 125985833 85297623 -32.30% BenchmarkVerify800x200/262144-32 508379267 355721767 -30.03% BenchmarkVerify800x200/1048576-32 2068388300 1449036800 -29.94% BenchmarkVerify10x2x10000-32 53561 42944 -19.82% BenchmarkVerify50x5x100000-32 907225 783960 -13.59% BenchmarkVerify10x2x1M-32 640749 536002 -16.35% BenchmarkVerify5x2x1M-32 396362 304214 -23.25% BenchmarkVerify10x4x1M-32 1212746 1091746 -9.98% BenchmarkVerify50x20x1M-32 24401410 20788040 -14.81% BenchmarkVerify10x4x16M-32 17572573 14192901 -19.23% BenchmarkReconstruct10x2x10000-32 35819 40261 +12.40% BenchmarkReconstruct800x200/64-32 910172 847324 -6.91% BenchmarkReconstruct800x200/256-32 1950439 1606723 -17.62% BenchmarkReconstruct800x200/1024-32 5781554 4567305 -21.00% BenchmarkReconstruct800x200/4096-32 21559582 16228687 -24.73% BenchmarkReconstruct800x200/16384-32 84484767 61475983 -27.23% BenchmarkReconstruct800x200/65536-32 352869233 257830060 -26.93% BenchmarkReconstruct800x200/262144-32 1505845200 1137826500 -24.44% BenchmarkReconstruct800x200/1048576-32 6090028900 4613878500 -24.24% BenchmarkReconstruct50x5x50000-32 689686 465919 -32.44% BenchmarkReconstruct10x2x1M-32 514134 376113 -26.85% BenchmarkReconstruct5x2x1M-32 309217 163760 -47.04% BenchmarkReconstruct10x4x1M-32 751008 634086 -15.57% BenchmarkReconstruct50x20x1M-32 11040695 8815629 -20.15% BenchmarkReconstruct10x4x16M-32 8192061 7168362 -12.50% BenchmarkReconstructData10x2x10000-32 37946 38289 +0.90% BenchmarkReconstructData800x200/64-32 981698 846979 -13.72% BenchmarkReconstructData800x200/256-32 1962800 1593880 -18.80% BenchmarkReconstructData800x200/1024-32 5818608 4503410 -22.60% BenchmarkReconstructData800x200/4096-32 22954233 15814243 -31.11% BenchmarkReconstructData800x200/16384-32 88836892 63010456 -29.07% BenchmarkReconstructData800x200/65536-32 384783267 246166020 -36.02% BenchmarkReconstructData800x200/262144-32 1265891200 1127835600 -10.91% BenchmarkReconstructData800x200/1048576-32 5463190100 4630147200 -15.25% BenchmarkReconstructData50x5x50000-32 667775 465496 -30.29% BenchmarkReconstructData10x2x1M-32 513945 369514 -28.10% BenchmarkReconstructData5x2x1M-32 282178 151657 -46.25% BenchmarkReconstructData10x4x1M-32 724144 593335 -18.06% BenchmarkReconstructData50x20x1M-32 11100261 9621467 -13.32% BenchmarkReconstructData10x4x16M-32 8161166 6423180 -21.30% BenchmarkReconstructP10x2x10000-32 2975 2759 -7.26% BenchmarkReconstructP10x5x20000-32 10975 11836 +7.85% BenchmarkSplit10x4x160M-32 2171627 2003794 -7.73% BenchmarkSplit5x2x5M-32 68340 62255 -8.90% BenchmarkSplit10x2x1M-32 11724 10878 -7.22% BenchmarkSplit10x4x10M-32 127757 126308 -1.13% BenchmarkSplit50x20x50M-32 631860 606853 -3.96% BenchmarkSplit17x3x272M-32 1603306 1504980 -6.13% BenchmarkParallel_8x8x64K-32 81963 62322 -23.96% BenchmarkParallel_8x8x05M-32 657356 493039 -25.00% BenchmarkParallel_20x10x05M-32 2035771 1701756 -16.41% BenchmarkParallel_8x8x1M-32 1312086 989650 -24.57% BenchmarkParallel_8x8x8M-32 12395383 8855818 -28.56% BenchmarkParallel_8x8x32M-32 145119543 46900712 -67.68% BenchmarkParallel_8x3x1M-32 488483 346840 -29.00% BenchmarkParallel_8x4x1M-32 652976 466834 -28.51% BenchmarkParallel_8x5x1M-32 847350 599749 -29.22% BenchmarkStreamEncode10x2x10000-32 53168 44803 -15.73% BenchmarkStreamEncode100x20x10000-32 1967911 1206515 -38.69% BenchmarkStreamEncode17x3x1M-32 1825471 1604931 -12.08% BenchmarkStreamEncode10x4x16M-32 20421472 17869211 -12.50% BenchmarkStreamEncode5x2x1M-32 512446 434971 -15.12% BenchmarkStreamEncode10x2x1M-32 931308 847344 -9.02% BenchmarkStreamEncode10x4x1M-32 1385173 1276586 -7.84% BenchmarkStreamEncode50x20x1M-32 24626658 22235222 -9.71% BenchmarkStreamEncode17x3x16M-32 28870364 25197182 -12.72% BenchmarkStreamVerify10x2x10000-32 59623 46435 -22.12% BenchmarkStreamVerify50x5x50000-32 1349335 1316518 -2.43% BenchmarkStreamVerify10x2x1M-32 1037307 934257 -9.93% BenchmarkStreamVerify5x2x1M-32 649599 551300 -15.13% BenchmarkStreamVerify10x4x1M-32 1694152 1497405 -11.61% BenchmarkStreamVerify50x20x1M-32 26931902 23227298 -13.76% BenchmarkStreamVerify10x4x16M-32 25068808 20562626 -17.98% benchmark old MB/s new MB/s speedup BenchmarkGalois128K-32 5284.82 8252.31 1.56x BenchmarkGalois1M-32 5189.86 8475.99 1.63x BenchmarkGaloisXor128K-32 4496.17 7191.61 1.60x BenchmarkGaloisXor1M-32 4542.83 7076.83 1.56x BenchmarkEncode2x1x1M-32 125747.73 124529.14 0.99x BenchmarkEncode800x200/64-32 501.31 620.99 1.24x BenchmarkEncode800x200/256-32 543.63 722.34 1.33x BenchmarkEncode800x200/1024-32 557.86 743.64 1.33x BenchmarkEncode800x200/4096-32 551.76 781.87 1.42x BenchmarkEncode800x200/16384-32 532.02 763.68 1.44x BenchmarkEncode800x200/65536-32 542.89 676.23 1.25x BenchmarkEncode800x200/262144-32 534.93 711.89 1.33x BenchmarkEncode800x200/1048576-32 521.20 676.56 1.30x BenchmarkEncode1K/4+4/cauchy-32 1673.81 2963.98 1.77x BenchmarkEncode1K/8+8/cauchy-32 929.79 1433.96 1.54x BenchmarkEncode1K/16+16/cauchy-32 494.84 561.59 1.13x BenchmarkEncode1K/32+32/cauchy-32 254.54 273.41 1.07x BenchmarkEncode1K/64+64/cauchy-32 125.90 135.42 1.08x BenchmarkEncode1K/128+128/cauchy-32 64.48 62.73 0.97x BenchmarkDecode1K/4+4/cauchy-32 1381.93 2137.44 1.55x BenchmarkDecode1K/4+4/cauchy-inv-32 1399.72 2349.22 1.68x BenchmarkDecode1K/4+4/cauchy-single-32 4575.70 6869.25 1.50x BenchmarkDecode1K/4+4/cauchy-single-inv-32 5657.40 9062.93 1.60x BenchmarkDecode1K/8+8/cauchy-32 810.82 1236.21 1.52x BenchmarkDecode1K/8+8/cauchy-inv-32 859.53 1365.73 1.59x BenchmarkDecode1K/8+8/cauchy-single-32 4577.48 7275.14 1.59x BenchmarkDecode1K/8+8/cauchy-single-inv-32 6279.80 10205.55 1.63x BenchmarkDecode1K/16+16/cauchy-32 274.77 467.49 1.70x BenchmarkDecode1K/16+16/cauchy-inv-32 353.26 548.39 1.55x BenchmarkDecode1K/16+16/cauchy-single-32 4186.28 5588.65 1.33x BenchmarkDecode1K/16+16/cauchy-single-inv-32 5358.52 8230.70 1.54x BenchmarkDecode1K/32+32/cauchy-32 204.78 222.64 1.09x BenchmarkDecode1K/32+32/cauchy-inv-32 235.90 273.91 1.16x BenchmarkDecode1K/32+32/cauchy-single-32 4202.16 4869.76 1.16x BenchmarkDecode1K/32+32/cauchy-single-inv-32 7095.27 8255.89 1.16x BenchmarkDecode1K/64+64/cauchy-32 93.50 95.72 1.02x BenchmarkDecode1K/64+64/cauchy-inv-32 122.81 137.15 1.12x BenchmarkDecode1K/64+64/cauchy-single-32 3142.82 3658.95 1.16x BenchmarkDecode1K/64+64/cauchy-single-inv-32 6783.14 8446.98 1.25x BenchmarkDecode1K/128+128/cauchy-32 36.28 36.61 1.01x BenchmarkDecode1K/128+128/cauchy-inv-32 63.02 68.65 1.09x BenchmarkDecode1K/128+128/cauchy-single-32 2106.58 2395.66 1.14x BenchmarkDecode1K/128+128/cauchy-single-inv-32 7177.38 8419.32 1.17x BenchmarkEncode10x2x10000-32 2379.08 2851.50 1.20x BenchmarkEncode100x20x10000-32 666.10 1084.98 1.63x BenchmarkEncode17x3x1M-32 16280.13 20931.08 1.29x BenchmarkEncode10x4x16M-32 17433.95 22743.51 1.30x BenchmarkEncode5x2x1M-32 23608.18 39650.03 1.68x BenchmarkEncode10x2x1M-32 21355.70 30618.31 1.43x BenchmarkEncode10x4x1M-32 14451.19 16943.58 1.17x BenchmarkEncode50x20x1M-32 3253.94 3826.27 1.18x BenchmarkEncode17x3x16M-32 20835.30 24855.88 1.19x BenchmarkEncode_8x4x8M-32 18238.60 25595.03 1.40x BenchmarkEncode_12x4x12M-32 16306.30 22173.14 1.36x BenchmarkEncode_16x4x16M-32 15617.20 18797.32 1.20x BenchmarkEncode_16x4x32M-32 15754.74 18708.38 1.19x BenchmarkEncode_16x4x64M-32 16213.06 19292.07 1.19x BenchmarkEncode_8x5x8M-32 15413.77 21032.18 1.36x BenchmarkEncode_8x6x8M-32 13800.78 18864.83 1.37x BenchmarkEncode_8x7x8M-32 13223.53 17498.20 1.32x BenchmarkEncode_8x9x8M-32 11322.52 14083.45 1.24x BenchmarkEncode_8x10x8M-32 10890.04 12996.10 1.19x BenchmarkEncode_8x11x8M-32 10691.46 12012.18 1.12x BenchmarkEncode_8x8x05M-32 8648.52 12262.00 1.42x BenchmarkEncode_8x8x1M-32 10316.85 13060.88 1.27x BenchmarkEncode_8x8x8M-32 12189.53 15931.92 1.31x BenchmarkEncode_8x8x32M-32 12574.42 16758.42 1.33x BenchmarkEncode_24x8x24M-32 8341.49 10188.40 1.22x BenchmarkEncode_24x8x48M-32 8528.11 10341.15 1.21x BenchmarkVerify800x200/64-32 472.09 584.89 1.24x BenchmarkVerify800x200/256-32 531.52 737.55 1.39x BenchmarkVerify800x200/1024-32 547.63 774.14 1.41x BenchmarkVerify800x200/4096-32 546.76 799.39 1.46x BenchmarkVerify800x200/16384-32 554.71 789.48 1.42x BenchmarkVerify800x200/65536-32 520.19 768.32 1.48x BenchmarkVerify800x200/262144-32 515.65 736.94 1.43x BenchmarkVerify800x200/1048576-32 506.95 723.64 1.43x BenchmarkVerify10x2x10000-32 2240.42 2794.32 1.25x BenchmarkVerify50x5x100000-32 6062.44 7015.67 1.16x BenchmarkVerify10x2x1M-32 19637.81 23475.50 1.20x BenchmarkVerify5x2x1M-32 18518.52 24127.82 1.30x BenchmarkVerify10x4x1M-32 12104.82 13446.41 1.11x BenchmarkVerify50x20x1M-32 3008.04 3530.89 1.17x BenchmarkVerify10x4x16M-32 13366.34 16549.19 1.24x BenchmarkReconstruct10x2x10000-32 3350.18 2980.57 0.89x BenchmarkReconstruct800x200/64-32 70.32 75.53 1.07x BenchmarkReconstruct800x200/256-32 131.25 159.33 1.21x BenchmarkReconstruct800x200/1024-32 177.12 224.20 1.27x BenchmarkReconstruct800x200/4096-32 189.99 252.39 1.33x BenchmarkReconstruct800x200/16384-32 193.93 266.51 1.37x BenchmarkReconstruct800x200/65536-32 185.72 254.18 1.37x BenchmarkReconstruct800x200/262144-32 174.08 230.39 1.32x BenchmarkReconstruct800x200/1048576-32 172.18 227.27 1.32x BenchmarkReconstruct50x5x50000-32 7974.64 11804.62 1.48x BenchmarkReconstruct10x2x1M-32 24473.99 33455.17 1.37x BenchmarkReconstruct5x2x1M-32 23737.47 44822.02 1.89x BenchmarkReconstruct10x4x1M-32 19547.15 23151.54 1.18x BenchmarkReconstruct50x20x1M-32 6648.16 8326.16 1.25x BenchmarkReconstruct10x4x16M-32 28671.79 32766.35 1.14x BenchmarkReconstructData10x2x10000-32 3162.37 3134.05 0.99x BenchmarkReconstructData800x200/64-32 65.19 75.56 1.16x BenchmarkReconstructData800x200/256-32 130.43 160.61 1.23x BenchmarkReconstructData800x200/1024-32 175.99 227.38 1.29x BenchmarkReconstructData800x200/4096-32 178.44 259.01 1.45x BenchmarkReconstructData800x200/16384-32 184.43 260.02 1.41x BenchmarkReconstructData800x200/65536-32 170.32 266.23 1.56x BenchmarkReconstructData800x200/262144-32 207.08 232.43 1.12x BenchmarkReconstructData800x200/1048576-32 191.93 226.47 1.18x BenchmarkReconstructData50x5x50000-32 8236.31 11815.36 1.43x BenchmarkReconstructData10x2x1M-32 24482.98 34052.56 1.39x BenchmarkReconstructData5x2x1M-32 26012.05 48398.94 1.86x BenchmarkReconstructData10x4x1M-32 20272.31 24741.62 1.22x BenchmarkReconstructData50x20x1M-32 6612.49 7628.81 1.15x BenchmarkReconstructData10x4x16M-32 28780.33 36567.72 1.27x BenchmarkReconstructP10x2x10000-32 40334.81 43493.13 1.08x BenchmarkReconstructP10x5x20000-32 27335.36 25347.36 0.93x BenchmarkParallel_8x8x64K-32 12793.36 16825.03 1.32x BenchmarkParallel_8x8x05M-32 12761.14 17014.08 1.33x BenchmarkParallel_20x10x05M-32 7726.13 9242.59 1.20x BenchmarkParallel_8x8x1M-32 12786.67 16952.67 1.33x BenchmarkParallel_8x8x8M-32 10828.04 15155.88 1.40x BenchmarkParallel_8x8x32M-32 3699.51 11446.97 3.09x BenchmarkParallel_8x3x1M-32 23612.54 33255.54 1.41x BenchmarkParallel_8x4x1M-32 19270.09 26953.70 1.40x BenchmarkParallel_8x5x1M-32 16087.19 22728.65 1.41x BenchmarkStreamEncode10x2x10000-32 1880.84 2231.98 1.19x BenchmarkStreamEncode100x20x10000-32 508.15 828.83 1.63x BenchmarkStreamEncode17x3x1M-32 9765.04 11106.89 1.14x BenchmarkStreamEncode10x4x16M-32 8215.48 9388.90 1.14x BenchmarkStreamEncode5x2x1M-32 10231.09 12053.39 1.18x BenchmarkStreamEncode10x2x1M-32 11259.17 12374.86 1.10x BenchmarkStreamEncode10x4x1M-32 7570.00 8213.91 1.09x BenchmarkStreamEncode50x20x1M-32 2128.95 2357.92 1.11x BenchmarkStreamEncode17x3x16M-32 9879.08 11319.23 1.15x BenchmarkStreamVerify10x2x10000-32 1677.21 2153.54 1.28x BenchmarkStreamVerify50x5x50000-32 3705.53 3797.90 1.02x BenchmarkStreamVerify10x2x1M-32 10108.64 11223.64 1.11x BenchmarkStreamVerify5x2x1M-32 8070.95 9510.04 1.18x BenchmarkStreamVerify10x4x1M-32 6189.39 7002.62 1.13x BenchmarkStreamVerify50x20x1M-32 1946.72 2257.21 1.16x BenchmarkStreamVerify10x4x16M-32 6692.47 8159.08 1.22x ```

This reverts commit fd9ea30.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (5)

options.go (1)
28-28: New skip2B flag: confirm default semantics and add a tuning hook (optional).

Default false enables the 2-byte (16-bit) table path in pure-Go. Please confirm this is intended for all shard sizes unless overridden at runtime.

Consider adding an Option to toggle this explicitly for benchmarking/regressions without touching internal code.

Proposed Option for explicit control:
+// WithTwoByteTables enables/disables the 2-byte lookup-table path on pure-Go.
+// Default is enabled (skip2B=false).
+func WithTwoByteTables(enabled bool) Option {
+	return func(o *options) {
+		o.skip2B = !enabled
+	}
+}
If you already gate this at runtime based on shard counts (e.g., large K+M), ignore the Option and consider documenting the threshold near the field.
galois_noasm.go (2)
15-26: 8-byte path looks correct; pre-load to improve alias safety and ILP (optional).

The logic is sound and will significantly cut per-byte overhead. For extra robustness when in/out alias in unusual ways and to improve instruction-level parallelism, pre-load the 4 x 16-bit lookups before any stores.

Apply this within the loop:
-		for len(in) >= 8 {
-			store16(out, mt16[load16(in, 0)], 0)
-			store16(out, mt16[load16(in, 2)], 2)
-			store16(out, mt16[load16(in, 4)], 4)
-			store16(out, mt16[load16(in, 6)], 6)
+		for len(in) >= 8 {
+			v0 := mt16[load16(in, 0)]
+			v1 := mt16[load16(in, 2)]
+			v2 := mt16[load16(in, 4)]
+			v3 := mt16[load16(in, 6)]
+			store16(out, v0, 0)
+			store16(out, v1, 2)
+			store16(out, v2, 4)
+			store16(out, v3, 6)
Minor style nit: using an index loop (i += 8) can avoid repeated slice header adjustments, but slicing is fine and clear.

40-51: XOR 8-byte path correct; consider pre-loading to limit read-after-write concerns (optional).

Semantics match the scalar path. As above, pre-loading out/in values before any stores can slightly improve ILP and makes aliasing unquestionably safe.

Suggested tweak inside the loop:
-		for len(in) >= 8 {
-			store16(out, load16(out, 0)^mt16[load16(in, 0)], 0)
-			store16(out, load16(out, 2)^mt16[load16(in, 2)], 2)
-			store16(out, load16(out, 4)^mt16[load16(in, 4)], 4)
-			store16(out, load16(out, 6)^mt16[load16(in, 6)], 6)
+		for len(in) >= 8 {
+			o0, o1 := load16(out, 0), load16(out, 2)
+			o2, o3 := load16(out, 4), load16(out, 6)
+			v0, v1 := mt16[load16(in, 0)], mt16[load16(in, 2)]
+			v2, v3 := mt16[load16(in, 4)], mt16[load16(in, 6)]
+			store16(out, o0^v0, 0)
+			store16(out, o1^v1, 2)
+			store16(out, o2^v2, 4)
+			store16(out, o3^v3, 6)
benchmark/main.go (2)
43-43: New -fast1 flag: nice toggle for 1-parity optimization.

Note: When codec=xor, WithFastOneParityMatrix() is already included; adding -fast1 is redundant but harmless.

115-142: Close CPU profile file to avoid descriptor leak (ordering matters).

You start CPU profiling but never close the file. While StopCPUProfile flushes data, closing the file is good hygiene. Ensure StopCPUProfile runs before Close.

Apply this diff:
 	if cpuprofile != "" {
 		f, err := os.Create(cpuprofile)
 		if err != nil {
 			log.Fatal(err)
 		}
 		pprof.StartCPUProfile(f)
-		defer pprof.StopCPUProfile()
+		// Ensure Stop runs before closing the file.
+		defer f.Close()
+		defer pprof.StopCPUProfile()
 	}
Trace and heap profile blocks already defer Stop/Write before Close correctly. No change needed there.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 5d7e142 and f35c941.

📒 Files selected for processing (9)

benchmark/main.go (5 hunks)
galois.go (2 hunks)
galois_amd64.go (2 hunks)
galois_noasm.go (2 hunks)
galois_nopshufb_amd64.go (2 hunks)
options.go (1 hunks)
reedsolomon.go (2 hunks)
unsafe.go (3 hunks)
unsafe_disabled.go (3 hunks)

🔥 Files not summarized due to errors (1)

galois.go: Error: Server error: no LLM provider could handle the message

🚧 Files skipped from review as they are similar to previous changes (5)

reedsolomon.go
galois_nopshufb_amd64.go
galois_amd64.go
unsafe_disabled.go
unsafe.go

👮 Files not reviewed due to content moderation or server errors (1)

galois.go

🧰 Additional context used

🧬 Code graph analysis (1)

benchmark/main.go (1)

options.go (1)

WithFastOneParityMatrix (259-263)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)

GitHub Check: build (1.25.x, macos-latest)
GitHub Check: build (1.24.x, macos-latest)
GitHub Check: build (1.25.x, ubuntu-latest)
GitHub Check: build (1.23.x, windows-latest)
GitHub Check: build (1.24.x, windows-latest)
GitHub Check: build (1.25.x, windows-latest)
GitHub Check: build (1.24.x, ubuntu-latest)
GitHub Check: build (1.23.x, ubuntu-latest)
GitHub Check: build (1.23.x, macos-latest)
GitHub Check: build-special

🔇 Additional comments (6)

galois_noasm.go (2)

28-31: Tail write with store8: LGTM.

Replacing the per-byte assignment with store8 maintains semantics and consistency with the unsafe helpers. No concerns.

53-55: Tail XOR store with load8/store8: LGTM.

The change preserves behavior and aligns with the helper API. Looks good.

benchmark/main.go (4)

9-16: Profiling imports: good addition.

The added imports are used correctly below. No issues.

52-52: Grouping profiling vars: LGTM.

Clear and conventional naming.

71-75: Flag plumbing for profiles: LGTM.

Flag names are sensible and discoverable.

457-459: Hooking fast1 into getOptions: LGTM.

Option application is idempotent; works across codecs. No concerns.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (5)

benchmark/main.go (5)
9-10: Tidy imports: you can drop io/log after the refactors below

If you switch the prefill to rng.Read and unify error handling to use exitErr, the io and log imports become unused. Safe to remove.
-	"io"
-	"log"
 	"math"
 	"math/rand"
 	"os"
 	"runtime"
 	"runtime/debug"
-	"runtime/pprof"
-	"runtime/trace"
+	"runtime/pprof"
+	"runtime/trace"
Also applies to: 16-17

44-44: Clarify -fast1 flag help to set user expectations

Fast-one-parity only applies when m=1. Suggest making that explicit in the help to avoid confusion.
-	fast1      = flag.Bool("fast1", false, "Use fast 1 parity encoder (if available)")
+	fast1      = flag.Bool("fast1", false, "Use fast 1 parity encoder (only when m=1; ignored otherwise)")
116-143: Profile lifecycle polish: close files, handle Start errors, and GC before heap profile

CPU: currently the file handle isn’t explicitly closed; close it after Stop to avoid descriptor leaks and ensure flush.

CPU/Trace: check Start errors and close the file on failure.

Mem: call runtime.GC() before writing the heap profile so it reflects up-to-date stats.

Consistency: prefer exitErr over log.Fatal to match the rest of the tool’s error handling.
 	if cpuprofile != "" {
-		f, err := os.Create(cpuprofile)
-		if err != nil {
-			log.Fatal(err)
-		}
-		pprof.StartCPUProfile(f)
-		defer pprof.StopCPUProfile()
+		f, err := os.Create(cpuprofile)
+		if err != nil {
+			exitErr(err)
+		}
+		if err := pprof.StartCPUProfile(f); err != nil {
+			_ = f.Close()
+			exitErr(err)
+		}
+		defer func() {
+			pprof.StopCPUProfile()
+			_ = f.Close()
+		}()
 	}
 	if memprofile != "" {
-		f, err := os.Create(memprofile)
-		if err != nil {
-			log.Fatal(err)
-		}
-		defer f.Close()
-		defer pprof.WriteHeapProfile(f)
+		f, err := os.Create(memprofile)
+		if err != nil {
+			exitErr(err)
+		}
+		defer func() { _ = f.Close() }()
+		defer func() {
+			// Ensure the profile reflects up-to-date heap stats.
+			runtime.GC()
+			_ = pprof.WriteHeapProfile(f)
+		}()
 	}
 	if traceprofile != "" {
-		f, err := os.Create(traceprofile)
-		if err != nil {
-			log.Fatal(err)
-		}
-		defer f.Close()
-		err = trace.Start(f)
-		if err != nil {
-			log.Fatal(err)
-		}
-		defer trace.Stop()
+		f, err := os.Create(traceprofile)
+		if err != nil {
+			exitErr(err)
+		}
+		if err := trace.Start(f); err != nil {
+			_ = f.Close()
+			exitErr(err)
+		}
+		defer func() {
+			trace.Stop()
+			_ = f.Close()
+		}()
 	}
150-154: Simplify prefill: use rng.Read and drop io.ReadFull

math/rand.Rand.Read always fills the buffer and returns a nil error. You can simplify the loop and, if you adopt this, remove the io import.
-		for j := range data[i] {
-			// We fill with random data, otherwise lookups may not use the full tables
-			_, err := io.ReadFull(rng, data[i][j])
-			exitErr(err)
-		}
+		for j := range data[i] {
+			// We fill with random data, otherwise lookups may not use the full tables
+			_, _ = rng.Read(data[i][j])
+		}
464-466: Gate -fast1 to m=1 and warn when ignored

The option is no-op for m>1 in reedsolomon.New, but making it explicit avoids user confusion. Optional: print a warning when ignored.
-	if *fast1 {
-		o = append(o, reedsolomon.WithFastOneParityMatrix())
-	}
+	if *fast1 && *mShards == 1 {
+		o = append(o, reedsolomon.WithFastOneParityMatrix())
+	} else if *fast1 && *mShards != 1 {
+		fmt.Fprintln(os.Stderr, "WARN: -fast1 only applies when -m=1; ignoring")
+	}

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 0e8a15e and d6b94fa.

📒 Files selected for processing (1)

benchmark/main.go (5 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

benchmark/main.go (4)

reedsolomon.go (1)

New (413-582)

unsafe.go (1)

AllocAligned (19-43)

unsafe_disabled.go (1)

AllocAligned (17-27)

options.go (1)

WithFastOneParityMatrix (259-263)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)

GitHub Check: build (1.23.x, macos-latest)
GitHub Check: build (1.25.x, ubuntu-latest)
GitHub Check: build (1.25.x, windows-latest)
GitHub Check: build (1.23.x, windows-latest)
GitHub Check: build (1.24.x, macos-latest)
GitHub Check: build (1.25.x, macos-latest)
GitHub Check: build (1.24.x, windows-latest)
GitHub Check: build (1.24.x, ubuntu-latest)
GitHub Check: build (1.23.x, ubuntu-latest)
GitHub Check: build-special

🔇 Additional comments (3)

benchmark/main.go (3)

53-53: LGTM: profile flag variables

Declaring cpuprofile, memprofile, and traceprofile as package-level strings is fine for flag wiring.

73-76: LGTM: profiling flags are wired correctly

Flags parse into the declared vars and are optional. Names are consistent with common conventions (cpu.profile, mem.profile, trace.profile).

147-147: LGTM: deterministic RNG for reproducible benches

Seeding with 0 is a good call for reproducibility across runs and platforms.

coderabbitai bot reviewed Aug 26, 2025

View reviewed changes

unsafe.go Show resolved Hide resolved

klauspost added 2 commits August 26, 2025 17:50

Cast to unsafe.IntegerType(i)

fd9ea30

klauspost force-pushed the do-double-lookups branch from 5d7e142 to fd9ea30 Compare August 26, 2025 15:53

Revert "Cast to unsafe.IntegerType(i)"

f35c941

This reverts commit fd9ea30.

coderabbitai bot reviewed Aug 26, 2025

View reviewed changes

Also skip when no pshufb

0e8a15e

klauspost changed the title ~~Improve pure Go speed~~ Improve pure Go speed in low shard count Aug 26, 2025

Fill data segment in benchmarks to use all of the lookup tables.

d6b94fa

coderabbitai bot reviewed Aug 26, 2025

View reviewed changes

klauspost merged commit 4916c9c into master Aug 28, 2025
14 checks passed

klauspost deleted the do-double-lookups branch August 28, 2025 07:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve pure Go speed in low shard count #306

Improve pure Go speed in low shard count #306

Uh oh!

klauspost commented Aug 26, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Aug 26, 2025 •

edited

Loading

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Status, Documentation and Community

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Improve pure Go speed in low shard count #306

Improve pure Go speed in low shard count #306

Uh oh!

Conversation

klauspost commented Aug 26, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Aug 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Status, Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

klauspost commented Aug 26, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Aug 26, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)