Skip to content

LoongArch64: add lsx support #976

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 10, 2025
Merged

LoongArch64: add lsx support #976

merged 1 commit into from
Jul 10, 2025

Conversation

Xiao-Tao
Copy link
Contributor

Binutils(2.41) and GCC(14.1.0) complete LSX and LASX support
More details: https://github.com/loongson/build-tools/wiki
After using LSX, the performance is improved by 5-10%.

Some reference documents for loongarch64 architecture LSX:
LoongArch SX Vector Intrinsics: https://gcc.gnu.org/onlinedocs/gcc/LoongArch-SX-Vector-Intrinsics.html
Unofficial LoongArch Intrinsics Guide: https://jia.je/unofficial-loongarch-intrinsics-guide/

Currently, it has passed all tests in LoongArch architecture 2k2000(only support LSX), 3C5000, 3A6000 environment

LoongArch64 2K2000 machine benchmark test results

main:
Loading /home/zhoumt/ada/build/_deps/url-dataset-src/out.txt
2025-07-10T09:46:19+08:00
Running ./benchmarks/benchdata
Run on (2 X 1400 MHz CPU s)
CPU Caches:
  L1 Instruction 64 KiB (x2)
  L1 Data 64 KiB (x2)
  L2 Unified 2048 KiB (x1)
Load Average: 0.30, 0.20, 0.18
ada spec: Ada follows whatwg/url
bad urls: ---------------------
ada---count of bad URLs       26
servo/url---count of bad URLs 26
whatwg---count of bad URLs    26
-------------------------------

bytes/URL: 86.859205
curl : OMITTED
input bytes: 8688092
number of URLs: 100025
performance counters: Enabled
rust version : 1.82.0
zuri : OMITTED
--------------------------------------------------------------------------------------------
Benchmark                                  Time             CPU   Iterations UserCounters...
--------------------------------------------------------------------------------------------
BasicBench_AdaURL_href             120769756 ns    120679607 ns            6 GHz=1.3967 cycle/byte=19.54 cycles/url=1.69723k instructions/byte=43.5421 instructions/cycle=2.22835 instructions/ns=3.11235 instructions/url=3.78203k ns/url=1.21517k speed=71.993M/s time/byte=13.8902ns time/url=1.20649us url/s=828.848k/s
BasicBench_AdaURL_aggregator_href   95760157 ns     95627690 ns            7 GHz=1.39657 cycle/byte=15.2875 cycles/url=1.32786k instructions/byte=34.3532 instructions/cycle=2.24714 instructions/ns=3.1383 instructions/url=2.98389k ns/url=950.801 speed=90.8533M/s time/byte=11.0068ns time/url=956.038ns url/s=1.04598M/s
BasicBench_AdaURL_CanParse          63078339 ns     63040431 ns           11 GHz=1.39484 cycle/byte=10.1341 cycles/url=880.236 instructions/byte=23.955 instructions/cycle=2.36382 instructions/ns=3.29715 instructions/url=2.08072k ns/url=631.065 speed=137.818M/s time/byte=7.25596ns time/url=630.247ns url/s=1.58668M/s


LSX:
--------------------------------------------------------------------------------------------
Benchmark                                  Time             CPU   Iterations UserCounters...
--------------------------------------------------------------------------------------------
BasicBench_AdaURL_href             107859129 ns    107757937 ns            6 GHz=1.39661 cycle/byte=17.4756 cycles/url=1.51792k instructions/byte=38.1925 instructions/cycle=2.18548 instructions/ns=3.05226 instructions/url=3.31737k ns/url=1.08686k speed=80.626M/s time/byte=12.4029ns time/url=1.07731us url/s=928.238k/s
BasicBench_AdaURL_aggregator_href   83822883 ns     83799530 ns            8 GHz=1.39607 cycle/byte=13.4636 cycles/url=1.16944k instructions/byte=29.004 instructions/cycle=2.15425 instructions/ns=3.00749 instructions/url=2.51926k ns/url=837.663 speed=103.677M/s time/byte=9.64533ns time/url=837.786ns url/s=1.19362M/s
BasicBench_AdaURL_CanParse          51950801 ns     51938152 ns           13 GHz=1.39482 cycle/byte=8.34759 cycles/url=725.065 instructions/byte=18.6054 instructions/cycle=2.22883 instructions/ns=3.10883 instructions/url=1.61605k ns/url=519.825 speed=167.278M/s time/byte=5.97808ns time/url=519.252ns url/s=1.92585M/s

LoongArch64 3C5000 machine benchmark test results

main:

Loading /home/zhoumt/ada/build/_deps/url-dataset-src/out.txt
2025-07-09T20:47:58+08:00
Running ./benchmarks/benchdata
Run on (64 X 2200 MHz CPU s)
CPU Caches:
  L1 Instruction 64 KiB (x64)
  L1 Data 64 KiB (x64)
  L2 Unified 256 KiB (x64)
  L3 Unified 16384 KiB (x4)
Load Average: 0.00, 0.00, 0.00
ada spec: Ada follows whatwg/url
bad urls: ---------------------
ada---count of bad URLs       26
servo/url---count of bad URLs 26
whatwg---count of bad URLs    26
-------------------------------

bytes/URL: 86.859205
curl : OMITTED
input bytes: 8688092
number of URLs: 100025
performance counters: Enabled
rust version : 1.82.0
zuri : OMITTED
--------------------------------------------------------------------------------------------
Benchmark                                  Time             CPU   Iterations UserCounters...
--------------------------------------------------------------------------------------------
BasicBench_AdaURL_href              70237332 ns     70219095 ns           10 GHz=2.19717 cycle/byte=17.9387 cycles/url=1.55814k instructions/byte=43.1562 instructions/cycle=2.40577 instructions/ns=5.28588 instructions/url=3.74852k ns/url=709.157 speed=123.728M/s time/byte=8.08222ns time/url=702.015ns url/s=1.42447M/s
BasicBench_AdaURL_aggregator_href   52536374 ns     52522557 ns           13 GHz=2.19698 cycle/byte=13.2874 cycles/url=1.15414k instructions/byte=34.1956 instructions/cycle=2.57353 instructions/ns=5.65398 instructions/url=2.97021k ns/url=525.33 speed=165.416M/s time/byte=6.04535ns time/url=525.094ns url/s=1.90442M/s
BasicBench_AdaURL_CanParse          33083323 ns     33083240 ns           21 GHz=2.19776 cycle/byte=8.37497 cycles/url=727.443 instructions/byte=23.6957 instructions/cycle=2.82935 instructions/ns=6.21822 instructions/url=2.05819k ns/url=330.993 speed=262.613M/s time/byte=3.80788ns time/url=330.75ns url/s=3.02343M/s


LSX:
--------------------------------------------------------------------------------------------
Benchmark                                  Time             CPU   Iterations UserCounters...
--------------------------------------------------------------------------------------------
BasicBench_AdaURL_href              61362122 ns     61356556 ns           11 GHz=2.19716 cycle/byte=15.6198 cycles/url=1.35673k instructions/byte=37.825 instructions/cycle=2.4216 instructions/ns=5.32066 instructions/url=3.28545k ns/url=617.49 speed=141.6M/s time/byte=7.06214ns time/url=613.412ns url/s=1.63023M/s
BasicBench_AdaURL_aggregator_href   45419574 ns     45418890 ns           15 GHz=2.1973 cycle/byte=11.4936 cycles/url=998.327 instructions/byte=28.8651 instructions/cycle=2.51141 instructions/ns=5.51831 instructions/url=2.5072k ns/url=454.342 speed=191.288M/s time/byte=5.22772ns time/url=454.075ns url/s=2.20228M/s
BasicBench_AdaURL_CanParse          26432103 ns     26432046 ns           26 GHz=2.19764 cycle/byte=6.69288 cycles/url=581.338 instructions/byte=18.3647 instructions/cycle=2.74392 instructions/ns=6.03016 instructions/url=1.59515k ns/url=264.528 speed=328.695M/s time/byte=3.04233ns time/url=264.254ns url/s=3.78423M/s

LoongArch64 3A6000 machine benchmark test results

main:
Loading /home/zhoumt/ada/build/_deps/url-dataset-src/out.txt
2025-07-09T20:28:22+08:00
Running ./benchmarks/benchdata
Run on (8 X 2500 MHz CPU s)
CPU Caches:
  L1 Instruction 64 KiB (x8)
  L1 Data 64 KiB (x8)
  L2 Unified 256 KiB (x8)
  L3 Unified 16384 KiB (x1)
Load Average: 0.25, 0.06, 0.02
ada spec: Ada follows whatwg/url
bad urls: ---------------------
ada---count of bad URLs       26
servo/url---count of bad URLs 26
whatwg---count of bad URLs    26
-------------------------------

bytes/URL: 86.859205
curl : OMITTED
input bytes: 8688092
number of URLs: 100025
performance counters: Enabled
rust version : 1.85.0
zuri : OMITTED
--------------------------------------------------------------------------------------------
Benchmark                                  Time             CPU   Iterations UserCounters...
--------------------------------------------------------------------------------------------
BasicBench_AdaURL_href              47189615 ns     47152658 ns           15 GHz=2.49793 cycle/byte=13.7227 cycles/url=1.19194k instructions/byte=38.2253 instructions/cycle=2.78555 instructions/ns=6.95809 instructions/url=3.32022k ns/url=477.174 speed=184.255M/s time/byte=5.42727ns time/url=471.409ns url/s=2.1213M/s
BasicBench_AdaURL_aggregator_href   30084600 ns     30060977 ns           23 GHz=2.49791 cycle/byte=8.65737 cycles/url=751.972 instructions/byte=27.1609 instructions/cycle=3.13732 instructions/ns=7.83674 instructions/url=2.35918k ns/url=301.041 speed=289.016M/s time/byte=3.46002ns time/url=300.535ns url/s=3.3274M/s
BasicBench_AdaURL_CanParse          21901334 ns     21884383 ns           32 GHz=2.498 cycle/byte=6.29039 cycles/url=546.378 instructions/byte=20.2599 instructions/cycle=3.22077 instructions/ns=8.04546 instructions/url=1.75976k ns/url=218.727 speed=397M/s time/byte=2.51889ns time/url=218.789ns url/s=4.57061M/s


LSX:
--------------------------------------------------------------------------------------------
Benchmark                                  Time             CPU   Iterations UserCounters...
--------------------------------------------------------------------------------------------
BasicBench_AdaURL_href              44818268 ns     44785516 ns           16 GHz=2.49801 cycle/byte=13.0466 cycles/url=1.13322k instructions/byte=35.7832 instructions/cycle=2.74272 instructions/ns=6.85134 instructions/url=3.1081k ns/url=453.648 speed=193.993M/s time/byte=5.15482ns time/url=447.743ns url/s=2.23342M/s
BasicBench_AdaURL_aggregator_href   29356161 ns     29334797 ns           24 GHz=2.49795 cycle/byte=8.44307 cycles/url=733.358 instructions/byte=25.3689 instructions/cycle=3.0047 instructions/ns=7.50558 instructions/url=2.20352k ns/url=293.584 speed=296.17M/s time/byte=3.37644ns time/url=293.275ns url/s=3.40977M/s
BasicBench_AdaURL_CanParse          18897944 ns     18883385 ns           37 GHz=2.49848 cycle/byte=5.43229 cycles/url=471.845 instructions/byte=18.1011 instructions/cycle=3.33212 instructions/ns=8.32526 instructions/url=1.57224k ns/url=188.852 speed=460.092M/s time/byte=2.17348ns time/url=188.787ns url/s=5.29698M/s

@Xiao-Tao Xiao-Tao mentioned this pull request Jul 10, 2025
@anonrig anonrig requested review from lemire and anonrig July 10, 2025 11:08
@lemire
Copy link
Member

lemire commented Jul 10, 2025

@lemire
Copy link
Member

lemire commented Jul 10, 2025

@Xiao-Tao I see that CI was merged, from a separate PR.

Could you sync with our main branch?

@Xiao-Tao
Copy link
Contributor Author

Thank you, CI has been re-triggered and all tests passed. @lemire @anonrig

@anonrig anonrig merged commit dd38444 into ada-url:main Jul 10, 2025
48 checks passed
@anonrig
Copy link
Member

anonrig commented Jul 10, 2025

Thanks for the contribution @Xiao-Tao

@Xiao-Tao Xiao-Tao deleted the lsx-support branch July 11, 2025 00:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants