Skip to content

Conversation

samhatfield
Copy link
Collaborator

Draft PR demonstrating potential method to pre-plan FFTs, taking the planning out of the hot loop.

@samhatfield
Copy link
Collaborator Author

Tested on LUMI-G.

develop branch:

ectrans_benchmark initialisation, on 8 tasks, took      5.43 sec

Running for 100 iterations with 3 extra warm-up iterations

time step      1 took******** | zspvor max err= 0.238E-06 | zspdiv max err= 0.238E-06 | zspsc3a max err= 0.238E-06 | zspsc2 max err= 0.238E-06
time step      2 took  1.4104 | zspvor max err= 0.477E-06 | zspdiv max err= 0.477E-06 | zspsc3a max err= 0.477E-06 | zspsc2 max err= 0.477E-06
time step      3 took  1.2918 | zspvor max err= 0.775E-06 | zspdiv max err= 0.775E-06 | zspsc3a max err= 0.715E-06 | zspsc2 max err= 0.715E-06
time step      4 took  1.2331 | zspvor max err= 0.101E-05 | zspdiv max err= 0.101E-05 | zspsc3a max err= 0.954E-06 | zspsc2 max err= 0.954E-06
time step      5 took  1.2333 | zspvor max err= 0.101E-05 | zspdiv max err= 0.101E-05 | zspsc3a max err= 0.107E-05 | zspsc2 max err= 0.107E-05
time step      6 took  1.2330 | zspvor max err= 0.119E-05 | zspdiv max err= 0.119E-05 | zspsc3a max err= 0.131E-05 | zspsc2 max err= 0.131E-05
time step      7 took  1.2335 | zspvor max err= 0.137E-05 | zspdiv max err= 0.137E-05 | zspsc3a max err= 0.131E-05 | zspsc2 max err= 0.131E-05
time step      8 took  1.2333 | zspvor max err= 0.137E-05 | zspdiv max err= 0.137E-05 | zspsc3a max err= 0.143E-05 | zspsc2 max err= 0.143E-05
time step      9 took  1.2336 | zspvor max err= 0.143E-05 | zspdiv max err= 0.143E-05 | zspsc3a max err= 0.191E-05 | zspsc2 max err= 0.191E-05
time step     10 took  1.2331 | zspvor max err= 0.137E-05 | zspdiv max err= 0.137E-05 | zspsc3a max err= 0.226E-05 | zspsc2 max err= 0.226E-05

preplan_fft branch:

ectrans_benchmark initialisation, on 8 tasks, took   1335.98 sec

Running for 100 iterations with 3 extra warm-up iterations

time step      1 took 43.9510 | zspvor max err= 0.238E-06 | zspdiv max err= 0.238E-06 | zspsc3a max err= 0.238E-06 | zspsc2 max err= 0.238E-06
time step      2 took  1.2594 | zspvor max err= 0.477E-06 | zspdiv max err= 0.477E-06 | zspsc3a max err= 0.477E-06 | zspsc2 max err= 0.477E-06
time step      3 took  1.2524 | zspvor max err= 0.775E-06 | zspdiv max err= 0.775E-06 | zspsc3a max err= 0.715E-06 | zspsc2 max err= 0.715E-06
time step      4 took  1.2518 | zspvor max err= 0.101E-05 | zspdiv max err= 0.101E-05 | zspsc3a max err= 0.954E-06 | zspsc2 max err= 0.954E-06
time step      5 took  1.2517 | zspvor max err= 0.101E-05 | zspdiv max err= 0.101E-05 | zspsc3a max err= 0.107E-05 | zspsc2 max err= 0.107E-05
time step      6 took  1.2513 | zspvor max err= 0.119E-05 | zspdiv max err= 0.119E-05 | zspsc3a max err= 0.131E-05 | zspsc2 max err= 0.131E-05
time step      7 took  1.2518 | zspvor max err= 0.137E-05 | zspdiv max err= 0.137E-05 | zspsc3a max err= 0.131E-05 | zspsc2 max err= 0.131E-05
time step      8 took  1.2517 | zspvor max err= 0.137E-05 | zspdiv max err= 0.137E-05 | zspsc3a max err= 0.143E-05 | zspsc2 max err= 0.143E-05
time step      9 took  1.2518 | zspvor max err= 0.143E-05 | zspdiv max err= 0.143E-05 | zspsc3a max err= 0.191E-05 | zspsc2 max err= 0.191E-05
time step     10 took  1.2515 | zspvor max err= 0.137E-05 | zspdiv max err= 0.137E-05 | zspsc3a max err= 0.226E-05 | zspsc2 max err= 0.226E-05

Something is still taking a lot of time in the first step, but it's dramatically faster than before (most of the FFT initialisation cost has been shifted to the actual intialisation stage of SETUP_TRANS).

Pinging @PaulMullowney for his interest.

@PaulMullowney
Copy link
Contributor

Thanks @samhatfield I will pull this branch and start playing around with it.

@samhatfield samhatfield changed the title Pre-plan FFTs [DRAFT] Pre-plan FFTs Aug 6, 2025
@samhatfield
Copy link
Collaborator Author

Branch not intended for merging so I will close this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants