Back to performance benchmark lists
F90FLAGS = -FR -fpp -r8 -O3 -xAVX -shared_intel -I$(MKLROOT)/include -I$(MKLROOT)/include/fftw
name | |
---|---|
# of Cores | Number of used CPU cores |
# of Processes | Number of MPI processes |
# of Threads | Number of threads for each process |
$l_{max}$ | Truncation lavel for spherical harmonincs |
$N_{C}$ | Truncation lavel for Chebyshev polynomials |
$(N_{r},N_{\theta},N_{\phi})$ | Nuber of grids in spherical coordinate |
Elapsed | Elapsed (wall clock time) for one time step |
Nonlinear | Elapsed (wall clock time) for nonlinear terms (including communications) |
Solver | Elapsed (wall clock time) for linear calculation |
Comm. | Elapsed (wall clock time) for data communication |
Efficiency | Parallel efficiency |
SUs | Service unit for $10^{4}$ time steps (Core hours) |
$N_{C}$ | $l_{max}$ | $(N_{r},N_{\theta},N_{\phi})$ | Elapsed | Nonlinear | Solver | Comm. | SU |
---|---|---|---|---|---|---|---|
48 | 47 | ( 73,72,144) | 1.604797 | 1.56274 | 0.042059 | 0.508469 | 71.3243 |
$N_{C}$ | $l_{max}$ | $(N_{r},N_{\theta},N_{\phi})$ |
---|---|---|
255 | (512,384,768) |
# of Cores | # of Processes | # of SMP | Elapsed | Nonlinear | Solver | Comm. | Efficiency | SUs |
---|---|---|---|---|---|---|---|---|
64 | 64 | 1 | 5.9551 | 2.97054 | 0.77414 | 2.21042 | 1 | 1058.68 |
128 | 128 | 1 | 3.12294 | 1.67361 | 0.436457 | 1.01287 | 0.953444 | 1110.38 |
256 | 256 | 1 | 1.49993 | 0.793621 | 0.224334 | 0.481976 | 0.992562 | 1066.62 |
512 | 512 | 1 | 0.901664 | 0.441342 | 0.145236 | 0.315087 | 0.82557 | 1282.37 |
1024 | 1024 | 1 | 0.457387 | 0.219177 | 0.0752801 | 0.16293 | 0.813738 | 1301.01 |
2048 | 2048 | 1 | 0.322376 | 0.136652 | 0.0529017 | 0.132822 | 0.577267 | 1833.96 |
4096 | 4096 | 1 | 0.185785 | 0.0691138 | 0.0287718 | 0.0878993 | 0.500839 | 2113.82 |
Elapsed (wall clock) time for the strong scaling. Number of OpenMP threads are shown by the numbers. Ideal scaling is plotted by dotted line.
Parallel Efficiency for the strong scaling. Number of OpenMP threads are shown by the numbers.
# of Cores | # of Processes | # of SMP | $l_{max}$ | $(N_{r},N_{\theta},N_{\phi})$ | Elapsed | Nonlinear | Solver | Comm. | SUs |
---|---|---|---|---|---|---|---|---|---|
16 | 4 | 4 | 31 | (513,48,96) | 0.345327 | 0.334976 | 0.0103503 | 0.0346996 | 15.3479 |
64 | 16 | 4 | 63 | (513,96,192) | 0.377983 | 0.367511 | 0.0104712 | 0.0701478 | 67.197 |
256 | 64 | 4 | 127 | (513,192,384) | 0.506746 | 0.496344 | 0.0104008 | 0.215548 | 360.352 |
1024 | 256 | 4 | 255 | (513,768,1536) | 0.523838 | 0.513385 | 0.0104525 | 0.175344 | 1490.03 |
4096 | 1024 | 4 | 511 | (513,768,1536) | 0.744473 | 0.733799 | 0.010673 | 0.386788 | 8470.44 |
Elapsed time for the weak scaling in the horizontal resolution. The results with 4 OpenMP threads are shown. An ideal scaling for Legendre transform is plotted by dotted line.
# of Cores | # of Processes | # of SMP | $l_{max}$ | $(N_{r},N_{\theta},N_{\phi})$ | Elapsed | Nonlinear | Solver | Comm. | SUs |
---|---|---|---|---|---|---|---|---|---|
128 | 32 | 4 | 255 | (33,384,768) | 0.253204 | 0.248189 | 0.00501417 | 0.100608 | 90.0281 |
256 | 64 | 4 | 255 | (65,384,768) | 0.261203 | 0.256194 | 0.00500897 | 0.0883585 | 185.744 |
512 | 128 | 4 | 255 | (129,384,768) | 0.266168 | 0.261061 | 0.00510643 | 0.0962178 | 378.549 |
1024 | 256 | 4 | 255 | (257,384,768) | 0.303394 | 0.298234 | 0.00515566 | 0.145423 | 760.043 |
2048 | 512 | 4 | 255 | (513,384,768) | 0.276864 | 0.271508 | 0.00535592 | 0.118606 | 1575.05 |
4096 | 1024 | 4 | 255 | (1025,384,768) | 0.279425 | 0.27406 | 0.00536459 | 0.127257 | 3179.23 |
Elapsed time for the weak scaling in the radial resolution. The results with 4 OpenMP threads are shown.