Back to performance benchmark lists
compile options
F90OPTFLAGS = -r8 -i4 -ftz -IPF_fma -IPF_fltacc -WB -O2
Definition of columns
name | |
# of Cores | Number of used CPU cores |
# of Processes | Number of MPI processes |
# of Threads | Number of threads for each process |
$N_{c}$ | Truncation lavel for Chebyshev polynomials |
$l_{max}$ | Truncation lavel for spherical harmonincs |
$(N_{r},N_{\theta},N_{\phi})$ | Nuber of grids in spherical coordinate |
Elapsed | Elapsed (wall clock time) for one time step |
Nonlinear | Elapsed (wall clock time) for nonlinear terms (including communications) |
Solver | Elapsed (wall clock time) for linear calculation |
Comm. | Elapsed (wall clock time) for data communication |
Efficiency | Parallel efficiency |
SUs | Service unit for $10^{4}$ time steps (Core hours) |
Two Processor Result
$N_{c}$ | $l_{max}$ | $(N_{r},N_{\theta},N_{\phi})$ | Elapsed | Nonlinear | Solver | Comm. | SU |
| | | | | | | |
Strong Scaling Results
$N_{c}$ | $l_{max}$ | $(N_{r},N_{\theta},N_{\phi})$ |
192 | 170 | (193,256,256) |
# of Cores | # of Processes | # of SMP | Elapsed | Nonlinear | Solver | Comm. | Efficiency | SUs |
32 | 32 | 1 | 0.719657 | 0.518729 | 0.200927 | 0.182378 | 1 | 63.9695 |
64 | 64 | 1 | 0.545835 | 0.336805 | 0.209030 | 0.21079 | 0.659226 | 97.0373 |
128 | 128 | 1 | 0.285552 | 0.19960 | 0.085952 | 0.174127 | 0.630058 | 101.529 |
$N_{c}$ | $l_{max}$ | $(N_{r},N_{\theta},N_{\phi})$ |
256 | 341 | (257,512,512) |
# of Cores | # of Processes | # of SMP | Elapsed | Nonlinear | Solver | Comm. | Efficiency | SUs |
86 | 86 | 1 | 2.47102 | 1.88142 | 0.589604 | 0.379399 | 1 | 658.939 |
128 | 128 | 1 | 2.32883 | 1.91045 | 0.418383 | 0.857392 | 0.712896 | 828.03 |
129 | 129 | 1 | 2.09683 | 1.68765 | 0.409184 | 0.663364 | 0.785635 | 838.734 |
256 | 256 | 1 | 1.41293 | 1.13385 | 0.279076 | 0.675334 | 0.58751 | 1004.75 |
257 | 257 | 1 | 1.33908 | 1.06534 | 0.273739 | 0.621596 | 0.61991 | 1011.75 |
Elapsed (wall clock) time for the strong scaling. Number of OpenMP threads are shown by the numbers. Ideal scaling is plotted by dotted line.
Parallel Efficiency for the strong scaling.
Weak Scaling Results
# of Cores | # of Processes | # of SMP | $N_{c}$ | $l_{max}$ | $(N_{r},N_{\theta},N_{\phi})$ | Elapsed | Nonlinear | Solver | Comm. | SUs |
4 | 4 | 1 | 256 | 42 | (257,64,64) | 0.322194 | 0.165516 | 0.156679 | 0.0525075 | 14.3197 |
16 | 16 | 1 | 256 | 85 | (257,128,128) | 0.406717 | 0.204322 | 0.202394 | 0.0716188 | 18.0763 |
64 | 64 | 1 | 256 | 170 | (257,256,256) | 0.743692 | 0.44779 | 0.295902 | 0.312471 | 132.212 |
256 | 256 | 1 | 256 | 341 | (257,512,512) | 1.41293 | 1.13385 | 0.279076 | 0.675334 | 1004.75 |
| | | | | | | | | | |
5 | 5 | 1 | 256 | 42 | (257,64,64) | 0.252394 | 0.127396 | 0.124998 | 0.0395718 | 11.2175 |
17 | 17 | 1 | 256 | 85 | (257,128,128) | 0.359973 | 0.180849 | 0.179124 | 0.0811399 | 31.9976 |
65 | 65 | 1 | 256 | 170 | (257,256,256) | 0.639472 | 0.363557 | 0.275915 | 0.227634 | 142.105 |
257 | 257 | 1 | 256 | 341 | (257,512,512) | 1.33908 | 1.06534 | 0.273739 | 0.621596 | 1011.75 |
Elapsed time for the weak scaling in the horizontal directions. Scaling of $O(Ncore^{1/2})$ (ideal scaling for Legendre transform) is plotted by dotted line.
# of Cores | # of Processes | # of SMP | $N_{c}$ | $l_{max}$ | $(N_{r},N_{\theta},N_{\phi})$ | Elapsed | Nonlinear | Solver | Comm. | SUs |
8 | 8 | 1 | 16 | 341 | (17,512,512) | 1.28171 | 1.20689 | 0.0748194 | 0.372245 | 56.9651 |
16 | 16 | 1 | 32 | 341 | (33,512,512) | 1.52159 | 1.42569 | 0.0958951 | 0.315524 | 67.6261 |
32 | 32 | 1 | 64 | 341 | (65,512,512) | 1.61663 | 1.45842 | 0.158208 | 0.389299 | 143.7 |
64 | 64 | 1 | 128 | 341 | (129,512,512) | 1.75436 | 1.50559 | 0.248772 | 0.453414 | 311.887 |
128 | 128 | 1 | 256 | 341 | (257,512,512) | 2.32883 | 1.91045 | 0.418383 | 0.857392 | 828.03 |
| | | | | | | | | | |
9 | 9 | 1 | 16 | 341 | (17,512,512) | 1.05288 | 0.993848 | 0.0590345 | 0.207514 | 46.7948 |
17 | 17 | 1 | 32 | 341 | (33,512,512) | 1.03108 | 0.94969 | 0.0813889 | 0.194925 | 91.6515 |
33 | 33 | 1 | 64 | 341 | (65,512,512) | 1.25166 | 1.10257 | 0.149091 | 0.218563 | 166.888 |
65 | 65 | 1 | 128 | 341 | (129,512,512) | 1.43519 | 1.19399 | 0.241198 | 0.262446 | 318.931 |
129 | 129 | 1 | 256 | 341 | (257,512,512) | 2.09683 | 1.68765 | 0.409184 | 0.663364 | 838.734 |
Elapsed time for the weak scaling in the radial direction. Scaling of $O(Ncore^{1/2})$ (ideal scaling for Legendre transform) is plotted by dotted line.
Back to performance benchmark lists
files