Back to performance benchmark lists
F90OPTFLAGS = -r8 -i4 -ftz -IPF_fma -IPF_fltacc -WB -O2
name | |
---|---|
# of Cores | Number of used CPU cores |
# of Processes | Number of MPI processes |
# of Threads | Number of threads for each process |
$N_{c}$ | Truncation lavel for Chebyshev polynomials |
$l_{max}$ | Truncation lavel for spherical harmonincs |
$(N_{r},N_{\theta},N_{\phi})$ | Nuber of grids in spherical coordinate |
Elapsed | Elapsed (wall clock time) for one time step |
Nonlinear | Elapsed (wall clock time) for nonlinear terms (including communications) |
Solver | Elapsed (wall clock time) for linear calculation |
Comm. | Elapsed (wall clock time) for data communication |
Efficiency | Parallel efficiency |
SUs | Service unit for $10^{4}$ time steps (Core hours) |
$N_{c}$ | $l_{max}$ | $(N_{r},N_{\theta},N_{\phi})$ | Elapsed | Nonlinear | Solver | Comm. | SU |
---|---|---|---|---|---|---|---|
47 | 42 | (48,64,129) | 0.460322 | 0.378743 | 0.0565741 | 0.005765 |
$N_{c}$ | $l_{max}$ | $(N_{r},N_{\theta},N_{\phi})$ |
---|---|---|
128 | 128 | (129,192,385) |
# of Cores | # of Processes | # of SMP | Elapsed | Nonlinear | Solver | Comm. | Efficiency | SUs |
---|---|---|---|---|---|---|---|---|
4 | 4 | 1 | 13.6562 | 12.3256 | 1.19259 | 8.13115 | 1 | 151.736 |
8 | 8 | 1 | 5.27784 | 4.71791 | 0.559926 | 1.8018 | 1.29373 | 117.285 |
16 | 16 | 1 | 2.83756 | 2.56503 | 0.240695 | 1.55375 | 1.20317 | 126.114 |
32 | 32 | 1 | 1.41826 | 1.2796 | 0.12139 | 1.18198 | 1.20361 | 126.067 |
64 | 64 | 1 | 2.73806 | 2.66596 | 0.0632037 | 2.87671 | 0.311722 | 486.766 |
128 | 128 | 1 | 6.82779 | 6.79328 | 0.0298972 | 7.1671 | 0.0625029 | 2427.66 |
$N_{c}$ | $l_{max}$ | $(N_{r},N_{\theta},N_{\phi})$ |
---|---|---|
192 | 128 | (193,192,385) |
# of Cores | # of Processes | # of SMP | Elapsed | Nonlinear | Solver | Comm. | Efficiency | SUs |
---|---|---|---|---|---|---|---|---|
4 | 4 | 1 | 21.845 | 19.1889 | 2.41306 | 12.8516 | 1 | 242.723 |
8 | 8 | 1 | 8.69087 | 7.22496 | 1.34667 | 2.81284 | 1.25678 | 193.13 |
16 | 16 | 1 | 4.43835 | 3.84567 | 0.535587 | 2.32362 | 1.23047 | 197.26 |
32 | 32 | 1 | 2.31319 | 2.01689 | 0.266718 | 1.90409 | 1.18046 | 205.617 |
64 | 64 | 1 | 4.14431 | 3.99401 | 0.135566 | 4.28234 | 0.329443 | 736.767 |
128 | 128 | 1 | 10.8246 | 10.7533 | 0.0646424 | 11.3395 | 0.0630654 | 3848.74 |
Elapsed (wall clock) time for the strong scaling for $(N_{c}, l_{max}) = (192, 128)$ case. Number of OpenMP threads are shown by the numbers. Ideal scaling is plotted by dotted line.
Parallel Efficiency for the strong scaling for $(N_{c}, l_{max}) = (192, 128)$ case.
# of Cores | # of Processes | # of SMP | $N_{c}$ | $l_{max}$ | $(N_{r},N_{\theta},N_{\phi})$ | Elapsed | Nonlinear | Solver | Comm. | SUs |
---|---|---|---|---|---|---|---|---|---|---|
32 | 32 | 1 | 192 | 128 | (193,192,385) | 2.31319 | 2.01689 | 0.266718 | 1.90409 | 205.617 |
# of Cores | # of Processes | # of SMP | $N_{c}$ | $l_{max}$ | $(N_{r},N_{\theta},N_{\phi})$ | Elapsed | Nonlinear | Solver | Comm. | SUs |
---|---|---|---|---|---|---|---|---|---|---|
32 | 32 | 1 | 192 | 128 | (193,192,385) | 2.31319 | 2.01689 | 0.266718 | 1.90409 | 205.617 |