Back to performance benchmark lists
compile options
F90OPTFLAGS = -r8 -i4 -ftz -IPF_fma -IPF_fltacc -WB -O3 -xhosts
Definition of columns
name | |
# of Cores | Number of used CPU cores |
# of Processes | Number of MPI processes |
# of Threads | Number of threads for each process |
$N_{c}$ | Truncation lavel for Chebyshev polynomials |
$l_{max}$ | Truncation lavel for spherical harmonincs |
$(N_{r},N_{\theta},N_{\phi})$ | Nuber of grids in spherical coordinate |
Elapsed | Elapsed (wall clock time) for one time step |
Legendre | Elapsed (wall clock time) for Legendre transform |
Implicit | Elapsed (wall clock time) for linear calculation |
Efficiency | Parallel efficiency |
SUs | Service unit for $10^{4}$ time steps (Core hours) |
Single Processor Result
$N_{c}$ | $l_{max}$ | $(N_{r},N_{\theta},N_{\phi})$ | Elapsed | Legendre | Implicit | LUdecomp | SUs |
71 | 47 | (73,72,144) | 0.96659 | 0.57970 | 0.13313 | 0.010014 | 2.6849 |
Strong Scaling Results
$N_{c}$ | $l_{max}$ | $(N_{r},N_{\theta},N_{\phi})$ |
191 | 255 | (193,384,768) |
# of Cores | # of Processes | # of SMP | Elapsed | Legendre | Implicit | Efficiency | SUs |
16 | 16 | 1 | 7.8559 | 3.1132 | 1.0993 | 1.0 | 349.151 |
32 | 32 | 1 | 4.4581 | 1.5484 | 0.67073 | 0.881082 | 396.276 |
64 | 64 | 1 | 3.4032 | 0.77098 | 0.68621 | 0.577097 | 605.013 |
128 | 128 | 1 | 1.0696 | 0.37643 | 0.14921 | 0.918089 | 380.302 |
$N_{c}$ | $l_{max}$ | $(N_{r},N_{\theta},N_{\phi})$ |
255 | 511 | (257,768,1536) |
# of Cores | # of Processes | # of SMP | Elapsed | Legendre | Implicit | Efficiency | SUs |
64 | 64 | 1 | 13.018 | 4.7327 | 1.9132 | 1.0 | 414.015 |
128 | 128 | 1 | 8.7973 | 2.3534 | 1.7398 | 0.555322 | 745.541 |
256 | 256 | 1 | 8.678 | 1.1378 | 4.3325 | 0.412058 | 1004.75 |
Elapsed (wall clock) time for the strong scaling. Number of OpenMP threads are shown by the numbers. Ideal scaling is plotted by dotted line.
Parallel Efficiency for the strong scaling.
Weak Scaling Results
# of Cores | # of Processes | # of SMP | $N_{c}$ | $l_{max}$ | $(N_{r},N_{\theta},N_{\phi})$ | Elapsed | Legendre | Implicit | SUs |
2 | 2 | 1 | 255 | 63 | (257,96,192) | 3.3257 | 1.8103 | 0.47442 | 147.809 |
8 | 8 | 1 | 255 | 127 | (257,192,384) | 4.0801 | 1.8754 | 0.51211 | 181.338 |
32 | 32 | 1 | 255 | 255 | (257,384,768) | 5.8172 | 2.0489 | 0.8497 | 517.084 |
128 | 128 | 1 | 255 | 511 | (257,768,1536) | 9.5023 | 2.3534 | 1.7398 | 3378.6 |
| | | | | | | | | |
3 | 3 | 1 | 255 | 63 | (257,96,192) | 2.3095 | 1.1811 | 0.3029 | 102.644 |
9 | 9 | 1 | 255 | 127 | (257,192,384) | 3.5408 | 1.7574 | 0.46589 | 157.369 |
33 | 33 | 1 | 255 | 255 | (257,384,768) | 5.528 | 1.7904 | 0.77925 | 737.067 |
129 | 129 | 1 | 255 | 511 | (257,768,1536) | 8.9802 | 1.9042 | 1.7902 | 3592.1 |
Elapsed time for the weak scaling in the horizontal resolution. Scaling of $O(Ncore^{1/2})$ is plotted by dotted line.
# of Cores | # of Processes | # of SMP | $N_{c}$ | $l_{max}$ | $(N_{r},N_{\theta},N_{\phi})$ | Elapsed | Legendre | Implicit | SUs |
16 | 16 | 1 | 31 | 511 | (33,768,1536) | 5.1154 | 2.5566 | 0.49331 | 227.351 |
32 | 32 | 1 | 63 | 511 | (65,768,1536) | 6.0620 | 2.4221 | 0.76595 | 538.844 |
64 | 64 | 1 | 127 | 511 | (129,768,1536) | 6.9065 | 2.3967 | 1.1075 | 1227.82 |
128 | 128 | 1 | 255 | 511 | (257,768,1536) | 9.5023 | 2.3534 | 1.7398 | 3378.6 |
| | | | | | | | | |
17 | 17 | 1 | 31 | 511 | (33,768,1536) | 4.2943 | 2.4266 | 0.40772 | 381.716 |
33 | 33 | 1 | 63 | 511 | (65,768,1536) | 5.6936 | 2.2617 | 0.67252 | 759.147 |
65 | 65 | 1 | 127 | 511 | (129,768,1536) | 6.7351 | 2.1191 | 1.0571 | 1496.69 |
129 | 129 | 1 | 255 | 511 | (257,768,1536) | 8.9802 | 1.9042 | 1.7902 | 3592.08 |
Elapsed time for the weak scaling in the radial resolution. Scaling of $O(Ncore)$ is plotted by dotted line.
Back to performance benchmark lists
files