Back to performance benchmark lists
F90OPTFLAGS = -r8 -i4 -ftz -IPF_fma -IPF_fltacc -WB -O3 -xhosts
name | |
---|---|
# of Cores | Number of used CPU cores |
# of Processes | Number of MPI processes |
# of Threads | Number of threads for each process |
$N_{c}$ | Truncation lavel for Chebyshev polynomials |
$l_{max}$ | Truncation lavel for spherical harmonincs |
$(N_{r},N_{\theta},N_{\phi})$ | Nuber of grids in spherical coordinate |
Elapsed | Elapsed (wall clock time) for one time step |
Legendre | Elapsed (wall clock time) for Legendre transform |
Implicit | Elapsed (wall clock time) for linear calculation |
Efficiency | Parallel efficiency |
SUs | Service unit for $10^{4}$ time steps (Core hours) |
$N_{c}$ | $l_{max}$ | $(N_{r},N_{\theta},N_{\phi})$ | Elapsed | Legendre | Implicit | LUdecomp | SUs |
---|---|---|---|---|---|---|---|
71 | 47 | (73,72,144) | 0.96659 | 0.57970 | 0.13313 | 0.010014 | 2.6849 |
$N_{c}$ | $l_{max}$ | $(N_{r},N_{\theta},N_{\phi})$ |
---|---|---|
191 | 255 | (193,384,768) |
# of Cores | # of Processes | # of SMP | Elapsed | Legendre | Implicit | Efficiency | SUs |
---|---|---|---|---|---|---|---|
16 | 16 | 1 | 7.8559 | 3.1132 | 1.0993 | 1.0 | 349.151 |
32 | 32 | 1 | 4.4581 | 1.5484 | 0.67073 | 0.881082 | 396.276 |
64 | 64 | 1 | 3.4032 | 0.77098 | 0.68621 | 0.577097 | 605.013 |
128 | 128 | 1 | 1.0696 | 0.37643 | 0.14921 | 0.918089 | 380.302 |
$N_{c}$ | $l_{max}$ | $(N_{r},N_{\theta},N_{\phi})$ |
---|---|---|
255 | 511 | (257,768,1536) |
# of Cores | # of Processes | # of SMP | Elapsed | Legendre | Implicit | Efficiency | SUs |
---|---|---|---|---|---|---|---|
64 | 64 | 1 | 13.018 | 4.7327 | 1.9132 | 1.0 | 414.015 |
128 | 128 | 1 | 8.7973 | 2.3534 | 1.7398 | 0.555322 | 745.541 |
256 | 256 | 1 | 8.678 | 1.1378 | 4.3325 | 0.412058 | 1004.75 |
Elapsed (wall clock) time for the strong scaling. Number of OpenMP threads are shown by the numbers. Ideal scaling is plotted by dotted line.
# of Cores | # of Processes | # of SMP | $N_{c}$ | $l_{max}$ | $(N_{r},N_{\theta},N_{\phi})$ | Elapsed | Legendre | Implicit | SUs |
---|---|---|---|---|---|---|---|---|---|
2 | 2 | 1 | 255 | 63 | (257,96,192) | 3.3257 | 1.8103 | 0.47442 | 147.809 |
8 | 8 | 1 | 255 | 127 | (257,192,384) | 4.0801 | 1.8754 | 0.51211 | 181.338 |
32 | 32 | 1 | 255 | 255 | (257,384,768) | 5.8172 | 2.0489 | 0.8497 | 517.084 |
128 | 128 | 1 | 255 | 511 | (257,768,1536) | 9.5023 | 2.3534 | 1.7398 | 3378.6 |
3 | 3 | 1 | 255 | 63 | (257,96,192) | 2.3095 | 1.1811 | 0.3029 | 102.644 |
9 | 9 | 1 | 255 | 127 | (257,192,384) | 3.5408 | 1.7574 | 0.46589 | 157.369 |
33 | 33 | 1 | 255 | 255 | (257,384,768) | 5.528 | 1.7904 | 0.77925 | 737.067 |
129 | 129 | 1 | 255 | 511 | (257,768,1536) | 8.9802 | 1.9042 | 1.7902 | 3592.1 |
Elapsed time for the weak scaling in the horizontal resolution. Scaling of $O(Ncore^{1/2})$ is plotted by dotted line.
# of Cores | # of Processes | # of SMP | $N_{c}$ | $l_{max}$ | $(N_{r},N_{\theta},N_{\phi})$ | Elapsed | Legendre | Implicit | SUs |
---|---|---|---|---|---|---|---|---|---|
16 | 16 | 1 | 31 | 511 | (33,768,1536) | 5.1154 | 2.5566 | 0.49331 | 227.351 |
32 | 32 | 1 | 63 | 511 | (65,768,1536) | 6.0620 | 2.4221 | 0.76595 | 538.844 |
64 | 64 | 1 | 127 | 511 | (129,768,1536) | 6.9065 | 2.3967 | 1.1075 | 1227.82 |
128 | 128 | 1 | 255 | 511 | (257,768,1536) | 9.5023 | 2.3534 | 1.7398 | 3378.6 |
17 | 17 | 1 | 31 | 511 | (33,768,1536) | 4.2943 | 2.4266 | 0.40772 | 381.716 |
33 | 33 | 1 | 63 | 511 | (65,768,1536) | 5.6936 | 2.2617 | 0.67252 | 759.147 |
65 | 65 | 1 | 127 | 511 | (129,768,1536) | 6.7351 | 2.1191 | 1.0571 | 1496.69 |
129 | 129 | 1 | 255 | 511 | (257,768,1536) | 8.9802 | 1.9042 | 1.7902 | 3592.08 |
Elapsed time for the weak scaling in the radial resolution. Scaling of $O(Ncore)$ is plotted by dotted line.