Back to performance benchmark lists
F90OPTFLAGS = -O3 -warn all -g -xhost -openmp
name | |
---|---|
# of Cores | Number of used CPU cores |
# of Processes | Number of MPI processes |
# of Threads | Number of threads for each process |
$N_{C}$ | Truncation lavel for Chebyshev polynomials |
$l_{max}$ | Truncation lavel for spherical harmonincs |
$(N_{r},N_{\theta},N_{\phi})$ | Nuber of grids in spherical coordinate |
Elapsed | Elapsed (wall clock time) for one time step |
Nonlinear | Elapsed (wall clock time) for nonlinear terms (including communications) |
Solver | Elapsed (wall clock time) for linear calculation |
Comm. | Elapsed (wall clock time) for data communication |
Efficiency | Parallel efficiency |
SUs | Service unit for $10^{4}$ time steps (Core hours) |
$N_{C}$ | $l_{max} $ | $(N_{r},N_{\theta},N_{\phi})$ | Elapsed | Nonlinear | Solver | Comm. | SU |
---|---|---|---|---|---|---|---|
72 | 47 | ( 73,72,144) | 0.353396 | 0.249238 | 0.965605 | 0.00720803 | 15.7065 |
$N_{C}$ | $l_{max}$ | $(N_{r},N_{\theta},N_{\phi})$ |
---|---|---|
192 | 255 | (192,384,768) |
# of Cores | # of Processes | # of SMP | Elapsed | Nonlinear | Solver | Comm. | Efficiency | SUs |
---|---|---|---|---|---|---|---|---|
8 | 2 | 4 | 9.98327 | 6.70605 | 2.44661 | 0.413657 | 1.3892 | 221.85 |
8 | 4 | 2 | 9.21265 | 6.62681 | 2.19377 | 0.230439 | 1.5054 | 204.726 |
16 | 2 | 8 | 7.94273 | 5.06533 | 2.13615 | 0.538236 | 0.873047 | 353.01 |
16 | 4 | 4 | 6.93438 | 4.67925 | 1.80959 | 0.298254 | 1 | 308.195 |
16 | 8 | 2 | 7.12601 | 5.21069 | 1.70288 | 0.245879 | 0.973108 | 316.712 |
32 | 4 | 8 | 3.91464 | 2.45256 | 1.01179 | 0.220971 | 0.885697 | 347.968 |
32 | 8 | 4 | 3.65146 | 2.36565 | 0.96631 | 0.243386 | 0.949535 | 324.574 |
32 | 16 | 2 | 3.76134 | 2.62796 | 0.861258 | 0.165907 | 0.921796 | 334.342 |
64 | 8 | 8 | 2.01431 | 1.22438 | 0.465491 | 0.146756 | 0.860641 | 358.099 |
64 | 16 | 4 | 1.942 | 1.18073 | 0.465075 | 0.188362 | 0.892685 | 345.245 |
64 | 32 | 2 | 2.0975 | 1.33229 | 0.360439 | 0.108833 | 0.826505 | 372.889 |
128 | 16 | 8 | 1.15653 | 0.633651 | 0.241426 | 0.119156 | 0.749479 | 411.212 |
128 | 32 | 4 | 1.0925 | 0.591954 | 0.182997 | 0.0788043 | 0.793407 | 388.445 |
128 | 64 | 2 | 1.46869 | 0.662761 | 0.175857 | 0.0781756 | 0.590182 | 522.202 |
256 | 32 | 8 | 0.713037 | 0.318539 | 0.103698 | 0.0459037 | 0.607821 | 507.049 |
256 | 64 | 4 | 0.799634 | 0.298149 | 0.0985314 | 0.0466445 | 0.541996 | 568.629 |
512 | 64 | 8 | 0.597079 | 0.156069 | 0.0687971 | 0.0383883 | 0.362932 | 849.18 |
Elapsed (wall clock) time for the strong scaling. Number of OpenMP threads are shown by the numbers. Ideal scaling is plotted by dotted line.
Parallel Efficiency for the strong scaling. Fastest result with 16 cores (one node) is chosen for a reference. Number of OpenMP threads are shown by the numbers.
# of Cores | # of Processes | # of SMP | $N_{C}$ | $l_{max}$ | $(N_{r},N_{\theta},N_{\phi})$ | Elapsed | Nonlinear | Solver | Comm. | SUs |
---|---|---|---|---|---|---|---|---|---|---|
4 | 1 | 4 | 256 | 31 | (257,48,96) | 0.239627 | 0.117565 | 0.108877 | 0.01385 | 22.2638 |
16 | 4 | 4 | 256 | 31 | (257,96,192) | 0.265171 | 0.128896 | 0.0825426 | 0.000955493 | 22.2638 |
64 | 16 | 4 | 256 | 63 | (257,192,384) | 0.410822 | 0.223191 | 0.110366 | 0.0318171 | 106.733 |
256 | 64 | 4 | 256 | 127 | (257,384,768) | 1.06998 | 0.398631 | 0.134563 | 0.0538146 | 519.207 |
Elapsed time for the weak scaling in the horizontal resolution. The results with 4 OpenMP threads are shown. Scaling of $O(Ncore^{1/2})$ (ideal scaling for Legendre transform) is plotted by dotted line.
# of Cores | # of Processes | # of SMP | $N_{C}$ | $l_{max}$ | $(N_{r},N_{\theta},N_{\phi})$ | Elapsed | Nonlinear | Solver | Comm. | SUs |
---|---|---|---|---|---|---|---|---|---|---|
32 | 8 | 4 | 32 | 255 | (33,384,768) | 0.525404 | 0.389331 | 0.0906778 | 0.0499037 | 46.7025 |
64 | 16 | 4 | 64 | 255 | (65,384,768) | 0.586558 | 0.396212 | 0.100678 | 0.0558089 | 104.277 |
128 | 32 | 4 | 128 | 255 | (129,384,768) | 0.694737 | 0.396308 | 0.105363 | 0.0500511 | 247.018 |
256 | 64 | 4 | 256 | 255 | (257,384,768) | 1.06998 | 0.398631 | 0.134563 | 0.0538146 | 760.877 |
Elapsed time for the weak scaling in the radial resolution. The results with 4 OpenMP threads are shown. Scaling of $O(Ncore)$ is plotted by dotted line.