Back to performance benchmark lists
F90OPTFLAGS = -O3 -warn all -g -xhost -openmp
Nonlinear terms is calculated twice for each step
All process have full matrix for all harmonics degree
LU decomposition is done for full matrix
Time integration is done by a solver for banded matrix
name | |
---|---|
# of Cores | Number of used CPU cores |
# of Processes | Number of MPI processes |
# of Threads | Number of threads for each process |
$l_{max}$ | Truncation lavel for spherical harmonincs |
$(N_{r},N_{\theta},N_{\phi})$ | Nuber of grids in spherical coordinate |
Elapsed | Elapsed (wall clock) time for one time step |
Nonlinear | Elapsed (wall clock) time for nonlinear terms (including communications) |
Solver | Elapsed (wall clock) time for linear calculation |
Comm. | Elapsed (wall clock) time for data communication |
Init. | Elapsed (wall clock) time for initialization |
Efficiency | Parallel efficiency |
SUs | Service unit for $10^{4}$ time steps (Core hours) |
$l_{max}$ | $(N_{r},N_{\theta},N_{\phi})$ | Elapsed | Nonlinear | Solver | Comm. | Init. | SU |
---|---|---|---|---|---|---|---|
47 | ( 73,72,144) | 0.678760 | 0.488277 | 0.190479 | 0.029903 | 2.4152 | 30.1671 |
$l_{max}$ | $(N_{r},N_{\theta},N_{\phi})$ |
---|---|
255 | (256,384,768) |
# of Cores | # of Processes | # of SMP | Elapsed | Nonlinear | Solver | Comm. | Init. | Efficiency | SUs |
---|---|---|---|---|---|---|---|---|---|
64 | 8 | 8 | 6.40703 | 5.89877 | 0.508254 | 1.07863 | 3554.5 | 1 | 1139.03 |
128 | 16 | 8 | 3.54131 | 3.2940 | 0.247309 | 0.890222 | 3552.51 | 0.904612 | 1259.13 |
256 | 32 | 8 | 1.86101 | 1.7352 | 0.125808 | 0.475738 | 3550.63 | 0.860692 | 1323.39 |
1024 | 64 | 8 | 1.04298 | 0.977307 | 0.0656672 | 0.361399 | 3552.23 | 0.383937 | 2966.7 |
Elapsed (wall clock) time for the strong scaling. Ideal scaling is plotted by dotted line.
# of Cores | # of Processes | # of SMP | $l_{max}$ | $(N_{r},N_{\theta},N_{\phi})$ | Elapsed | Nonlinear | Solver | Comm. | Init. | SUs |
---|---|---|---|---|---|---|---|---|---|---|
2 | 1 | 2 | 31 | (256,48,96) | 0.551937 | 0.391925 | 0.160007 | 0.0214999 | 68.4944 | 15.3479 |
8 | 1 | 8 | 63 | (256,96,192) | 1.12191 | 0.939082 | 0.182825 | 0.0476302 | 92.4576 | 67.197 |
32 | 4 | 8 | 127 | (256,192,384) | 1.81109 | 1.62482 | 0.186259 | 0.27049 | 271.017 | 360.352 |
128 | 16 | 8 | 255 | (256,384,768) | 2.62543 | 2.43587 | 0.189552 | 0.624057 | 969.922 | 1490.03 |
512 | 64 | 8 | 511 | (256,768,1536) | 4.65903 | 4.45921 | 0.199815 | 2.01773 | 3545.45 | 8470.44 |
Elapsed time for the weak scaling in the horizontal resolution. Elapsed time for each time step is plotted by black, and initialization time is plotted by red. Scaling of O(Ncore^1/2) is plotted by dotted lines.
# of Cores | # of Processes | # of SMP | $l_{max}$ | $(N_{r},N_{\theta},N_{\phi})$ | Elapsed | Nonlinear | Solver | Comm. | Init. | SUs |
---|---|---|---|---|---|---|---|---|---|---|
96 | 12 | 8 | 383 | (64,576,1152) | 2.76226 | 2.59632 | 0.165937 | 0.509965 | 43.1944 | 736.604 |
192 | 24 | 8 | 383 | (128,576,1152) | 2.81134 | 2.63732 | 0.174011 | 0.613411 | 484.779 | 1499.38 |
384 | 48 | 8 | 383 | (256,576,1152) | 3.63671 | 3.44881 | 0.187897 | 1.27930 | 6423.87 | 3879.16 |
Elapsed time for the weak scaling in the radial resolution. The results with 4 OpenMP threads are shown.