Back to performance benchmark lists
F90OPTFLAGS = -O3 -xhost
More than two processes are required
Time stepping adjustment routine are implemented
name | |
---|---|
# of Cores | Number of used CPU cores |
# of Processes | Number of MPI processes |
# of Threads | Number of threads for each process |
$l_{max}$ | Truncation lavel for spherical harmonincs |
$(N_{r},N_{\theta},N_{\phi})$ | Nuber of grids in spherical coordinate |
Elapsed | Elapsed (wall clock time) for one time step |
Nonlinear | Elapsed (wall clock time) for nonlinear terms (including communications) |
Solver | Elapsed (wall clock time) for linear calculation |
CFL | Elapsed (wall clock time) for CFL condition check |
Efficiency | Parallel efficiency |
SUs | Service unit for $10^{4}$ time steps (Core hours) |
$l_{max}$ | $(N_{r},N_{\theta},N_{\phi})$ | Elapsed | Nonlinear | Solver | SUs |
---|---|---|---|---|---|
47 | ( 72,72,144) | 1.32979 | 0.510249 | 0.819544 | 46.7169 |
$l_{max}$ | $(N_{r},N_{\theta},N_{\phi})$ |
---|---|
47 | (72,72,144) |
# of Cores | # of Processes | # of SMP | Elapsed | Nonlinear | Solver | Efficiency | SUs |
---|---|---|---|---|---|---|---|
2 | 1 | 1 | 1.32979 | 0.510249 | 0.819544 | 1 | 46.7169 |
4 | 3 | 1 | 0.720192 | 0.2817 | 0.438491 | 0.923222 | 32.7459 |
8 | 7 | 1 | 0.375957 | 0.155108 | 0.220847 | 0.884273 | 29.8152 |
16 | 15 | 1 | 0.248631 | 0.104707 | 0.143922 | 0.668557 | 35.9047 |
25 | 24 | 1 | 0.158468 | 0.0812083 | 0.0772578 | 0.671326 | 35.9047 |
$l_{max}$ | $(N_{r},N_{\theta},N_{\phi})$ |
---|---|
63 | (124,96,192) |
# of Cores | # of Processes | # of SMP | Elapsed | Nonlinear | Solver | Efficiency | SUs |
---|---|---|---|---|---|---|---|
4 | 3 | 1 | 4.49747 | 0.942879 | 3.55459 | 1.23093 | 199.888 |
8 | 7 | 1 | 2.2319 | 0.59711 | 1.63479 | 1.24022 | 99.1957 |
16 | 15 | 1 | 1.38402 | 0.407052 | 0.97697 | 1 | 61.5122 |
32 | 31 | 1 | 0.765525 | 0.245143 | 0.52038 | 0.903971 | 68.0467 |
64 | 63 | 1 | 0.500593 | 0.152387 | 0.348203 | 0.691192 | 88.9943 |
126 | 125 | 1 | 0.300067 | 0.123445 | 0.176621 | 0.585699 | 106.69 |
$l_{max}$ | $(N_{r},N_{\theta},N_{\phi})$ |
---|---|
63 | (128,96,192) |
# of Cores | # of Processes | # of SMP | Elapsed | Nonlinear | Solver | Efficiency | SUs |
---|---|---|---|---|---|---|---|
4 | 3 | 1 | 4.33759 | 0.982332 | 3.35526 | 1.40591 | 192.782 |
8 | 7 | 1 | 2.38804 | 0.611198 | 1.77684 | 1.27684 | 106.135 |
16 | 15 | 1 | 1.52457 | 0.473646 | 1.05092 | 1 | 67.7585 |
32 | 31 | 1 | 0.839238 | 0.257673 | 0.581563 | 0.908304 | 74.599 |
64 | 63 | 1 | 0.5472 | 0.179547 | 0.367651 | 0.696531 | 97.28 |
128 | 127 | 1 | 0.335634 | 0.148279 | 0.187353 | 0.567794 | 119.336 |
Elapsed (wall clock) time for the strong scaling. Ideal scaling is plotted by dotted line.
# of Cores | # of Processes | # of SMP | $l_{max}$ | $(N_{r},N_{\theta},N_{\phi})$ | Elapsed | Nonlinear | Solver | SUs |
---|---|---|---|---|---|---|---|---|
3 | 2 | 1 | 15 | (124,24,48) | 0.26251 | 0.0325999 | 0.229909 | 11.6671 |
9 | 8 | 1 | 31 | (124,48,96) | 0.383009 | 0.066082 | 0.316925 | 17.0226 |
32 | 31 | 1 | 63 | (124,96,192) | 0.765525 | 0.245143 | 0.52038 | 68.0467 |
125 | 124 | 1 | 127 | (124,192,384) | 1.55669 | 0.793595 | 0.763088 | 553.488 |
Elapsed time for the weak scaling in the horizontal resolution.
# of Cores | # of Processes | # of SMP | $l_{max}$ | $(N_{r},N_{\theta},N_{\phi})$ | Elapsed | Nonlinear | Solver | SUs |
---|---|---|---|---|---|---|---|---|
16 | 15 | 1 | 63 | (30,96,192) | 0.164901 | 0.117507 | 0.0473918 | 7.32893 |
32 | 31 | 1 | 63 | (62,96,192) | 0.238942 | 0.138128 | 0.100811 | 21.2393 |
64 | 63 | 1 | 63 | (124,96,192) | 0.500593 | 0.152387 | 0.348203 | 88.9943 |
127 | 126 | 1 | 63 | (248,96,192) | 1.19851 | 0.235588 | 0.962918 | 426.137 |
Elapsed time for the weak scaling in the radial resolution. Ideal scaling for linear solver after LU decomposition is plotted by dotted line.