Back to performance benchmark lists
F90OPTFLAGS = -O3 -warn all -g -xhost -openmp
name | |
---|---|
# of Cores | Number of used CPU cores |
# of Processes | Number of MPI processes |
# of Threads | Number of threads for each process |
$N_{c}$ | Truncation lavel for Chebyshev polynomial |
$l_{max}$ | Truncation lavel for spherical harmonincs |
$(N_{r},N_{\theta},N_{\phi})$ | Nuber of grids in spherical coordinate |
Elapsed time | Elapsed (wall clock time) for one time step |
Efficiency | Parallel efficiency |
SUs | Service unit for $10^{4}$ time steps (Core hours) |
$N_{c}$ | $l_{max}$ | $(N_{r},N_{\theta},N_{\phi})$ | Elapsed time |
---|---|---|---|
47 | 48 | (72,72,144) | 0.35723 |
In the present test, spatial resolution is fixed, and change the parallelization. Elapsed time is inverse proportion to the number of Cores in ideal scaling.
$N_{c}$ | $l_{max}$ | $(N_{r},N_{\theta},N_{\phi})$ |
---|---|---|
47 | 48 | (72,72,144) |
# of Cores | # of Processes | # of Threads | Elapsed time | Efficiency | SUs |
---|---|---|---|---|---|
1 | 1 | 1 | 0.35723 | 1.0 | 15.8769 |
2 | 1 | 2 | 0.18729 | 0.953681 | 8.32400 |
4 | 1 | 4 | 0.10288 | 0.868074 | 4.57244 |
8 | 1 | 8 | 0.05784 | 0.772022 | 2.57067 |
$N_{c}$ | $l_{max}$ | $(N_{r},N_{\theta},N_{\phi})$ |
---|---|---|
97 | 170 | (145,256,512) |
# of Cores | # of Processes | # of Threads | Elapsed time | Efficiency | SUs |
---|---|---|---|---|---|
1 | 1 | 1 | 18.3200 | 1.0 | 814.222 |
2 | 1 | 2 | 8.69336 | 1.05368 | 386.372 |
4 | 1 | 4 | 4.51397 | 1.01463 | 200.621 |
8 | 1 | 8 | 2.53104 | 0.904766 | 112.491 |
16 | 1 | 16 | 2.16658 | 0.528483 | 96.2924 |
Elapsed (wall clock) time for the strong scaling for $(N_{c},l_{max}) = (97,170)$ case. Number of OpenMP threads are shown by the numbers. Ideal scaling is plotted by dotted line.
Parallel Efficiency for the strong scaling for $(N_{c},l_{max}) = (97,170)$ case. Number of OpenMP threads are shown by the numbers.
In the present benchmark, radial resolution is fixed, and horizontal resolution is increased with parallelization.
# of Cores | # of Processes | # of Threads | $N_{c}$ | $l_{max}$ | $(N_{r},N_{\theta},N_{\phi})$ | Elapsed time | SUs |
---|---|---|---|---|---|---|---|
1 | 1 | 1 | 97 | 42 | (145,64,128) | 0.61665 | 27.4067 |
4 | 1 | 4 | 97 | 84 | (145,128,256) | 0.623898 | 27.564 |
16 | 1 | 16 | 97 | 170 | (145,256,512) | 2.16658 | 96.2924 |
Elapsed (wall clock) time for the weak scaling in the horizontal resolutions. Number of OpenMP threads are shown by the numbers.
In the present benchmark, horizontal resolution is fixed, and radial resolution is increased with parallelization.
# of Cores | # of Processes | # of Threads | $N_{c}$ | $l_{max}$ | $(N_{r},N_{\theta},N_{\phi})$ | Elapsed time | SUs |
---|---|---|---|---|---|---|---|
1 | 1 | 1 | 9 | 170 | (8,256,512) | 0.65988 | 29.328 |
2 | 1 | 2 | 17 | 170 | (16,256,512) | 0.74609 | 33.1596 |
4 | 1 | 4 | 33 | 170 | (32,256,512) | 0.79330 | 35.2578 |
8 | 1 | 8 | 63 | 170 | (64,256,512) | 1.01657 | 45.1809 |
16 | 1 | 16 | 129 | 170 | (128,256,512) | 1.85728 | 82.5458 |
Elapsed time for the weak scaling in the radial resolution. Number of OpenMP threads are shown by the numbers.