Back to performance benchmark lists
module swap mvapich2 impi/4.1.0.030
F90OPTFLAGS = -O3 -r8 -cpp -openmp -xhost
At least 3 MPI processes are required
4 radial levels is minimum for each MPI process
Elapsed time is evaluated by inserting MPI_wtime() in parody.f90
name | |
---|---|
# of Cores | Number of used CPU cores |
# of Processes | Number of MPI processes |
# of Threads | Number of threads for each process |
$l_{max}$ | Truncation lavel for spherical harmonincs |
$(N_{r},N_{\theta},N_{\phi})$ | Nuber of grids in spherical coordinate |
Elapsed | Elapsed (wall clock time) for one time step |
Nonlinear | Elapsed (wall clock time) for evaluation of nonlinear terms |
Solver | Elapsed (wall clock time) for linear solver (including communications) |
Efficiency | Parallel efficiency |
SUs | Service unit for $10^{4}$ time steps (Core hours) |
$l_{max}$ | $(N_{r},N_{\theta},N_{\phi})$ | Elapsed | Nonlinear | Solver | SUs |
---|---|---|---|---|---|
47 | ( 73,72,144) | 0.269091 | 0.257912 | 0.00424973 | 0.747475 |
$l_{max}$ | $(N_{r},N_{\theta},N_{\phi})$ |
---|---|
255 | (512,384,768) |
# Cores | # Processes | # Threads | Elapsed | Nonlinear | Solver | Efficiency | SUs |
---|---|---|---|---|---|---|---|
16 | 4 | 4 | 12.54290 | 11.87211 | 0.287119 | 1.000000 | 557.462 |
32 | 4 | 8 | 6.805739 | 6.191985 | 0.254288 | 0.921494 | 604.955 |
32 | 8 | 4 | 6.363801 | 6.005817 | 0.163213 | 0.985488 | 565.671 |
64 | 8 | 8 | 3.432315 | 3.104396 | 0.144843 | 0.913589 | 610.189 |
64 | 16 | 4 | 3.209374 | 2.992346 | 0.116619 | 0.977052 | 570.555 |
128 | 16 | 8 | 1.754511 | 1.551827 | 0.110802 | 0.893618 | 623.826 |
128 | 32 | 4 | 1.685379 | 1.503336 | 0.127453 | 0.930273 | 599.246 |
128 | 64 | 2 | 1.836404 | 1.561923 | 0.226236 | 0.853768 | 652.944 |
128 | 128 | 1 | 2.535049 | 1.993672 | 0.481132 | 0.618474 | 901.351 |
256 | 32 | 8 | 0.951109 | 0.779863 | 0.122069 | 0.824229 | 676.344 |
256 | 64 | 4 | 0.997783 | 0.755018 | 0.191324 | 0.785673 | 709.535 |
256 | 128 | 2 | 1.193223 | 0.779913 | 0.380951 | 0.656986 | 848.514 |
512 | 64 | 8 | 0.6191725 | 0.393483 | 0.194016 | 0.633048 | 880.601 |
512 | 128 | 4 | 0.736604 | 0.380441 | 0.333829 | 0.532125 | 1047.61 |
1024 | 128 | 8 | 0.564325 | 0.199389 | 0.342527 | 0.347287 | 1605.19 |
2048 | 128 | 16 | 0.575268 | 0.217482 | 0.336122 | 0.170340 | 3272.64 |
Elapsed (wall clock) time for the strong scaling. Number of OpenMP threads are shown by the numbers. Ideal scaling is plotted by dotted line.
Parallel Efficiency for the strong scaling. Number of OpenMP threads are shown by the numbers.
# Cores | # Processes | # Threads | $l_{max}$ | $(N_{r},N_{\theta},N_{\phi})$ | Elapsed | Nonlinear | Solver | SUs |
---|---|---|---|---|---|---|---|---|
4 | 4 | 1 | 15 | (512,24,48) | 0.03682030 | 0.03474719 | 0.00099214 | 1.63646 |
16 | 4 | 4 | 31 | (512,48,96) | 0.05962245 | 0.05093837 | 0.00317335 | 2.64989 |
16 | 8 | 2 | 31 | (512,48,96) | 0.05066221 | 0.04486143 | 0.00221673 | 2.25165 |
64 | 4 | 16 | 63 | (512,96,192) | 0.2261694 | 0.1915030 | 0.01298851 | 40.2079 |
64 | 8 | 8 | 63 | (512,96,192) | 0.0879787 | 0.0693352 | 0.00814595 | 15.6407 |
64 | 16 | 4 | 63 | (512,96,192) | 0.0764195 | 0.0639300 | 0.00679417 | 13.5857 |
64 | 32 | 2 | 63 | (512,96,192) | 0.0811522 | 0.0682890 | 0.00898229 | 14.4271 |
64 | 64 | 1 | 63 | (512,96,192) | 0.0872758 | 0.0677864 | 0.01562347 | 15.5157 |
256 | 32 | 8 | 127 | (512,192,384) | 0.151465 | 0.109548 | 0.03015987 | 107.708 |
256 | 64 | 4 | 127 | (512,192,384) | 0.158937 | 0.105684 | 0.04617718 | 113.022 |
256 | 128 | 2 | 127 | (512,192,384) | 0.194962 | 0.103681 | 0.08474707 | 138.640 |
1024 | 128 | 8 | 255 | (512,384,768) | 0.564325 | 0.199389 | 0.342527 | 1605.19 |
Elapsed (wall clock) time for the weak scaling in the horizontal resolutions. Number of OpenMP threads are shown by the numbers. Ideal scaling for Legendre transform ($O_{N_{core}^{1/2}}$) is plotted by dotted line
# Cores | # Processes | # Threads | $l_{max}$ | $(N_{r},N_{\theta},N_{\phi})$ | Elapsed | Nonlinear | Solver | SUs |
---|---|---|---|---|---|---|---|---|
64 | 8 | 8 | 255 | (32,384,768) | 0.23631219 | 0.196183 | 0.0222231 | 42.0111 |
128 | 16 | 8 | 255 | (64,384,768) | 0.2575840 | 0.196314 | 0.043000 | 91.5854 |
256 | 32 | 8 | 255 | (128,384,768) | 0.2998254 | 0.196222 | 0.084590 | 213.209 |
512 | 64 | 8 | 255 | (256,384,768) | 0.3866510 | 0.197542 | 0.169289 | 549.904 |
1024 | 128 | 8 | 255 | (512,384,768) | 0.564325 | 0.199389 | 0.342527 | 1605.19 |
2048 | 128 | 16 | 255 | (1024,384,768) | 0.82399 | 0.427062 | 0.368039 | 4687.59 |
2048 | 256 | 8 | 255 | (1024,384,768) | 0.903354 | 0.203544 | 0.675003 | 5139.08 |
Elapsed time for the weak scaling in the radial resolution. The results with 8 OpenMP threads are shown.
Ideal scaling for Legendre transform ($O_{N_{core}^{1/2}}$) is plotted by dotted line