Table of Contents

Back to performance benchmark lists

compile options

F90OPTFLAGS = -O3 -warn all -g -xhost -openmp

Definition of columns

# of Cores Number of used CPU cores
# of Processes Number of MPI processes
# of Threads Number of threads for each process
$N_{c}$ Truncation lavel for Chebyshev polynomial
$l_{max}$ Truncation lavel for spherical harmonincs
$(N_{r},N_{\theta},N_{\phi})$ Nuber of grids in spherical coordinate
Elapsed time Elapsed (wall clock time) for one time step
Efficiency Parallel efficiency
SUs Service unit for $10^{4}$ time steps (Core hours)

Single Processor Result

$N_{c}$ $l_{max}$ $(N_{r},N_{\theta},N_{\phi})$ Elapsed time
47 48 (72,72,144) 0.35723

Strong Scaling Results

In the present test, spatial resolution is fixed, and change the parallelization. Elapsed time is inverse proportion to the number of Cores in ideal scaling.

$N_{c}$ $l_{max}$ $(N_{r},N_{\theta},N_{\phi})$
47 48 (72,72,144)
# of Cores # of Processes # of Threads Elapsed time Efficiency SUs
1 1 1 0.35723 1.0 15.8769
2 1 2 0.18729 0.953681 8.32400
4 1 4 0.10288 0.868074 4.57244
8 1 8 0.05784 0.772022 2.57067
$N_{c}$ $l_{max}$ $(N_{r},N_{\theta},N_{\phi})$
97 170 (145,256,512)
# of Cores # of Processes # of Threads Elapsed time Efficiency SUs
1 1 1 18.3200 1.0 814.222
2 1 2 8.69336 1.05368 386.372
4 1 4 4.51397 1.01463 200.621
8 1 8 2.53104 0.904766 112.491
16 1 16 2.16658 0.528483 96.2924

Elapsed (wall clock) time for the strong scaling for $(N_{c},l_{max}) = (97,170)$ case. Number of OpenMP threads are shown by the numbers. Ideal scaling is plotted by dotted line.

Parallel Efficiency for the strong scaling for $(N_{c},l_{max}) = (97,170)$ case. Number of OpenMP threads are shown by the numbers.

Weak Scaling Results

Weak Scaling in horizontal direction

In the present benchmark, radial resolution is fixed, and horizontal resolution is increased with parallelization.

# of Cores # of Processes # of Threads $N_{c}$ $l_{max}$ $(N_{r},N_{\theta},N_{\phi})$ Elapsed time SUs
1 1 1 97 42 (145,64,128) 0.61665 27.4067
4 1 4 97 84 (145,128,256) 0.623898 27.564
16 1 16 97 170 (145,256,512) 2.16658 96.2924

Elapsed (wall clock) time for the weak scaling in the horizontal resolutions. Number of OpenMP threads are shown by the numbers.

Weak Scaling in radial direction

In the present benchmark, horizontal resolution is fixed, and radial resolution is increased with parallelization.

# of Cores # of Processes # of Threads $N_{c}$ $l_{max}$ $(N_{r},N_{\theta},N_{\phi})$ Elapsed time SUs
1 1 1 9 170 (8,256,512) 0.65988 29.328
2 1 2 17 170 (16,256,512) 0.74609 33.1596
4 1 4 33 170 (32,256,512) 0.79330 35.2578
8 1 8 63 170 (64,256,512) 1.01657 45.1809
16 1 16 129 170 (128,256,512) 1.85728 82.5458

Elapsed time for the weak scaling in the radial resolution. Number of OpenMP threads are shown by the numbers.

Back to performance benchmark lists