Table of Contents

Back to performance benchmark lists

compile options

F90OPTFLAGS = -O3 -g -xhost

Definition of columns

name
# of Cores Number of used CPU cores
# of parallel FEM Number of subdomain in meridional plane
# of parallel FFT Number of parallelization for FFT
$N_{med}$ Number of nodes for fluid in a meridional plane
$N_{\phi}$ Number of nodes (modes) in longitudinal direction
Elapsed time Elapsed (wall clock time) for one time step
Solver time Elapsed (wall clock time) for linear solver (including communications)
Comm. time Elapsed (wall clock time) for data communication
Efficiency Parallel efficiency
SUs Service unit for $10^{4}$ time steps (Core hours)

Elapsed time is evaluated by averaging over 100 steps and number of cores from “fort.702”
Solver time is evaluated by averaging over 100 steps and number of cores from “fort.705”
Comm. time is evaluated by averaging over 100 steps and number of cores from “fort.703”

Strong Scaling Results

$N_{med}$ $N_{\phi}$
53280 32
# of Cores # of parallel FEM # of parallel FFT Elapsed time Solver time Comm. time Efficiency SUs
64 8 8 3.26451 0.112633 0.0521764 0.958659 580.357
64 16 4 5.39355 0.134547 0.0933418 0.580239 958.854
128 8 16 1.85212 0.108784 0.0344249 0.844854 658.533
128 16 8 1.84533 0.0709971 0.0290995 0.847966 656.116
256 8 32 1.18584 0.427434 0.0484521 0.659775 843.263
256 16 16 1.04266 0.0631451 0.023694 0.750375 741.448
256 32 8 1.20359 0.0506269 0.0211725 0.650045 855.886
512 16 32 0.707405 0.248459 0.0314082 0.552998 1006.09
512 32 16 0.732251 0.0445155 0.0205402 0.534234 1041.42
512 64 8 0.649872 0.119073 0.00713214 0.601955 924.262
$N_{med}$ $N_{\phi}$
132587 128
# of Cores # of parallel FEM # of parallel FFT Elapsed time Solver time Comm. time Efficiency SUs
512 32 16 5.347 0.195319 0.162323 0.852482 7604.62
512 16 32 4.55822 0.225011 0.18032 1 6482.8
1024 64 16 3.99642 0.162858 0.117593 0.570288 11367.6
1024 32 32 2.30393 0.126079 0.107139 0.989228 6553.4
1024 16 64 3.2378 0.214095 0.144102 0.703906 9209.75
2048 64 32 1.40748 0.0818107 0.0677741 0.809641 8007.01
2048 32 64 1.51652 0.111759 0.0867726 0.75143 8627.29
2048 16 128 2.39379 0.338092 0.107656 0.476046 13618
4096 128 32 1.02445 0.0618993 0.0500832 0.55618 11655.9
4096 64 64 0.954898 0.0737486 0.0606541 0.59669 10864.6
4096 32 128 1.12072 0.180406 0.0659313 0.508405 12751.3


Elapsed (wall clock) time for the strong scaling. Number of parallelization for FFT is shown by the numbers.


Parallel Efficiency for the strong scaling. Number of parallelization for FFT is shown by the numbers.

Weak Scaling Results

# of Cores # of parallel FEM # of parallel FFT $N_{med}$ $N_{\phi}$ Elapsed time Solver time Comm. time SUs
128 32 4 132587 16 4.45475 0.100929 0.0287522 1583.91
256 32 8 132587 32 2.82371 0.101547 0.0439264 2007.97
512 32 16 132587 64 2.41257 0.117023 0.0525979 3431.22
1024 32 32 132587 128 2.30393 0.126079 0.107139 6553.4
2048 32 64 132587 256 2.39006 0.126997 0.149923 13596.8


Elapsed time for the weak scaling in the zonal direction.

# of Cores # of parallel FEM # of parallel FFT $N_{med}$ $N_{\phi}$ Elapsed time Solver time Comm. time SUs
256 16 16 7620 64 0.498492 0.0331437 0.0233114 354.484
256 64 16 30667 64 0.734665 0.0432489 0.0237405 2089.71
2304 144 16 67412 64 1.05416 0.0557186 0.0527788 6746.65
4096 256 16 119590 64 1.26638 0.0689046 0.0430469 14408.5


Elapsed time for the weak scaling in the meridional directions. Scaling of O(Ncore1/2) is plotted by dotted line.

Back to performance benchmark lists
files