User Tools

Site Tools


wg:dynamo:performance_results:geofem

Back to performance benchmark lists

compile options

F90OPTFLAGS = -O3 -warn all -g -xhost -openmp

Definition of columns

name
# of Cores Number of used CPU cores
# of Processes Number of MPI processes
# of Threads Number of threads for each process
$N_{r}$ Nuber of nodes in radial direction
$N_{sph}$ Number of nodes in a sphere
Elapsed time Elapsed (wall clock time) for one time step
Solver time Elapsed (wall clock time) for linear solver (including communications)
Comm. time Elapsed (wall clock time) for data communication
Efficiency Parallel efficiency
SUs Service unit for $10^{4}$ time steps (Core hours)

Strong Scaling Results

$N_{r}$ $N_{sph}$
65 31106
# of Cores # of Processes # of Threads Elapsed time Solver time Comm. time Efficiency SUs
32 32 1 140.320 136.035 5.63721 0.993565 12472.9
64 64 1 62.3732 60.2832 5.83084 1.1176 11088.6
256 256 1 25.7511 22.0492 2.21876 0.676753 18311.9
32 16 2 139.417 134.658 2.90400 1.00 12392.6
64 32 2 61.0636 58.9420 2.62792 1.14157 10855.8
128 64 2 26.2981 25.2837 1.96978 1.32535 9350.44
256 128 2 18.6194 17.9958 8.38487 0.935966 13240.5
512 256 2 18.5565 18.1882 13.8237 0.469569 26391.5
32 8 4 141.713 134.061 8.63714 0.983798 12596.7
64 16 4 61.3219 58.9329 2.28762 1.13676 10901.7
128 32 4 26.7351 25.6872 1.89901 1.30369 9505.81
256 64 4 12.0594 11.5573 1.29915 1.44511 8575.57
512 128 4 6.50411 6.19388 0.930524 1.3397 9250.29
1024 256 4 3.61803 3.40549 1.27324 1.20419 10291.3
256 32 8 12.6620 12.1131 1.49653 1.37633 9004.09
512 64 8 6.31781 6.05579 1.326885 1.37921 8985.33
1024 128 8 4.00455 3.83623 1.61223 1.08796 11390.7
2048 256 8 2.24999 2.17732 1.06132 0.968178 12799.9
$N_{r}$ $N_{sph}$
129 124418
# of Cores # of Processes # of Threads Elapsed time Solver time Comm. time Efficiency SUs
512 512 1 188.6315 185.9239 79.4327 0.703179 268276
512 256 2 155.7737 150.9981 48.0654 0.851503 221545
1024 512 2 116.1871 114.8309 68.0962 0.570811 330488
512 128 4 132.6418 131.1039 13.0843 1.00 188646
1024 256 4 62.7093 60.6450 14.2009 1.05759 178373
2048 512 4 29.1312 28.4619 7.7951 1.13831 165724
1024 128 8 60.5976 59.1307 7.7176 1.09445 172367
2048 256 8 31.4802 30.6008 8.3909 1.05337 179087
4096 512 8 17.7632 17.4155 7.1339 0.933403 202106
2048 128 16 129.3082 127.3400 78.7965 0.256445 735620
4096 256 16 84.6423 83.7060 60.9705 0.195886 963041


Elapsed (wall clock) time for the strong scaling. Number of OpenMP threads are shown by the numbers.


Parallel Efficiency for the strong scaling. Number of OpenMP threads are shown by the numbers.

Weak Scaling Results

N_r N_sph
1 16
# of Cores # of Processes # of Threads $N_{r}$ $N_{sph}$ Elapsed time Solver time Comm. time iteration for $d{\bf A}/dT$ SUs
8 4 2 17 1946 1.27525 0.995820 0.051852 180.75 28.3389
16 4 4 33 1946 2.13157 1.89065 0.104110 229.25 94.7364
32 8 4 17 7778 2.91725 2.64922 0.176285 309.0 259.311
64 8 8 33 7778 3.29256 3.03684 0.356617 358.5 585.344
128 32 4 65 7778 4.02430 3.77557 0.680323 404.25 1430.86
256 32 8 33 31106 5.94088 5.67232 0.916604 661.7 4224.63
512 64 8 65 31106 6.31781 6.05579 1.32689 683.5 8985.33
1024 256 4 129 31106 9.44298 9.15932 2.59881 814.0 26860
2048 256 8 65 124418 15.0371 14.7224 4.4994 1354.1 85544.4
4096 512 8 129 124418 17.7632 17.4155 7.1339 1355.8 202106


Elapsed time for the weak scaling (red line). The best results among runs with the same number of cores are chosen for the plotting. Average iteration counts for $dA/Dt$ is also plotted with black line.

Back to performance benchmark lists

files

wg/dynamo/performance_results/geofem.txt · Last modified: 2018/11/28 21:55 (external edit)