[[wg:dynamo:Performance_results|Back to performance benchmark lists]] \\ ===== compile options ===== F90OPTFLAGS = -O3 -warn all -g -xhost -openmp ===== Definition of columns ===== ^ name ^ ^ | # of Cores | Number of used CPU cores | | # of Processes | Number of MPI processes | | # of Threads | Number of threads for each process | | $N_{r}$ | Nuber of nodes in radial direction | | $N_{sph}$ | Number of nodes in a sphere | | Elapsed time | Elapsed (wall clock time) for one time step | | Solver time | Elapsed (wall clock time) for linear solver (including communications) | | Comm. time | Elapsed (wall clock time) for data communication | | Efficiency | Parallel efficiency | | SUs | Service unit for $10^{4}$ time steps (Core hours) | ===== Strong Scaling Results ===== ^ $N_{r}$ ^ $N_{sph}$ ^ | 65 | 31106 | ^ # of Cores ^ # of Processes ^ # of Threads ^ Elapsed time ^ Solver time ^ Comm. time ^ Efficiency ^ SUs ^ | 32 | 32 | 1 | 140.320 | 136.035 | 5.63721 | 0.993565 | 12472.9 | | 64 | 64 | 1 | 62.3732 | 60.2832 | 5.83084 | 1.1176 | 11088.6 | | 256 | 256 | 1 | 25.7511 | 22.0492 | 2.21876 | 0.676753 | 18311.9 | | 32 | 16 | 2 | 139.417 | 134.658 | 2.90400 | 1.00 | 12392.6 | | 64 | 32 | 2 | 61.0636 | 58.9420 | 2.62792 | 1.14157 | 10855.8 | | 128 | 64 | 2 | 26.2981 | 25.2837 | 1.96978 | 1.32535 | 9350.44 | | 256 | 128 | 2 | 18.6194 | 17.9958 | 8.38487 | 0.935966 | 13240.5 | | 512 | 256 | 2 | 18.5565 | 18.1882 | 13.8237 | 0.469569 | 26391.5 | | 32 | 8 | 4 | 141.713 | 134.061 | 8.63714 | 0.983798 | 12596.7 | | 64 | 16 | 4 | 61.3219 | 58.9329 | 2.28762 | 1.13676 | 10901.7 | | 128 | 32 | 4 | 26.7351 | 25.6872 | 1.89901 | 1.30369 | 9505.81 | | 256 | 64 | 4 | 12.0594 | 11.5573 | 1.29915 | 1.44511 | 8575.57 | | 512 | 128 | 4 | 6.50411 | 6.19388 | 0.930524 | 1.3397 | 9250.29 | | 1024 | 256 | 4 | 3.61803 | 3.40549 | 1.27324 | 1.20419 | 10291.3 | | 256 | 32 | 8 | 12.6620 | 12.1131 | 1.49653 | 1.37633 | 9004.09 | | 512 | 64 | 8 | 6.31781 | 6.05579 | 1.326885 | 1.37921 | 8985.33 | | 1024 | 128 | 8 | 4.00455 | 3.83623 | 1.61223 | 1.08796 | 11390.7 | | 2048 | 256 | 8 | 2.24999 | 2.17732 | 1.06132 | 0.968178 | 12799.9 | ^ $N_{r}$ ^ $N_{sph}$ ^ | 129 | 124418 | ^ # of Cores ^ # of Processes ^ # of Threads ^ Elapsed time ^ Solver time ^ Comm. time ^ Efficiency ^ SUs ^ | 512 | 512 | 1 | 188.6315 | 185.9239 | 79.4327 | 0.703179 | 268276 | | 512 | 256 | 2 | 155.7737 | 150.9981 | 48.0654 | 0.851503 | 221545 | | 1024 | 512 | 2 | 116.1871 | 114.8309 | 68.0962 | 0.570811 | 330488 | | 512 | 128 | 4 | 132.6418 | 131.1039 | 13.0843 | 1.00 | 188646 | | 1024 | 256 | 4 | 62.7093 | 60.6450 | 14.2009 | 1.05759 | 178373 | | 2048 | 512 | 4 | 29.1312 | 28.4619 | 7.7951 | 1.13831 | 165724 | | 1024 | 128 | 8 | 60.5976 | 59.1307 | 7.7176 | 1.09445 | 172367 | | 2048 | 256 | 8 | 31.4802 | 30.6008 | 8.3909 | 1.05337 | 179087 | | 4096 | 512 | 8 | 17.7632 | 17.4155 | 7.1339 | 0.933403 | 202106 | | 2048 | 128 | 16 | 129.3082 | 127.3400 | 78.7965 | 0.256445 | 735620 | | 4096 | 256 | 16 | 84.6423 | 83.7060 | 60.9705 | 0.195886 | 963041 | {{wg:dynamo:Performance_results:GeoFEM:GeoFEM_elapsed.png?480}}\\ Elapsed (wall clock) time for the strong scaling. Number of OpenMP threads are shown by the numbers. {{wg:dynamo:Performance_results:GeoFEM:GeoFEM_efficiency.png?480}}\\ Parallel Efficiency for the strong scaling. Number of OpenMP threads are shown by the numbers. ===== Weak Scaling Results ===== ^ N_r ^ N_sph ^ | 1 | 16 | ^ # of Cores ^ # of Processes ^ # of Threads ^ $N_{r}$ ^ $N_{sph}$ ^ Elapsed time ^ Solver time ^ Comm. time ^ iteration for $d{\bf A}/dT$ ^ SUs ^ | 8 | 4 | 2 | 17 | 1946 | 1.27525 | 0.995820 | 0.051852 | 180.75 | 28.3389 | | 16 | 4 | 4 | 33 | 1946 | 2.13157 | 1.89065 | 0.104110 | 229.25 | 94.7364 | | 32 | 8 | 4 | 17 | 7778 | 2.91725 | 2.64922 | 0.176285 | 309.0 | 259.311 | | 64 | 8 | 8 | 33 | 7778 | 3.29256 | 3.03684 | 0.356617 | 358.5 | 585.344 | | 128 | 32 | 4 | 65 | 7778 | 4.02430 | 3.77557 | 0.680323 | 404.25 | 1430.86 | | 256 | 32 | 8 | 33 | 31106 | 5.94088 | 5.67232 | 0.916604 | 661.7 | 4224.63 | | 512 | 64 | 8 | 65 | 31106 | 6.31781 | 6.05579 | 1.32689 | 683.5 | 8985.33 | | 1024 | 256 | 4 | 129 | 31106 | 9.44298 | 9.15932 | 2.59881 | 814.0 | 26860 | | 2048 | 256 | 8 | 65 | 124418 | 15.0371 | 14.7224 | 4.4994 | 1354.1 | 85544.4 | | 4096 | 512 | 8 | 129 | 124418 | 17.7632 | 17.4155 | 7.1339 | 1355.8 | 202106 | {{wg:dynamo:Performance_results:GeoFEM:GeoFEM_weak.png?480}}\\ Elapsed time for the weak scaling (red line). The best results among runs with the same number of cores are chosen for the plotting. Average iteration counts for $dA/Dt$ is also plotted with black line. [[wg:dynamo:Performance_results|Back to performance benchmark lists]] \\ [[wg:dynamo:Performance_results:GeoFEM:files|files]]