[[wg:dynamo:Performance_results|Back to performance benchmark lists]] \\ ===== compile options ===== F90OPTFLAGS = -O3 -warn all -g -xhost -openmp ===== Definition of columns ===== ^ name ^ ^ | # of Cores | Number of used CPU cores | | # of Processes | Number of MPI processes | | # of Threads | Number of threads for each process | | $N_{C}$ | Truncation lavel for Chebyshev polynomials | | $l_{max}$ | Truncation lavel for spherical harmonincs | | $(N_{r},N_{\theta},N_{\phi})$ | Nuber of grids in spherical coordinate | | Elapsed | Elapsed (wall clock time) for one time step | | Nonlinear | Elapsed (wall clock time) for nonlinear terms (including communications) | | Solver | Elapsed (wall clock time) for linear calculation | | Comm. | Elapsed (wall clock time) for data communication | | Efficiency | Parallel efficiency | | SUs | Service unit for $10^{4}$ time steps (Core hours) | ===== Single Processor Result ===== ^ $N_{C}$ ^ $l_{max} $ ^ $(N_{r},N_{\theta},N_{\phi})$ ^ Elapsed ^ Nonlinear ^ Solver ^ Comm. ^ SU ^ | 72 | 47 | ( 73,72,144) | 0.353396 | 0.249238 | 0.965605 | 0.00720803 | 15.7065 | ===== Strong Scaling Results ===== ^ $N_{C}$ ^ $l_{max}$ ^ $(N_{r},N_{\theta},N_{\phi})$ ^ | 192 | 255 | (192,384,768) | ^ # of Cores ^ # of Processes ^ # of SMP ^ Elapsed ^ Nonlinear ^ Solver ^ Comm. ^ Efficiency ^ SUs ^ | 8 | 2 | 4 | 9.98327 | 6.70605 | 2.44661 | 0.413657 | 1.3892 | 221.85 | | 8 | 4 | 2 | 9.21265 | 6.62681 | 2.19377 | 0.230439 | 1.5054 | 204.726 | | 16 | 2 | 8 | 7.94273 | 5.06533 | 2.13615 | 0.538236 | 0.873047 | 353.01 | | 16 | 4 | 4 | 6.93438 | 4.67925 | 1.80959 | 0.298254 | 1 | 308.195 | | 16 | 8 | 2 | 7.12601 | 5.21069 | 1.70288 | 0.245879 | 0.973108 | 316.712 | | 32 | 4 | 8 | 3.91464 | 2.45256 | 1.01179 | 0.220971 | 0.885697 | 347.968 | | 32 | 8 | 4 | 3.65146 | 2.36565 | 0.96631 | 0.243386 | 0.949535 | 324.574 | | 32 | 16 | 2 | 3.76134 | 2.62796 | 0.861258 | 0.165907 | 0.921796 | 334.342 | | 64 | 8 | 8 | 2.01431 | 1.22438 | 0.465491 | 0.146756 | 0.860641 | 358.099 | | 64 | 16 | 4 | 1.942 | 1.18073 | 0.465075 | 0.188362 | 0.892685 | 345.245 | | 64 | 32 | 2 | 2.0975 | 1.33229 | 0.360439 | 0.108833 | 0.826505 | 372.889 | | 128 | 16 | 8 | 1.15653 | 0.633651 | 0.241426 | 0.119156 | 0.749479 | 411.212 | | 128 | 32 | 4 | 1.0925 | 0.591954 | 0.182997 | 0.0788043 | 0.793407 | 388.445 | | 128 | 64 | 2 | 1.46869 | 0.662761 | 0.175857 | 0.0781756 | 0.590182 | 522.202 | | 256 | 32 | 8 | 0.713037 | 0.318539 | 0.103698 | 0.0459037 | 0.607821 | 507.049 | | 256 | 64 | 4 | 0.799634 | 0.298149 | 0.0985314 | 0.0466445 | 0.541996 | 568.629 | | 512 | 64 | 8 | 0.597079 | 0.156069 | 0.0687971 | 0.0383883 | 0.362932 | 849.18 | {{wg:dynamo:Performance_results:MagIC5:MagIC5_Elapsed.png?480}}\\ Elapsed (wall clock) time for the strong scaling. Number of OpenMP threads are shown by the numbers. Ideal scaling is plotted by dotted line. {{wg:dynamo:Performance_results:MagIC5:MagIC5_efficiency.png?480}}\\ Parallel Efficiency for the strong scaling. Fastest result with 16 cores (one node) is chosen for a reference. Number of OpenMP threads are shown by the numbers. ===== Weak Scaling Results ===== ^ # of Cores ^ # of Processes ^ # of SMP ^ $N_{C}$ ^ $l_{max}$ ^ $(N_{r},N_{\theta},N_{\phi})$ ^ Elapsed ^ Nonlinear ^ Solver ^ Comm. ^ SUs ^ | 4 | 1 | 4 | 256 | 31 | (257,48,96) | 0.239627 | 0.117565 | 0.108877 | 0.01385 | 22.2638 | | 16 | 4 | 4 | 256 | 31 | (257,96,192) | 0.265171 | 0.128896 | 0.0825426 | 0.000955493 | 22.2638 | | 64 | 16 | 4 | 256 | 63 | (257,192,384) | 0.410822 | 0.223191 | 0.110366 | 0.0318171 | 106.733 | | 256 | 64 | 4 | 256 | 127 | (257,384,768) | 1.06998 | 0.398631 | 0.134563 | 0.0538146 | 519.207 | {{wg:dynamo:Performance_results:MagIC5:MagIC5_weak_sph.png?480}}\\ Elapsed time for the weak scaling in the horizontal resolution. The results with 4 OpenMP threads are shown. Scaling of $O(Ncore^{1/2})$ (ideal scaling for Legendre transform) is plotted by dotted line. \\ ^ # of Cores ^ # of Processes ^ # of SMP ^ $N_{C}$ ^ $l_{max}$ ^ $(N_{r},N_{\theta},N_{\phi})$ ^ Elapsed ^ Nonlinear ^ Solver ^ Comm. ^ SUs ^ | 32 | 8 | 4 | 32 | 255 | (33,384,768) | 0.525404 | 0.389331 | 0.0906778 | 0.0499037 | 46.7025 | | 64 | 16 | 4 | 64 | 255 | (65,384,768) | 0.586558 | 0.396212 | 0.100678 | 0.0558089 | 104.277 | | 128 | 32 | 4 | 128 | 255 | (129,384,768) | 0.694737 | 0.396308 | 0.105363 | 0.0500511 | 247.018 | | 256 | 64 | 4 | 256 | 255 | (257,384,768) | 1.06998 | 0.398631 | 0.134563 | 0.0538146 | 760.877 | {{wg:dynamo:Performance_results:MagIC5:MagIC5_weak_r.png?480}}\\ Elapsed time for the weak scaling in the radial resolution. The results with 4 OpenMP threads are shown. Scaling of $O(Ncore)$ is plotted by dotted line. [[wg:dynamo:Performance_results|Back to performance benchmark lists]] \\ [[wg:dynamo:Performance_results:MagIC5:files|files]]