[[wg:dynamo:Performance_results|Back to performance benchmark lists]] \\

===== compile options =====

F90OPTFLAGS = -O3 -warn all -g -xhost   -openmp

===== Definition of columns =====

^  name  ^    ^
|  # of Cores  |  Number of used CPU cores  |
|  # of Processes  |  Number of MPI processes  |
|  # of Threads  |  Number of threads for each process  |
|  $N_{C}$  |  Truncation lavel for Chebyshev polynomials  |
|  $l_{max}$  |  Truncation lavel for spherical harmonincs  |
|  $(N_{r},N_{\theta},N_{\phi})$  |  Nuber of grids in spherical coordinate  |
|  Elapsed  |  Elapsed (wall clock time) for one time step  |
|  Nonlinear  |  Elapsed (wall clock time) for nonlinear terms (including communications)  |
|  Solver  |  Elapsed (wall clock time) for linear calculation  |
|  Comm.  |  Elapsed (wall clock time) for data communication  |
|  Efficiency  |  Parallel efficiency  |
|  SUs  |  Service unit for $10^{4}$ time steps (Core hours)  |

===== Single Processor Result =====

^  $N_{C}$  ^  $l_{max}  $  ^  $(N_{r},N_{\theta},N_{\phi})$  ^  Elapsed  ^  Nonlinear  ^  Solver  ^  Comm.  ^  SU  ^
|  72  |  47  |  ( 73,72,144)  |  0.353396  |  0.249238  |  0.965605  |  0.00720803  |  15.7065  |

===== Strong Scaling Results =====

^  $N_{C}$  ^  $l_{max}$  ^  $(N_{r},N_{\theta},N_{\phi})$  ^
|  192  |  255  |  (192,384,768)  |

^  # of Cores  ^  # of Processes  ^  # of SMP  ^  Elapsed  ^  Nonlinear  ^  Solver  ^  Comm.  ^  Efficiency  ^  SUs  ^
|  8  |  2  |  4  |  9.98327  |  6.70605  |  2.44661  |  0.413657  |  1.3892  |  221.85  |
|  8  |  4  |  2  |  9.21265  |  6.62681  |  2.19377  |  0.230439  |  1.5054  |  204.726  |
|  16  |  2  |  8  |  7.94273  |  5.06533  |  2.13615  |  0.538236  |  0.873047  |  353.01  |
|  16  |  4  |  4  |  6.93438  |  4.67925  |  1.80959  |  0.298254  |  1  |  308.195  |
|  16  |  8  |  2  |  7.12601  |  5.21069  |  1.70288  |  0.245879  |  0.973108  |  316.712  |
|  32  |  4  |  8  |  3.91464  |  2.45256  |  1.01179  |  0.220971  |  0.885697  |  347.968  |
|  32  |  8  |  4  |  3.65146  |  2.36565  |  0.96631  |  0.243386  |  0.949535  |  324.574  |
|  32  |  16  |  2  |  3.76134  |  2.62796  |  0.861258  |  0.165907  |  0.921796  |  334.342  |
|  64  |  8  |  8  |  2.01431  |  1.22438  |  0.465491  |  0.146756  |  0.860641  |  358.099  |
|  64  |  16  |  4  |  1.942  |  1.18073  |  0.465075  |  0.188362  |  0.892685  |  345.245  |
|  64  |  32  |  2  |  2.0975  |  1.33229  |  0.360439  |  0.108833  |  0.826505  |  372.889  |
|  128  |  16  |  8  |  1.15653  |  0.633651  |  0.241426  |  0.119156  |  0.749479  |  411.212  |
|  128  |  32  |  4  |  1.0925  |  0.591954  |  0.182997  |  0.0788043  |  0.793407  |  388.445  |
|  128  |  64  |  2  |  1.46869  |  0.662761  |  0.175857  |  0.0781756  |  0.590182  |  522.202  |
|  256  |  32  |  8  |  0.713037  |  0.318539  |  0.103698  |  0.0459037  |  0.607821  |  507.049  |
|  256  |  64  |  4  |  0.799634  |  0.298149  |  0.0985314  |  0.0466445  |  0.541996  |  568.629  |
|  512  |  64  |  8  |  0.597079  |  0.156069  |  0.0687971  |  0.0383883  |  0.362932  |  849.18  |


{{wg:dynamo:Performance_results:MagIC5:MagIC5_Elapsed.png?480}}\\
Elapsed (wall clock) time for the strong scaling. Number of OpenMP threads are shown by the numbers. Ideal scaling is plotted by dotted line. 

{{wg:dynamo:Performance_results:MagIC5:MagIC5_efficiency.png?480}}\\
Parallel Efficiency for the strong scaling. Fastest result with 16 cores (one node) is chosen for a reference. Number of OpenMP threads are shown by the numbers.


===== Weak Scaling Results =====

^  # of Cores  ^  # of Processes  ^  # of SMP  ^  $N_{C}$  ^  $l_{max}$  ^  $(N_{r},N_{\theta},N_{\phi})$  ^  Elapsed  ^  Nonlinear  ^  Solver  ^  Comm.  ^  SUs  ^
|  4  |  1  |  4  |  256  |  31  |  (257,48,96)  |  0.239627  |  0.117565  |  0.108877  |  0.01385  |  22.2638  |
|  16  |  4  |  4  |  256  |    31  |  (257,96,192)  |  0.265171  |  0.128896  |  0.0825426  |  0.000955493  |  22.2638  |
|  64  |  16  |  4  |  256  |    63  |  (257,192,384)  |  0.410822  |  0.223191  |  0.110366  |  0.0318171  |  106.733  |
|  256  |  64  |  4  |  256  |    127  |  (257,384,768)  |  1.06998  |  0.398631  |  0.134563  |  0.0538146  |  519.207  |

{{wg:dynamo:Performance_results:MagIC5:MagIC5_weak_sph.png?480}}\\
Elapsed time for the weak scaling in the horizontal resolution. The results with 4 OpenMP threads are shown. Scaling of $O(Ncore^{1/2})$ (ideal scaling for Legendre transform) is plotted by dotted line.

\\

^  # of Cores  ^  # of Processes  ^  # of SMP  ^  $N_{C}$  ^  $l_{max}$  ^  $(N_{r},N_{\theta},N_{\phi})$  ^  Elapsed  ^  Nonlinear  ^  Solver  ^  Comm.  ^  SUs  ^
|  32  |  8  |  4  |  32  |  255  |  (33,384,768)  |  0.525404  |  0.389331  |  0.0906778  |  0.0499037  |  46.7025  |
|  64  |  16  |  4  |  64  |  255  |  (65,384,768)  |  0.586558  |  0.396212  |  0.100678  |  0.0558089  |  104.277  |
|  128  |  32  |  4  |  128  |  255  |  (129,384,768)  |  0.694737  |  0.396308  |  0.105363  |  0.0500511  |  247.018  |
|  256  |  64  |  4  |  256  |  255  |  (257,384,768)  |  1.06998  |  0.398631  |  0.134563  |  0.0538146  |  760.877  |

{{wg:dynamo:Performance_results:MagIC5:MagIC5_weak_r.png?480}}\\
Elapsed time for the weak scaling in the radial resolution. The results with 4 OpenMP threads are shown. Scaling of $O(Ncore)$ is plotted by dotted line.

[[wg:dynamo:Performance_results|Back to performance benchmark lists]] \\

[[wg:dynamo:Performance_results:MagIC5:files|files]]