[[wg:dynamo:Performance_results|Back to performance benchmark lists]] \\

===== compile options =====

F90OPTFLAGS = -r8 -i4 -ftz -IPF_fma -IPF_fltacc -WB -O2

===== Definition of columns =====

^  name  ^    ^
|  # of Cores  |  Number of used CPU cores  |
|  # of Processes  |  Number of MPI processes  |
|  # of Threads  |  Number of threads for each process  |
|  $N_{c}$  |  Truncation lavel for Chebyshev polynomials  |
|  $l_{max}$  |  Truncation lavel for spherical harmonincs  |
|  $(N_{r},N_{\theta},N_{\phi})$  |  Nuber of grids in spherical coordinate  |
|  Elapsed  |  Elapsed (wall clock time) for one time step  |
|  Nonlinear  |  Elapsed (wall clock time) for nonlinear terms (including communications)  |
|  Solver  |  Elapsed (wall clock time) for linear calculation  |
|  Comm.  |  Elapsed (wall clock time) for data communication  |
|  Efficiency  |  Parallel efficiency  |
|  SUs  |  Service unit for $10^{4}$ time steps (Core hours)  |

===== Single Processor Result =====

^  $N_{c}$  ^  $l_{max}$  ^  $(N_{r},N_{\theta},N_{\phi})$  ^  Elapsed  ^  Nonlinear  ^  Solver  ^  Comm.  ^  SU  ^
|  47  |  42  |  (48,64,129)  |  0.460322  |  0.378743  |  0.0565741  |  0.005765  |

===== Strong Scaling Results =====

^  $N_{c}$  ^  $l_{max}$  ^  $(N_{r},N_{\theta},N_{\phi})$  ^
|  128  |  128  |  (129,192,385)  |

^  # of Cores  ^  # of Processes  ^  # of SMP  ^  Elapsed  ^  Nonlinear  ^  Solver  ^  Comm.  ^  Efficiency  ^  SUs  ^
|  4  |  4  |  1  |  13.6562  |  12.3256  |  1.19259  |  8.13115  |  1  |  151.736  |
|  8  |  8  |  1  |  5.27784  |  4.71791  |  0.559926  |  1.8018  |  1.29373  |  117.285  |
|  16  |  16  |  1  |  2.83756  |  2.56503  |  0.240695  |  1.55375  |  1.20317  |  126.114  |
|  32  |  32  |  1  |  1.41826  |  1.2796  |  0.12139  |  1.18198  |  1.20361  |  126.067  |
|  64  |  64  |  1  |  2.73806  |  2.66596  |  0.0632037  |  2.87671  |  0.311722  |  486.766  |
|  128  |  128  |  1  |  6.82779  |  6.79328  |  0.0298972  |  7.1671  |  0.0625029  |  2427.66  |


^  $N_{c}$  ^  $l_{max}$  ^  $(N_{r},N_{\theta},N_{\phi})$  ^
|  192  |  128  |  (193,192,385)  |

^  # of Cores  ^  # of Processes  ^  # of SMP  ^  Elapsed  ^  Nonlinear  ^  Solver  ^  Comm.  ^  Efficiency  ^  SUs  ^
|  4  |  4  |  1  |  21.845  |  19.1889  |  2.41306  |  12.8516  |  1  |  242.723  |
|  8  |  8  |  1  |  8.69087  |  7.22496  |  1.34667  |  2.81284  |  1.25678  |  193.13  |
|  16  |  16  |  1  |  4.43835  |  3.84567  |  0.535587  |  2.32362  |  1.23047  |  197.26  |
|  32  |  32  |  1  |  2.31319  |  2.01689  |  0.266718  |  1.90409  |  1.18046  |  205.617  |
|  64  |  64  |  1  |  4.14431  |  3.99401  |  0.135566  |  4.28234  |  0.329443  |  736.767  |
|  128  |  128  |  1  |  10.8246  |  10.7533  |  0.0646424  |  11.3395  |  0.0630654  |  3848.74  |


{{wg:dynamo:Performance_results:Busse:Busse_elapsed.png?480}}\\
Elapsed (wall clock) time for the strong scaling for $(N_{c}, l_{max}) = (192, 128)$ case. Number of OpenMP threads are shown by the numbers. Ideal scaling is plotted by dotted line. 

{{wg:dynamo:Performance_results:Busse:Busse_efficiency.png?480}}\\
Parallel Efficiency for the strong scaling for $(N_{c}, l_{max}) = (192, 128)$ case.


===== Weak Scaling Results =====

^  # of Cores  ^  # of Processes  ^  # of SMP  ^  $N_{c}$  ^  $l_{max}$  ^  $(N_{r},N_{\theta},N_{\phi})$  ^  Elapsed  ^  Nonlinear  ^  Solver  ^  Comm.  ^  SUs  ^
|  32  |  32  |  1  |  192  |  128  |  (193,192,385)  |  2.31319  |  2.01689  |  0.266718  |  1.90409  |  205.617  |

^  # of Cores  ^  # of Processes  ^  # of SMP  ^  $N_{c}$  ^  $l_{max}$  ^  $(N_{r},N_{\theta},N_{\phi})$  ^  Elapsed  ^  Nonlinear  ^  Solver  ^  Comm.  ^  SUs  ^
|  32  |  32  |  1  |  192  |  128  |  (193,192,385)  |  2.31319  |  2.01689  |  0.266718  |  1.90409  |  205.617  |

[[wg:dynamo:Performance_results|Back to performance benchmark lists]] \\

[[wg:dynamo:Performance_results:Busse:files|files]]