[[wg:dynamo:Performance_results|Back to performance benchmark lists]] \\

===== modules for libraries =====

module swap mvapich2 impi/4.1.0.030

===== compile options =====

F90OPTFLAGS = -O3 -r8 -cpp -openmp -xhost

=====  Notes =====
At least 3 MPI processes are required \\
4 radial levels is minimum for each MPI process \\
Elapsed time is evaluated by inserting MPI_wtime() in parody.f90 \\

===== Definition of columns =====

^  name  ^    ^
|  # of Cores  |  Number of used CPU cores  |
|  # of Processes  |  Number of MPI processes  |
|  # of Threads  |  Number of threads for each process  |
|  $l_{max}$  |  Truncation lavel for spherical harmonincs  |
|  $(N_{r},N_{\theta},N_{\phi})$  |  Nuber of grids in spherical coordinate  |
|  Elapsed  |  Elapsed (wall clock time) for one time step  |
|  Nonlinear  |  Elapsed (wall clock time) for evaluation of nonlinear terms  |
|  Solver  |  Elapsed (wall clock time) for linear solver (including communications)  |
|  Efficiency  |  Parallel efficiency  |
|  SUs  |  Service unit for $10^{4}$ time steps (Core hours)  |

===== Four processors Result =====

^  $l_{max}$  ^  $(N_{r},N_{\theta},N_{\phi})$  ^  Elapsed  ^  Nonlinear  ^  Solver  ^  SUs  ^
|  47  |  ( 73,72,144)  |  0.269091  |  0.257912  |  0.00424973  |  0.747475  |

===== Strong Scaling Results =====

^  $l_{max}$  ^  $(N_{r},N_{\theta},N_{\phi})$  ^
|  255  |  (512,384,768)  |

^  # Cores  ^  # Processes  ^  # Threads  ^  Elapsed  ^  Nonlinear  ^  Solver  ^  Efficiency  ^  SUs  ^
|  16  |   4  |  4  |  12.54290  |  11.87211  |  0.287119  |  1.000000  |  557.462  |
|  32  |   4  |  8  |  6.805739  |  6.191985  |  0.254288  |  0.921494  |  604.955  |
|  32  |   8  |  4  |  6.363801  |  6.005817  |  0.163213  |  0.985488  |  565.671  |
|  64  |   8  |  8  |  3.432315  |  3.104396  |  0.144843  |  0.913589  |  610.189  |
|  64  |  16  |  4  |  3.209374  |  2.992346  |  0.116619  |  0.977052  |  570.555  |
|  128  |  16  |  8  |  1.754511  |  1.551827  |  0.110802  |  0.893618  |  623.826  |
|  128  |  32  |  4  |  1.685379  |  1.503336  |  0.127453  |  0.930273  |  599.246  |
|  128  |  64  |  2  |  1.836404  |  1.561923  |  0.226236  |  0.853768  |  652.944  |
|  128  |  128  |  1  |  2.535049  |  1.993672  |  0.481132  |  0.618474  |  901.351  |
|  256  |  32  |  8  |  0.951109  |  0.779863  |  0.122069  |  0.824229  |  676.344  |
|  256  |  64  |  4  |  0.997783  |  0.755018  |  0.191324  |  0.785673  |  709.535  |
|  256  |  128  |  2  |  1.193223  |  0.779913  |  0.380951  |  0.656986  |  848.514  |
|  512  |  64  |  8  |  0.6191725  |  0.393483  |  0.194016  |  0.633048  |  880.601  |
|  512  |  128  |  4  |  0.736604  |  0.380441  |  0.333829  |  0.532125  |  1047.61  |
|  1024  |  128  |  8  |  0.564325  |  0.199389  |  0.342527  |  0.347287  |  1605.19  |
|  2048  |  128  |  16  |  0.575268  |  0.217482    |  0.336122  |  0.170340  |  3272.64  |

{{wg:dynamo:Performance_results:IPGP:strong_scale_parody.png?480}}\\
Elapsed (wall clock) time for the strong scaling. Number of OpenMP threads are shown by the numbers. Ideal scaling is plotted by dotted line. 

{{wg:dynamo:Performance_results:IPGP:strong_efficiency_parody.png?480}}\\
Parallel Efficiency for the strong scaling. Number of OpenMP threads are shown by the numbers.

===== Weak Scaling Results =====

^  # Cores  ^  # Processes  ^  # Threads  ^  $l_{max}$  ^  $(N_{r},N_{\theta},N_{\phi})$  ^  Elapsed  ^  Nonlinear  ^  Solver  ^  SUs  ^
|  4  |  4  |  1  |  15  |  (512,24,48)  |  0.03682030  |  0.03474719  |  0.00099214  |  1.63646  |
|  16  |  4  |  4  |  31  |  (512,48,96)  |  0.05962245  |  0.05093837  |  0.00317335  |  2.64989  |
|  16  |  8  |  2  |  31  |  (512,48,96)  |  0.05066221  |  0.04486143  |  0.00221673  |  2.25165  |
|  64  |  4  |  16  |  63  |  (512,96,192)  |  0.2261694  |  0.1915030  |  0.01298851  |  40.2079  |
|  64  |  8  |  8  |  63  |  (512,96,192)  |  0.0879787  |  0.0693352  |  0.00814595  |  15.6407  |  
|  64  |  16  |  4  |  63  |  (512,96,192)  |  0.0764195  |  0.0639300  |  0.00679417  |  13.5857  |
|  64  |  32  |  2  |  63  |  (512,96,192)  |  0.0811522  |  0.0682890  |  0.00898229  |  14.4271  |
|  64  |  64  |  1  |  63  |  (512,96,192)  |  0.0872758  |  0.0677864  |  0.01562347  |  15.5157  |
|  256  |  32  |  8  |  127  |  (512,192,384)  |  0.151465  |  0.109548  |  0.03015987  |  107.708  |
|  256  |  64  |  4  |  127  |  (512,192,384)  |  0.158937  |  0.105684  |  0.04617718  |  113.022  |
|  256  |  128  |  2  |  127  |  (512,192,384)  |  0.194962  |  0.103681  |  0.08474707  |  138.640  |
|  1024  |  128  |  8  |  255  |  (512,384,768)  |  0.564325  |  0.199389  |  0.342527  |  1605.19  |

{{wg:dynamo:Performance_results:IPGP:parody_weak_sph.png?480}}\\
Elapsed (wall clock) time for the weak scaling in the horizontal resolutions. Number of OpenMP threads are shown by the numbers. Ideal scaling for Legendre transform ($O_{N_{core}^{1/2}}$) is plotted by dotted line

^  # Cores  ^  # Processes  ^  # Threads  ^  $l_{max}$  ^  $(N_{r},N_{\theta},N_{\phi})$  ^  Elapsed  ^  Nonlinear  ^  Solver  ^  SUs  ^
|  64  |  8  |  8  |  255  |  (32,384,768)  |  0.23631219  |  0.196183  |  0.0222231  |  42.0111  |
|  128  |  16  |  8  |  255  |  (64,384,768)  |  0.2575840  |  0.196314  |  0.043000  |  91.5854  |
|  256  |  32  |  8  |  255  |  (128,384,768)  |  0.2998254  |  0.196222  |  0.084590  |  213.209  |
|  512  |  64  |  8  |  255  |  (256,384,768)  |  0.3866510  |  0.197542  |  0.169289  |  549.904  |
|  1024  |  128  |  8  |  255  |  (512,384,768)  |  0.564325  |  0.199389  |  0.342527  |  1605.19  |
|  2048  |  128  |  16  |  255  |  (1024,384,768)  |  0.82399  |  0.427062  |  0.368039  |  4687.59  |
|  2048  |  256  |  8  |  255  |  (1024,384,768)  |  0.903354  |  0.203544  |  0.675003  |  5139.08  |

{{wg:dynamo:Performance_results:IPGP:parody_weak_r.png?480}}\\
Elapsed time for the weak scaling in the radial resolution. The results with 8 OpenMP threads are shown.
Ideal scaling for Legendre transform ($O_{N_{core}^{1/2}}$) is plotted by dotted line

[[wg:dynamo:Performance_results|Back to performance benchmark lists]] \\
[[wg:dynamo:Performance_results:IPGP:files|files]]