[[wg:dynamo:Performance_results|Back to performance benchmark lists]] \\ ===== compile options ===== F90OPTFLAGS = -r8 -i4 -ftz -IPF_fma -IPF_fltacc -WB -O2 ===== Definition of columns ===== ^ name ^ ^ | # of Cores | Number of used CPU cores | | # of Processes | Number of MPI processes | | # of Threads | Number of threads for each process | | $N_{c}$ | Truncation lavel for Chebyshev polynomials | | $l_{max}$ | Truncation lavel for spherical harmonincs | | $(N_{r},N_{\theta},N_{\phi})$ | Nuber of grids in spherical coordinate | | Elapsed | Elapsed (wall clock time) for one time step | | Nonlinear | Elapsed (wall clock time) for nonlinear terms (including communications) | | Solver | Elapsed (wall clock time) for linear calculation | | Comm. | Elapsed (wall clock time) for data communication | | Efficiency | Parallel efficiency | | SUs | Service unit for $10^{4}$ time steps (Core hours) | ===== Single Processor Result ===== ^ $N_{c}$ ^ $l_{max}$ ^ $(N_{r},N_{\theta},N_{\phi})$ ^ Elapsed ^ Nonlinear ^ Solver ^ Comm. ^ SU ^ | 47 | 42 | (48,64,129) | 0.460322 | 0.378743 | 0.0565741 | 0.005765 | ===== Strong Scaling Results ===== ^ $N_{c}$ ^ $l_{max}$ ^ $(N_{r},N_{\theta},N_{\phi})$ ^ | 128 | 128 | (129,192,385) | ^ # of Cores ^ # of Processes ^ # of SMP ^ Elapsed ^ Nonlinear ^ Solver ^ Comm. ^ Efficiency ^ SUs ^ | 4 | 4 | 1 | 13.6562 | 12.3256 | 1.19259 | 8.13115 | 1 | 151.736 | | 8 | 8 | 1 | 5.27784 | 4.71791 | 0.559926 | 1.8018 | 1.29373 | 117.285 | | 16 | 16 | 1 | 2.83756 | 2.56503 | 0.240695 | 1.55375 | 1.20317 | 126.114 | | 32 | 32 | 1 | 1.41826 | 1.2796 | 0.12139 | 1.18198 | 1.20361 | 126.067 | | 64 | 64 | 1 | 2.73806 | 2.66596 | 0.0632037 | 2.87671 | 0.311722 | 486.766 | | 128 | 128 | 1 | 6.82779 | 6.79328 | 0.0298972 | 7.1671 | 0.0625029 | 2427.66 | ^ $N_{c}$ ^ $l_{max}$ ^ $(N_{r},N_{\theta},N_{\phi})$ ^ | 192 | 128 | (193,192,385) | ^ # of Cores ^ # of Processes ^ # of SMP ^ Elapsed ^ Nonlinear ^ Solver ^ Comm. ^ Efficiency ^ SUs ^ | 4 | 4 | 1 | 21.845 | 19.1889 | 2.41306 | 12.8516 | 1 | 242.723 | | 8 | 8 | 1 | 8.69087 | 7.22496 | 1.34667 | 2.81284 | 1.25678 | 193.13 | | 16 | 16 | 1 | 4.43835 | 3.84567 | 0.535587 | 2.32362 | 1.23047 | 197.26 | | 32 | 32 | 1 | 2.31319 | 2.01689 | 0.266718 | 1.90409 | 1.18046 | 205.617 | | 64 | 64 | 1 | 4.14431 | 3.99401 | 0.135566 | 4.28234 | 0.329443 | 736.767 | | 128 | 128 | 1 | 10.8246 | 10.7533 | 0.0646424 | 11.3395 | 0.0630654 | 3848.74 | {{wg:dynamo:Performance_results:Busse:Busse_elapsed.png?480}}\\ Elapsed (wall clock) time for the strong scaling for $(N_{c}, l_{max}) = (192, 128)$ case. Number of OpenMP threads are shown by the numbers. Ideal scaling is plotted by dotted line. {{wg:dynamo:Performance_results:Busse:Busse_efficiency.png?480}}\\ Parallel Efficiency for the strong scaling for $(N_{c}, l_{max}) = (192, 128)$ case. ===== Weak Scaling Results ===== ^ # of Cores ^ # of Processes ^ # of SMP ^ $N_{c}$ ^ $l_{max}$ ^ $(N_{r},N_{\theta},N_{\phi})$ ^ Elapsed ^ Nonlinear ^ Solver ^ Comm. ^ SUs ^ | 32 | 32 | 1 | 192 | 128 | (193,192,385) | 2.31319 | 2.01689 | 0.266718 | 1.90409 | 205.617 | ^ # of Cores ^ # of Processes ^ # of SMP ^ $N_{c}$ ^ $l_{max}$ ^ $(N_{r},N_{\theta},N_{\phi})$ ^ Elapsed ^ Nonlinear ^ Solver ^ Comm. ^ SUs ^ | 32 | 32 | 1 | 192 | 128 | (193,192,385) | 2.31319 | 2.01689 | 0.266718 | 1.90409 | 205.617 | [[wg:dynamo:Performance_results|Back to performance benchmark lists]] \\ [[wg:dynamo:Performance_results:Busse:files|files]]