[[wg:dynamo:Performance_results|Back to performance benchmark lists]] \\ ===== compile options ===== F90OPTFLAGS = -r8 -i4 -ftz -IPF_fma -IPF_fltacc -WB -O2 ===== Definition of columns ===== ^ name ^ ^ | # of Cores | Number of used CPU cores | | # of Processes | Number of MPI processes | | # of Threads | Number of threads for each process | | $N_{c}$ | Truncation lavel for Chebyshev polynomials | | $l_{max}$ | Truncation lavel for spherical harmonincs | | $(N_{r},N_{\theta},N_{\phi})$ | Nuber of grids in spherical coordinate | | Elapsed | Elapsed (wall clock time) for one time step | | Nonlinear | Elapsed (wall clock time) for nonlinear terms (including communications) | | Solver | Elapsed (wall clock time) for linear calculation | | Comm. | Elapsed (wall clock time) for data communication | | Efficiency | Parallel efficiency | | SUs | Service unit for $10^{4}$ time steps (Core hours) | ===== Two Processor Result ===== ^ $N_{c}$ ^ $l_{max}$ ^ $(N_{r},N_{\theta},N_{\phi})$ ^ Elapsed ^ Nonlinear ^ Solver ^ Comm. ^ SU ^ | | | | | | | | ===== Strong Scaling Results ===== ^ $N_{c}$ ^ $l_{max}$ ^ $(N_{r},N_{\theta},N_{\phi})$ ^ | 192 | 170 | (193,256,256) | ^ # of Cores ^ # of Processes ^ # of SMP ^ Elapsed ^ Nonlinear ^ Solver ^ Comm. ^ Efficiency ^ SUs ^ | 32 | 32 | 1 | 0.719657 | 0.518729 | 0.200927 | 0.182378 | 1 | 63.9695 | | 64 | 64 | 1 | 0.545835 | 0.336805 | 0.209030 | 0.21079 | 0.659226 | 97.0373 | | 128 | 128 | 1 | 0.285552 | 0.19960 | 0.085952 | 0.174127 | 0.630058 | 101.529 | ^ $N_{c}$ ^ $l_{max}$ ^ $(N_{r},N_{\theta},N_{\phi})$ ^ | 256 | 341 | (257,512,512) | ^ # of Cores ^ # of Processes ^ # of SMP ^ Elapsed ^ Nonlinear ^ Solver ^ Comm. ^ Efficiency ^ SUs ^ | 86 | 86 | 1 | 2.47102 | 1.88142 | 0.589604 | 0.379399 | 1 | 658.939 | | 128 | 128 | 1 | 2.32883 | 1.91045 | 0.418383 | 0.857392 | 0.712896 | 828.03 | | 129 | 129 | 1 | 2.09683 | 1.68765 | 0.409184 | 0.663364 | 0.785635 | 838.734 | | 256 | 256 | 1 | 1.41293 | 1.13385 | 0.279076 | 0.675334 | 0.58751 | 1004.75 | | 257 | 257 | 1 | 1.33908 | 1.06534 | 0.273739 | 0.621596 | 0.61991 | 1011.75 | {{wg:dynamo:Performance_results:Gary:Gary_1_elapsed.png?480}}\\ Elapsed (wall clock) time for the strong scaling. Number of OpenMP threads are shown by the numbers. Ideal scaling is plotted by dotted line. {{wg:dynamo:Performance_results:Gary:Gary_1_efficiency.png?480}}\\ Parallel Efficiency for the strong scaling. ===== Weak Scaling Results ===== ^ # of Cores ^ # of Processes ^ # of SMP ^ $N_{c}$ ^ $l_{max}$ ^ $(N_{r},N_{\theta},N_{\phi})$ ^ Elapsed ^ Nonlinear ^ Solver ^ Comm. ^ SUs ^ | 4 | 4 | 1 | 256 | 42 | (257,64,64) | 0.322194 | 0.165516 | 0.156679 | 0.0525075 | 14.3197 | | 16 | 16 | 1 | 256 | 85 | (257,128,128) | 0.406717 | 0.204322 | 0.202394 | 0.0716188 | 18.0763 | | 64 | 64 | 1 | 256 | 170 | (257,256,256) | 0.743692 | 0.44779 | 0.295902 | 0.312471 | 132.212 | | 256 | 256 | 1 | 256 | 341 | (257,512,512) | 1.41293 | 1.13385 | 0.279076 | 0.675334 | 1004.75 | | | | | | | | | | | | | 5 | 5 | 1 | 256 | 42 | (257,64,64) | 0.252394 | 0.127396 | 0.124998 | 0.0395718 | 11.2175 | | 17 | 17 | 1 | 256 | 85 | (257,128,128) | 0.359973 | 0.180849 | 0.179124 | 0.0811399 | 31.9976 | | 65 | 65 | 1 | 256 | 170 | (257,256,256) | 0.639472 | 0.363557 | 0.275915 | 0.227634 | 142.105 | | 257 | 257 | 1 | 256 | 341 | (257,512,512) | 1.33908 | 1.06534 | 0.273739 | 0.621596 | 1011.75 | {{wg:dynamo:Performance_results:Gary:Gary_1_weak_sph.png?480}}\\ Elapsed time for the weak scaling in the horizontal directions. Scaling of $O(Ncore^{1/2})$ (ideal scaling for Legendre transform) is plotted by dotted line. ^ # of Cores ^ # of Processes ^ # of SMP ^ $N_{c}$ ^ $l_{max}$ ^ $(N_{r},N_{\theta},N_{\phi})$ ^ Elapsed ^ Nonlinear ^ Solver ^ Comm. ^ SUs ^ | 8 | 8 | 1 | 16 | 341 | (17,512,512) | 1.28171 | 1.20689 | 0.0748194 | 0.372245 | 56.9651 | | 16 | 16 | 1 | 32 | 341 | (33,512,512) | 1.52159 | 1.42569 | 0.0958951 | 0.315524 | 67.6261 | | 32 | 32 | 1 | 64 | 341 | (65,512,512) | 1.61663 | 1.45842 | 0.158208 | 0.389299 | 143.7 | | 64 | 64 | 1 | 128 | 341 | (129,512,512) | 1.75436 | 1.50559 | 0.248772 | 0.453414 | 311.887 | | 128 | 128 | 1 | 256 | 341 | (257,512,512) | 2.32883 | 1.91045 | 0.418383 | 0.857392 | 828.03 | | | | | | | | | | | | | | 9 | 9 | 1 | 16 | 341 | (17,512,512) | 1.05288 | 0.993848 | 0.0590345 | 0.207514 | 46.7948 | | 17 | 17 | 1 | 32 | 341 | (33,512,512) | 1.03108 | 0.94969 | 0.0813889 | 0.194925 | 91.6515 | | 33 | 33 | 1 | 64 | 341 | (65,512,512) | 1.25166 | 1.10257 | 0.149091 | 0.218563 | 166.888 | | 65 | 65 | 1 | 128 | 341 | (129,512,512) | 1.43519 | 1.19399 | 0.241198 | 0.262446 | 318.931 | | 129 | 129 | 1 | 256 | 341 | (257,512,512) | 2.09683 | 1.68765 | 0.409184 | 0.663364 | 838.734 | {{wg:dynamo:Performance_results:Gary:Gary_1_weak_r.png?480}}\\ Elapsed time for the weak scaling in the radial direction. Scaling of $O(Ncore^{1/2})$ (ideal scaling for Legendre transform) is plotted by dotted line. [[wg:dynamo:Performance_results|Back to performance benchmark lists]] \\ [[wg:dynamo:Performance_results:Gary:files|files]]