[[wg:dynamo:Performance_results|Back to performance benchmark lists]] \\ ===== compile options ===== F90OPTFLAGS = -O3 -xhost ===== Notes ===== More than two processes are required \\ Time stepping adjustment routine are implemented \\ ===== Definition of columns ===== ^ name ^ ^ | # of Cores | Number of used CPU cores | | # of Processes | Number of MPI processes | | # of Threads | Number of threads for each process | | $l_{max}$ | Truncation lavel for spherical harmonincs | | $(N_{r},N_{\theta},N_{\phi})$ | Nuber of grids in spherical coordinate | | Elapsed | Elapsed (wall clock time) for one time step | | Nonlinear | Elapsed (wall clock time) for nonlinear terms (including communications) | | Solver | Elapsed (wall clock time) for linear calculation | | CFL | Elapsed (wall clock time) for CFL condition check | | Efficiency | Parallel efficiency | | SUs | Service unit for $10^{4}$ time steps (Core hours) | ===== Two Processes Result ===== ^ $l_{max}$ ^ $(N_{r},N_{\theta},N_{\phi})$ ^ Elapsed ^ Nonlinear ^ Solver ^ SUs ^ | 47 | ( 72,72,144) | 1.32979 | 0.510249 | 0.819544 | 46.7169 | ===== Strong Scaling Results ===== ^ $l_{max}$ ^ $(N_{r},N_{\theta},N_{\phi})$ ^ | 47 | (72,72,144) | ^ # of Cores ^ # of Processes ^ # of SMP ^ Elapsed ^ Nonlinear ^ Solver ^ Efficiency ^ SUs ^ | 2 | 1 | 1 | 1.32979 | 0.510249 | 0.819544 | 1 | 46.7169 | | 4 | 3 | 1 | 0.720192 | 0.2817 | 0.438491 | 0.923222 | 32.7459 | | 8 | 7 | 1 | 0.375957 | 0.155108 | 0.220847 | 0.884273 | 29.8152 | | 16 | 15 | 1 | 0.248631 | 0.104707 | 0.143922 | 0.668557 | 35.9047 | | 25 | 24 | 1 | 0.158468 | 0.0812083 | 0.0772578 | 0.671326 | 35.9047 | ^ $l_{max}$ ^ $(N_{r},N_{\theta},N_{\phi})$ ^ | 63 | (124,96,192) | ^ # of Cores ^ # of Processes ^ # of SMP ^ Elapsed ^ Nonlinear ^ Solver ^ Efficiency ^ SUs ^ | 4 | 3 | 1 | 4.49747 | 0.942879 | 3.55459 | 1.23093 | 199.888 | | 8 | 7 | 1 | 2.2319 | 0.59711 | 1.63479 | 1.24022 | 99.1957 | | 16 | 15 | 1 | 1.38402 | 0.407052 | 0.97697 | 1 | 61.5122 | | 32 | 31 | 1 | 0.765525 | 0.245143 | 0.52038 | 0.903971 | 68.0467 | | 64 | 63 | 1 | 0.500593 | 0.152387 | 0.348203 | 0.691192 | 88.9943 | | 126 | 125 | 1 | 0.300067 | 0.123445 | 0.176621 | 0.585699 | 106.69 | ^ $l_{max}$ ^ $(N_{r},N_{\theta},N_{\phi})$ ^ | 63 | (128,96,192) | ^ # of Cores ^ # of Processes ^ # of SMP ^ Elapsed ^ Nonlinear ^ Solver ^ Efficiency ^ SUs ^ | 4 | 3 | 1 | 4.33759 | 0.982332 | 3.35526 | 1.40591 | 192.782 | | 8 | 7 | 1 | 2.38804 | 0.611198 | 1.77684 | 1.27684 | 106.135 | | 16 | 15 | 1 | 1.52457 | 0.473646 | 1.05092 | 1 | 67.7585 | | 32 | 31 | 1 | 0.839238 | 0.257673 | 0.581563 | 0.908304 | 74.599 | | 64 | 63 | 1 | 0.5472 | 0.179547 | 0.367651 | 0.696531 | 97.28 | | 128 | 127 | 1 | 0.335634 | 0.148279 | 0.187353 | 0.567794 | 119.336 | {{wg:dynamo:Performance_results:Goddard:mMosFFT_Elapsed.png?480}}\\ Elapsed (wall clock) time for the strong scaling. Ideal scaling is plotted by dotted line. {{wg:dynamo:Performance_results:Goddard:mMosFFT_efficiency.png?480}}\\ Parallel Efficiency for the strong scaling. ===== Weak Scaling Results ===== ^ # of Cores ^ # of Processes ^ # of SMP ^ $l_{max}$ ^ $(N_{r},N_{\theta},N_{\phi})$ ^ Elapsed ^ Nonlinear ^ Solver ^ SUs ^ | 3 | 2 | 1 | 15 | (124,24,48) | 0.26251 | 0.0325999 | 0.229909 | 11.6671 | | 9 | 8 | 1 | 31 | (124,48,96) | 0.383009 | 0.066082 | 0.316925 | 17.0226 | | 32 | 31 | 1 | 63 | (124,96,192) | 0.765525 | 0.245143 | 0.52038 | 68.0467 | | 125 | 124 | 1 | 127 | (124,192,384) | 1.55669 | 0.793595 | 0.763088 | 553.488 | {{wg:dynamo:Performance_results:Goddard:mMosFFT_weak_sph.png?480}}\\ Elapsed time for the weak scaling in the horizontal resolution. \\ ^ # of Cores ^ # of Processes ^ # of SMP ^ $l_{max}$ ^ $(N_{r},N_{\theta},N_{\phi})$ ^ Elapsed ^ Nonlinear ^ Solver ^ SUs ^ | 16 | 15 | 1 | 63 | (30,96,192) | 0.164901 | 0.117507 | 0.0473918 | 7.32893 | | 32 | 31 | 1 | 63 | (62,96,192) | 0.238942 | 0.138128 | 0.100811 | 21.2393 | | 64 | 63 | 1 | 63 | (124,96,192) | 0.500593 | 0.152387 | 0.348203 | 88.9943 | | 127 | 126 | 1 | 63 | (248,96,192) | 1.19851 | 0.235588 | 0.962918 | 426.137 | {{wg:dynamo:Performance_results:Goddard:mMosFFT_weak_r.png?480}}\\ Elapsed time for the weak scaling in the radial resolution. Ideal scaling for linear solver after LU decomposition is plotted by dotted line.\\ [[wg:dynamo:Performance_results|Back to performance benchmark lists]] \\ [[wg:dynamo:Performance_results:Goddard:files|files]]