[[wg:dynamo:Performance_results|Back to performance benchmark lists]] \\ ===== modules for libraries ===== module swap mvapich2 impi/4.1.0.030 \\ module load phdf5 netcdf fftw3 ===== compile options ===== F90OPTFLAGS = -cpp -c -O3 -ip -ipo -xhost ===== Notes ===== Nonlinear terms (spherical transform) is evaluated twice for each time step. \\ Elapsed time is evaluated by inserting MPI_wtime() in main.f90 \\ Time is evaluated from average over 100 time steps. ===== Definition of columns ===== ^ name ^ ^ | # of Cores | Number of used CPU cores | | # of Processes | Number of MPI processes | | # of PE for $r$ | Number of MPI processes in radial | | $l_{max}$ | Truncation lavel for spherical harmonincs | | $(N_{r},N_{\theta},N_{\phi})$ | Nuber of grids in spherical coordinate | | Elapsed | Elapsed (wall clock time) for one time step | | Nonlinear | Elapsed (wall clock time) for evaluation of nonlinear terms | | Solver | Elapsed (wall clock time) for linear solver (including communications) | | Efficiency | Parallel efficiency | | SUs | Service unit for $10^{4}$ time steps (Core hours) | ===== Single processors Result ===== ^ $l_{max}$ ^ $(N_{r},N_{\theta},N_{\phi})$ ^ Elapsed ^ Nonlinear ^ Solver ^ SUs ^ | 48 | ( 73,72,144) | 0.774053 | 0.647997 | 0.109058 | 34.4023 | ===== Strong Scaling Results ===== ^ $l_{max}$ ^ $(N_{r},N_{\theta},N_{\phi})$ ^ | 48 | (73,72,144) | ^ # Cores ^ # Processes ^ # of PE for $r$ ^ Elapsed ^ Nonlinear ^ Solver ^ Efficiency ^ SUs ^ | 1 | 1 | 1 | 0.774053 | 0.647997 | 0.109058 | 1.00 | 34.4023 | | 2 | 2 | 2 | 0.385259 | 0.320167 | 0.0551415 | 1.00459 | 17.1226 | | 4 | 4 | 2 | 0.209187 | 0.174623 | 0.0281849 | 0.925072 | 9.2972 | | 8 | 8 | 4 | 0.119906 | 0.100789 | 0.0145561 | 0.806939 | 5.32914 | | 16 | 16 | 4 | 0.0788227 | 0.0672625 | 0.00788815 | 0.613761 | 3.50323 | ^ $l_{max}$ ^ $(N_{r},N_{\theta},N_{\phi})$ ^ | 256 | (512,384,768) | ^ # Cores ^ # Processes ^ # of PE for $r$ ^ Elapsed ^ Nonlinear ^ Solver ^ Efficiency ^ SUs ^ | 64 | 64 | 8 | 15.1231 | 14.6107 | 0.341912 | 1.00 | 2688.55 | | 128 | 128 | 8 | 7.7779 | 7.46396 | 0.174534 | 0.972183 | 2765.47 | | 256 | 256 | 16 | 4.08026 | 3.85867 | 0.0918966 | 0.926600 | 2901.52 | | 512 | 512 | 16 | 2.22781 | 2.05299 | 0.048329 | 0.848538 | 3168.45 | | 1024 | 1024 | 32 | 1.3467 | 1.17748 | 0.0416313 | 0.70186 | 3830.60 | | 2048 | 2048 | 32 | 0.852113 | 0.689238 | 0.0302918 | 0.554617 | 4847.58 | | 4096 | 4096 | 32 | 0.374556 | 0.249321 | 0.0101704 | 0.501042 | 4261.62 | {{wg:dynamo:Performance_results:LSD:LSD_elapsed.png?480}}\\ Elapsed (wall clock) time for the strong scaling for $l_{max} = 256$, $(N_{r}, N_{\theta}, N_{\phi}) = (512,284,768)$ case. Number of subdomain in the radial direction is shown by the numbers. Ideal scaling is plotted by dotted line. {{wg:dynamo:Performance_results:LSD:LSD_efficiency.png?480}}\\ Parallel Efficiency for the strong scaling for $l_{max} = 256$, $(N_{r}, N_{\theta}, N_{\phi}) = (512,284,768)$ case. Number of subdomain in the radial direction is shown by the numbers. ===== Weak Scaling Results ===== ^ # Cores ^ # Processes ^ # of PE for $r$ ^ $l_{max}$ ^ $(N_{r},N_{\theta},N_{\phi})$ ^ Elapsed ^ Nonlinear ^ Solver ^ SUs ^ | 64 | 64 | 32 | 32 | (512,48,96) | 0.0775688 | 0.0565 | 0.00624115 | 13.79 | | 256 | 64 | 32 | 64 | (512,96,192) | 0.142538 | 0.106911 | 0.00610947 | 101.361 | | 1024 | 1024 | 32 | 128 | (512,192,384) | 0.203346 | 0.137757 | 0.00715795 | 578.407 | | 4096 | 4096 | 32 | 256 | (512,384,768) | 0.374556 | 0.249321 | 0.0101704 | 4261.62 | {{wg:dynamo:Performance_results:LSD:LSD_weak_sph.png?480}}\\ Elapsed (wall clock) time for the weak scaling in the horizontal resolutions. Number of processes for the radial directions is fixed to 32. Ideal scaling for the Legendre transform ($ = O(L_{max}^3$} is plotted by dotted lines. ^ # Cores ^ # Processes ^ # of PE for $r$ ^ $l_{max}$ ^ $(N_{r},N_{\theta},N_{\phi})$ ^ Elapsed ^ Nonlinear ^ Solver ^ SUs ^ | 256 | 256 | 2 | 255 | (32,384,768) | 0.221005 | 0.206752 | 0.0071895 | 157.159 | | 512 | 512 | 4 | 255 | (64,384,768) | 0.231582 | 0.211148 | 0.0067766 | 329.361 | | 1024 | 1024 | 8 | 255 | (128,384,768) | 0.268191 | 0.231129 | 0.0070692 | 762.854 | | 2048 | 2048 | 16 | 255 | (256,384,768) | 0.45460 | 0.374649 | 0.0152046 | 2586.17 | | 4096 | 4096 | 32 | 255 | (512,384,768) | 0.374556 | 0.249321 | 0.0101704 | 4261.62 | {{wg:dynamo:Performance_results:LSD:LSD_weak_r.png?480}}\\ Elapsed time for the weak scaling in the radial resolution. Number of processes for the spherical harmonics is fixed to 128. Ideal scaling is plotted by dotted lines. \\ [[wg:dynamo:Performance_results|Back to performance benchmark lists]] \\