Back to performance benchmark lists
module swap mvapich2 impi/4.1.0.030
module load phdf5 netcdf fftw3
F90OPTFLAGS = -cpp -c -O3 -ip -ipo -xhost
Nonlinear terms (spherical transform) is evaluated twice for each time step.
Elapsed time is evaluated by inserting MPI_wtime() in main.f90
Time is evaluated from average over 100 time steps.
name | |
---|---|
# of Cores | Number of used CPU cores |
# of Processes | Number of MPI processes |
# of PE for $r$ | Number of MPI processes in radial |
$l_{max}$ | Truncation lavel for spherical harmonincs |
$(N_{r},N_{\theta},N_{\phi})$ | Nuber of grids in spherical coordinate |
Elapsed | Elapsed (wall clock time) for one time step |
Nonlinear | Elapsed (wall clock time) for evaluation of nonlinear terms |
Solver | Elapsed (wall clock time) for linear solver (including communications) |
Efficiency | Parallel efficiency |
SUs | Service unit for $10^{4}$ time steps (Core hours) |
$l_{max}$ | $(N_{r},N_{\theta},N_{\phi})$ | Elapsed | Nonlinear | Solver | SUs |
---|---|---|---|---|---|
48 | ( 73,72,144) | 0.774053 | 0.647997 | 0.109058 | 34.4023 |
$l_{max}$ | $(N_{r},N_{\theta},N_{\phi})$ |
---|---|
48 | (73,72,144) |
# Cores | # Processes | # of PE for $r$ | Elapsed | Nonlinear | Solver | Efficiency | SUs |
---|---|---|---|---|---|---|---|
1 | 1 | 1 | 0.774053 | 0.647997 | 0.109058 | 1.00 | 34.4023 |
2 | 2 | 2 | 0.385259 | 0.320167 | 0.0551415 | 1.00459 | 17.1226 |
4 | 4 | 2 | 0.209187 | 0.174623 | 0.0281849 | 0.925072 | 9.2972 |
8 | 8 | 4 | 0.119906 | 0.100789 | 0.0145561 | 0.806939 | 5.32914 |
16 | 16 | 4 | 0.0788227 | 0.0672625 | 0.00788815 | 0.613761 | 3.50323 |
$l_{max}$ | $(N_{r},N_{\theta},N_{\phi})$ |
---|---|
256 | (512,384,768) |
# Cores | # Processes | # of PE for $r$ | Elapsed | Nonlinear | Solver | Efficiency | SUs |
---|---|---|---|---|---|---|---|
64 | 64 | 8 | 15.1231 | 14.6107 | 0.341912 | 1.00 | 2688.55 |
128 | 128 | 8 | 7.7779 | 7.46396 | 0.174534 | 0.972183 | 2765.47 |
256 | 256 | 16 | 4.08026 | 3.85867 | 0.0918966 | 0.926600 | 2901.52 |
512 | 512 | 16 | 2.22781 | 2.05299 | 0.048329 | 0.848538 | 3168.45 |
1024 | 1024 | 32 | 1.3467 | 1.17748 | 0.0416313 | 0.70186 | 3830.60 |
2048 | 2048 | 32 | 0.852113 | 0.689238 | 0.0302918 | 0.554617 | 4847.58 |
4096 | 4096 | 32 | 0.374556 | 0.249321 | 0.0101704 | 0.501042 | 4261.62 |
Elapsed (wall clock) time for the strong scaling for $l_{max} = 256$, $(N_{r}, N_{\theta}, N_{\phi}) = (512,284,768)$ case. Number of subdomain in the radial direction is shown by the numbers. Ideal scaling is plotted by dotted line.
Parallel Efficiency for the strong scaling for $l_{max} = 256$, $(N_{r}, N_{\theta}, N_{\phi}) = (512,284,768)$ case. Number of subdomain in the radial direction is shown by the numbers.
# Cores | # Processes | # of PE for $r$ | $l_{max}$ | $(N_{r},N_{\theta},N_{\phi})$ | Elapsed | Nonlinear | Solver | SUs |
---|---|---|---|---|---|---|---|---|
64 | 64 | 32 | 32 | (512,48,96) | 0.0775688 | 0.0565 | 0.00624115 | 13.79 |
256 | 64 | 32 | 64 | (512,96,192) | 0.142538 | 0.106911 | 0.00610947 | 101.361 |
1024 | 1024 | 32 | 128 | (512,192,384) | 0.203346 | 0.137757 | 0.00715795 | 578.407 |
4096 | 4096 | 32 | 256 | (512,384,768) | 0.374556 | 0.249321 | 0.0101704 | 4261.62 |
Elapsed (wall clock) time for the weak scaling in the horizontal resolutions. Number of processes for the radial directions is fixed to 32. Ideal scaling for the Legendre transform ($ = O(L_{max}^3$} is plotted by dotted lines.
# Cores | # Processes | # of PE for $r$ | $l_{max}$ | $(N_{r},N_{\theta},N_{\phi})$ | Elapsed | Nonlinear | Solver | SUs |
---|---|---|---|---|---|---|---|---|
256 | 256 | 2 | 255 | (32,384,768) | 0.221005 | 0.206752 | 0.0071895 | 157.159 |
512 | 512 | 4 | 255 | (64,384,768) | 0.231582 | 0.211148 | 0.0067766 | 329.361 |
1024 | 1024 | 8 | 255 | (128,384,768) | 0.268191 | 0.231129 | 0.0070692 | 762.854 |
2048 | 2048 | 16 | 255 | (256,384,768) | 0.45460 | 0.374649 | 0.0152046 | 2586.17 |
4096 | 4096 | 32 | 255 | (512,384,768) | 0.374556 | 0.249321 | 0.0101704 | 4261.62 |
Elapsed time for the weak scaling in the radial resolution. Number of processes for the spherical harmonics is fixed to 128. Ideal scaling is plotted by dotted lines.