User Tools

Site Tools


wg:dynamo:performance_results:busse

Back to performance benchmark lists

compile options

F90OPTFLAGS = -r8 -i4 -ftz -IPF_fma -IPF_fltacc -WB -O2

Definition of columns

name
# of Cores Number of used CPU cores
# of Processes Number of MPI processes
# of Threads Number of threads for each process
$N_{c}$ Truncation lavel for Chebyshev polynomials
$l_{max}$ Truncation lavel for spherical harmonincs
$(N_{r},N_{\theta},N_{\phi})$ Nuber of grids in spherical coordinate
Elapsed Elapsed (wall clock time) for one time step
Nonlinear Elapsed (wall clock time) for nonlinear terms (including communications)
Solver Elapsed (wall clock time) for linear calculation
Comm. Elapsed (wall clock time) for data communication
Efficiency Parallel efficiency
SUs Service unit for $10^{4}$ time steps (Core hours)

Single Processor Result

$N_{c}$ $l_{max}$ $(N_{r},N_{\theta},N_{\phi})$ Elapsed Nonlinear Solver Comm. SU
47 42 (48,64,129) 0.460322 0.378743 0.0565741 0.005765

Strong Scaling Results

$N_{c}$ $l_{max}$ $(N_{r},N_{\theta},N_{\phi})$
128 128 (129,192,385)
# of Cores # of Processes # of SMP Elapsed Nonlinear Solver Comm. Efficiency SUs
4 4 1 13.6562 12.3256 1.19259 8.13115 1 151.736
8 8 1 5.27784 4.71791 0.559926 1.8018 1.29373 117.285
16 16 1 2.83756 2.56503 0.240695 1.55375 1.20317 126.114
32 32 1 1.41826 1.2796 0.12139 1.18198 1.20361 126.067
64 64 1 2.73806 2.66596 0.0632037 2.87671 0.311722 486.766
128 128 1 6.82779 6.79328 0.0298972 7.1671 0.0625029 2427.66
$N_{c}$ $l_{max}$ $(N_{r},N_{\theta},N_{\phi})$
192 128 (193,192,385)
# of Cores # of Processes # of SMP Elapsed Nonlinear Solver Comm. Efficiency SUs
4 4 1 21.845 19.1889 2.41306 12.8516 1 242.723
8 8 1 8.69087 7.22496 1.34667 2.81284 1.25678 193.13
16 16 1 4.43835 3.84567 0.535587 2.32362 1.23047 197.26
32 32 1 2.31319 2.01689 0.266718 1.90409 1.18046 205.617
64 64 1 4.14431 3.99401 0.135566 4.28234 0.329443 736.767
128 128 1 10.8246 10.7533 0.0646424 11.3395 0.0630654 3848.74


Elapsed (wall clock) time for the strong scaling for $(N_{c}, l_{max}) = (192, 128)$ case. Number of OpenMP threads are shown by the numbers. Ideal scaling is plotted by dotted line.


Parallel Efficiency for the strong scaling for $(N_{c}, l_{max}) = (192, 128)$ case.

Weak Scaling Results

# of Cores # of Processes # of SMP $N_{c}$ $l_{max}$ $(N_{r},N_{\theta},N_{\phi})$ Elapsed Nonlinear Solver Comm. SUs
32 32 1 192 128 (193,192,385) 2.31319 2.01689 0.266718 1.90409 205.617
# of Cores # of Processes # of SMP $N_{c}$ $l_{max}$ $(N_{r},N_{\theta},N_{\phi})$ Elapsed Nonlinear Solver Comm. SUs
32 32 1 192 128 (193,192,385) 2.31319 2.01689 0.266718 1.90409 205.617

Back to performance benchmark lists

files

wg/dynamo/performance_results/busse.txt · Last modified: 2018/11/28 21:53 (external edit)