Contents
Using OpenMPI on AMD EPYC and Intel
OpenMPI is available in a great variety of version and is combinable with different compilers. Select an appropriate one after inspecting all combinations via
module avail openmpiPlease be aware that all versions and combinations have to match. OpenMPI programs compiled with the Intel compiler need a corresponding openmpi-intel-version of OpenMPI. After selecting a version, do not forget to load it:
module load openmpi/your-version
Some versions of OpenMPI show reduced support for the networks in our cluster. We therefore recommend not to use OpenMPI in version 4.1 but 4.0 or even preferable version 5.
The known compiler wrappers come together with the MPI library so you may use any of mpic++, mpicc or mpicxx like in:
mpicc -O3 -o hello_world hello_world.c
Best practice for starting a program in the batch environment is by using a so called script that contains parameters for the batch system. You may copy the given example script in a file called openmpi_job.sh and submit to the batch system:
sbatch openmpi_job.shTo look for the available nodes you may use (in one line)
scontrol show node |grep Features|awk '{ FS="="; print $2}' | awk '{ FS=","; print $1}' |sort|uniqThis has to be done only once.
Pure MPI
Pure MPI implies that you start and use only tasks. Combining tasks and threads is called hybrid parallelization and considered below.
The following script may be copied and then the generic lines like the partition, the account, the jobname and of course the type of the CPU may be modified according to your needs.
#!/bin/bash #SBATCH -p partition # select a partition like skylake-96 #SBATCH --account project # specify your project for priorization #SBATCH --mail-type=END # want a mail notification at end of job #SBATCH -J jobname # name of the job #SBATCH -o jobname.%j.out # Output: %j expands to jobid #SBATCH -e jobname.%j.err # Error: %j expands to jobid #SBATCH -L ompi # request a license for openmpi #SBATCH --nodes=2 # requesting 2 nodes (identical -N 2) #SBATCH --ntasks=4 # requesting 4 MPI tasks (identical -n 4) #SBATCH --ntasks-per-node=2 # 2 MPI tasks will be started per node #SBATCH -C XEON_SP_6126 # will select a so called Skylake CPU # Skylake nodes host 24 CPU cores, thus recommended are # --ntasks=4/8/12/24 in combination with --nodes=1 # or --ntasks=i*24 with a corresponding no. of nodes ## is a comment - so we use Skylake (above) - else comment above, uncomment below ##SBATCH -C XEON_E5_2640v3 # will select a so called Haswell CPU # Haswell nodes host 16 CPU cores, thus recommended are # --ntasks=4/8/16 in combination with --nodes=1 # or --ntasks=i*16 with a corresponding no. of nodes ##SBATCH -C EPYC_7262 # will select an AMD CPU. They host 16 CPU cores, thus # --ntasks=4/8/16 in combination with --nodes=1 # or --ntasks=i*16 with a corresponding no. of nodes ### generic for all type of CPUs Intel and AMD ### ========================================== module purge # clean all module version and modules loaded interactively(!) VERSION="5.0.2" # an explicit version is preferable to just latest # as latest may change module add openmpi/$VERSION INTEL="--mca btl_openib_allow_ib 1" AMD="--mca pml cm --mca btl self --mca btl_ofi_mode 1" # the following is a recommendation in case the distribution of tasks is identical to all # nodes BEST_PRACTICE="-np $SLURM_NTASKS -N $SLURM_NTASKS_PER_NODE" # select the command to be used F=$(scontrol show nodes $SLURMD_NODENAME | grep -i Availablefeatures) if [ ${F/XEON/} != $F ] ; then # this is an Intel CPU CMD="mpirun $INTEL $BEST_PRACTICE ./my_executable" else # has to be an AMD CMD="mpirun $AMD $BEST_PRACTICE ./my_executable" fi $CMD
Hybrid OpenMP and MPI
An example may look like (Intel red, AMD green):
#!/bin/bash -l # required in first line (the -l is needed !) #SBATCH -p partition # select a partition like skylake-96 #SBATCH --account project # specify your project for priorization #SBATCH --mail-type=END # want a mail notification at end of job #SBATCH -J jobname # name of the job #SBATCH -o jobname.%j.out # Output: %j expands to jobid #SBATCH -e jobname.%j.err # Error: %j expands to jobid #SBATCH -L ompi # request a license for openmpi #SBATCH --nodes=2 # requesting 2 nodes (identical -N 2) #SBATCH --ntasks=4 # requesting 4 MPI tasks (identical -n 4) #SBATCH --ntasks-per-node=2 # 2 MPI tasks will be started per node #SBATCH --cpus-per-task=3 # each MPI task starts 3 OpenMP threads ### use generic part given for Intel to select OpenMPI Version # compile as well the hybrid program using MPI and OpenMP mpicc -o your_program -O3 -fopen your_program.c # AMD specific: MAP="" OMP="" SLOT="" if [ "$SLURM_CPUS_PER_TASK" != "" ]; then # Is this a Hybrid job? if [ $SLURM_CPUS_PER_TASK -ge 2 ]; then # Are at least 2 threads used per task ? SLOT="slot" fi OMP="-x OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK" # set the ENV-Variable to be used MAP="--map-by $SLOT:PE=$SLURM_CPUS_PER_TASK" # organize the threads on the cores of the node else MAP="--map-by $SLOT:PE=1" fi # to start the MPI tasks in a symmetric (!recommended) layout SNT="" if [ "$SLURM_NTASKS_PER_NODE" != "" ]; then SNT="-N $SLURM_NTASKS_PER_NODE" fi HGN="--mca mtl psm2" # Communication option for the network OPTS="${MAP} ${OMP} ${SNT} ${HGN}" mpirun -np $SLURM_NTASKS $OPTS ./your_program
This example covers Hybrid OpenMP/MPI and pure MPI cases. In case of a pure MPI cases the following SBATCH line has to be commented (additional # at beginning) or erased:
#SBATCH --cpus-per-task=3 # each MPI task starts 3 OpenMP threadsand of course the -fopenmp option has to be removed for the compilation.
#!/bin/bash # required in first line (without -l) # - same SBATCH lines as for AMD (except the partition) ### generic for all type of CPUs Intel and AMD ### ========================================== module purge # clean all module versions # choose the version of the openmpi module (either or) VERSION="4.1" # based on gcc VERSION="4.1-intel-2022" # based on intel compiler OPENMPI=${VERSION/-*/} # select OPENMPI verstion out of it if [ "$VERSION" != "$OPENMPI" ]; then # in case using intel based VERSION C=${VERSION/"$OPENMPI-"/} # select compiler string out of openmpi version COMP=${C/-*/} # select compiler YEAR=${C/*-/} # select year in compiler version if [ $YEAR -gt 2021 ] && [ "$COMP" == "intel" ]; then # special case for newer versions COMP="oneapi" fi module load $COMP/$YEAR # load appropriate compiler for openmpi version else module load gcc/latest # Version=4.1 - need newest gcc as well fi module load openmpi/$VERSION # load your openmpi version ### end of generic part # compile as well the hybrid program using MPI and OpenMP mpicc -o your_program -O3 -fopen your_program.c # specific for Intel CPUs # ======================= OPT="" if [ ${OPENMPI/\./} -ge 41 ]; then OPT="--mca btl openib" if [ "$COMP" == "" ]; then # special case for gcc, depending on hardware if [ "$(/usr/sbin/ibstat -l | grep hfi)" != "" ]; then # if network is omnipath OPT="-mca mtl psm2 -mca pml cm -mca btl_openib_allow_ib 1" fi fi fi mpirun $OPT -x "OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK" ./your_program
In this script is as well considered, that with changing OpenMPI versions additional (changing) options for the high bandwith networks are required.