Main Navigation

Secondary Navigation

Page Contents

Contents

Using OpenMPI on AMD EPYC and Intel

OpenMPI is available in a great variety of version and is combinable with different compilers. Select an appropriate one after inspecting all combinations via

module avail openmpi
Please be aware that all versions and combinations have to match. OpenMPI programs compiled with the Intel compiler need a corresponding openmpi-intel-version of OpenMPI. After selecting a version, do not forget to load it:
module load openmpi/your-version

Some versions of OpenMPI show reduced support for the networks in our cluster. We therefore recommend not to use OpenMPI in version 4.1 but 4.0 or even preferable version 5.

The known compiler wrappers come together with the MPI library so you may use any of mpic++, mpicc or mpicxx like in:

mpicc -O3 -o hello_world hello_world.c

Best practice for starting a program in the batch environment is by using a so called script that contains parameters for the batch system. You may copy the given example script in a file called openmpi_job.sh and submit to the batch system:

sbatch openmpi_job.sh

To look for the available nodes you may use (in one line)
scontrol show node |grep Features|awk '{ FS="="; print $2}' | awk '{ FS=","; print $1}'
|sort|uniq
This has to be done only once.

Pure MPI

Pure MPI implies that you start and use only tasks. Combining tasks and threads is called hybrid parallelization and considered below.

The following script may be copied and then the generic lines like the partition, the account, the jobname and of course the type of the CPU may be modified according to your needs.

#!/bin/bash
#SBATCH -p partition        # select a partition like skylake-96
#SBATCH --account project   # specify your project for priorization
#SBATCH --mail-type=END     # want a mail notification at end of job
#SBATCH -J jobname          # name of the job
#SBATCH -o jobname.%j.out   # Output: %j expands to jobid
#SBATCH -e jobname.%j.err   # Error: %j expands to jobid
#SBATCH -L ompi             # request a license for openmpi
#SBATCH --nodes=2           # requesting 2 nodes (identical -N 2)
#SBATCH --ntasks=4          # requesting 4 MPI tasks (identical -n 4)
#SBATCH --ntasks-per-node=2 # 2 MPI tasks will be started per node
#SBATCH -C XEON_SP_6126     # will select a so called Skylake CPU
                            # Skylake nodes host 24 CPU cores, thus recommended are
                            # --ntasks=4/8/12/24  in combination with --nodes=1
                            # or --ntasks=i*24 with a corresponding no. of nodes
## is a comment - so we use Skylake (above) - else comment above, uncomment below
##SBATCH -C XEON_E5_2640v3  # will select a so called Haswell CPU
                            # Haswell nodes host 16 CPU cores, thus recommended are
							# --ntasks=4/8/16  in combination with --nodes=1
							# or --ntasks=i*16 with a corresponding no. of nodes
##SBATCH -C EPYC_7262		# will select an AMD CPU. They host 16 CPU cores, thus
							# --ntasks=4/8/16  in combination with --nodes=1
							# or --ntasks=i*16 with a corresponding no. of nodes

### generic for all type of CPUs Intel and AMD
### ==========================================
module purge				# clean all module version and modules loaded interactively(!)
VERSION="5.0.2"				# an explicit version is preferable to just latest 
                            # as latest may change
module add openmpi/$VERSION

INTEL="--mca btl_openib_allow_ib 1"
AMD="--mca pml cm --mca btl self --mca btl_ofi_mode 1"
# the following is a recommendation in case the distribution of tasks is identical to all
# nodes
BEST_PRACTICE="-np $SLURM_NTASKS -N $SLURM_NTASKS_PER_NODE"   

# select the command to be used
F=$(scontrol show nodes $SLURMD_NODENAME | grep -i Availablefeatures)
if [ ${F/XEON/} != $F ] ; then
  # this is an Intel CPU
  CMD="mpirun $INTEL $BEST_PRACTICE ./my_executable"
else
  # has to be an AMD
  CMD="mpirun $AMD $BEST_PRACTICE ./my_executable"
fi

$CMD
 

Hybrid OpenMP and MPI

An example may look like (Intel red, AMD green):

#!/bin/bash -l			# required in first line (the -l is needed !)
#SBATCH -p partition		# select a partition like skylake-96
#SBATCH --account project 	# specify your project for priorization
#SBATCH --mail-type=END		# want a mail notification at end of job
#SBATCH -J jobname		# name of the job 
#SBATCH -o jobname.%j.out	# Output: %j expands to jobid
#SBATCH -e jobname.%j.err	# Error: %j expands to jobid
#SBATCH -L ompi			# request a license for openmpi
#SBATCH --nodes=2		# requesting 2 nodes (identical -N 2)
#SBATCH --ntasks=4		# requesting 4 MPI tasks (identical -n 4)
#SBATCH --ntasks-per-node=2     # 2 MPI tasks will be started per node
#SBATCH --cpus-per-task=3       # each MPI task starts 3 OpenMP threads

### use generic part given for Intel to select OpenMPI Version

# compile as well the hybrid program using MPI and OpenMP
mpicc -o your_program -O3 -fopen your_program.c 

# AMD specific:
MAP=""
OMP=""
SLOT=""
if [ "$SLURM_CPUS_PER_TASK" != "" ]; then	# Is this a Hybrid job?
 if [ $SLURM_CPUS_PER_TASK -ge 2 ]; then	# Are at least 2 threads used per task ?
   SLOT="slot"
 fi
 OMP="-x OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK"  # set the ENV-Variable to be used
 MAP="--map-by $SLOT:PE=$SLURM_CPUS_PER_TASK"	# organize the threads on the cores of the node
else
 MAP="--map-by $SLOT:PE=1"
fi

# to start the MPI tasks in a symmetric (!recommended) layout
SNT=""						
if [ "$SLURM_NTASKS_PER_NODE" != "" ]; then
   SNT="-N $SLURM_NTASKS_PER_NODE"
fi
HGN="--mca mtl psm2"				# Communication option for the network

OPTS="${MAP} ${OMP} ${SNT} ${HGN}"

mpirun -np $SLURM_NTASKS $OPTS ./your_program

This example covers Hybrid OpenMP/MPI and pure MPI cases. In case of a pure MPI cases the following SBATCH line has to be commented (additional # at beginning) or erased:

#SBATCH --cpus-per-task=3       # each MPI task starts 3 OpenMP threads
and of course the -fopenmp option has to be removed for the compilation.

#!/bin/bash			# required in first line (without -l)
# - same SBATCH lines as for AMD (except the partition)

### generic for all type of CPUs Intel and AMD
### ==========================================
module purge				# clean all module versions

# choose the version of the openmpi module (either or)
VERSION="4.1"				# based on gcc
VERSION="4.1-intel-2022"		# based on intel compiler

OPENMPI=${VERSION/-*/}			# select OPENMPI verstion out of it
if [ "$VERSION" != "$OPENMPI" ]; then	# in case using intel based VERSION
 C=${VERSION/"$OPENMPI-"/}		# select compiler string out of openmpi version
 COMP=${C/-*/}				# select compiler
 YEAR=${C/*-/}				# select year in compiler version
 if [ $YEAR -gt 2021 ] && [ "$COMP" == "intel" ]; then  # special case for newer versions
   COMP="oneapi"
 fi
 module load $COMP/$YEAR		# load appropriate compiler for openmpi version
else
 module load gcc/latest			# Version=4.1 - need newest gcc as well
fi
module load openmpi/$VERSION		# load your openmpi version
### end of generic part

# compile as well the hybrid program using MPI and OpenMP
mpicc -o your_program -O3 -fopen your_program.c 

# specific for Intel CPUs
# =======================
OPT=""
if [ ${OPENMPI/\./} -ge 41 ]; then
  OPT="--mca btl openib"
  if [ "$COMP" == "" ]; then		# special case for gcc, depending on hardware
    if [ "$(/usr/sbin/ibstat -l | grep hfi)" != "" ]; then   # if network is omnipath
	OPT="-mca mtl psm2 -mca pml cm -mca btl_openib_allow_ib 1"
    fi
  fi
fi 

mpirun $OPT -x "OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK" ./your_program

In this script is as well considered, that with changing OpenMPI versions additional (changing) options for the high bandwith networks are required.