Batch Usage at RPTU Kaiserslautern-Landau

Batch System Concepts

A cluster consists of a set of tightly connected computers, called nodes that are presented as a single system. The nodes are connected through high speed local network and have access to several shared file systems.
A job is the execution of user defined work flows by the batch system that is, without user's intervention.
Resource Manager is the software that manages the execution of jobs. It is responsible for managing the resources of a cluster like nodes, and memory and assures that jobs are not overlapping on the resources.
Scheduler is the software that controlls user's jobs and the resource manager. It handles the job submissions and put jobs into partitions. It offers among other things user commands to start and stop jobs, interfaces to define work flows, interfaces for monitoring and accounting.
Batch System is the combination of a scheduler and a resource manager. The batch system used on cluster 'Elwetritsch' is SLURM.

Batch Model

Users should usually not submit their jobs to a partition but specify resource limits and projects instead. Partitions are then selected automatically to guarantee the best throughput.
Job are scheduled according to priorities. The jobs with highest priorities will be scheduled next. A user's priority depends on the applied project size.
The scheduler checks the pending queue and may even schedule jobs with lower priorities that can fill up gaps created by freeing resources for the next highest priority jobs. This so called backfilling scheduling enlarges the utilization and is strongly depending on meaningful wall time limits in the job description.
Nodes are shared among users to allow non-parallel programs to run efficiently. This introduces sharing of the resources of a node and jobs on the same node may influence each other. The required number of cores are guarenteed for a job that is, cores in nodes are not shared.
The priority is reduced while jobs are running.
Users without an active project will only be started on a small subset of the cluster. This subset consist currently of 24 nodes only. The by far bigger part of the cluster is reserved for projects.
Jobs submitted to the partition idle may utilize the whole cluster as well. But these jobs may be suspended or terminated as soon as a job in another partition needs the resources.

Characteristics of SLURM

SLURM is an open-source project developed and documented by SchedMD.

Slurm groups the compute node into partitions. Appropriate partitions are automatically selected according to the resource requirements specified.

Slurm - User Commands

salloc is used to request interactive job allocations. Best choice for interactive jobs is rz-launch.
sbatch is used to submit a batch script.
scancel is used to cancel a job or job step.
scontrol provides some functionality to manage jobs and query information.
sinfo is used to retrieve information about partitions, reservations and nodes.
smap displays graphically the state of the partions. We recommend our www pages.
sprio displays job priorities.
squeue is used to query the list of pending and running jobs.
srun is used to initiate job steps from within a job or start an interactive job in conjunction with salloc.
sshare displays fair share information for each user.
sstat allows to query status information about a running job.
sview is a graphical user interface for job, partitions and nodes.
sacct retrieves accounting information. We recomment rz-accounting instead.
sacctmgr additional possibilities to query accounting information.

System Usage

The installed software of the cluster is organized through modules. Please contact us, if additional software is required.

Special importance have compilers. We offer the GNU suites, commercial Intel compilers and a set of additional tools, compilers of the Portland Group (PGI) and a compiler for GPUs (nvcc, NVIDIA).

Job Scripts

Instead of passing options to sbatch for job submission, it is better to specify these options inside a job script. Here is a simple example. It will have the name "TestJob" requests 30 minutes time with 1 core on 1 node and writes output to the specified files. An email is sent when the job has finished. The file may be written with help of a graphical editor like "kate" or "xedit" or ...

#!/bin/bash
#SBATCH -J TestJob
#SBATCH -N 1
#SBATCH -o TestJob-%j.out
#SBATCH -e TestJob-%j.err
#SBATCH -t 30
#SBATCH --mail-type=END

echo "Executing on $HOSTNAME"

sleep 5

This script is submitted with the command

sbatch jobscript

SLURM - Job Submission Examples

Submit a job requesting 2 nodes for 1 hour with 8 tasks per node:

#!/bin/bash
#SBATCH -N 2
#SBATCH -t 1:00:00
#SBATCH --ntasks-per-node=8

Submit a job requesting 2 nodes for 30 minutes with 1 task per node there each task spawns 8 threads (hybrid MPI with OpenMP):

#!/bin/bash
#SBATCH -N 2
#SBATCH --cpus-per-task=8
#SBATCH --ntasks-per-node=1

Submit a job requesting 1 core for 2 hours of architecture XEON_SP_6126 (Skylake architecture and partitions):

#!/bin/bash
#SBATCH --ntasks=1
#SBATCH -t 2:00:00
#SBATCH --constraint=XEON_SP_6126

Submit a job requesting 1 GPU of type V100 in project "WILLI":

#!/bin/bash
#SBATCH --account=WILLI
#SBATCH --ntasks=1
#SBATCH --gres=gpu:v100:1

More examples ....

Questions?

Of course we have forgotten to mention and clarify your problem. Please inform us.

Main Navigation

Contents