Contents
Batch Usage at RPTU Kaiserslautern-Landau
Batch System Concepts
- A cluster consists of a set of tightly connected computers, called nodes that are presented as a single system. The nodes are connected through high speed local network and have access to several shared file systems.
- A job is the execution of user defined work flows by the batch system that is, without user's intervention.
- Resource Manager is the software that manages the execution of jobs. It is responsible for managing the resources of a cluster like nodes, and memory and assures that jobs are not overlapping on the resources.
- Scheduler is the software that controlls user's jobs and the resource manager. It handles the job submissions and put jobs into queues, resp. partitions. It offers among other things user commands to start and stop jobs, interfaces to define work flows, interfaces for monitoring and accounting.
- Batch System is the combination of a scheduler and a resource manager. The batch system used on cluster 'Elwetritsch' is SLURM.
Batch Model
- Users should usually not submit their jobs to a batch queue resp. partition but specify resource limits and projects. Queues are then selected automatically to guarantee the best throughput.
- Job are scheduled according to priorities. The jobs with highest priorities will be scheduled next. A user's priority depends on the applied project size.
- The scheduler checks the queues and may even schedule jobs with lower priorities that can fill up gaps created by freeing resources for the next highest priority jobs. This so called backfilling scheduling enlarges the utilization and is strongly depending on meaningful wall time limits in the job description.
- Nodes are shared among users to allow non-parallel programs to run efficiently. This introduces sharing of the resources of a node and jobs on the same node may influence each other. The required number of cores are guarenteed for a job that is, cores in nodes are not shared.
- The priority is reduced while jobs are running.
- Users without an active project will only be started on a small subset of the cluster. This subset consist currently of 24 nodes only. The by far bigger part of the cluster is reserved for projects.
- Jobs submitted to the idle queue may utilize the whole cluster as well. But this jobs may be suspended or terminated as soon as a job in another queue is scheduled.
Characteristics of SLURM
SLURM is an open-source project developed and documented by SchedMD.Slurm groups the compute node into partitions replacing queues in other batch systems. Appropriate queues are automatically selected according to the resource requirements specified.
Slurm - User Commands
- salloc is used to request interactive job allocations. Best choice for interactive jobs is rz-launch.
- sbatch is used to submit a batch script.
- scancel is used to cancel a job or job step.
- scontrol provides some functionality to manage jobs and query information.
- sinfo is used to retrieve information about partitions, reservations and nodes.
- smap displays graphically the state of the partions. We recommend our www pages.
- sprio displays job priorities.
- squeue is used to query the list of pending and running jobs.
- srun is used to initiate job steps from within a job or start an interactive job in conjunction with salloc.
- sshare displays fair share information for each user.
- sstat allows to query status information about a running job.
- sview is a graphical user interface for job, partitions and nodes.
- sacct retrieves accounting information. We recomment rz-accounting instead.
- sacctmgr additional possibilities to query accounting information.
System Usage
The installed software of the cluster is organized through modules. Please contact us, if additional software is required.Special importance have compilers. We offer the GNU suites, commercial Intel compilers and a set of additional tools, compilers of the Portland Group (PGI) and a compiler for GPUs (nvcc, NVIDIA).
Job Scripts
Instead of passing options to sbatch for job submission, it is better to specify these options inside a job script. Here is a simple example. It will have the name "TestJob" requests 30 minutes time with 1 core on 1 node and writes output to the specified files. An email is sent when the job has finished. The file may be written with help of a graphical editor like "kate" or "xedit" or ...#!/bin/bash #SBATCH -J TestJob #SBATCH -N 1 #SBATCH -o TestJob-%j.out #SBATCH -e TestJob-%j.err #SBATCH -t 30 #SBATCH --mail-type=END echo "Executing on $HOSTNAME" sleep 5This script is submitted with the command
sbatch jobscript
SLURM - Job Submission Examples
Submit a job requesting 2 nodes for 1 hour with 8 tasks per node:#!/bin/bash #SBATCH -N 2 #SBATCH -t 1:00:00 #SBATCH --ntasks-per-node=8Submit a job requesting 2 nodes for 30 minutes with 1 task per node there each task spawns 8 threads (hybrid MPI with OpenMP):
#!/bin/bash #SBATCH -N 2 #SBATCH --cpus-per-task=8 #SBATCH --ntasks-per-node=1Submit a job requesting 1 core for 2 hours of architecture XEON_SP_6126 (Skylake architecture and partitions):
#!/bin/bash #SBATCH --ntasks=1 #SBATCH -t 2:00:00 #SBATCH --constraint=XEON_SP_6126Submit a job requesting 1 GPU of type V100 in project "WILLI":
#!/bin/bash #SBATCH --account=WILLI #SBATCH --ntasks=1 #SBATCH --gres=gpu:v100:1More examples ....