Job State Codes

Each job in the Slurm system has a state assigned to it. How the job state is displayed depends on the method used to identify the state.

Overview

In the Slurm code, there are base states and state flags. Each job has a base state and may have additional state flags set. When using the REST API, both the base state and current flag(s) will be returned.

When the squeue and sacct command report a job state, they represent it as a single state. Both will recognize all base states but not all state flags. If a recognized flag is present, it will be reported instead of the base state. Refer to the relevant command documentation for details.

This page represents all job codes and flags that are represented in the code. The names provided are the string representations that are used in user-facing output. For most, the names used in the code are identical, with JOB_ at the start. For more visibility into the job states and flags, set DebugFlags=TraceJobs and SlurmctldDebug=verbose (or higher) in slurm.conf.

Job states

Each job known to the system will have one of the following states:

NameDescription
BOOT_FAILterminated due to node boot failure
CANCELLEDcancelled by user or administrator
COMPLETEDcompleted execution successfully; finished with an exit code of zero on all nodes
DEADLINEterminated due to reaching the latest acceptable start time specified for the job
FAILEDcompleted execution unsuccessfully; non-zero exit code or other failure condition
NODE_FAILterminated due to node failure
OUT_OF_MEMORYexperienced out of memory error
PENDINGqueued and waiting for initiation; will typically have a reason code specifying why it has not yet started
PREEMPTEDterminated due to preemption; may transition to another state based on the configured PreemptMode and job characteristics
RUNNINGallocated resources and executing
SUSPENDEDallocated resources but execution suspended, such as from preemption or a direct request from an authorized user
TIMEOUTterminated due to reaching the time limit, such as those configured in slurm.conf or specified for the individual job

Job flags

Jobs may have additional flags set:

NameDescription
COMPLETINGjob has finished or been cancelled and is performing cleanup tasks, including the epilog script if present
CONFIGURINGjob has been allocated nodes and is waiting for them to boot or reboot
LAUNCH_FAILEDfailed to launch on the chosen node(s); includes prolog failure and other failure conditions
POWER_UP_NODEjob has been allocated powered down nodes and is waiting for them to boot
RECONFIG_FAILnode configuration for job failed
REQUEUEDjob is being requeued, such as from preemption or a direct request from an authorized user
REQUEUE_FEDrequeued due to conditions of its sibling job in a federated setup
REQUEUE_HOLDsame as REQUEUED but will not be considered for scheduling until it is released
RESIZINGthe size of the job is changing; prevents conflicting job changes from taking place
RESV_DEL_HOLDheld due to deleted reservation
REVOKEDrevoked due to conditions of its sibling job in a federated setup
SIGNALINGoutgoing signal to job is pending
SPECIAL_EXITsame as REQUEUE_HOLD but used to identify a special situation that applies to this job
STAGE_OUTstaging out data (burst buffer)
STOPPEDreceived SIGSTOP to suspend the job without releasing resources
UPDATE_DBsending an update about the job to the database

Last modified 01 October 2024