Trackable RESources (TRES)
A TRES is a resource that can be tracked for usage or used to enforce limits against. A TRES is a combination of a Type and a Name. Types are predefined. Current TRES Types are:
- BB (burst buffers)
- Billing
- CPU
- Energy
- FS (filesystem)
- GRES
- IC (interconnect)
- License
- Mem (Memory)
- Node
- Pages
- VMem (Virtual Memory/Size)
The Billing TRES is calculated from a partition's TRESBillingWeights. Though TRES weights on a partition may be defined as doubles, the Billing TRES values for a job are stored as integers. This is not the case when calculating a job's fairshare where the value is treated as a double.
Valid 'FS' TRES are 'disk' (local disk) and 'lustre'. These are primarily there for reporting usage, not limiting access.
Valid 'IC' TRES is 'ofed'. These are primarily there for reporting usage, not limiting access.
slurm.conf settings
- AccountingStorageTRES
Used to define which TRES are to be tracked on the system. By default Billing, CPU, Energy, Memory, Node, FS/Disk, Pages and VMem are tracked. These default TRES cannot be disabled, but only appended to. The following example:
AccountingStorageTRES=gres/gpu,license/iop1
will track billing, cpu, energy, memory, nodes, fs/disk, pages and vmem along with a GRES called gpu, as well as a license called iop1. Whenever these resources are used on the cluster they are recorded. TRES are automatically set up in the database on the start of the slurmctld.
The TRES that require associated names are BB, GRES, and License. As seen in the above example, GRES and License are typically different on each system. The BB TRES is named the same as the burst buffer plugin being used. In the above example we are using the Cray burst buffer plugin.
When including a specific GRES with a subtype, it is also recommended to include its generic type, otherwise a request with only the generic one won't be accounted for. For example, if we want to account for gres/gpu:tesla, we would also include gres/gpu for accounting gpus in requests like srun --gres=gpu:1.
AccountingStorageTRES=gres/gpu,gres/gpu:tesla
- PriorityWeightTRES
A comma separated list of TRES Types and weights that sets the degree that each TRES Type contributes to the job's priority.
PriorityWeightTRES=CPU=1000,Mem=2000,GRES/gpu=3000
Applicable only if PriorityType=priority/multifactor and if AccountingStorageTRES is configured with each TRES Type. The default values are 0.
The Billing TRES is not available for priority calculations because the number isn't generated until after the job has been allocated resources — since the number can change for different partitions.
- TRESBillingWeights
For each partition this option is used to define the billing weights of each TRES type that will be used in calculating the usage of a job.
Billing weights are specified as a comma-separated list of TRES=Weight pairs.
Any TRES Type is available for billing. Note that the base unit for memory and burst buffers is megabytes.
By default the billing of TRES is calculated as the sum of all TRES types multiplied by their corresponding billing weight.
The weighted amount of a resource can be adjusted by adding a suffix of K,M,G,T or P after the billing weight. For example, a memory weight of "mem=.25" on a job allocated 8GB will be billed 2048 (8192MB *.25) units. A memory weight of "mem=.25G" on the same job will be billed 2 (8192MB * (.25/1024)) units.
When a job is allocated 1 CPU and 8 GB of memory on a partition configured with:
TRESBillingWeights="CPU=1.0,Mem=0.25G,GRES/gpu=2.0,license/licA=1.5"
the billable TRES will be:
(1*1.0) + (8*0.25) + (0*2.0) + (0*1.5) = 3.0
If PriorityFlags=MAX_TRES is configured, the billable TRES is calculated as the MAX of individual TRESs on a node (e.g. cpus, mem, gres) plus the sum of all global TRESs (e.g. licenses). Using the same example above, the billable TRES will be:
MAX(1*1.0, 8*0.25, 0*2.0) + (0*1.5) = 2.0
If TRESBillingWeights is not defined then the job is billed against the total number of allocated CPUs.
NOTE: TRESBillingWeights is only used when calculating fairshare and doesn't affect job priority directly as it is currently not used for the size of the job. If you want TRESs to play a role in the job's priority then refer to the PriorityWeightTRES option.
NOTE: As with PriorityWeightTRES only TRES defined in AccountingStorageTRES are available for TRESBillingWeights.
NOTE: Jobs can be limited based off of the calculated TRES billing value. See Resource Limits documentation for more information.
NOTE: If a Billing TRES is defined as a weight, it is ignored.
NOTE: Setting gres/gpu will also set gres/gpumem and gres/gpuutil. gres/gpumem and gres/gpuutil can be set individually when gres/gpu is not set.
sacct
sacct can be used to view the TRES of each job by adding "tres" to the --format option.
sacctmgr
sacctmgr is used to view the various TRES available globally in the system. sacctmgr show tres will do this.
sreport
sreport reports on different TRES. Simply using the comma separated input option --tres= will have sreport generate reports available for the requested TRES types. More information about these reports can be found on the sreport manpage.
In sreport, the "Reported" Billing TRES is calculated from the largest Billing TRES of each node multiplied by the time frame. For example, if a node is part of multiple partitions and each has a different TRESBillingWeights defined the Billing TRES for the node will be the highest of the partitions. If TRESBillingWeights is not defined on any partition for a node then the Billing TRES will be equal to the number of CPUs on the node.
Last modified 16 August 2024