Springe direkt zu Inhalt

Management of Resources

The batch system manages resources, such as CPUs and memory, allocates these to waiting jobs, and uses a method called fairshare scheduling to ensure that computing time is shared fairly among all the users and contributes to the priority of jobs. Because the memory on the nodes is shared amoung the jobs, it is very important to estimate memory requirements as accurately as possible. A further mechanism, called Backfill, is employed to allow short-running jobs to fill in gaps in the scheduling plan and thus optimise resource usage.

The priority of jobs is regulated by fairshare scheduling. Whether job A is started before job B depends mainly on two factors:

  • when the job was submitted
  • how many shares are available to the user

In general, a job which is submitted to the batch system before another will also be started before the other. However, a user consumes so-called shares when his or her job runs based on the amount of CPU time, GPU time and RAM used. The more shares a user has, the more his or her jobs will be preferred. As time passes, a user is given new shares. In this manner, equitable access to resources is achieved.

This mechanism only plays a role when jobs compete for resources. If enough resources are available, a job will also be started, even if the owner no longer has any shares.

The priority of a job is mainly determined by the following factors, in order of importance

  • User Shares - The shares a user has in the sense of fairshare scheduling constitutes the most important factor. The more shares a users has, the higher the priority of his or her job will be. Consuming CPU time, GPU time or RAM reduces the number of shares a user has and thus lowers the priority of any waiting jobs.
  • Age of Job - How long the job has been waiting in the queue is another important factor. However the value is limited and reaches a maximum after a certain time.
  • Size of Job - Larger jobs in term of number of nodes requested will have their priority increased slightly.

The factors affecting the priority of the currently waiting jobs can be viewed with the following command, which sorts the results according to total priority:

sprio -Sy

The amount of memory a job requires should be given in the job control script. Care must be taken to specify the value as accurately as possible.

If the value given is too high, then

  • the job may have to wait longer than necessary before enough memory is available.
  • once the job starts, other jobs may have to wait, because although memory is available, it has been reserved for the running job. In the illustration below, although enough cores are available, the job shown cannot start because the queueing system cannot reserve enough memory on the node.

memory overestimation.png

If the value is too low, then

  • the job will be terminated once its memory requirement exceeds the amount of memory requested

Backfill is the mechanism whereby a lower-priority job can be started before a higher-priority job in such a way that the start of higher-priority job is not delayed.

In the diagram below, assume Job A has just started. Job B, which requires the node currently occupied by Job A plus some additional nodes, will have to wait until Job A has completed. If Job C has a shorter maximum run-time than Job A, Job C will be able to start before Job B, even if Job B has the higher priority and without delaying the start of Job B. In this way, "gaps" in the scheduling plan can be filled and the throughput of the cluster increased. backfill.png

Note that a job can only then benefit from backfilling if the run-time needed for that job is significantly less than the default maximum and this is specified. As an example for SLURM the following line

#SBATCH --time=2-12:00:00 

would set up a limit of 2 days and 12 hours. The shorter the time given, the more the job can benefit from backfilling. However, if the job reaches the time limit, it will be killed by the queueing system. See man sbatch for more details.