CPUs (Cores)
If your program uses some form of parallelism internally, such as multithreading, OpenMP or MPI, the job will in general need to request one core for each thread or process.
Please note that strictly one CPU will have a number of cores. However, the two terms are often used interchangeably to refer to a single processing unit. In particular, in the Slurm documentation CPU is often used where core is meant.
In order to determine the optimum number of cores for a job, the following two main aspects should be considered:
- Scaliing - How much faster does the program run when the number of cores is increased?
- Wait-time - How much longer will the job have to wait if more cores are requested?
Scaling
With regard to scaling, it is good practice to run the same job with different numbers of cores so that you can get an idea of how the run-time decreases with the degree of parallelisation. It may well be that a job with 32 cores does not run twice as fast as one with only 16, but, say only 1.2 times as fast.
Therefore, before a large number of similar jobs is run, a scaling experiment should be performed in which the same job is run with, say, 1, 2, 4, 8, 16 and 32 cores in order to determine the point at which increasing the number of cores begins to have a weaker effect on the speed of the job.
Wait-time
A job will have to wait if the resources it has requested are not available. If a job has to wait, in general the more resources it has requested, the longer it will have to wait. This is particularly the case if MPI parallelism is not used and all the cores must be within a single node. It is therefore important to ensures that the probable wait-time is appropriate compared with the actual run-time of the job.