Using the GPU nodes with Slurm

There are several nodes in mesocentre with NVIDIA GPU card on board suitable for the GPU Computing.

To submit a job via SLURM on one of the machines equipped with a GPU card you have to specify a name of the partition dedicated to the GPU computing.

Here is the list of available partition dedicated to the GPU computing:

Nom de la partitionNœud(s) associé(s)Mémoire par CPU (Mo)

Also, you have to specify the number of the GPU cards which you want to allocate for your job. It can be done with an instruction like –gres=gpu:2. In this example we are allocating 2 GPU cards for our job.

On a given GPU machine, each GPU card is having a proper ID number associated with it, starting from 0 through 3 (or 4, on few nodes). By default the typical program is going to use only one GPU accelerator with GPU_ID 0. The OS is not capable of attributing a GPU to a user, so in case of several users running their jobs on the same node there could be a situation when a GPU card with GPU_ID 0 is 100% occupied and shared between the jobs while the other GPUs on this node are free. So, it is absolutely essential to properly indicate the correct GPU_ID in your program. To do, so you should use the environment variable called $CUDA_VISIBLE_DEVICES which contains a list of GPU ids attributed to a given job.

Here is an example for a fortran code:

replace cudaSetDevice(0) in the .cuf by following:

#ifdef GPUID

Here is an example of a SLURM batch script using part of the code above:

#SBATCH -p kepler
#SBATCH --gres=gpu:2

module load PGI/14.9
pgf90 -Mpreprocess -DGPUID=$CUDA_VISIBLE_DEVICES -fast -o exec exec.cuf


Here is an example of an interactive job on a GPU capable node:

srun -p volta --gres=gpu:1 --pty bash -i

Dernière mise à jour : 2 December 2021

+33 (0)4 13 94 58 29 / (0)4 13 94 58 27