Using the GPU nodes with Slurm -

Using the GPU nodes with Slurm

There are several nodes in mesocentre with NVIDIA GPU card on board suitable for the GPU Computing.

To submit a job via SLURM on one of the machines equipped with a GPU card you have to specify a name of the partition dedicated to the GPU computing.

Here is the list of available partition dedicated to the GPU computing:

Nom de la partition	Nœud(s) associé(s)	Mémoire par CPU (Mo)
kepler	gpu[004-010]	13430
pascal	gpu[011-012]	11512
volta	gpu[013-017]	11500

Also, you have to specify the number of the GPU cards which you want to allocate for your job. It can be done with an instruction like –gres=gpu:2. In this example we are allocating 2 GPU cards for our job.

On a given GPU machine, each GPU card is having a proper ID number associated with it, starting from 0 through 3 (or 4, on few nodes). By default the typical program is going to use only one GPU accelerator with GPU_ID 0. The OS is not capable of attributing a GPU to a user, so in case of several users running their jobs on the same node there could be a situation when a GPU card with GPU_ID 0 is 100% occupied and shared between the jobs while the other GPUs on this node are free. So, it is absolutely essential to properly indicate the correct GPU_ID in your program. To do, so you should use the environment variable called $CUDA_VISIBLE_DEVICES which contains a list of GPU ids attributed to a given job.

Here is an example for a fortran code:

replace cudaSetDevice(0) in the .cuf by following:

#ifdef GPUID
  cudaSetDevice(GPUID)
#else
  cudaSetDevice(0)
#endif

Here is an example of a SLURM batch script using part of the code above:

#!/bin/bash
#SBATCH -p kepler
#SBATCH --gres=gpu:2

module load PGI/14.9
pgf90 -Mpreprocess -DGPUID=$CUDA_VISIBLE_DEVICES -fast -o exec exec.cuf

./exec

Here is an example of an interactive job on a GPU capable node:


srun -p volta --gres=gpu:1 --pty bash -i

Last updated : 15 January 2025