Using the GPU nodes with Slurm
There are several nodes in mesocentre with NVIDIA GPU card on board suitable for the GPU Computing.
To submit a job via SLURM on one of the machines equipped with a GPU card you have to specify a name of the partition dedicated to the GPU computing. The most generic partition referencing to all the nodes with the GPU accelerators is called gpu and can be indicated with an instruction like: -p gpu.
Here is the list of available partition dedicated to the GPU computing:
|Nom de la partition||Nœud(s) associé(s)||Mémoire par CPU (Mo)|
Also, you have to specify the number of the GPU cards which you want to allocate for your job. It can be done with an instruction like –gres=gpu:2. In this example we are allocating 2 GPU cards for our job.
The nodes gpu001-003 are interconnected by Ethernet computer network and we strongly discourage you from using these nodes in multi-node jobs.
On a given GPU machine, each GPU card is having a proper ID number associated with it, starting from 0 through 3 (or 4, on few nodes). By default the typical program is going to use only one GPU accelerator with GPU_ID 0. The OS is not capable of attributing a GPU to a user, so in case of several users running their jobs on the same node there could be a situation when a GPU card with GPU_ID 0 is 100% occupied and shared between the jobs while the other GPUs on this node are free. So, it is absolutely essential to properly indicate the correct GPU_ID in your program. To do, so you should use the environment variable called $CUDA_VISIBLE_DEVICES which contains a list of GPU ids attributed to a given job.
Here is an example for a fortran code:
replace cudaSetDevice(0) in the .cuf by following: #ifdef GPUID cudaSetDevice(GPUID) #else cudaSetDevice(0) #endif
Here is an example of a SLURM batch script using part of the code above:
#!/bin/bash #SBATCH -p gpu #SBATCH --gres=gpu:2 module load PGI/14.9 pgf90 -Mpreprocess -DGPUID=$CUDA_VISIBLE_DEVICES -fast -o exec exec.cuf ./exec
Here is an example of an interactive job on a GPU capable node:
srun -p gpu --gres=gpu:1 --pty bash -i