SLURM Cheatsheet¶
This page summarizes commonly used SLURM directives for batch jobs on the sciCORE cluster. Add these lines near the top of your submission script, after the #!/bin/bash line and before the commands that run your program.
#!/bin/bash
#SBATCH --job-name=my_job
#SBATCH --time=06:00:00
#SBATCH --qos=6hours
#SBATCH --cpus-per-task=1
#SBATCH --mem-per-cpu=1G
#SBATCH --output=my_job.%j.out
#SBATCH --error=my_job.%j.err
ml <software/version>
<your_command>
Warning
SLURM reads #SBATCH directives only before the first non-comment, non-empty shell command. Put all directives at the beginning of the script.
Essential directives¶
| Directive | Meaning |
|---|---|
--job-name |
Sets a recognizable name for the job in squeue, sacct, and email notifications.Example: #SBATCH --job-name=analysis01 |
--time |
Sets the maximum wall-clock time. The job is stopped when this limit is reached. Example: #SBATCH --time=06:00:00 |
--qos |
Selects the Quality of Service. The requested --time must fit within the selected QoS.Example: #SBATCH --qos=6hours |
--cpus-per-task |
Reserves CPU cores for one task. Use this for threaded applications such as OpenMP, Python multiprocessing within one process, or tools with a threads option. Example: #SBATCH --cpus-per-task=4 |
--ntasks |
Starts multiple tasks, usually for MPI jobs. Example: #SBATCH --ntasks=16 |
--mem-per-cpu |
Requests memory per allocated CPU core. Example: #SBATCH --mem-per-cpu=4G |
--mem |
Requests total memory for the job. Use either --mem or --mem-per-cpu, not both.Example: #SBATCH --mem=32G |
--output |
Writes standard output to a file. %x is replaced by the job name and %j by the job ID.Example: #SBATCH --output=logs/%x.%j.out |
--error |
Writes standard error to a file. Example: #SBATCH --error=logs/%x.%j.err |
Tip
Use the smallest realistic --time, --qos, CPU, and memory request. Smaller jobs usually start sooner and leave more resources available for other users.
Warning
If you use a directory in --output or --error, the directory must already exist. SLURM will not create it for you.
Time and QoS¶
| Runtime needed | Time example | Typical QoS |
|---|---|---|
| Up to 30 minutes | #SBATCH --time=00:30:00 |
#SBATCH --qos=30min |
| Up to 6 hours | #SBATCH --time=06:00:00 |
#SBATCH --qos=6hours |
| Up to 1 day | #SBATCH --time=1-00:00:00 |
#SBATCH --qos=1day |
| Up to 1 week | #SBATCH --time=7-00:00:00 |
#SBATCH --qos=1week |
| Up to 2 weeks | #SBATCH --time=14-00:00:00 |
#SBATCH --qos=2weeks |
Time can be written as HH:MM:SS or D-HH:MM:SS.
Additional information
Current QoS limits and partition details are listed in Batch Computing (SLURM). You can also check current limits on the cluster with usage.
CPU jobs¶
Single-core job
Use this for commands that run on one CPU core.
Threaded job
Use this for software that runs several threads within one process. Match the program’s own thread setting to --cpus-per-task.
MPI job
Use this for applications launched with srun or mpirun that start several MPI ranks.
Hybrid MPI and threaded job
Use this only if the application is explicitly configured to use both MPI ranks and threads.
#SBATCH --ntasks=4
#SBATCH --cpus-per-task=8
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
srun ./my_hybrid_program
Warning
Requesting more CPUs does not automatically make software run faster. The application must be configured to use the allocated CPUs.
Warning
The CPU job examples above do not include memory directives. In that case, SLURM uses the default memory allocation of 1 GB per CPU. Adapt the memory request with --mem or --mem-per-cpu according to the needs of your job.
GPU jobs¶
GPU jobs must request a GPU partition, a GPU-compatible QoS, and the number of GPUs.
#SBATCH --partition=a100
#SBATCH --qos=a100-6hours
#SBATCH --gres=gpu:1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=4
#SBATCH --mem=32G
Common GPU-related directives are:
| Directive | Meaning |
|---|---|
--partition |
Selects the GPU node group. Example: #SBATCH --partition=a100 |
--qos |
Selects the matching GPU QoS. Example: #SBATCH --qos=a100-6hours |
--gres |
Requests one GPU. Increase the number only if your application can use multiple GPUs. Example: #SBATCH --gres=gpu:1 |
Warning
CPU QoS values such as 6hours cannot be used for GPU jobs. Use the QoS that matches the requested GPU partition, for example a100-6hours, rtx4090-6hours, l40s-6hours, h200-6hours, or titan-6hours.
Job arrays¶
Job arrays are useful when you need to run the same script many times with different inputs.
Inside the script, use $SLURM_ARRAY_TASK_ID to differentiate between array tasks.
Limit how many array tasks run at the same time with %.
This submits 100 array tasks but runs at most 10 simultaneously.
Email notifications¶
| Directive | Meaning |
|---|---|
--mail-type |
Sends email when the job ends or fails. Other common values are BEGIN, TIME_LIMIT, and ALL.Example: #SBATCH --mail-type=END,FAIL |
--mail-user |
Sets the email address for notifications. Example: #SBATCH --mail-user=firstname.lastname@unibas.ch |
Storage and temporary files¶
| Directive or variable | Meaning |
|---|---|
--tmp |
Requests a node with at least this amount of free local scratch space. Example: #SBATCH --tmp=50G |
$TMPDIR |
Points to the job-specific local scratch directory on the compute node. Example: cp input.dat $TMPDIR/ |
Tip
For I/O-heavy jobs, copy input files to $TMPDIR, run the calculation there, and copy final results back to your project or home directory before the job ends.
Useful SLURM variables¶
| Variable | Meaning |
|---|---|
$SLURM_JOB_ID |
Job ID assigned by SLURM. |
$SLURM_JOB_NAME |
Job name. |
$SLURM_SUBMIT_DIR |
Directory from which the job was submitted. |
$SLURM_CPUS_PER_TASK |
Number of CPUs requested per task. |
$SLURM_NTASKS |
Number of tasks requested. |
$SLURM_ARRAY_TASK_ID |
Current task index in a job array. |
$SLURM_JOB_NODELIST |
Nodes allocated to the job. |
Complete examples¶
Python script¶
#!/bin/bash
#SBATCH --job-name=python_analysis
#SBATCH --time=02:00:00
#SBATCH --qos=6hours
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --mem=8G
#SBATCH --output=python_analysis.%j.out
#SBATCH --error=python_analysis.%j.err
ml Python/<version>
python my_script.py inputdata.txt
Multithreaded application¶
#!/bin/bash
#SBATCH --job-name=threads
#SBATCH --time=06:00:00
#SBATCH --qos=6hours
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=8
#SBATCH --mem-per-cpu=4G
#SBATCH --output=threads.%j.out
#SBATCH --error=threads.%j.err
ml <software/version>
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
./my_program
Array job¶
#!/bin/bash
#SBATCH --job-name=array_analysis
#SBATCH --time=01:00:00
#SBATCH --qos=6hours
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --mem=4G
#SBATCH --array=1-50%5
#SBATCH --output=array_%A_%a.out
#SBATCH --error=array_%A_%a.err
ml Python/<version>
python analyze.py sample_${SLURM_ARRAY_TASK_ID}.txt
In array output filenames, %A is replaced by the array job ID and %a by the array task ID.
Common mistakes¶
- Using spaces around
=in directives, for example#SBATCH --time = 06:00:00. - Requesting CPUs or GPUs that the program does not use.
- Selecting a QoS that is shorter than the requested
--time. - Using a GPU partition without a matching GPU QoS.
- Writing output or error files to a directory that does not exist.
- Forgetting to copy results from
$TMPDIRbefore the job finishes.
Additional information
For a longer explanation of SLURM scripts, monitoring commands, queues, partitions, and GPU usage, see Batch Computing (SLURM).