Slurm User Guide

Our new server rayl3 uses Slurm, a highly configurable open-source workload manager. This guide will provide you with the necessary information to manage and control batch jobs on this shared cluster computing environment.

Slurm is responsible for accepting, scheduling, dispatching, and managing compute jobs on clusters. It allows for flexible management of computational resources and can be tailored to suit various research needs.

Getting Started with Rayl3

Checking States and Nodes:

  • State of the Cluster: Use sinfo to see the overall state of nodes and partitions.
  • Detailed Node Information: scontrol show nodes provides detailed information on each node.
  • Node Status by User: squeue -u username shows the status of nodes for a specific user.

Submitting Jobs

  1. Basic Job Submission: Submit a job script using sbatch: sbatch
  2. Options & Parameters: Specify options directly in the command line: sbatch --time=02:00:00 --nodes=2
  3. Interactive Jobs: Allocate resources for interactive usage with: salloc or srun --pty bash
  4. Array Jobs: Submit an array of jobs: sbatch --array=1-10
  5. Job Dependencies: Set dependencies between jobs: sbatch --dependency=afterok:12345
  6. Job Hold and Release: sbatch --hold scontrol release JOBID
  7. Quality of Service (QoS): sbatch --qos=high

Monitoring Jobs

  1. Viewing the Job Queue: squeue
  2. Detailed Information: scontrol show job JOBID
  3. Real-time Monitoring: sstat --jobs=JOBID
  4. Historical Information: sacct -j JOBID

Cancelling Jobs

Cancel a job using: scancel JOBID

Common Slurm Scripts

Basic batch script

Here is a basic script to run a job on Slurm:

#SBATCH --job-name=my_job
#SBATCH --output=result.txt
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --time=01:00:00

module load my_application
my_application < input_file > output_file

Parallel job script

If you need to run a parallel job, you can use a script like this:

#SBATCH --job-name=parallel_job
#SBATCH --output=result.txt
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=4
#SBATCH --time=02:00:00

module load mpi
mpirun my_parallel_application

Array job script

For submitting an array of jobs, you can use the following script:

#SBATCH --job-name=array_job
#SBATCH --output=result_%a.txt
#SBATCH --array=1-10
#SBATCH --time=00:30:00

./my_array_application $SLURM_ARRAY_TASK_ID


Remeber to load CUDA environment before running your application

#SBATCH --partition=gpu      
#SBATCH -o %j.out            
#SBATCH -e %j.err            
# Define paths to required software and libraries
export CUDA_HOME=/home/apps/cuda-11.7
export LD_LIBRARY_PATH=/home/apps/cuda-11.7/lib64:/home/apps/openmpi/4.1.5/lib
export PATH=/home/apps/cuda-11.7/bin:/home/apps/openmpi/4.1.5/lib:/bin

A standard script for running Amber22

Here is one example script for running production MD using Amber22

#SBATCH --partition=gpu      # Request the GPU partition for the job
#SBATCH -o %j.out            # Redirect standard output to a file named with the Job ID
#SBATCH -e %j.err            # Redirect standard error to a file named with the Job ID

# Define paths to required software and libraries
export CUDA_HOME=/home/apps/cuda-11.7
export AMBERHOME=/home/apps/amber22
export LD_LIBRARY_PATH=/home/apps/cuda-11.7/lib64:/home/apps/amber22/lib:/home/apps/openmpi/4.1.5/lib
export PATH=/home/apps/cuda-11.7/bin:/home/apps/amber22/bin:/home/apps/openmpi/4.1.5/lib:/bin

# Define the job name dynamically with the process ID

# Specify the Molecular Dynamics (MD) engine, pmemd with CUDA in single-precision mode

# Print environment variables for debugging

# Define the name of the molecule (Update according to your need)

echo "${pdb_name}job"

# Run the Molecular Dynamics (MD) simulation with Amber
${pmemd} \
        -O \
        -i \
        -p ${pdb_name}.prmtop \
        -c ${pdb_name}_Equil.rst \
        -o ${pdb_name}_Prod.out \
        -r ${pdb_name}_Prod.ncrst \
        -x ${pdb_name}