Our new server rayl3 uses Slurm, a highly configurable open-source workload manager. This guide will provide you with the necessary information to manage and control batch jobs on this shared cluster computing environment.
Slurm is responsible for accepting, scheduling, dispatching, and managing compute jobs on clusters. It allows for flexible management of computational resources and can be tailored to suit various research needs.
Getting Started with Rayl3
Checking States and Nodes:
- State of the Cluster: Use
sinfo
to see the overall state of nodes and partitions. - Detailed Node Information:
scontrol show nodes
provides detailed information on each node. - Node Status by User:
squeue -u username
shows the status of nodes for a specific user.
Submitting Jobs
- Basic Job Submission: Submit a job script using
sbatch
:sbatch myscript.sh
- Options & Parameters: Specify options directly in the command line:
sbatch --time=02:00:00 --nodes=2 myscript.sh
- Interactive Jobs: Allocate resources for interactive usage with:
salloc or srun --pty bash
- Array Jobs: Submit an array of jobs:
sbatch --array=1-10 myscript.sh
- Job Dependencies: Set dependencies between jobs:
sbatch --dependency=afterok:12345 myscript.sh
- Job Hold and Release:
sbatch --hold myscript.sh scontrol release JOBID
- Quality of Service (QoS):
sbatch --qos=high myscript.sh
Monitoring Jobs
- Viewing the Job Queue:
squeue
- Detailed Information:
scontrol show job JOBID
- Real-time Monitoring:
sstat --jobs=JOBID
- Historical Information:
sacct -j JOBID
Cancelling Jobs
Cancel a job using: scancel JOBID
Common Slurm Scripts
Basic batch script
Here is a basic script to run a job on Slurm:
#!/bin/bash
#SBATCH --job-name=my_job
#SBATCH --output=result.txt
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --time=01:00:00
module load my_application
my_application < input_file > output_file
Parallel job script
If you need to run a parallel job, you can use a script like this:
#!/bin/bash
#SBATCH --job-name=parallel_job
#SBATCH --output=result.txt
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=4
#SBATCH --time=02:00:00
module load mpi
mpirun my_parallel_application
Array job script
For submitting an array of jobs, you can use the following script:
#!/bin/bash
#SBATCH --job-name=array_job
#SBATCH --output=result_%a.txt
#SBATCH --array=1-10
#SBATCH --time=00:30:00
./my_array_application $SLURM_ARRAY_TASK_ID
Use GPU
Remeber to load CUDA environment before running your application
#!/bin/bash
#SBATCH --partition=gpu
#SBATCH -o %j.out
#SBATCH -e %j.err
# Define paths to required software and libraries
export CUDA_HOME=/home/apps/cuda-11.7
export LD_LIBRARY_PATH=/home/apps/cuda-11.7/lib64:/home/apps/openmpi/4.1.5/lib
export PATH=/home/apps/cuda-11.7/bin:/home/apps/openmpi/4.1.5/lib:/bin
A standard script for running Amber22
Here is one example script for running production MD using Amber22
#!/bin/bash
#SBATCH --partition=gpu # Request the GPU partition for the job
#SBATCH -o %j.out # Redirect standard output to a file named with the Job ID
#SBATCH -e %j.err # Redirect standard error to a file named with the Job ID
# Define paths to required software and libraries
export CUDA_HOME=/home/apps/cuda-11.7
export AMBERHOME=/home/apps/amber22
export LD_LIBRARY_PATH=/home/apps/cuda-11.7/lib64:/home/apps/amber22/lib:/home/apps/openmpi/4.1.5/lib
export PATH=/home/apps/cuda-11.7/bin:/home/apps/amber22/bin:/home/apps/openmpi/4.1.5/lib:/bin
# Define the job name dynamically with the process ID
job=user_$$
# Specify the Molecular Dynamics (MD) engine, pmemd with CUDA in single-precision mode
pmemd=$AMBERHOME/bin/pmemd.cuda_SPFP
# Print environment variables for debugging
echo "AMBERHOME = $AMBERHOME"
echo "LD_LIBRARY_PATH = $LD_LIBRARY_PATH"
echo "HOSTNAME = $HOSTNAME"
# Define the name of the molecule (Update according to your need)
pdb_name=myProtein
echo "${pdb_name}job"
# Run the Molecular Dynamics (MD) simulation with Amber
${pmemd} \
-O \
-i md.in \
-p ${pdb_name}.prmtop \
-c ${pdb_name}_Equil.rst \
-o ${pdb_name}_Prod.out \
-r ${pdb_name}_Prod.ncrst \
-x ${pdb_name}_Prod.nc