Mod:Hunt Research Group/Running jobs on NeSI
Jump to navigation
Jump to search
Contents
Using NeSI
getting started
- contact tricia to get an account
- login online my.nesi.org.nz using your VUW credentials
- configure your 2-factor authentication as described here
- setup your .config file as described here
ssh mahuikaand provide the required passwords and you are in!
[tricia@]/Users/tricia $ ssh mahuika (huntpa@lander.nesi.org.nz) Login Password (First Factor): (huntpa@lander.nesi.org.nz) Authenticator Code (Second Factor): (huntpa@login.mahuika.nesi.org.nz) Login Password:
- you should only need to authenticate once (each time you login), there after ssh mahuika form new terminals will go straight to your home directory
- you will want to edit either the .bashrc or .bash_profile to your liking
- take a look at the slides here slides
- general top level support page here
- information on the partitions here
directories and moving files to and from mahuika
- directories are
- /home/username 20GB
- /nesi/project/vuw04056 100GB
- /nesi/nobackup/vuw04056 10TB
- on the local computer
scp <path/filename> mahuika:<path/filename>
- for example
[tricia@]/Volumes/Tricia_Home/work/jobs/sangoro $ scp water.inp mahuika:. water.inp 100% 218 13.8KB/s 00:00 [tricia@]/Volumes/Tricia_Home/work/jobs/sangoro
- further information (including for Windows) is here
TEST creating a batch script and checking you can run jobs
- NeSI uses Slurm like Raapoi so you will need a batch script
- test that you can submit with this script
- copy into run.sh and type
sbatch run.sh
#!/bin/bash -e #SBATCH --job-name=SerialJob # job name (shows up in the queue) #SBATCH --time=00:01:00 # Walltime (HH:MM:SS) #SBATCH --mem=512MB # Memory in MB #SBATCH --qos=debug # debug QOS for high priority job tests pwd # Prints working directory
- you should see a slurm-*.out file which contains your pwd
huntpa@mahuika01 /home/huntpa $ cat slurm-52598016.out /home/huntpa
- check you can run a parallel job
#!/bin/bash -e #SBATCH --job-name=MPIJob # job name (shows up in the queue) #SBATCH --time=00:01:00 # Walltime (HH:MM:SS) #SBATCH --mem-per-cpu=512MB # Memory in MB #SBATCH --cpus-per-task=4 # 2 Physical cores per task. #SBATCH --ntasks=2 # number of tasks (e.g. MPI) srun pwd # Prints working directory
REAL batch script
- modify this script to run jobs
- I call mine runorcaP.sh
- you should be running jobs in our nobackup directory /nesi/nobackup/vuw04056
- create your own directory in this folder and run your jobs there
- eg mine is /nesi/nobackup/vuw04056/tricia
- copy the completed job files back into our shared project directory /nesi/project/vuw04056/your_name
- make sure you set maxcore at about (2/3)*(mem/ntasks)
- so in the example there is 1G per task, so I set %maxcore 800 (because its a small job)
- a node has 128 cores, each core has 2cpus, in the script we call cores
- we do NOT want to work across nodes hence nodes=1
- we want orca to call 8 or 16 cores hence ntasks=no of cores
- in the sacct analysis you might see 16 or 32 cpus (because 2 cpus per core)
- partition selection set on advice from NeSI support
- if you want to submit more than 50 linked jobs (ie same script different geometry) use an array job
#!/bin/bash -e
#SBATCH --job-name=XX
#SBATCH --time=05:00:00
#SBATCH --nodes=1
#SBATCH --ntasks=8
#SBATCH --mem=8G
#SBATCH --error=./workdir_%j/slurm_%j.err
#SBATCH --output=./workdir_%j/XX.out
#SBATCH --partition=milan,large,long,bigmem
echo "slurm job ID: ${SLURM_JOBID}" > ./workdir_${SLURM_JOB_ID}/XX.minfo
echo "start time " >> ./workdir_${SLURM_JOB_ID}/XX.minfo
date >> ./workdir_${SLURM_JOB_ID}/XX.minfo
cd ./workdir_${SLURM_JOB_ID}
cp ${SLURM_SUBMIT_DIR}/XX.inp .
module --quiet purge
module load ORCA/5.0.4-OpenMPI-4.1.5
# ORCA under MPI requires that it be called via its full absolute path
orca_exe=$(which orca)
# Don't use "srun" as ORCA does that itself when launching its MPI process.
${orca_exe} XX.inp
echo "finish time " >> ./workdir_${SLURM_JOB_ID}/XX.minfo
date >> ./workdir_${SLURM_JOB_ID}/XX.minfo
checking jobs
- key commands
- squeue all jobs
- squeue --me my jobs
- scancel jobID kill named job
- sacct -x jobs run in last day
- we "pay" for NeSI usage so it is important to make sure you are using the system effectively
- nn_seff jobID summary of cpu and memory efficiency
- eg water input file
huntpa@mahuika01 /home/huntpa $ cat water.inp !PBE opt numfreq def2-SVP def2/J smallprint TightSCF NoPop xyzfile %maxcore 2000 %pal nprocs 8 end %elprop Polar 1 end * xyz 0 1 O 0.0000 0.0000 0.0626 H -0.7920 0.0000 -0.4973 H 0.7920 0.0000 -0.4973 *
- water job submit script below, run for 10min, the job did not complete but only just
huntpa@mahuika01 /home/huntpa $ nn_seff 52598705 Cluster: mahuika Job ID: 52598705 State: TIMEOUT Cores: 8 Tasks: 8 Nodes: 2 Job Wall-time: 103.2% 00:10:19 of 00:10:00 time limit CPU Efficiency: 34.4% 00:28:25 of 01:22:32 core-walltime Mem Efficiency: 1.8% 590.79 MB (0.00 MB to 100.31 MB / task) of 32.00 GB (4.00 GB/task)
- check the memory usage after an example job
huntpa@mahuika01 /home/huntpa/workdir_52598705 $ grep -i 'Memory' water.out Shared memory : Shared parallel matrices Maximum memory used throughout the entire GTOINT-calculation: 8.7 MB Maximum memory used throughout the entire SCF-calculation: 6.2 MB Maximum memory used throughout the entire SCFGRAD-calculation: 4.9 MB Maximum memory used throughout the entire GTOINT-calculation: 8.7 MB Maximum memory used throughout the entire SCF-calculation: 6.2 MB Maximum memory used throughout the entire SCFGRAD-calculation: 5.0 MB Maximum memory used throughout the entire GTOINT-calculation: 8.7 MB Maximum memory used throughout the entire SCF-calculation: 6.2 MB Maximum memory used throughout the entire SCFGRAD-calculation: 4.9 MB Maximum memory used throughout the entire GTOINT-calculation: 8.7 MB Maximum memory used throughout the entire SCF-calculation: 6.2 MB Maximum memory used throughout the entire SCFGRAD-calculation: 5.1 MB Maximum memory used throughout the entire GTOINT-calculation: 8.7 MB Maximum memory used throughout the entire SCF-calculation: 6.2 MB Memory available ... 1996.8 MB Memory needed per perturbation ... 0.0 MB
- use
sacct --format="JobID,JobName,Elapsed,AveCPU,MinCPU,TotalCPU,Alloc,NTask,MaxRSS,State" -j <jobid>
huntpa@mahuika01 /home/huntpa $ sacct --format="JobID,JobName,Elapsed,AveCPU,MinCPU,TotalCPU,Alloc,NTask,MaxRSS,State" -j 52598705 JobID JobName Elapsed AveCPU MinCPU TotalCPU AllocCPUS NTasks MaxRSS State ------------ ---------- ---------- ---------- ---------- ---------- ---------- -------- ---------- ---------- 52598705 water 00:10:19 28:24.740 16 TIMEOUT 52598705.ba+ batch 00:10:20 00:00:03 00:00:03 00:02.601 12 1 87444K CANCELLED 52598705.ex+ extern 00:10:19 00:00:00 00:00:00 00:00.001 16 2 0 COMPLETED 52598705.0 orca_gtoi+ 00:00:18 00:00:06 00:00:04 00:56.724 16 8 101592K COMPLETED 52598705.1 orca_scf_+ 00:02:27 00:00:43 00:00:26 05:51.947 16 8 100872K COMPLETED 52598705.2 orca_scfg+ 00:00:06 00:00:04 00:00:03 00:35.225 16 8 98876K COMPLETE ... and more
check our usage
- check core usage by the group
nn_corehour_usage vuw04056
huntpa@mahuika01 /home/huntpa $ nn_corehour_usage vuw04056 Note: Fair Share rankings will only be shown for the current cluster, mahuika. Project vuw04056 ================ Project vuw04056 on the mahuika cluster --------------------------------------- Fair share score on mahuika: 0.998855 out of 1.0 Ranked 158th of 622 active projects (behind 25.24% of active projects) Usage period CPU core hours P100 GPU device hours A100 GPU device hours GB-hours of RAM Compute units ------------ -------------- --------------------- --------------------- --------------- ------------- 2025-01-14T15:00:00 to 2025-01-15T15:00:00 0 0 0 0 0
running an orca job
- note you cannot use Gaussian or Gaussview on NeSI
- we will be using Orca
- check that a code is installed and available using
module spider "code"
- for example
module spider orca
----------------------------------------------------------------------------------
ORCA:
----------------------------------------------------------------------------------
Description:
ORCA is a flexible, efficient and easy-to-use general purpose tool for quantum chemistry with specific
emphasis on spectroscopic properties of open-shell molecules. It features a wide variety of standard quantum
chemical methods ranging from semiempirical methods to DFT to single- and multireference correlated ab initio
methods. It can also treat environmental and relativistic effects.
Versions:
ORCA/4.0.1-OpenMPI-2.0.2
ORCA/4.2.1-OpenMPI-3.1.4
ORCA/5.0.1-OpenMPI-4.1.1
ORCA/5.0.3-OpenMPI-4.1.1
ORCA/5.0.4-OpenMPI-4.1.5
----------------------------------------------------------------------------------
For detailed information about a specific "ORCA" module (including how to load the modules) use the module's full name.
For example:
$ module spider ORCA/5.0.4-OpenMPI-4.1.5
----------------------------------------------------------------------------------
- here is a very basic submit script
#!/bin/bash -e
#SBATCH --job-name=water
#SBATCH --time=00:10:00
#SBATCH --ntasks=8
#SBATCH --mem-per-cpu=2G
#SBATCH --error=./workdir_%j/slurm_%j.err
#SBATCH --output=./workdir_%j/water.out
echo "slurm job ID: ${SLURM_JOBID}" > ./workdir_${SLURM_JOB_ID}/water.minfo
echo "start time " >> ./workdir_${SLURM_JOB_ID}/water.minfo
date >> ./workdir_${SLURM_JOB_ID}/water.minfo
cd ./workdir_${SLURM_JOB_ID}
cp ${SLURM_SUBMIT_DIR}/water.inp .
module --quiet purge
module load ORCA/5.0.4-OpenMPI-4.1.5
# ORCA under MPI requires that it be called via its full absolute path
orca_exe=$(which orca)
# Don't use "srun" as ORCA does that itself when launching its MPI process.
${orca_exe} water.inp
echo "finish time " >> ./workdir_${SLURM_JOB_ID}/water.minfo
date >> ./workdir_${SLURM_JOB_ID}/water.minfo