Introduction

The aim of this wiki is to help new users get started running jobs on Victoria's HPC Raapoi and to take you through:

What changes you need in your .com files
Setting up your directories and naming files
Creating a run script
Running your first job

An important resource is the Raapoi wiki

Your com file

A file created on your mac will not run on the hpc, it needs some additional information
You need to add the following at the top of your .com file

- %mem= for how much memory is required

- %nprocs= for how many processors are required

as a default you should use %mem=30GB and %procs=16
the following is the first part of a test.com file setup for the hpc

%nprocs=16
%mem=30GB
%chk=test.chk
# hf/3-21g geom=connectivity

Title Card Required

0 1
 C

See the page on memory and disk for further information on non-default usage of cpus and memeory

Set up your directories

You have 2 directories on Raapoi one is on /nfs/home and one on /nfs/scratch

-nfs stands for networked file system

your "home" directory is on the login node

- use your home directory to stage your jobs and back-up for completed jobs

- the home directory is backed-up

your "scratch" directory is on a huge shared disk

- use your scratch directory to run your jobs

- the scratch directory is NOT BACKED-UP

- this means you should assume that all the files on scratch could potentially be lost

organise your files (YES DO IT NOW)

- each molecule should have its own folder

- inside each folder have ALL the different conformers, use the filename to differentiate them, eg add _xx.com etc where xx designates the conformer

- NEVER deleate a *.log unless Tricia says its ok

- I add an _1 _2 _3 etc for each run of a particular conformer

- in the file name indicate if the job is

opt (optimisation)

fopt (frequency and optimisation)

freq (frequency only)

pop (some form of population analysis)

nmr (nmr computation)

some examples, of a water dimer, dimer A took 3 runs to fully optimise, then a frequency analysis was carried out (and a minima confirmed) followed by a population analysis. dimer B has only just been started and has run once.

water_dimer_A_opt_1.com
water_dimer_A_opt_2.com
water_dimer_A_opt_3.com
water_dimer_B_opt_1.com
water_dimer_A_freq.com
water_dimer_A_pop.com

Runscript

You need to submit your job through a batch queing system or sheduler
copy the runscript given below into a file called rung16.sh
place rung16.sh file in the same directory as your com file
change the following

- abc=your job name

- username=your user name

you can use vi global replace to change all abc and all username

- in command mode

- :%s/search_string/replacement_string/g

then you will need to run the script which will submit your job to the batch processing facility

- type sbatch rung16.sh

- if successful you will see something like Submitted batch job 295704

runscript

#!/bin/bash
#SBATCH --job-name=abc
#SBATCH --cpus-per-task=16
#SBATCH --mem=32GB
#SBATCH --partition=quicktest
#SBATCH --time=01:00:00
#SBATCH -o /nfs/scratch/username/abc.log
#SBATCH -e /nfs/scratch/username/abc.err

cp /nfs/home/username/abc.com /nfs/scratch/username/abc.com

test -r abc.chk
if [ $? -eq 0 ] 
then 
cp /nfs/home/username/abc.chk  /nfs/scratch/username/abc.chk
fi

cd /nfs/scratch/username/
module --quiet purge
module load gaussian/g16
g16 abc.com

test -r abc.log
if [ $? -eq 0 ] 
then 
  cp /nfs/scratch/username/abc.log /nfs/home/username/abc.log
fi

test -r abc.chk
if [ $? -eq 0 ] 
then 
  cp /nfs/scratch/username/abc.chk /nfs/home/username/abc.chk
fi

running the job in a specific node

- add the following line in your runscript

#SBATCH --nodelist=node_name

- nodename can refer to any node, such as itl02n01, itl02n02, etc.

- you can specify more than one node in the list, separating them with commas

- the job scheduler will allocate only the nodes explicitly listed in the option

- if the specified nodes are not available, the job will wait in the queue until those nodes become available

Monitoring your Job

Now that your job has been submitted you can monitor by using the command vuw-myjobs. This gives you the status of your jobs in the queues. Useful commands may be:

vuw-myjobs to get your jobs that are running

vuw-alljobs to get a list of all the jobs that are running

vuw-job-history to get quick view of all the jobs completed within the last 5 days

To delete a job from the queue you can use the command:

scancel [jobID]

The jobID is the first number when you view the running jobs, for example the jobID of the two jobs below are 472471 and 472470

QUICK TEST PARTITION QUEUE: quicktest (Default partition) 
          JOBID             NAME       USER    CPUS MIN_MEM         TIME    TIME_LEFT      STATE NODELIST(REASON)
         472471         DOS_Full   holmeswi      64    100G        17:03        42:57    RUNNING itl02n02
         472470         DOS_Full   holmeswi      64    100G        17:07        42:53    RUNNING itl02n01

To delete all your jobs from the queue were [user] is your username:

scancel -u [user]

Keep checking your job until it has run.

If your job has run and "completed" then you the .log file should be copied back to your working directory, check this to see if your job was successful. Your job can finish and NOT complete successfully, it is up to you to check each "completed" job to make sure it is ok. The job can "complete" with an error message, see the "checking your job" section if there has been a problem.

You will also find a file which has the extension: *.err. If there has been an error it will be detailed with this file along with the resources requested and used by your job.

Mod:Hunt Research Group/Running jobs on the HPC

Contents

Introduction

Your com file

Set up your directories

Runscript

Monitoring your Job

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Help

Tools