Running CASTEP with gLite JDL
From ScotGrid
| Table of contents |
Introduction
ScotGrid-Glasgow supports many different submission mechanisms all through EGEE middleware called gLite. We do not support direct submission to the batch system via qsub as we operate a Grid cluster rather than a batch system. This page describes using the gLite command line tools and submission written in JDL. This gives you all the flexibility required to run CASTEP both serially and with MPI. The page will go through two examples of using JDL both for a serial run and an MPI run using OPENMPI. Using this method you can submit jobs from your home area $HOME or $CLUSTER_SHARED: /cluster/share/gla*** on the ScotGrid UI svr020.
The example below has been created using a sample tao_cr_5.cell, tao_cr_5.param file that requires O_00.recpot and Ta_00.recpot. Theoretically these would have have been copied onto svr020 via scp and be available in a directory that you wish to submit from.
Using CASTEP with MPI (built against gfortran44/OPENMPI)
Check out our general MPI page: MPI at Glasgow
General Principles
We're using a package called mpi-start to manage the use of MPI applications. The purpose of this package it to manage some of the complexity of getting MPI to work with the heterogeneous installations across the grid. The idea is that with mpi-start the job that you submit is the same, independent of the site you submit to; and that it works equally well in all cases. In reality, there's a couple of little wrinkles left, but it's pretty good at what it does.
CASTEP Example
There are 4 components to the tao_cr_5 run example.
- The executable (as source code in this case)
- The JDL: to tell the gLite middleware what to do.
- The mpi-start-wrapper: a short launcher script
- mpi-hooks: used to compile the source on the target node
executable
Since CASTEP runs using a pre-compiled binary, you just need to know where it is installed.
/expsoft/ssp/castep/5.0.1/castep-openmpi/castep-openmpi
tao_cr_5.jdl
CpuNumber = 24;
Executable = "mpi-start-wrapper.sh";
Arguments = "";
StdOutput = "tao_cr_5.stdout";
StdError = "tao_cr_5.stderr";
InputSandbox = {"tao_cr_5.param", "tao_cr_5.cell","O_00.recpot","Ta_00.recpot", "mpi-start-wrapper.sh", "mpi-hooks.sh" };
OutputSandbox = {"tao_cr_5.stdout", "tao_cr_5.stderr"};
OutputSandboxDestURI = {
"gsiftp://svr008.gla.scotgrid.ac.uk/clusterhome/home/gla0**/castep/mpi/tao2/tao_cr_5.stdout",
"gsiftp://svr008.gla.scotgrid.ac.uk/clusterhome/home/gla0**/castep/mpi/tao2/tao_cr_5.stderr"
} ;
VirtualOrganisation = "vo.ssp.ac.uk";
Requirements = other.GlueSiteUniqueID == "UKI-SCOTGRID-GLASGOW"
&& other.GlueCEImplementationName == "CREAM"
&& other.GlueCEUniqueID == "svr014.gla.scotgrid.ac.uk:8443/cream-pbs-mpi";
ShallowRetryCount = -1;
mpi-start-wrapper.sh
The mpi-start-wrapper is a generic launcher script. In principle, this script can be used without modification in all cases, the necessary changes places into the hooks and JDL. For running CASTEP with OPENMPI you can use this example. The I2G_MPI_APPLICATION and I2G_MPI_APPLICATION_ARGS are just dummy values in the case as due to a local Glasgow bug we override there in the post-hooks.sh.
# Convert flavor to lowercase for passing to mpi-start. MPI_FLAVOR_LOWER=openmpi # Pull out the correct paths for the requested flavor. eval MPI_PATH=`printenv MPI_OPENMPI_PATH` # Ensure the prefix is correctly set. Don't rely on the defaults. eval I2G_OPENMPI_PREFIX=$MPI_PATH export I2G_OPENMPI_PREFIX # Setup for mpi-start. # Dummy value for the moment - needs to be a real file to get the distribution working. export I2G_MPI_APPLICATION=mpi-start-wrapper.sh export I2G_MPI_APPLICATION_ARGS="`pwd`/tao_cr_5" export I2G_MPI_TYPE=$MPI_FLAVOR_LOWER export I2G_MPI_PRE_RUN_HOOK=mpi-hooks.sh export I2G_MPI_POST_RUN_HOOK=hooks.sh # If these are set then you will get more debugging information. export I2G_MPI_START_VERBOSE=1 export I2G_MPI_START_DEBUG=0 # Invoke mpi-start. $I2G_MPI_START
mpi-hooks.sh
The hooks are there to perform set up and tear down activities - some MPI scenarios require more sophisticated environments than a single process job. In some cases, you might need to use the post-hook in order to send data back - this is something that is currently under investigation. (It appears that for some scenarios the gLite OutputSandbox doesn't quite do what might be expected, so other arrangements might be needed. More to follow, but for the moment, if you have difficulty getting output back, drop us a line).
The example shown below uses the pre-hook to set the application paths correctly for running CASTEP using MPI when running all processes on one node and when running over multiple nodes. This pre-hook is always required when running at Glasgow. The post-hook can be tailored to suit your requirements. This post hook tar's up the directory contents and copies it back to the users home directory.
Notice the use of the tao_cr_5 in the follow file. This would need to changed to match the name of your cell/param files.
#!/bin/sh
pre_run_hook () {
# Determine if mpi-start is going to change to a shared path
# This is stright out of mpi-start itself ( 0.0.52 )
# get node count
COPY_NP=$((cat $MPI_START_MACHINEFILE | sort -u | grep -v $(hostname) | grep -v $(hostname -f) | grep -v $(hostname -s); echo $(hostname -s)) | wc -l)
# check of there are several hosts or not
if [ "x$COPY_NP" == "x1" ] ; then
echo Single physical node detected
# no application munging
export I2G_MPI_APPLICATION=/expsoft/ssp/castep/5.0.1/castep-openmpi/castep-openmpi
export I2G_MPI_APPLICATION_ARGS="`pwd`/tao_cr_5"
# no fsp path munging
else
echo multiple physical nodes detected
# More than one physical node - mpi-start munges the application path
# FIXME: Should back step exaclty the number of paths in $MPI_SHARED_HOME - this is probably not correct outside of Glasgow
export I2G_MPI_APPLICATION=../../../expsoft/ssp/castep/5.0.1/castep-openmpi/castep-openmpi
# This is where the FILENAME file will end up
export I2G_MPI_APPLICATION_ARGS="$MPI_SHARED_HOME/`pwd`/tao_cr_5"
fi
return 0
}
post_run_hook () {
export GLOBUS_LOCATION=/opt/globus
export OUTPUT_PATTERN="*"
export OUTPUT_ARCHIVE=${PBS_JOBID}.tao_cr_5.tar.gz
echo "compressing output data"
cd ..
tar -cvzf $OUTPUT_ARCHIVE $OUTPUT_PATTERN
echo "copying output back from ${PWD}"
cmd="${GLOBUS_LOCATION}/bin/globus-url-copy file:///${PWD}/${OUTPUT_ARCHIVE} gsiftp://svr008.gla.scotgrid.ac.uk/clusterhome/home/gla0**/castep/mpi/tao2/${OUTPUT_ARCHIVE}"
echo $cmd
$cmd
if [ ! $? -eq 0 ]; then
echo "${PWD}/${OUTPUT_ARCHIVE} not available"
fi
return 0
}
Running the example
- Create the files above.
- Determine the MPI queue to submit your job to. Do not submit your MPI job to another queue or it will die early.
- Generate a Proxy.
- Submit your job.
- Obtaining your output.
Create the files above
log into svr020:
mkdir castep/mpi/tao2/
Create the files in this directory either with vim or svp them locally to here from your own machine.
Determine the MPI Queue
Run the following command to determine the queues that support your VO. For MPI, you want the MPI queues.
For reference their are two different types of endpoint. svr021/svr026 are lcg-CE's and svr014 is CREAM. More of this later.
-bash-3.00$ lcg-infosites --vo vo.ssp.ac.uk voview Running Waiting Total Free ResponseTime WorstResponseTime ---------------------------------------------------------- .... 0 0 0 0 858 11836800 svr014.gla.scotgrid.ac.uk:8443/cream-pbs-mpi ....
Since we have a specific MPI queue you should target this directly either in your JDL or from the command line submission.
Generate a Proxy
-bash-3.2$ voms-proxy-init -voms vo.ssp.ac.uk --valid 96:00 Enter GRID pass phrase: Your identity: /C=UK/O=eScience/OU=Glasgow/L=Compserv/CN=douglas mcnab Creating temporary proxy ........................ Done Contacting svr029.gla.scotgrid.ac.uk:15006 [/C=UK/O=eScience/OU=Glasgow/L=Compserv/CN=svr029.gla.scotgrid.ac.uk/Email=grid-certificate@physics.gla.ac.uk] "vo.ssp.ac.uk" Done Creating proxy ....................................................................... Done Your proxy is valid until Tue Mar 2 13:35:57 2010
Submit the Job
There are various ways to submit your job using raw JDL. Via the WMS brokering system or directly to a CREAM CE. At Glasgow we have two types of CE. lcg-CE and CREAM. lcg-CE is a legacy globus mechanism and CREAM is the web services mechanism which understands JDL. The WMS brokering system also understands JDL and is useful for full grid submission to other sites. This example uses brokering through the WMS. Therefore, we use the wms commands for submission and monitoring status. From the JDL above you will see the requirement of CREAM CE and more specifically the MPI queue. This tells the WMS how to broker the job to our site.
In future when gqsub supports CREAM/MPI submissions this will all revert to usage using the gqsub style of commands. Meaning it will be far easier to do. But until then:
glite-wms-job-submit -a -o tao_cr_5.jid tao_cr_5.jdl
note: the -o tao_cr_5.jid saves the wms string to file. This allows you to check the status of the job later.
Check the Status
if your job has been submitted through a WMS:
glite-wms-job-status -i tao_cr_5.jid ======================= glite-wms-job-status Success ===================== BOOKKEEPING INFORMATION: Status info for the Job : https://svr022.gla.scotgrid.ac.uk:9000/qjhIy9cjOfzrr3WzkQMLPQ Current Status: Ready Destination: svr014.gla.scotgrid.ac.uk:8443/cream-pbs-mpi Submitted: Fri Feb 26 15:07:18 2010 GMT ==========================================================================
- check the status of the job on the batch system: pbswebmon (https://svr031.gla.scotgrid.ac.uk/pbswebmon/)
Obtaining your output
Using the standard wms command: glite-wms-job-ouput -i tao_cr_5.jid is not required in this case as we have requested that stderr and stdout be returned automatically with gridftp when the job completes or fails.
All other outputs are returned by the post-hook section of the mpi-hooks.sh. This has an advantage of allowing you control over where your output goes be it to your home directory or to storage but it is unfortunately necessary for MPI jobs at Glasgow. This is because we do not allow passwordless ssh between our worker nodes, instead we use a shared file system. This has the unfortunate side-affect of confusing the glite middleware and you cannot get your output back in the standard way i.e. in you JDL. However, on the plus side the post hook can be used in inventive ways to tar everything up and stick it in a storage node or just gridftp it back to your UI. This same post hook can be essentially the same for every MPI job you run.
BEWARE: CASTEP's output can be huge! tao_cr_5 for example creates a 400MB gzipped tar file. If you are doing many runs of CASTEP be careful to manage you user account disk space and remove any data you don't need. If you want to store them elsewhere we can give you another recipe for using the storage nodes at Glasgow. If you are unsure, just ask.
