Submitting User Jobs to the grid

Introduction

When your anamipp jobs get to long to run on your local workstation or you just crave more statistics for your latest analysis you may want to run process data on the grid, getting results for tens or hundreds of runs in the same time it takes to do one run on your computer.
gridSubmitUserJob is a script that lets you do this. It is mentioned on the grid page. But that page lacks step by step instructions, so here they are.

When to run on the grid

Of course you don't have to run on the grid if the dsts contain enough information for your analysis. Submit gridUserJobs if you need to re-reconstruct some of the data or get to the hit information.
If you want to run e907mc jobs, you can use gridSubmitE907MC. It can write to both volatile or tape-backed dCache. Most of the information here applies to gridSubmitE907MC, too.

What a gridUserJob does

A gridUserJob runs anamipp with your custom job description on runs you specify producing histogram/dst output in volatile dCache. Once the files are there, you can directly use them or dccp them to local disk space. You can run on MC runs, but you may have to tweak the script for non-standard file locations.
The gridUserJob runs on the grid, so it will use a grid-release of the MIPP software. A grid-release is just a tar-ball of all the MIPP executables, shared libraries, Geometry-, and Bfield-files.
gridUserJobs can run on any site in the Open Science Grid. The instructions here are tailored for the fngp-osg cluster, the FermiNational GeneralPurpose - OpenScienceGrid cluster. Running on other sites is similar.

Step by Step

These instructions have a lot of detail. This is actually much easier than the length of this page may cause you to believe.
  1. Get ready to submit the job:
    1. Decide what release you want to use:
      A tagged release allows you to work with dsts later. If you use the development release and your job creates dsts, the dsts may become unusable once the development release changes. This is not a problem if you only generate histograms. An older tagged release may not contain your job description or job module. If you need to tag a new release, feel free to do so.
    2. Test that your job works:
      Run anamipp with the job description you plan to use from the release you plan to use: setrel myrelease; anamipp -x myJob.xml -n 100 -i one-of-the-runs-you-want-to-use.root
      Fix problems and repeat until there are no problems, i.e. the job does not crash and your histograms/dsts get filled.
    3. Put a grid-release from the release you want to use (tagged or development) into the release directory on the MIPP web site:
      If you want to run with a tagged release, e.g. R08.03.14_Linux2.6, it may be there already. (If a development release is there, it may be out of date.)
      Otherwise see the details on how to do this.
      For this example we use release R08.03.14_Linux2.6. You may have development_Linux2.6 instead.
    4. Make a runlist:
      If you want to submit only one or two runs, just remember the run numbers. Otherwise make a text file with a single run number per line. I assume below that you call it my-runlist.txt
      Copy the runlist to the system you will use to submit the job:
      scp my-runlist.txt e907@fnpcsrv1.fnal.gov: You need to have a Kerberos ticket for this. You also need to be listed in the .k5login file.
  2. Submit the job:
    You will log in to the head node of the grid cluster (fnpcsrv1) and execute gridSubmitUserJob. That script will submit your jobs to the grid worker nodes. Each worker node will get run anamipp on one run and copy results to dCache.
  3. Use the results:

How to tag a new release

Test that all your changes work, then run make-release.pl. The detailed instructions are here.

How to make a grid release

See here.

Holger Meyer