Grid-compatible MIPP Software


With Open Science Grid being "the way of the future" computing, it is advantageous to have MIPP software that can run on the grid. This page primarily focuses on documenting scripts in BatchProc/grid.

Grid environment

One can assume a reasonable flavor of Linux, but beyond that one cannot make any assumptions about what is available on a node! Therefore, one needs a release that packages all the necessary libraries. However faulty, we take the approach to bring in all libraries on which our software depends.

make-grid-rel

Once a MIPP release is compiled, one can create a release to be used on the grid with make-grid-rel script. Most recent releases will be kept here. Frozen releases should also be put into Enstore (/pnfs/e907/releases).
By default, symbols will be stripped from release libraries -- this shrinks the file size by a factor of 2. This will make it impossible to debug software, so if you want useful core dumps, run make-grid-rel --no-strip.

How portable is "grid" release?

The answer is not portable at all!
Despite the fact that most system libraries are packaged into release, it does not port very well between different flavors of Linux. Release created on FermiLinux 4.2 (kernel 2.6.9) could not run on FermiLinux 3.x (kernel 2.4.21) or on Fedora 5 (kernel 2.6.17). Most tools depend on glibc, so it's tough to port code. Therefore, it is highly recommended to compile release on the same flavor of Linux that it will be running on.
Note: if you know how to make release more portable, please let me know or update make-grid-rel.

Environment variables

One can count on the following evironment variables being set on a grid node:

Grid scripts

grid-ana

Design of MIPP grid scripts is such: most jobs will be started through grid-ana. This script does the following:
  1. Creates temporary directory $OSG_WN_TMP/<job name>.<run>
  2. Sources grid setup script ($OSG_GRID/setup.sh)
  3. Sets up environment to access mass storage
  4. Imports release to be run into $OSG_APP/mipp from mass storage releases. If release is not there, tries wget from MIPP page.
  5. Sources release setup script
  6. Verifies that it can connect to the MIPP database
  7. Executes a binary (typically another script) requested by user
  8. Removes temporary directory created in step 1.

gridPass

gridPass is one of the scripts that is designed to be called by grid-ana. It can run passes through the data that do not require ROOT output to be saved, but do require that histograms and calib DB constants be saved, and a log made in corresponding batchproc DB table.
Currently, pass1, calign, and pass2 are to be executed this way. The script does the following:
  1. Verifies that environment is correctly set up
  2. Queries the number of subruns for that run from DB
  3. Copies subruns from mass storage to local folder
  4. Runs anamipp
  5. Collects statistics about how long the job took, size of histogram file and logs to batchproc DB
  6. Copies histogram file to mass storage if anamipp didn't crash

gridSubmitPass

The script is installed as a binary, but it is intended to be run from BatchProc/grid directory, where gridPass.cndr is found.
It requires 3 arguments
gridSubmitPass will redirect job log files to ~/scratch/log/<pass>-<month><day> directory. It will refuse to submit jobs if that directory does not exist.
Two more options are found in the body of the script, since they don't change that frequently:
Note that run/1000 is appended to directory name by gridPass, so make sure the right directory structure exists before you submit a pass. Otherwise, given the fact that files are not overwritten by srmcp, you will not have any histogram files from the pass.

gridUserJob / gridSubmitUserJob

Similar to gridPass and gridSubmitPass, except no log in batchproc DB will be made, and by default histogram files are not saved. Any xml file in the release can be run.
Ideally, histogram files should go to volatile dCache area

Choosing grid sites

Selecting which site the job goes to is done by setting GlobusScheduler in condor job description (eg gridPass.cndr).
Other sites can be identified by going here. At this time, I don't know how to find out the jobmanager name. Sending to atlas.iu.edu/jobmanager worked.
Before you send your jobs to a job manager, find out how many slots you may get. On GP farm, MIPP quota is 100 as of Jan, 2007. Other sites may limit the number of jobs that VO's can run simultaneously.


Andre Lebedev - January 13, 2006