The Open Science Grid (OSG) is a consortium of research communities who promote science via sharing of computing resources. The Open Science Grid (OSG)
The resources accessible through the OSG are contributed by the community, organized by the OSG, and governed by the OSG Consortium. The cores that are free in the OSG shared pool are made available to the users. These are opportunistic resources for the users and the number of freely available cores are varying at any given time.
High throughput workflows with simple system and data dependencies are a good fit for OSG Connect. Typically these workflows can be decomposed into multiple tasks that can be carried out independently. Ideally, these tasks will download data for input, run some computation on it and then return results (which may be used by future tasks).
Jobs submitted into the OSG Connect will be executed on machines at several remote physical clusters. These machines may differ in terms of computing environment from the submit node. Therefore it is important that the jobs are as self-contained as possible by generic binaries and data that can be either carried with the job, or staged on demand. Please consider the following guidelines:
The following are examples of computations that are NOT good matches for OSG Connect:
##How to get help using OSG Connect
Please contact user support staff at connect-support@uchicago.edu.
Commonly used software and libraries on the Open Science Grid are available in a central repository (known as OASIS) and accessed via the module command. We will see how to search for, load, and use software packages.
We will also cover the usage of the built-in tutorial command. Using tutorial, we load a variety of job templates that cover basic usage, specific use cases, and best practices.
Log in OSG with secure shell
$ ssh username@login.osgconnect.net
The first step in using the module command is to initialize the module system. This step consists of sourcing a shell specific file that adds the module command to your environment. For example, initializing module for bash is done as follows:
$ source /cvmfs/oasis.opensciencegrid.org/osg/modules/lmod/5.6.2/init/bash
For other shells such as sh, zsh, tcsh, csh, etc., you would replace bash with the shell name (e.g. zsh).
Once the distributed environment modules system is initialized, you can check the available modules:
$ module avail
--------------------------- /cvmfs/oasis.opensciencegrid.org/osg/modules/modulefiles/Core ----------------------------
atlas fftw/fftw-3.3.4-gromacs lapack lmod/5.6.2 (D) python/3.4
blast gromacs/4.6.5 lmod/SiteHook namd/2.9 settarg/5.6.2
blender jpeg lmod/SitePackage python/2.7 (D)
Where:
(D): Default Module
Use "module spider" to find all possible modules.
Use "module keyword key1 key2 ..." to search for all possible modules matching any of the "keys".
In order to load a module, you need to run "module load [modulename]". Say for example you want to load R package,
$ module load R
This sets up the R package for you. Now you can do some test calculations with R.
$ R # invoke R package
> cos(45) # simple on-screen calculation with cosine function
[1] 0.525322
If you want to unload a module, type
$ module unload R
The built-in tutorial command assists a user in getting started on OSG. To see the list of existing tutorials, type
$ tutorial # will print a list tutorials
Say for example, you are interested in learning how to run R scripts on OSG, the tutorial command sets up the R tutorial for you.
$ tutorial R # prints the following message:
Application Example - R (statistical analysis)
This tutorial will introduce you to using the R statistical programming
language on OSG Connect. By the end of the tutorial:
* You will have set up R from the OSG OASIS service on the submit host
* You will know how to use the HAS_CVMFS_oasis_opensciencegrid_org job steering requirement.
Tutorial 'R' is set up. To begin:
cd ~/osg-R
The "tutorial R" command creates a directory "osg-R" containing the neccessary script and input files.
mciP.R # The example R script file
R-wrapper.sh # The job execution file
R.submit # The job submission file (will discuss later in the lesson HTCondor scripts)
Lets focus on "mciP.R" and the "R-wrapper" scripts. The details of "R.submit" script will be discussed later when we learn HTCondor scripts.
The file "mciP.R" is a R script that calculates the value of pi using the Monte Carlo method. The R-wrapper.sh essentially loads the R module and runs the "mciP.R" script.
#!/bin/bash # Defines the shell environment.
source /cvmfs/oasis.opensciencegrid.org/osg/modules/lmod/5.6.2/init/bash
module load R # Loads the module
Rscript mcpi.R # Execution of the R script
Similar to the R tutorial, there are other tutorials available on OSG. The available tutorials serve as templates to develop your own scripts and run the calculations on OSG.