Scaling up the computational resources is a big advantage for doing certain large scale calculations on OSG. Consider the extensive sampling for a multi-dimensional Monte Carlo integration or molecular dynamics simulation with several initial conditions. These type of calculations require submitting lot of jobs.
In one of the previous example, we submitted the job to a single worker machine. About a million CPU hours per day are available to OSG users on an opportunistic basis. Learning how to scale up and control large number of jobs is important to realizing the full potential of distributed high throughput computing on the OSG.
In this section, we will see how to scale up the calculations with simple examples. The examples are based on R and MATLAB Runtime. Once we understand the basic HTCondor script, it is easy to scale up.
$ tutorial ScalingUp-R
$ cd tutorial-ScalingUp-R
As we discussed in the previous section on HTCondor scripts, we need to prepare the job execution script and the job submission file.
If we want to submit several jobs, we need to track log, out and error files for each job. An easy way to do this is to add the $(Cluster) and $(Process) macros to the HTCondor submit file.
universe = vanilla
Executable = R-wrapper.sh
arguments = mcpi.R $(Process)
transfer_input_files = mcpi.R # mcpi.R is the R program we want to run
output = Log/job.out.$(Cluster).$(Process)
error = Log/job.error.$(Cluster).$(Process)
log = Log/job.log.$(Cluster).$(Process)
requirements = (HAS_CVMFS_oasis_opensciencegrid_org =?= TRUE) # Checks if OASIS available
queue 100
Note the Queue 100
. This tells Condor to enqueue 100 copies of this job
as one cluster.
Let us take a look at the execution script, R-wrapper.sh
#!/bin/bash
source /cvmfs/oasis.opensciencegrid.org/osg/modules/lmod/current/init/bash
module load R
Rscript $1 > mcpi.$2.out
The wrapper loads the R module and then executes the script with Rscript utility. From the submit file described above, the first argument is the name of the R program - mcpi.R and the second argument is the process number. The process number is a sequence of integers and used here to name the output files.
You'll see something like the following upon submission:
$ condor_submit R.submit
Submitting job(s).........................
100 job(s) submitted to cluster 837.
Now you have submitted the an ensemble of 100 jobs. The jobs should be finished quickly (less than few minutes). You can check the status of the submitted job by using the condor_q
command as follows
$ condor_q username # The status of the job is printed on the screen. Here, username is your login name.
Before we continue, let's look at a URL: your Duke CI Connect home page. If you have not signed in, you'll be redirected back to the main site. Sign In as you did the first time you signed up, and then click again on the your Duke CI Connect home link.
You see a number of graphs and plots here showing things happening in OSG Connect. We'll go over these briefly, then return later.
We're waiting on 100 jobs. Let's use connect watch
to
watch for job completions. As soon as you see some jobs enter R state
(running), press control-C, and let's introduce a new command:
$ connect histogram
Val |Ct (Pct) Histogram
unl.edu |46 (68.66%) ████████████████████████████████▏
bu.edu |13 (19.40%) █████████▏
uconn.edu |2 (2.99%) █▌
CRUSH-OSG-10-5-220-34 |1 (1.49%) ▊
ufhpc |1 (1.49%) ▊
LAW-D-SBA01-S2-its-c6-osg-20141013|1 (1.49%) ▊
CRUSH-OSG-10-5-10-33 |1 (1.49%) ▊
iu.edu |1 (1.49%) ▊
vt.edu |1 (1.49%) ▊
This command gives us a simple histogram of where on the grid our jobs are running. The column on the left is (for the most) a list of sites that OSG jobs run on. At times we don't correctly group job locations together. For example, the two rows for CRUSH-* above are really the same site, but histogram doesn't know about that site (yet) so it displays as two. But most of the big sites are mapped correctly. You see that in my case, 67 of my 100 jobs have begun running, and among them 69% (46 of 67) are running at University of Nebraska at Lincoln.
connect histogram
gives metrics on current jobs. As jobs complete,
they no longer appear. How to see where jobs have already run? connect
histogram --last
shows the run sites of your last job cluster.
$ connect histogram --last
Val |Ct (Pct) Histogram
uc3 |49 (49.00%) ████████████████████████████████▏
bu.edu |21 (21.00%) █████████████▊
uconn.edu |11 (11.00%) ███████▎
unl.edu |9 (9.00%) ██████
mwt2.org |3 (3.00%) ██
c5a-s22.ufhpc |3 (3.00%) ██
LCS-215-021-S2-its-c6-osg-20141013|3 (3.00%) ██
cinvestav.mx |1 (1.00%) ▊
Once the jobs are completed, you might want to invoke the script
$./mcpi_ave.bash
to compute the average value of pi from all the available outputs.
MATLAB® is a licensed high level language and interactive modeling and development toolkit. The MATLAB Compiler™ lets you share MATLAB programs as standalone applications. All applications created with MATLAB Compiler use MATLAB Runtime™ (MCR), which enables royalty-free deployment and use.
MATLAB Compiler is invoked with mcc
. Most toolboxes and user-developed interfaces are supported. For more details, check the list of supported toolboxes and⋅
ineligible programs. MATLAB Runtime is available on all OSG
sites using the OASIS software service using module
commands.
MATLAB's Optimization Toolbox tackles wide variety of solvers such as linear programming, mixed-integer linear programming, quadratic programming, nonlinear optimization. In this tutorial, we learn how to use simulated annealing to find the minimum of a function. The test function is the well known Rosenbrock function.
Fig.1. Two dimensional Rosenbrock function along x-y plane.⋅⋅
It is easiest to start with the tutorial
command. In the command prompt, type
$ tutorial tutorial-matlab-SimulatedAnnealing # Copies input and script files to the directory tutorial-matlab-SimulatedAnnealing.
This will create a directory tutorial-matlab-SimulatedAnnealing
. Inside the directory, you will see the following files
SA_Opt.m # matlab script - simulated annealing optimization of the function `simple_objective.m`
simple_objective.m # matlab script - defines the actual objective function
SA_Opt # matlab compiled binary
SA_Opt.submit # Condor job description file
SA_Opt.sh # Executable file
Log/ # Directory to copy standard output, error and log files from condor jobs.
Lets take a look at the objective function that takes an argument of an array x. The size of the array is two.
function y = simple_objective(x)
y = (1-x(1))^2 + 100*(x(2)-x(1)^2)^2;
The objective function is defined in the matlab file simple_objective.m
. The two dimensional Rosenbrock
function defined here has a minimum at (1,1)
. Rosenbrock function is one of the test function used to test
the robustness of an optimization method. Now let us see how the simulated annealing performs.
The simulated annealing script calls the objective function and optimizes via simulannealbnd
.⋅
%Simulated Annealing optimization of a given function
function SA_Optimization(fnumber)
filenumber = num2str(fnumber);
outfilename = sprintf ( '%s%s%s', 'rosen-sa-opt', filenumber, '.dat' );
fileID = fopen(outfilename,'w');
ObjectiveFunction = @simple_objective;
rng('shuffle');
for n = 1:1:5
rx = rand(1:2)*100.0;
X0 = [2.0 2.0] + rx; % Starting point
[x,fval,exitFlag,output] = simulannealbnd(ObjectiveFunction,X0) %
fprintf(fileID,'f= %12.6f x= %9.5f y= %9.5f x0= %9.5f y0=%9.5f\n', fval, x, X0 );
end
fclose(fileID);
In the above script, the optimizatoin is repeated for five times with random intial conditions.⋅
We need to compile the matlab script on a machine with license. At present, we don't have license
for matlab on OSG-Conect. On a⋅machine with matlab license, invoke the compiler mcc
. It is
important to turn off all⋅ graphical options (-nodisplay), disable Java (-nojvm), and instruct MATLAB to run this⋅
program as a single-threaded application (-singleCompThread). The flag -m means c
language
translation during compilation.
mcc -m -R -singleCompThread -R -nodisplay -R -nojvm SA_Opt.m
would produce the files: SA_Opt, run_SA_Opt.sh, mccExcludedFiles.log and readme.txt
. The file SA_Opt
⋅
is the compiled binary file that we would like to run on OSG Connect.⋅
Check further details of compilation process in the [lesson on basics of MATLAB compilation] (https://support.opensciencegrid.org/support/solutions/articles/5000660751-basics-of-compiled-matlab-applications-hello-world-example), we need to compile the matlab script on a machine with license.
Let us take a look at SA_Opt.submit
file:⋅
Universe = vanilla # One OSG Connect vanilla, the prefered job universe is "vanilla"
Executable = SA_Opt.sh # Job execution file which is transfered to worker machine
Arguments = $(Process) # process ID passed as an argument
transfer_input_files = SA_Opt # list of file(s) need be transfered to the remote worker machine⋅
Output = Log/job.$(Process).out⋅ # standard ouput⋅
Error = Log/job.$(Process).err # standard error
Log = Log/job.$(Process).log # log information about job execution
requirements = HAS_CVMFS_oasis_opensciencegrid_org =?= True && OpSysMajorVer > 5 # Check if the worker machine has CVMFS⋅
queue 10 # Submit 10 jobs
The above job description instructs condor to submit 10 jobs. Each job would start with different random⋅ intial conditions.⋅
The executable is a wrapper⋅ script SA_Opt.sh
#!/bin/bash⋅
source /cvmfs/oasis.opensciencegrid.org/osg/modules/lmod/current/init/bash
module load matlab/2014b
chmod +x SA_Opt
./SA_Opt $1
that loads the module matlab/2014b
and executes the MATLAB compiled binary SA_Opt
. The only required⋅
argument is a numerical label that would be attached with the name of the output file.
We submit the job using condor_submit
command as follows
$ condor_submit SA_Opt.submit # Submit the condor job description file "SA_Opt.submit"
Now you have submitted the an ensemble of 10 jobs. The jobs should be finished quickly (less than few minutes). You can check the status of the submitted job by using the condor_q
command as follows
$ condor_q username # The status of the job is printed on the screen. Here, username is your login name.
Each job produce rosen-sa-opt$(Process).dat file, where $(Process) is the process ID that runs from 0 to 9.⋅
After all jobs finished, we want to gather the output data. The script post-script.bash
gathers the⋅
output values and numerically sort them according to function values.⋅
$ ./post-script.bash
For assistance or questions, please email the OSG User Support team at user-support@opensciencegrid.org or visit the help desk and community forums.
connect histogram
gives a nice plot of resource assignments.