Distributed High Throughput Computing

Registration: Please visit the OIT website

Place: The Edge Workshop Room, Bostock Library, Duke University

Time: 9:00 AM - 12.00 PM (Part-1) and 1.30 PM - 4.30 PM (Part-2), Oct 29th 2015

This Duke-OSG event is being run by the open science grid (OSG), in collaboration with⋅ Software Carpentry⋅ and Duke Research Computing. The⋅ Open Science Grid (OSG) is a national scale distributed infrastructure for⋅ scientific computing. Software Carpentry's mission is to help scientists and engineers⋅ become more productive by teaching them basic lab skills for computing like program design, version control, data management, and task automation. Duke Research Computing offers⋅ services that are useful to research computing “as it is practiced” across Duke and⋅ often in collaboration with researchers at other institutions.

Come to this session to learn all about DHTC and the basics of grid computing. For example,

  • what is distributed high throughput computing
  • what are the best practices of DHTC
  • How to split your computation into many independent jobs
  • How to manage a scientific workflow

Setup Instructions

  • We will do all the exercises on login.duke.ci-connect.net.
  • If you do not have an account on duke.ci-connect, please sign up.
  • You have an account on `duke.ci-connect but forgot the password, click here.
  • You also need SSH installed on your laptop. For details, follow this link

Part-1 (9.00 AM - 12.00 PM)

  1. Introduction to Open Science Grid - Emelie
  2. Job Scheduling with HTCondor - Mats
  3. Trouble Shooting Failed Jobs - Mats
  4. [Connecting the Campus to the Grid Resources] - David

Part-2 (1.30 PM - 4.30 PM)⋅

  1. Handling Data - Suchandra
  2. Scaling Up Computing Resources - R and MATLAB runtime examples - Bala
  3. Handling Job Dependencies - DAGMan - Bala
  4. Large Scale Computation with Pegasus - Mats