MyCluster

From athena

MyCluster is an NSF-funded SDCI (Software Development for CyberInfrastructure) project led by Edward Walker of the Texas Advanced Computing Center (TACC). Jeff Gardner of UW is a project member. MyCluster allows a user (with user-level permissions) to carve out pieces of clusters throughout the world and integrate them into your own private "virtual" cluster.

More specifically, MyCluster is a system that builds personal Condor, OpenPBS, or SGE clusters on-demand. The system uses the concept of a job proxy, which is submitted to remote host server clusters, in-lieu of the actual user job. These job proxies, when dispatched by the schedulers on the host server clusters, provision CPUs into personal clusters created for the user. Depending on when job proxies are dispatched and terminated, the scientist sees an expanding and shrinking personal cluster over time. Most importantly, user jobs are submitted, managed and controlled in these dynamic personal clusters, through a single uniform job management interface of the scientist choice.

Currently, MyCluster is in the alpha development stage. The PI, Edward Walker, has been kind enough to devote some of his time to getting it running in limited form on the Athena cluster. Currently, it allows you to use Condor to schedule small serial jobs within a larger, multi-node PBS job. For example, let us say you have 1000 serial jobs that take 5 minutes each. Rather than submitting 1000 PBS/Torque jobs, you can have MyCluster submit a single PBS/Torque job (for, say, 32 cores for a few hours). When that job runs, MyCluster will start Condor daemons on all the cores, and they will report back to you MyCluster session running on Athena0. You can then condor_submit your 1000 jobs, and Condor will manage those jobs within the 32 cores that you "own."

For now, you must install it yourself in your home directory (a system-wide installation will follow in a few weeks time). It currently works only on Athena, and only for a personal Condor cluster (not OpenPBS or SGE). In short, MyCluster allows you to carve out a piece of Athena and turn it into your own personal Condor cluster. This is quite advantageous if:

Your existing application uses Condor, or
You have a large number of serial jobs that you do not wish to submit as individual PBS/Torque jobs to the Athena scheduler.

We encourage people to try out MyCluster on Athena by following the guide below. Please be patient if you encounter bugs or other difficulties as this is an alpha version. Email Jeff Gardner if you encounter any problems.

1 Installing MyCluster
2 Using MyCluster
3 Example
4 Using Condor
- 4.1 Athena specific Condor parameters/concerns

[edit] Installing MyCluster

Before installing, you may wish to read the MyCluster webpage at [1], the MyCluster User Guide at [2] as well as the "install snapshot" at [3]. This will give you a good overview. If you use the NSF TeraGrid, you may also wish to peruse the TeraGrid-specific documentation at [4].

[edit] Download:

You will need to download two packages: MyCluster and MyRt. From your home directory on athena0:

% wget http://www.tacc.utexas.edu/~ewalker/dist/myrt-0.5.6.tar.gz
% wget http://www.tacc.utexas.edu/~ewalker/dist/mycluster-alpha-2.0.4.tar.gz

[edit] MyRt Install:

MyRt will need to be compiled:

% tar xzvf myrt-0.5.6.tar.gz
% cd myrt-0.5.6
% ./configure 
% make
% make install
% cd ..

[edit] MyCluster Install:

MyCluster uses only scripts and a binary Condor installation (which is downloaded by the install script). Therefore, just execute the script "siteinstaller.csh", and configure for "PBS" (Athena's scheduler) and "Condor" (your personal scheduler). Here is the example of Jeff Gardner's install sessions. Do that same thing, except that everywhere "gardnerj" appears, replace with your own username:

% tar xzvf mycluster-alpha-2.0.4.tar.gz
% cd mycluster-alpha-2.0.4
% ./siteinstaller.csh 

Please select the agent you would like installed
for provisioning CPUs:

    1.  Load Sharing Facility (LSF)
    2.  Portable Batch System (PBS)
    3.  IBM LoadLeveler (LL)
    4.  Sun Grid Engine (SGE)
    5.  Amazon Elastic Compute Cloud (EC2)
    6.  Client submission only (Globus/SSH)

    7.  Quit

(Choice)  2
MYCLUSTER_LOCATION </share/home/gardnerj/mycluster-alpha-2.0.4>: /share/home/gardnerj/mycluster
Accept /share/home/gardnerj/mycluster? [Y/N] y

Cleaning up old files ...
Setting up mycluster default location ...
Downloading http://www.tacc.utexas.edu/~ewalker/mycluster_dist/mycluster_bin/linux-x86_64.tar.gz ...
Altix? [N/Y] n
Do you require RMS support? [N/Y] n
Version 2.0.4
Created on Wed Sep 24 13:15:51 CDT 2008

Please select the personal job management interface
you would like installed:

    1.  Condor 
    2.  Sun Grid Engine (SGE) 
    3.  Condor and SGE 

    4.  Quit

(Choice) 1
Use MYCLUSTER_LOCATION /share/home/gardnerj/mycluster? [Y/N] y
Installing Condor version default
      Please specify a directory for this program to install a new
      (or modify an existing) Condor installation.
      The installer needs permission to modify the Condor installation.
_CONDOR_WORK_DIR </share/home/gardnerj/condor>: 
Accept /share/home/gardnerj/condor? [Y/N] y
Downloading Condor ...
Downloading http://www.tacc.utexas.edu/~ewalker/mycluster_dist/condor-default/linux-x86_64.tar.gz ...
/share/home/gardnerj/condor
Untarring Condor ...
/share/home/gardnerj/mycluster
setting up binaries ...
/share/home/gardnerj/condor/sbin
/share/home/gardnerj/condor/bin
/share/home/gardnerj/mycluster
Please source the following files to your MyCluster/Condor startup scripts:
     /share/home/gardnerj/mycluster/mycluster.csh
     /share/home/gardnerj/mycluster/mycluster.sh

[edit] Site- and user-specific configuration

Now, do one last site-specific piece of configuring (NOTE: "ewalker" here really does mean "ewalker"):

% mkdir ~/template
% cp ~ewalker/template/* ~/template

Now, you need to put a couple of lines in your .cshrc or .bashrc file.

(t)csh users: At the end of your .cshrc file, insert the following lines:

if ( -e $HOME/mycluster/mycluster.csh ) then
  source $HOME/mycluster/mycluster.csh
  setenv _MYCLUSTER_TEMPLATE_DIR $HOME/template/
endif

bash users: At the end of your .bashrc file, insert the following lines:

if [ -e $HOME/mycluster/mycluster.sh ] then
  source $HOME/mycluster/mycluster.sh
  export _MYCLUSTER_TEMPLATE_DIR=$HOME/template/
fi

Everyone: Logout, then log back in and test if these files were really sourced as they should have been (again, "gardnerj" should be replaced by your username):

% echo $MYCLUSTER_LOCATION
/share/home/gardnerj/mycluster
% echo $_MYCLUSTER_SUBMIT_AGENT
/share/home/gardnerj/mycluster/pbs-agent.sh -user %U -resource %R %E
% echo $_MYCLUSTER_TEMPLATE_DIR
/share/home/gardnerj/template/

[edit] Using MyCluster

MyCluster functions by submitting one or more "job proxies" on your behalf when you start a MyCluster session. On Athena, a job proxy is a PBS/Torque job that is submitted to the Athena queue. In practice this job proxy will span many nodes/cores. Once your proxy actually starts running on Athena, it launches Condor startd daemons on every core of every node on your job, thus converting your job proxy into your own personal Condor pool.

In order to start MyCluster use the command vcluster_starter. This is similar to the older vo-login command documented in the MyCluster User Guide. Here is a command summary for vcluster_starter:

submission args: [-d] [-H <host>] [-P <port>] [-A <IP>] [-B <event port>] [-D <advertised event port>] [-K <sitename>] 
 [-L <hostname prefix>] [-s] [-V] [-W <minutes>] [-n <jobs:size>] [-M <host>] [-k] [-J] [-l] [-E <file>] [-I] 
 [-v <subnet>] [-r <router>] [-t <route_tab>] [-C <list>] [-N <list>] [-u] [-T] [-a] [-S] [-p <port>] [-G] [-m] [-h] 
 [-e <envlist>] [-c <instances>] [-f]
-d : debug mode
-q <queue name> : job proxy submit queue
-H <host> : watcher host
-P <port> : watcher port
-A <IP> : host IP for event monitor
-B <event port> : event monitor port
-D <advertised event port> : advertised event monitor port
-K <sitename> : master assigned sitename
-L <hostname prefix> : prefix all hostnames with this in UVPN
-s : silent mode (default)
-V : verbose mode
-W <minutes>: job proxy wall time limit
-n <jobs:size> : # job proxies and their size
-M <host>: Central manager
-k : checkpoint event_monitor
-J : join an existing pool
-l : use /bin/bash and not gridshell in job wrapper
-E <file>: put MyCluster environment in a file
-I : non-batch mode
-v <subnet> : UVPN subnet
-r <router> : UVPN router
-t <route table> : UVPN route file (located in /share/home/gardnerj/.gtcsh_submit)
-G : job proxies only
-m : master processes only
-h : help (this message)
-e <envlist> : semi-colon separated list of env-value pairs
-c <instances> : number of instances of startd (default: INF)
-f : foreground processes
-a : do not use full path in submit file
Condor (default) Specific Options: 
    -C <list>: Condor collector list
    -N <list>: Condor negotiator list
    -u : use condor UDP updates
    -T : disable TimeToLive option in Condor

The most relevant options are:

-q <queue name>: job proxy submit queue
-n <jobs:size>: Specifies the number of PBS/Torque job proxies to run and the size of each job that is submitted at each site; e.g. -n 2:16 specifies 2 job proxies of size 16 processor cores each. Default: -n 1:1
-W <mins>: Specifies the wall time limit for the PBS/Torque job proxies when they are submitted. The unit of the wall clock limit is in minutes.
-T: Do not set the TimeToLive option in the Condor starter. The TimeToLive option allows the Condor starter to advertise the TimeToLive classad, indicating the time left in the wall clock limit set for that job.
-d: Prints additional debug messages in standard error.

[edit] Example

The best way to describe MyCluster is to walk through an example. Let's say I want to submit request a single job proxy of 1 hour on 16 cores (i.e. 2 Athena nodes of 8 cores each) in the debug queue:

%vcluster_starter -n 1:16 -q debug -W 60
MyCluster Copyright (C) 2008 The University of Texas at Austin
This program comes with ABSOLUTELY NO WARRANTY; for details type show w.
This is free software, and you are welcome to redistribute it
under certain conditions; type show c for details.

Welcome to your MyCluster/Condor environment
To shutdown environment, type "exit"
%

I have now started MyCluster. Three things happened when I did this:

vcluster_starter submitted a single 16-core PBS job to the Athena queue (if you do qstat -u <username> you will be able to see that job).
It started up condor_master, condor_schedd, condor_collector, and condor_negotiator daemons on athena0.
It dropped me at a command prompt within the MyCluster shell.

Note that you must interact with all Condor daemons from within the MyCluster shell. If you type any Condor commands from outside the shell, the command will not be able to connect to the appropriate daemons. Look at the MyCluster shell as a special "portal" into your personal cluster.

From within the MyCluster shell, I can still interact with the Athena scheduler and look at my PBS job proxy:

%qstat -u gardnerj

athena0.npl.washington.edu: 
                                                                   Req'd  Req'd   Elap
Job ID               Username Queue    Jobname    SessID NDS   TSK Memory Time  S Time
-------------------- -------- -------- ---------- ------ ----- --- ------ ----- - -----
1432.athena0.npl.was gardnerj debug    tsubmit.TA    --      2  --    --  01:00 Q   --

There it is: a one-hour job for 2 Athena nodes (16 cores). It's still in the "queued" state, so nothing is running.

I can also verify that the local Condor daemons are running on Athena0:

%ps aux | grep condor_
gardnerj  8011  0.0  0.0 70452 6652 pts/2    S    15:54   0:00 /usr/bin/perl /share/home/gardnerj/mycluster/jstarter -I 
-m  -v /share/home/gardnerj/condor/sbin/condor_master.hd.gtcsh -d -dyn -f
gardnerj  8017  0.0  0.0 22444 3944 ?        Ss   15:54   0:00 /share/home/gardnerj/condor/sbin/condor_master -d -dyn -f
gardnerj  8049  0.0  0.0 22932 4308 ?        Ss   15:54   0:00 condor_collector -f
gardnerj  8053  0.0  0.0 22340 3900 ?        Ss   15:54   0:00 condor_negotiator -f
gardnerj  8056  0.0  0.0 23544 4256 ?        Ss   15:54   0:00 condor_schedd -f
gardnerj  9015  0.0  0.0 51064  628 pts/2    S+   16:15   0:00 grep condor_

Good. They are all there. I can do a "condor_status" as well:

%condor_status

But nothing is printed because there are no condor_startd daemons running anywhere, yet. On the plus side, this means that Condor is properly configured or else I would have gotten an error message. Let's check on my job again:

%qstat -u gardnerj

athena0.npl.washington.edu: 
                                                                   Req'd  Req'd   Elap
Job ID               Username Queue    Jobname    SessID NDS   TSK Memory Time  S Time
-------------------- -------- -------- ---------- ------ ----- --- ------ ----- - -----
1432.athena0.npl.was gardnerj debug    tsubmit.TA   8560     2  --    --  01:00 R 00:01

It's running now! That means that the nodes should be part of my Condor pool:

%condor_status

Name          OpSys       Arch   State      Activity   LoadAv Mem   ActvtyTime

10938@yoyo--1 LINUX       X86_64 Unclaimed  Idle       5.550  7970  0+00:00:04
10940@yoyo--1 LINUX       X86_64 Unclaimed  Idle       5.550  7970  0+00:00:04
10942@yoyo--1 LINUX       X86_64 Unclaimed  Idle       5.550  7970  0+00:00:04
10944@yoyo--1 LINUX       X86_64 Unclaimed  Idle       5.550  7970  0+00:00:04
10946@yoyo--1 LINUX       X86_64 Unclaimed  Idle       5.550  7970  0+00:00:04
10952@yoyo--1 LINUX       X86_64 Unclaimed  Idle       5.550  7970  0+00:00:04
10954@yoyo--1 LINUX       X86_64 Unclaimed  Idle       5.550  7970  0+00:00:04
10956@yoyo--1 LINUX       X86_64 Unclaimed  Idle       5.550  7970  0+00:00:04
8793@yoyo--1- LINUX       X86_64 Unclaimed  Idle       6.220  7970  0+00:00:06
8795@yoyo--1- LINUX       X86_64 Unclaimed  Idle       6.220  7970  0+00:00:05
8797@yoyo--1- LINUX       X86_64 Unclaimed  Idle       6.070  7970  0+00:00:04
8799@yoyo--1- LINUX       X86_64 Unclaimed  Idle       6.220  7970  0+00:00:06
8801@yoyo--1- LINUX       X86_64 Unclaimed  Idle       6.220  7970  0+00:00:05
8802@yoyo--1- LINUX       X86_64 Unclaimed  Idle       6.220  7970  0+00:00:05
8805@yoyo--1- LINUX       X86_64 Unclaimed  Idle       6.220  7970  0+00:00:05
8807@yoyo--1- LINUX       X86_64 Unclaimed  Idle       6.220  7970  0+00:00:05

                     Total Owner Claimed Unclaimed Matched Preempting Backfill

        X86_64/LINUX    16     0       0        16       0          0        0

               Total    16     0       0        16       0          0        0

There they are. 16 unclaimed cores, ready for me to submit my Condor jobs to. I wrote a little Condor command script called "sub.cmd." It submits 10 instances of the command "hostname":

% more sub.cmd
universe = vanilla
executable = /bin/hostname
output          = hostname.$(Process).out
error           = hostname.$(Process).err
notify_user     = gardnerj@phys.washington.edu
notification    = Complete
queue 10

Condor sends email by default for all job events (e.g. stop, start, error, etc). Note that in order to get email from Condor on Athena, I must set the "notify_user" field to my email address. If I don't do this, I will get email errors. I could also have set "notification = None" to not receive any email at all. Options for "notification" are "All, Complete, Error, Never."

Now I submit my Condor job:

%condor_submit sub.cmd
Submitting job(s)..........
10 job(s) submitted to cluster 1.

%condor_q

-- Submitter: yohoho.mycluster.org : <189.1.0.2:50404> : yohoho.mycluster.org
 ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD               
   1.0   gardnerj        9/24 14:00   0+00:00:02 R  0   9.8  hostname          
   1.1   gardnerj        9/24 14:00   0+00:00:00 R  0   9.8  hostname          
   1.2   gardnerj        9/24 14:00   0+00:00:00 R  0   9.8  hostname          
   1.3   gardnerj        9/24 14:00   0+00:00:00 R  0   9.8  hostname          
   1.4   gardnerj        9/24 14:00   0+00:00:00 R  0   9.8  hostname          
   1.5   gardnerj        9/24 14:00   0+00:00:00 R  0   9.8  hostname          
   1.6   gardnerj        9/24 14:00   0+00:00:00 R  0   9.8  hostname          
   1.7   gardnerj        9/24 14:00   0+00:00:00 R  0   9.8  hostname          
   1.8   gardnerj        9/24 14:00   0+00:00:00 R  0   9.8  hostname          
   1.9   gardnerj        9/24 14:00   0+00:00:00 R  0   9.8  hostname          

10 jobs; 0 idle, 10 running, 0 held

And there they are, already running! Now I wait 1-2 minutes, then check again:

%condor_q

-- Submitter: yohoho.mycluster.org : <189.1.0.2:50404> : yohoho.mycluster.org
 ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD               

0 jobs; 0 idle, 0 running, 0 held

All done. Now I can exit MyCluster:

%exit

When I exit, my job proxies are canceled automatically and all Condor daemons shut down (note that it takes 1-2 minutes for your job to be canceled and up to 10 minutes for your Condor daemons to exit):

% qstat -u gardnerj

% ps aux | grep condor_
gardnerj  9870  0.0  0.0 51064  628 pts/2    S+   16:44   0:00 grep condor_

If I check my home directory, I can see the output files generated by my Condor jobs:

% ls -al hostname*
-rw-r--r--   1 gardnerj physics         0 Sep 24 14:17 hostname.0.err
-rw-r--r--   1 gardnerj physics        19 Sep 24 14:17 hostname.0.out
-rw-r--r--   1 gardnerj physics         0 Sep 24 14:17 hostname.1.err
-rw-r--r--   1 gardnerj physics        19 Sep 24 14:17 hostname.1.out
-rw-r--r--   1 gardnerj physics         0 Sep 24 14:17 hostname.2.err
-rw-r--r--   1 gardnerj physics        19 Sep 24 14:17 hostname.2.out
-rw-r--r--   1 gardnerj physics         0 Sep 24 14:17 hostname.3.err
-rw-r--r--   1 gardnerj physics        19 Sep 24 14:17 hostname.3.out
-rw-r--r--   1 gardnerj physics         0 Sep 24 14:18 hostname.4.err
-rw-r--r--   1 gardnerj physics        19 Sep 24 14:18 hostname.4.out
-rw-r--r--   1 gardnerj physics         0 Sep 24 14:18 hostname.5.err
-rw-r--r--   1 gardnerj physics        19 Sep 24 14:18 hostname.5.out
-rw-r--r--   1 gardnerj physics         0 Sep 24 14:17 hostname.6.err
-rw-r--r--   1 gardnerj physics        19 Sep 24 14:17 hostname.6.out
-rw-r--r--   1 gardnerj physics         0 Sep 24 14:18 hostname.7.err
-rw-r--r--   1 gardnerj physics        19 Sep 24 14:18 hostname.7.out
-rw-r--r--   1 gardnerj physics         0 Sep 24 14:17 hostname.8.err
-rw-r--r--   1 gardnerj physics        19 Sep 24 14:17 hostname.8.out
-rw-r--r--   1 gardnerj physics         0 Sep 24 14:18 hostname.9.err
-rw-r--r--   1 gardnerj physics        19 Sep 24 14:18 hostname.9.out

The directory in which all of your runtime files are stored is .gtsch_submit:

% ls .gtsch_submit
checkpoint/  condor/  jobtag/  log/  throttle/  tsubmit.TAG_21800*  tsubmit.TAG_318470*  tsubmit.TAG_80160*

The "tsubmit*" files are the PBS scripts used for each of your job proxies (in my case, I have submitted 3 proxies so far). You can look at them to see the PBS script that MyCluster actually submits. MyCluster logs are stored in the "log" directory. Condor logs are stored in the "condor/personal/TAG_*/daemon/" directories, where "TAG_*" is the file extension of the job proxy submit file in .gtsch_submit (e.g. for tsubmit.TAG_21800, Condor logs are located in condor/personal/TAG_21800/daemon).

You may wish to monitor the size of the .gtsch_submit directory and delete it if it gets to be too large (just make sure to do this while not running a MyCluster session). Nothing in this directory is needed when no MyCluster sessions are running.

[edit] Using Condor

More information on Conder is available at the Condor website. In particular, see Section 2.5 of the Condor manual for how to submit Condor jobs.

[edit] Athena specific Condor parameters/concerns

In general, Athena should support all Condor functionality that is available on the x86_64 architecture. A few helpful suggestions:

Your Condor command file should explicitly include the "Vanilla" universe.
Your Condor command file should either turn off email-messaging by setting "notification=None", or have your proper email address included with "notify_user=<email address>".
At this point, one can only specify the queue that one wishes to submit to, but not any extra options like "qos=physics". If you really need to run at a particular priority level, please email Jeff Gardner. This capability will be added soon.
Not all condor client commands are enabled yet. The ones that are include: condor_submit, condor_rm, condor_status, condor_q, and condor_reschedule. More will come later.
For Condor experts: Broadcasting of the Condor TimeToLive variable is automatically activated by MyCluster. If you wish to deactivate this for any reason, use the -T option when launching MyCluster. Activation of this option should not affect anything.

Here is the most basic Condor command file that can be run on Athena. Please build upon it for you Condor jobs:

universe = vanilla
# Set notification to All, Complete, Error, Never
notification = None
# Uncomment below and include your email address to send emails
#notify_user    = <your email>

Here is an example Condor command file that executes 10 instances of the script "hello.csh". hello.csh prints a message containing the Condor process id and the node it is running on.

hello.cmd:

# Example Condor command file
universe = vanilla
notify_user     = gardnerj@phys.washington.edu
notification    = Complete
executable = hello.csh
arguments = $(Process)
output          = hello.$(Process).out
error           = hello.$(Process).err
queue 10

hello.csh:

#!/bin/csh
echo Hi!
echo I am process $1 running on node `/bin/hostname`
echo Bye!