Mark A. Baker, Geoffrey C. Fox and Hon W. Yau
Northeast Parallel Architectures Center
111 College Place Syracuse University New York 13244-4100 USA
(mab@npac.syr.edu)
16 November 1995
Version 1.1
In the past decade there has been a dramatic shift from mainframe or `host-centric' computing to a distributed `client-server' approach. In the next few years this trend is likely to continue with further shifts towards `network-centric' computing becoming apparent. All these trends were set in motion by the invention of the mass-reproducible microprocessor by Ted Hoff of Intel some twenty-odd years ago.
The present generation of RISC microprocessors are now more than a match for mainframes in terms of cost and performance. The long-foreseen day when collections of RISC microprocessors assembled together as a parallel computer could out perform the vector supercomputers has finally arrived.
Such high-performance parallel computers incorporate proprietary interconnection networks allowing low-latency, high bandwidth inter-processor communications. However, for certain types of applications such interconnect optimisation is unnecessary and conventional LAN technology is sufficient. This has led to the realisation that clusters of high-performance workstations can be realistically used for a variety of applications either to replace mainframes, vector supercomputers and parallel computers or to better manage already installed collections of workstations. Whilst it is clear that `cluster computers' have limitations, many institutions and companies are exploring this option.
Software to manage such clusters is at an early stage of development and this report reviews the current state-of-the-art. Cluster computing is a rapidly maturing technology that seems certain to play an important part in the `network-centric' computing future.
Tony Hey The University of Southampton
1. Cluster Computing Review
1.1 Summary of Conclusions
· The use of clusters of workstations to increase the throughput of user applications is becoming increasingly commonplace throughout the US and Europe.
· There now exists a significant number of Cluster Management Software (CMS) packages and more than twenty are mentioned in this review. These packages almost all originate from research projects, but many have now been taken-up or adopted by commercial vendors.
· The importance of cluster software can be seen by both the commercial take-up of these products and also by the widespread installation of this software at most of the major computing facilities around the world.
· It is not clear that CMS is being used to take advantage of spare CPU cycles, but it is evident that much effort is being expended to increase throughput on networks of workstations by load balancing the work that needs to be done.
· Six current CMS packages, two public domain and four commercial, out of the nineteen reviewed, have been identified as being worth serious investigation. The advertised functionality of these six packages is very similar and an informed choice would need installation and detailed testing.
· If finances permit, it is probably wise to choose one of the commercial packages. This will minimise the efforts of the on-site staff and leave the onus on vendors to ensure that their software is installed, used and supported properly.
· A number of packages were identified that were either newly established CMS projects at an early stage of development or which could be considered as complete Cluster Computing Environments (CCE). In a CCE the cluster is used in a fashion similar to a distributed shared memory environment.
· Nearly all the CMS packages are designed to run on Unix workstations and MPP systems. Some of the public domain packages support Linux, which runs on PCs. One commercial package, Codine, supports Linux. JP1[1] from Hitachi ltd. is designed to run on Windows NT platforms. No CMS package supports Windows-95.
· The software being developed for CCE is potentially the most viable means of utilising clusters of workstations efficiently to run user applications in the future.
· WWW software and HTTP protocols could clearly be used as part of an integrated CMS package. Little software of this type has so far been developed at present but several of the packages reviewed used a WWW browser as an alternative GUI.
This review was commissioned by the Joint Information Systems Committee (JISC) New Technology Sub Committee (NTSC) and follows two other reports: a historic review of Cluster Computing produced by the Manchester Computing Centre [1], and a critical review of Cluster Computing projects funded by the NTSC [2] and produced by the University of Southampton.
The overall aim of this review is to guide the reader through the mine-field that surrounds distributed and cluster computing software. In particular it aims to provide sufficient information for staff in a typical university Computing Service to:
- Identify the potential benefits and costs associated with Cluster Computing.
- Understand the important features of CMS.
- Provide an overview of the CMS packages presently available.
- Act as a guide through the maze of CMS packages.
- Be used as an aid for making decisions about which CMS package to use.
- Provide references, technical details and contact points.
In addition the review provides information about CCE. These are predominantly new projects which should provide indicators to show the future direction of CMS.
The material contained in this report has been put together using information gathered from numerous World Wide Web (WWW) sites, product descriptions supplied by vendors or included in software releases, and from various cluster review documents. Due to the limitations of time, the software discussed (nearly thirty packages) in this document has only been reviewed by addressing the issue of functionality (a paper exercise) rather than practically assessing each product, by installing it and testing it for quality and other factors, such as ease of use by administrators and users.
1.3
Organisation of this Review Document
This report is organised as follows:
Chapter 1 - introduces and then briefly discusses the scope and range of the review.
Chapter 2 - describes and discusses the criteria which will be used to judge and evaluate the functionality of the various Cluster Management Software (CMS) packages. This entails judging what the user of such a system wants against what a particular packages provides.
Chapter 3 - describes briefly each CMS package and its functionality.
Chapter 4 - in this chapter the CMS packages mentioned in the previous chapter are evaluated against the criteria discussed in chapter 2.
Chapter 5 - in this chapter each of the Computing Cluster Environments (CCE) is described.
Chapter 6 - includes comments about CMS, a step-by-step guide for choosing a CMS package, a short discussion about CMS and some views about the future of CMS.
Chapter 7- a glossary of the terms used throughout this review is included in this chapter.
Chapter 8 - includes a list of references and a table of contents.
1.4
Comments about this Review
Whilst gathering the information to produce this review it became evident that software for cluster computing software fell into one of two camps:
1. Cluster Management Software (CMS) - this software is designed to administer and manage application jobs submitted to workstation clusters. It encompasses the traditional batch and queuing systems.
2. Cluster Computing Environments (CCE) - with this software the cluster is typically used as an applications environment, similar in many ways to distributed shared memory systems. This chapter also includes descriptions of environments which are at an early development stages.
This review is primarily aimed at CMS, which is reviewed in Chapter 4. The CCE packages and ones in early stages of development or implementation are briefly described in Chapter 5.
The authors wish, in particular, to acknowledge the work of Kaplan & Nelson [3 & 4], and their evaluation criteria (Chapter 2, page 6) used in this review is based on their work. The authors acknowledge the help and assistance of vendors and numerous staff tasked with supporting the various packages. The authors would like to thank the JISC-NTI, and in particular Tom Franklin, for sponsoring this review. Finally, we wish to thank Tony Hey for his comments on this review.
1.5
Cluster Software and Its Interaction With the Operating System
In order to understand cluster software it is necessary to know how it interacts with the operating systems of a particular platform.
The CMS described in this review works completely outside the kernel and on top of a machines existing operating system. This means that its installation does not require modification of the kernel, and so basically the CMS package is installed like any other software package on the machine.
The situation is very different with the CCE. Here, typically, a micro-based kernel with customised services needs to be installed instead of the existing kernel to support the desired environment. The reason for this more radical solution with CCE is the need to support functionality, such as virtual shared memory. This could not be achieved efficiently outside the kernel.
The BSP and PVM packages can exist in two forms. The public domain versions of these packages are installed on top of an existing operating system. Whereas the vendor versions of the software are often integrated into the kernel to optimise their performance.
1.6
Some Words About Cluster Computing
So called cluster computing systems and the means of managing and scheduling applications to run on these systems are becoming increasingly common place. CMS packages have come about for a number of reasons; including load balancing, utilising spare CPU cycles, providing fault tolerant systems, managed access to powerful systems, and so on. But overall the main reason for there existence is their ability to provide an increased, and reliable, throughput of user applications on the systems they manage.
The systems reviewed in this report are those that are more commonly known, not all, and for that matter not necessarily the best. Approximately twenty CMS systems are briefly reviewed, at least two more packages are known to exist, but have been excluded because the authors had great difficulty getting further information on these products.
This review attempts to gauge and assess the features and functionality of each CMS package against a set of criteria deemed to be highly desirable or useful additions. The criteria that has been adopted for assessing the CMS packages is based heavily upon that devised by Kaplan & Nelson [3 & 4] at NASA. However, the criteria in this review has been broadened and modified to reflect the needs of the JISC-NTI and the knowledge and experience of the authors [5, 6, 7, 8, 9 & 10].
1.7
The Workings of Typical Cluster Management
Software
To run a job on a CMS batch system it is usual to produce some type of resource description file. This file is generally an ASCII text file (produced using a normal text editor or with the aid of a GUI) which contains a set of keywords to be interpreted by the CMS. The nature and number of keywords available depends on the CMS package, but will at least include the job name, the maximum runtime and the desired platform.
Once completed, the job description file is sent by the client software resident on the user's workstation, to a master scheduler. The master scheduler, as its name implies, is the part of the CMS that has an overall view of the cluster resources available: the queues that have been configured and the computational load on the workstations that it is managing. On each of the cluster workstations daemons are present that communicate their state at regular intervals to the master scheduler. One of the tasks of the master scheduler is to evenly balance the load on the resources that it is managing. So, when a new job is submitted it not only has to match the requested resources with those that are available, but also needs to ensure that the resources being used are load balanced.
Typically a batch system will have multiple queues, each being appropriate for a different type of jobs. For example, a queue may be set up for a homogeneous cluster which is primarily used to service parallel jobs, alternatively queues may be set up on a powerful server for CPU intensive jobs, or there may be a queue for jobs that need a rapid turnaround. The number of possible queue configurations is large and will depend on the typical throughput of jobs on the system being used.
The master scheduler is also tasked with the responsibility of ensuring that jobs complete successfully. It does this by monitoring jobs until they successfully finish. However, if a job fails, due to problems other than an application runtime error, it will reschedule the job to run again.
1.8
Clusters of Workstations: The Ownership
Hurdle.
Generally a workstation will be "owned" by, for example, an individual, a groups, a department, or an organisation. They are dedicated to the exclusive use by the "owners". This ownership often brings problems when attempting to form a cluster of workstations. Typically, there are three types of "owner":
· Ones who use their workstations for sending and receiving mail or preparing papers, such as administrative staff, librarian, theoreticians, etc.
· Ones involved in software development, where the usage of the workstation revolves around the edit, compile, debug and test cycle.
· Ones involved with running large numbers of simulations often requiring powerful computational resources.
It is the latter type of "owner" that needs additional compute resources and it is possible to fulfill their needs by fully utilising spare CPU cycles from former two "owners". However, this may be easier said than done and often requires delicate negotiation to become reality.
In this chapter the functionality and features (criteria) that a potential system administrator or user of CMS are defined. The criteria used are based on those originally set out by Kaplan and Nelson [3 & 4], have been modified to reflect the views and experiences of the authors. Apart from when explicitly mentioned, the criteria used are all deemed to be highly desirable features of a CMS package.
2.2
Computing Environments Supported
2.2.1
COMMERCIAL/RESEARCH
Is it a commercial or a research/academic product? This factor will determine the cost and level of support that can be expected - as most users of commercial and public domain software will understand.
Comments
A commercial site is likely to require a higher degree of software stability, robustness and more comprehensive software support services than an academic site. The software is likely to be managing workstations which are not only crucial to internal productivity but may be critical for services that are being provided to a commercial entity. In the case of research sites there is often more leeway and flexibility about the levels of service that are provided or are expected. It is also possible that a research site does not have the funds available to purchase expensive "turn-key" software with high levels of user support.
Does the software support homogeneous or heterogeneous clusters?
What are the hardware platforms supported?
Vendor operating systems supported.
2.2.5
ADDITIONAL HARDWARE/SOFTWARE
Is there any need for additional hardware or software to be able to run the CMS package? For example, additional diskspace or software such as AFS or DCE (see glossary for explanations of terms).
Comments
It is important that the CMS chosen by a site fully supports the platforms that it is intended for. For example, problems are bound to occur if you have Sparc platforms running Solaris 2.4 and the Cluster Management Software only supports SunOS 4.1.3.
Note - It is assumed that software such as NIS and NFS is available on the clusters being used.
2.3
Application support
2.3.1
BATCH JOBS
Are batch submissions of jobs supported?
Are jobs that would normally be run interactively supported? For example, a debugging session or a job that requires user command-line input .
Is there support for running parallel programs on the cluster, for example PVM, MPI or HPF.
Comments
The provision of support for parallel computing is probably more relevant to research sites, at the moment, than commercial sites. However, support for parallel computing will often mean that the CMS is more flexible in how it can be configured, than one that only supports sequential jobs.
Note - should also be taken of the parallel software packages supported by the Cluster Management Software. As a minimum acceptable criteria it should support PVM 3.3.x and, at least, have plans to support the common industry interface MPI.
Are multiple, configurable, queues supported? This feature is necessary for managing large multi-vendor clusters where jobs ranging from short interactive sessions to compute intensive parallel applications need to run.
Comments
The configuration of queues within a managed cluster is a matter that should considered carefully. The number and configuration will determine how effectively and efficiently a cluster can be utilised.
2.4
Job Scheduling and Allocation Policy
2.4.1
DISPATCHING POLICY
Is there a configurable dispatching policy, allowing for factors such as system load, resources available (CPU type, computational load, memory, disk-space), resources required, etc.
2.4.2
IMPACT ON WORKSTATION OWNER
What is the impact of the CMS on the owner of the workstation? Is it possible to minimise the impact on a workstation? It should be possible to configure the CMS to, for example, suspend jobs when an owner is using his/her workstation or set jobs to have a low priority (nice) value.
Comments
Ownership of a workstation and its resources can be a rather problematic matter. To utilise CPU cycles of workstations physically distributed around a site which are owned by individuals, groups and departments can become a point of aggravation and heated debate. It is vital that the CMS is able to minimise and manage the impact of running jobs on remote workstations.
Alternatively, CMS software to manage a "headless" workstation cluster does not need to be aware of the owner. Then it is just a matter of allocating resources to jobs and ensuring that throughput is carried out efficiently and effectively.
2.4.3
IMPACT ON THE WORKSTATION
What is the impact of running the CMS package on a workstation? There will be an obvious impact when a job is running, but there also may be an undesirable impact when a job is suspended, checkpointed or migrated to another workstation. For example process migration requires that a job saves its state and then is physically moved over the local network to another workstation. This will impact on the workstation (CPU/memory and diskspace) while the state is saved and then on the network bandwidth when tens of Mbytes of data is transferred across the network.
The CMS should load balances the resources that it is managing. It is useful if the system administrator can customise the default configuration to suit the local conditions in the light of experience of running the CMS.
This is a means of saving a job's state at regular intervals during its execution. If the workstation fails then the job can be restarted at its last checkpointed position.
Comments
This is a useful means of saving a jobs state in case of failure while a job is running. Its usefulness needs to be weighed carefully against the costs in additional resources required to support it. For small jobs (ones that do not take long to run), ones that are not time critical checkpointing is unnecessary. If the jobs are time critical, for example at a commercial site where results equate directly to income, then checkpointing would be absolutely necessary.
The main cost of checkpointing is in the need for hardware to support the activity. Saving a jobs state at regular intervals, for even relatively small jobs, may require tens of Mbytes of diskspace per workstation to achieve. This may mean that:
· Additional diskspace per workstation is needed.
· Home filestore my be remotely mounted, this will have an impact on NFS performance and the network bandwidth.
· Many existing clusters will have not have the physical resources (local diskspace) to support checkpointing.
This is a means of migrating an executing jobs from one workstation to another. Its is often used when owner takes back control of his/her workstation. Here the job running on the workstation will, be suspended first, and then migrated onto another workstation after a certain time interval. Another use for job migration is to move jobs around to load balance the cluster.
Comments
Like checkpointing, process migration can be a very useful feature of a CMS package. Typically it is used to minimise the impact on a owner workstation (a job will be suspended and eventually migrated on to a different resource when the workstation is used by the owner) and also as a means of load balancing a cluster (migrating processes of heavily load workstation and running them on lightly loaded ones). The impact of using process migration is similar to checkpointing, but has the additional disadvantage that large state files will be moved around the network connecting the cluster. This can have a serious impact on users of the network.
2.4.7
JOB MONITORING AND RESCHEDULING
The CMS should monitor that jobs running and in the event of a job failure should reschedule it to run again.
2.4.8
SUSPENSION/RESUMPTION OF JOBS
The ability to suspend and then resume jobs is highly desirable. This feature is particularly useful to minimise the impact of a jobs on the owner of a workstation, but may also be useful in the event of a system or network wide problem.
2.5
Configurability
2.5.1
RESOURCE ADMINISTRATION
The cluster administrator should have control over the resources available. The administrator should be able to, for example, control who has access to what resources and also what resources are used (CPU load, diskspace, memory).
The CMS should enforce job runtime limits, otherwise it will be difficult to fairly allocate resources amongst users.
It is common for a job to fork child processes. The CMS should be capable of managing and accounting for these processes.
Comments
Control over forked child processes is not common under most Unix operating systems. A parent processes can spawn child process which are not managed, cannot be accounted/charged for, and can potentially have a serious impact on load balancing ability of CMS.
The CMS should be able configure the available resources to be either shared or be exclusive to a given job.
Comments
Efficient use of resources may require close control over the number of processes running on a workstation, it may even be desirable to allow exclusive access to workstations by a particular job. It should also be possible to control the priority of jobs running on a workstation to help load balancing (nice)and minimise the impact of jobs on the owner of the workstation.
The user and/or administrator should be able to schedule when a job will be run.
What user interface do users and administrators of the CMS have.
Comments
The interface, for both a user or administrator, of a software package will often determine the popularity of a package. In general, a GUI based on Motif is the standard. However, there has been a dramatic increase in usage and popularity of the HTTP protocol and the WWW, so a GUI based on this technology seems likely to be a common standard in the future.
How easy and/or intuitive it is for users and administrators to use the CMS.
Can a user specify the resources that they require. For example, the machine type, job length and diskspace.
Can a user query the status of their job. For example, to find out if it is pending/running or perhaps how long before it completes.
Are statistics provided to the user and administrator about the jobs that have run?
2.6
Dynamics of Resources
2.6.1
RUNTIME CONFIGURATION
Can the resources available, queues and other configurable features of the CMS be reconfigured during runtime? i.e. it is not necessary to restart the CMS.
Is it possible to add and withdraw resources (workstations) dynamically during runtime?
2.6.3
SINGLE POINT OF FAILURE (SPF)
Is there a particular part of the CMS that will act as a SPF. For example, if the master scheduler fails does the CMS need restarting from scratch?
Comments
If the CMS has an SPF then there is no guarantee that a job submitted will complete. A typical SPF is only being able to run one master scheduler, so if it then fails the whole CMS fails. Ideally, there should be a backup scheduler which takes over if the original master scheduler fails. An SPF will also mean that the operators of the cluster will need to monitor closely the master scheduler in case of failure.
Is there fault tolerance built in to the CMS package? For example, does it check that resources are available before submitting jobs to them, and will it try to rerun a job after a workstation has crashed.
Comments
The CMS should be able to guarantee that a job will complete. So, if while a job is running the machine that it is running on fails then the CMS should not only notice that the machine/s are unavailable, but it should also reschedule the job that did not complete as soon as it is feasible.
Also, if a machine running a queue or CMS scheduler fails, the CMS should be able to recover and continue to run.
The real need for fault tolerance is determined by the level of service that is being provided by the cluster. However, fault tolerance is a useful feature in any system.
What security features are provided.
Comments
The CMS should provide at least normal Unix security features. In addition it is desirable that it takes advantage of NIS and other industry standard packages.
| Ref No. | Package Name | Vendor |
| 1. | Codine | GENIAS GmbH, Germany |
| 2. | Connect:Queue | Sterling Corp., USA |
| 3. | CS1/JP1 | Hitachi Inc, USA |
| 4. | Load Balancer | Unison Software, USA |
| 5. | Load Leveler | IBM Corp., USA |
| 6. | LSF | Platform Computing, Canada |
| 7. | NQE - Network Queuing Envn. | Craysoft Corp., USA |
| 8. | Task Broker | Hewlett-Packard Corp. |
| Ref No. | Package Name | Institution |
| 1. | Batch | UCSF, USA |
| 2. | CCS | Paderborn, Germany |
| 3. | Condor | Wisconsin State University, USA |
| 4. | DJM | MSCI, USA |
| 5. | DQS 3.x | SCRI, FSU, USA |
| 6. | EASY | ANL, USA |
| 7. | far | University of Liverpool, UK |
| 8. | Generic NQS | University of Sheffield, UK |
| 9. | MDQS | ABRL, USA |
| 10. | NOW | Berkeley, UC, USA |
| 11. | PBS | NASA Ames Research Center, USA |
| 12. | PRM | ISI, UC, USA |
| 13. | QBATCH | Vita Ltd, USA |
The aim of this chapter is to provide a brief description of each of the CMS packages, listed in the table above. The descriptions consist of information coalesced from vendor publicity, user guides and on-line (WWW) documents.
At least two other CMS packages; Balens (VXM Technologies Inc. USA) and J1 (Hitachi Inc.) - formerly known as the NC Toolset. Some difficulty has been found in trying to get further information about these packages, but when found it will be added to this review.
URL - http://www.genias.de/genias/english/codine/
Main Features of Codine
· Support for batch, interactive and parallel (Express, p4 and PVM)
jobs.
· Support for multiple queues.
· Support for checkpointing.
· Static load balancing.
· Dynamic load balancing by checkpointing and job migration.
· NQS interface - used for integration with existing NQS-based systems.
· Accounting and utilisation statistics.
· X11 Motif GUI, command-line and script interfaces for administrators and
users.
· POSIX compliance.
· Support for DCE technology.
3.2
Commercial Packages
3.2.1 CODIINE
URL http://www.sterling.com/
The Package provides a Unix batch and device queuing facility capable of supporting wide range of Unix-based platforms. It provides three queue types:
· Batch queue, providing a percentage mechanism for scheduling and running jobs;
· Device queue, which provides batch access to physical devices;
· Pipe queue, which transports requests to other batch, device or pipe queues at remote locations.
An intelligent batch job scheduling system provides job load balancing across the workstation clusters being managed. Load balancing is based on a scheduling algorithm which uses three statistics :
· Percentage of physical memory used.
· CPU utilisation
· Queue job limit.
Load Balancer attempts to optimise the use of computer resources by
distributing workloads to available UNIX systems across the network. Load
Balancer determines availability based on the:
· Capacity and current workload of each computer.
· Resources required by the submitted job.
· User defined and other constraints
Load Balancer tries to increase the overall job throughput by making use of
idle computer resources. At the same time, it attempts to prevent systems
becoming overloaded by distributing the load evenly across the available
computers.
Major Features
· No modification to applications or kernel.
· Central control and administration.
· Policy and constraint options.
· Job/application distribution criteria include:
- CPU Speed.
- Current load average.
- Time of day.
- RAM and Swap space.
- CPU type (HP, IBM, etc.).
- Presence and absence of interactive user(s).
- Other user-defined criteria.
· Robust and scalable.
· Works with both batch jobs and interactive applications.
· Security features are built in to restrict unauthorised access.
URL http://lscftp.kng.ibm.com/pps/products/loadlev.html
Features
· Distributed job scheduler.
· Serial/parallel batch and interactive workload.
· Workload balancing.
· Heterogeneous workstation support.
· Compatibility with NQS.
· Central point of control.
· Scalability (queues and master schedulers).
When a job is scheduled, its requirements are compared to all the resources
available to LoadLeveler. Job requirements might be a combination of memory,
disk space, architecture operating system and application programs.
LoadLeveler's central manager collects resource information and dispatches the
job as soon as it locates the suitable resources. Load Leveler accepts shell
scripts written for NQS so that jobs can be run under LoadLeveler or under
NQS-based systems. LoadLeveler also provides a user or system-initiated
checkpoint/restart capability for certain types of Fortran or C jobs linked to
the LoadLeveler libraries.
Interactive session support - LoadLeveler's interactive session support
feature allows remote TCP/IP applications to connect to the least loaded
cluster resource.
Individual control - Users can specify to LoadLeveler when their
workstation resources are available and how they are to be used.
Central control - From a system management perspective, LoadLeveler
allows a system administrator to control all the jobs running on a cluster. Job
and machine status are always available, providing administrators with the
information needed to make adjustments to job classes and changes to
LoadLeveler controlled resources.
Scalability - As workstations are added, LoadLeveler automatically
scales upward so the additional resources are transparent to the user.
Load Leveler has a command line interface and a Motif-based GUI. Users can:
· Submit and cancel jobs.
· Monitor job status.
· Set and change job priorities.
URL http://www.platform.com/products/overview.html
Specifically, LSF provides the following:
· Batch job queuing and scheduling.
· Interactive job distribution across the network.
· Load balancing across hosts.
· Transparent access to heterogeneous resources across network.
· Site resource sharing policy enforcement
· Network-wide load monitoring.
Other LSF Features
· Comprehensive load and resource information.
· Transparent remote execution. The same execution environment (user ID,
umask, resource limits, environment variables, working directory, task exit
status, signals, terminal parameters, etc.) are maintained when jobs are sent
to execute remotely.
· Batch and interactive processing.
· Sequential and parallel processing.
· Heterogeneous operation.
· Application Programming Interface (API) - allowing software be developed
for new distributed applications.
· Fault tolerance - LSF is available as long as one host is up and no job
is lost even if all hosts are down.
· Site policy options - provides configuration options for sites to
customise their own resource sharing policies.
· Job accounting data and analysis tools.
URL http://www.cray.com/PUBLIC/product-info/craysoft/WLM/NQE/NQE.html
· Automatic load balancing across an entire network.
· Parallel Virtual Machine (PVM) support.
· Job-dependency capability that allows users to specify the criteria that
must be met before a specific job may be initiated.
· Automatic job routing to systems that match user-specified attributes.
· Unattended and secure file transfer to/from any client using the File
Transfer Agent (FTA).
· Batch job submission from remote locations, using the NQE client
software.
· Compatibility with other UNIX systems running public domain versions of
NQS.
· Security features and multiple authentication mechanisms.
· Motif GUI for network system and NQE job, installation, and on-line
documentation.
· Automatic placement of jobs in wait queues for systems not available.
· Multiple job and administration. logs.
· Application programming interfaces (APIs) for customising applications
to take advantage of NQE's functionality.
· World-wide web interface to submit NQE jobs, view job status, and
receive back output.
URL http://www.hp.com/
· The ability to load balancing computational load among a group of
computer systems.
· Intelligent targeting of specialised jobs so that they run on the most
appropriate server for the specific task. For example, a graphics application
may be most efficiently run on a machine with a graphics accelerator or - the
user needs have no machine-specific knowledge.
· DCE inter-operability.
Task Broker automates the distribution of computations by:
- Gathering machine-specific knowledge from the user.
- Analysing machine-specific information and selecting the most available
server.
- Connecting to a selected server via telnet, rsh (remote
shell), or crp (create remote process).
- Copying program and data files to a selected server via ftp or
NFS.
- Invoking applications over the network
- Copying the resulting data files back from the server via ftp or
NFS.
Each of the above steps is done automatically by Task Broker without the user
needing to be aware of, or having to deal with, the details of server selection
and data movement.
Task Broker Features:
· Runs without requiring application code to be recompiled.
· Adds servers and clients to a network simply by hooking them to a LAN,
updating the central configuration file to include the new client server, and
starting the Task Broker daemon.
· Provides an accounting of services used on a given server.
· Allows limiting of the number of applications running on a computer at
any one time. This prevents degradation of server performance.
· Permits control of the use of computers as servers. For example, users
can specify that their workstations be accessed only at night.
· GUI provides a visual interface to most of the Task Broker command set
and configuration information. In addition, task status monitoring and control
are provided for the user.
· An on-line, context-sensitive help system.
URL http://cornelius.ucsf.edu/~srp/batch/ucsf/batchusr_toc.html
The capabilities include:
· Start and suspend running jobs at specific times and/or days.
· Start and suspend running jobs at specific machine loads.
· Start and suspend jobs based on whether someone is using the console.
· Logging to the system log at varying levels. Statistics and database
information are also available via signals sent to the daemon process.
· Accounting of jobs for each queue.
· Remote queues. Queues may be localised to a specific machine.
· Load balanced queues. Given a set of hosts supporting the same queue,
the machines (batch daemons) decide amongst themselves which host will run the
current job, based on load averages, currently running jobs and other details.
· Job runtime limitations.
· Run jobs with specific nice values.
· Limited console user control over the spooled jobs to minimise the
effect of batch jobs on interactive response during work hours.
· Ancillary programs for job and queue inspection, removal or
manipulation.
· A GUI (using the forms library) is also available to the job status and
removal tools.
URL http://www.uni-paderborn.de/pcpc/work/ccs/
The Port-Manager (PM) is a daemon that connects the MS, Queue-Manager (QM) and
the Machine-Managers (MM) together. The MS can be started manually by the user
or automatically by the surrounding Unix system as the user's default login
shell. In either case a connection to the PM is established and data
identifying the is transferred. The PM uses this information to initiate a
first authorisation, on failure the user session is aborted immediately.
Otherwise, the user has the whole command language of the MS at his/her
disposal.
If a user requests a VHE consisting of a number of processors in a certain
configuration and wants exclusive usage for one hour. The number of VHEs a user
can handle simultaneously is only restricted by the limitations of the
metacomputer and the restrictions set up by the administrator or given by the
operating system. When a VHE is ordered from the MS side the PM checks the
user's limitations first, i.e. the maximum number and kind of resources allowed
for the requesting user or project. If the request validation is successful,
the VHE is sent to the QM.
The QM administers several queues. Depending on priority, time, resource
requirements, and the application mode (batch or interactive), a queue for the
request is chosen. If the scheduler of the QM decides that a certain VHE should
be created, it is sent to the PM. The PM configures the request in co-operation
with appropriate MMs and supervises the time limits. In addition, the PM
generates operating and accounting data.
The user is allowed to start arbitrary applications within his VHE. The upper
level of the three level optimisation policy used by CCS, corresponds to an
efficient hardware request scheduling (QM). The mid-level maps requests onto
the metacomputer (PM) and the third level handles the system dependent
configuration software, optimising request placements onto the system
architectures (MMs). This three level policy leads to a high level load
balancing within the metacomputer.
URL http://www.cs.wisc.edu/condor/
The Condor software monitors the activity on all the participating workstations
in the local network. Those machines which are determined to be idle are placed
into a resource pool. Machines are then allocated from the pool for the
execution of jobs. The pool is a dynamic entity - workstations enter when they
become idle, and leave again when they get busy.
Design Features
· The user only needs to relink their code to run under Condor.
· The local execution environment is preserved for remotely executing
processes.
· The Condor software is responsible for locating and allocating idle
workstations.
· "Owners" of workstations have complete priority over their own
machines.
· Jobs are guaranteed to complete eventually.
· Local diskspace is not used by remotely executing Condor job.
· Condor works outside the kernel, and is compatible with BSD and other
Unix derivatives.
URL http://www.msc.edu/
Main features:
· Pack jobs into the MPP in order to minimise the number of idle
processing elements (PEs). DJM uses a packing algorithm that attempts to select
the best physical location and order that the jobs should be run on the MPP.
· Ensure that jobs complete. DJM forecasts future job activity to prevent
jobs from being killed because of scheduled events, such as a maintenance
shutdown.
· Select jobs to run based on local site policy. DJM can be configured to
determine the order in which jobs are run, including the time of day, the class
of user, and the size of the job.
· Reserve the MPP for real-time applications, heterogeneous applications,
or live demonstrations.
· Manage access to the MPP by both interactive and batch users.
· Runs transparently underneath NQS (Network Queuing System).
URL http://www.scri.fsu.edu/~pasko/dqs.html
NOTE: Dynamic Network Queuing System (DNQS) has now been superseded by DQS 3.x.
DNQS was based on an earlier version of DQS produced by Tom Green at Florida
State University in 1990. DNQS was basically re-written by people at McGill
University in Toronto to increase the functionality and make the software
easier to use and install. DNQS is still available, information about can be
found at the following URLs:
URL http://www.physics.mcgill.ca/cmpdocs/dnqs.html
· GUI-based interface.
· GUI-based accounting.
· Parallel Make Utility.
· Parallel package support - PVM, TCG/MSG, P4/P5 and MPI (planned).
· Interactive support.
· Scheduler based on Queue-Complexes.
· No single point of failure, redundant queue masters are supported.
Qmon - Qmon is a GUI to DQS based on X/Xt. The top
module has menus for executing DQS commands and other utility functions. An
icon window displays the current states of the queue and, finally a text output
window to record the response of DQS commands launched from qmon.
Qusage - This is a Xt-based accounting package provided by DQS.
Accounting information in a variety of forms can be retrieved via
Qusage, it features on-line help and Postscript output. All accounting
information is stored in one place, making retrieval of accounting information
quick and easy.
Dmake - Distributed Make is a generic parallel make utility designed to
speed up the process of compiling large packages. Dmake is designed
for use with DQS, but can also be easily used as a standalone parallel make
utility (separate from DQS). Dmake was developed with simplicity in
mind, no daemons or other modifications to network configurations are required.
URL http://info.mcs.anl.gov/Projects/sp/scheduler/scheduler.html
While there are currently no limits to the number or size of jobs that can be
submitted, the scheduler uses a public algorithm to determine when batch or
interactive time is actually provided. Any modifications to this algorithm will
be made public. Argonne has also implemented an allocation policy as a separate
part of the scheduler. The intent of the policy is to ensure all users some set
amount of resource time and to prevent people from using more than their share
of resources.
URL http://www.liv.ac.uk/HPC/farHomepage.html
· Detect lightly loaded workstations.
· Provide automatic load balancing.
· Allow a flexible workstation access policy.
· Allow users to request dedicated clusters of free workstations.
· Support a general PVM service.
far provides an environment in which the user could
rlogin to, or issue a command via the Unix commands at or
rsh which would be run automatically on the most lightly-loaded
workstation in the network. The implementation of the system is based on a
managed database of current workstation usage which could be inspected to find
a suitable workstation. far also supports the exploitation of a
network for running message passing parallel programs, i.e. as a
loosely-coupled distributed-memory parallel computer. Functionality, such as
checkpointing and process migration, has been deliberately omitted from
far.
far release 1.0 has the following features:
· A managed database of workstation loads at any given instant.
· Ability to run remote commands on the most lightly loaded workstations.
· Support for PVM and other message passing libraries.
· System administrator control of remote access to workstations.
· Facilities for the dynamic configuration of workstation clusters
· A set of system administrator controls.
URL http://www.shef.ac.uk/uni/projects/nqs/Product/Generic-NQS/v3.4x/
· Provide for the full support of both batch and device requests. Here a
batch request is defined as a shell script containing commands not requiring
the direct services of some physical device. In contrast, a device request is
defined as a set of independent instructions requiring the direct services of a
specific device for execution (e.g. a line printer request).
· Support all of the resource quotas enforceable by the underlying UNIX
kernel implementation that are relevant to any particular batch request, and
its corresponding batch queue.
· Support the remote queuing and routing of batch and device requests
throughout the network of machines running NQS.
· Modularise all of the request scheduling algorithms so that the NQS
request schedulers can be easily modified on an installation by installation
basis, if necessary.
· Support queue access restrictions whereby the right to submit a batch or
device request to a particular queue can be controlled, in the form of a user
and group access list for any queue.
· Support networked output return, whereby the stdout and
stderr files of any batch request can be returned to a possibly remote
machine.
· Allow for the mapping of accounts across machine boundaries.
· Provide a friendly mechanism whereby the NQS configuration on any
particular machine can be modified without having to resort to the editing of
obscure configuration files.
· Support status operations across the network so that a user on one
machine can obtain information relevant to the NQS on another machine, without
requiring the user to log in on the target remote machine.
· Provide a design for the future implementation of file staging, whereby
several files or file hierarchies can be staged in or out of the machine that
eventually executes a particular batch request.
NQS (modified by Monsanto) has been superseded by Generic NQS 3.4, which in
turn is being further developed and supported by the University of Sheffield -
URL http://www.shef.ac.uk/uni/projects/nqs
URL ftp://ftp.arl.mil/arch/
MDQS is designed around a central queue which is managed by a single privileged
daemon. Requests, delayed or immediate, are queued by non-privileged programs.
Once queued, requests can be listed, modified or deleted. When the requested
device or job stream becomes available, the daemon executes an appropriate
server process to handle the request. Once activated, the request can still be
canceled or restarted if needed.
MDQS can serve as a delayed- execution/batch. MDQS provides the system manager
with a number of tools for managing the queuing system. Queues can be created,
modified, or deleted without the loss of requests. MDQS recognises and supports
both multiple devices per queue and multiple queues per device by mapping input
for a logical device to an appropriate physical output device. Anticipating the
inevitable, MDQS also provides for crash recovery.
The MDQS system has been developed at the U.S. Army, Ballistics Research
Laboratory to support the work of the laboratory and is available to other UNIX
sites upon request.
URL http://www.nas.nasa.gov/NAS/Projects/pbs/
PBS's independent scheduling module allows the system administrator to define
what types of resources, and how much of each resource, can be used by each
job. The scheduling module has full knowledge of the available queued jobs,
running jobs, and system resource usage. Using one of several procedural
languages, the scheduling policies can easily be modified to suit the computing
requirements and goals of any site. The batch system allows a site to define
and implement policy as to what types of resources and how much of each
resource can be used by different jobs. PBS also provides a mechanism which
allows users to specify unique resources required for a job to complete.
A Forerunner of PBS was Cosmic NQS, which was also developed by the NAS
program. It became the early standard for batch system under Unix. However
Cosmic NQS had several limitations and it was difficult to maintain and
enhance.
PBS Supported Provides:
· User commands - for manipulating jobs, i.e. submit, deletion, status,
etc.
· Operator commands - for setting up, starting and stopping PBS.
· Administration commands - for managing PBS (setting up, start/stop
queues).
· Applications Programming Interface - design applications to run with
PBS.
· Resource Management - for manipulating PBS to best use the resources
available.
· Job routing - the process of moving jobs from one destination to
another.
· Job initiation - the process of selecting and running certain jobs.
· File staging - process of managing disk space and moving files to and
from the system executing the job.
· Job tracking - monitoring the status of jobs submitted remotely.
· History File - log maintained by PBS of job resource usage statistics.
· Error detection and recovery.
· Client software can be installed on all cluster workstations.
URL http://nii-server.isi.edu/gost-group/products/prm/
PRM enables users to run sequential or parallel jobs on a network of
workstations. Sequential jobs may be off loaded to lightly loaded workstations
while parallel jobs can make use of a collection of workstations. PRM supports
the message passing libraries CMMD and a PVM interface (V3.3.5). PRM also
supports terminal and file I/O activity by its tasks, such as keyboard input,
printing to a terminal or access to files that may be on a filesystem not
mounted by the host on which the task is running. Further more, the components
of an application may span multiple administrative domains and hardware
platforms, without imposing the responsibility of mapping individual components
to nodes on the user.
PRM selects the processors on which the jobs will run, starts the job, supports
communication between the tasks that make up the job, and directs input and
output to and from the terminal and files on the user's workstation. At the job
level, location transparency is achieved through a dynamic address translation
mechanisms that translate task identifiers to physical workstation addresses.
PRM's resource allocation functions are distributed across three entities: the
system manager, the job manager, and the node manager. The system manager
controls access to a collection of processing resources and allocates them to
jobs as requested by job managers. Large systems may employ multiple system
managers, each managing a subset of resources. The job manager is the principal
entity through which a job acquires processing resources to execute its tasks.
The job manager acquires resources from one or more system managers and
initiates tasks on these workstations through the node manager. A node manager
runs on each workstation in the PRM environment. It initiates and monitors
tasks on the workstation on which it is running.
URL http://gatekeeper.dec.com/pub/usenet/comp.sources.misc/volume25/QBATCH/
Features:
There can be as many queues as the system can support. Queues are named, and
run asynchronously. The processing of jobs running in one queue are totally
independent of those in any other (subject of course to the independence of
potentially shared resources such as data files and devices).
· Queues can be started and stopped independently.
· The priority of all jobs running are defined when the queue is
created.
· Jobs running in a queue consist of shell scripts. The jobs run with the
uid and gid of the submittor, and run in the environment
ruling when the job was submitted.
· stdout and stderr of running jobs is redirected to a
'monitor'. This will normally be the queue monitor, but may be specified for
individual jobs.
· The status and contents of queues can be listed at any time.
· If qbatch is configured in the system init script, all
queues which were running when the system went down will be restarted. Any jobs
present in the queues will also be restarted.
Facilities:
- Inform user when job is complete.
- Cancel jobs in queues, or kill running jobs.
- Change the running order of jobs in queues.
- Suspend and resume processing of jobs.
- Start and stop the queue process engine at any time.
- Sizable submissions to a queue, and re-enable them.
- Stop a queue, and wait until the current job has finished.
- List the status and contents of a queue.
- Repeat (or kill and repeat) the currently running job in a queue.
- Examine or edit jobs in a queue, and examine the monitors.
- Monitor (and perhaps account) the times taken by jobs in the queue.
- Prevent CPU hogging jobs from swamping the system.
- Separate queue for large jobs that is started and stopped by cron
out of working hours.
- Protect sensitive applications and data with file and directory permissions.
In this chapter we compare the evaluation criteria laid out in chapter 2 with
each CMS package.
Contact : Andreas Schwierskott
GENIAS Software GmbH
Erzgebirgstr. 2
D-93073 Neutraubling , Germany
Tel: +49 9401 9200-33
Email: andy@genias.de
Systems Supported:
DEC Alpha - OSF/1 v1.3, v2.x, v3.0, DEC SMM - OSF/1 v3.0
HP - HP-UX v9.0
IBM - AIX 3.2.5, IBM SP1, SP2 - AIX 3.2.5
INTEL - LINUX
SGI - IRIX v4.0.5, v5.x, SGI Power Challenge - IRIX v6.0
SUN Sparc - SunOS 4.1, Solaris 2.1, SUN SMM - Solaris 2.1
Upon request, the software can be ported to other platforms.
Contact: Peter Johnson
Sterling Software
Communications Software Division
5215 North O'Conner Boulevard, Suite 1500
Irving, TX 75039
Tel: +1 (800) 700 5599
Email: connect@sterling.com
Systems Supported:
Data General - DG/UX
DEC - OSF/1 & Ultrix
IBM - Aix/ESA
HP - HP-UX
SGI - Irix
Amdahl - UTS
Sun - Solaris & SunOS
Contact:
Unison Software
9-11 Vaughn Road, Harpenden
Hertfordshire AL5 4HU
United Kingdom
Tel: +44 1582 462424
Email: -
URL http://www.unison.com/main-menu/contacts/mailsupport.html
Systems Supported:
IBM - Aix
Sun - SunOS & Solaris
DEC - Ultrix
HP - HP-UX
SGI - Irix
Contact: Joe Banas
IBM UK Ltd
1 New Square, Bedfont Lakes
Feltham
Middlesex TW14 8HB, UK
Tel: +44 181 818 4000
Email: banas@vnet.ibm.com
Systems Supported:
IBM RS/6000 & SP - AIX 4.1.3
Sun Sparc - SunOS 4.1.2 & Solaris 2.3
SGI - IRIX 5.2
HP Apollo 9000 Series 700 - HP-UX 9.01
Contact:
Platform Computing Corporation
5001 Yonge St., #1401
North York, ON, M2N 6P6
Canada
Tel: +1 (416) 512-9587
Email: info@platform.com
Systems Supported:
CONVEX - C-Series systems running ConvexOS 10 or 11
Digital - Ultrix 4.2 - 4.4, ALPHA/AXP - Digital UNIX 2.x & 3.x
HP 9000/300 - HP-UX 8.x, 9.x 9000/700 systems & 9000/800 systems 9.x &
10.x
IBM RS/6000 - AIX 3.2.x & 4.1
SGI - IRIX 4.x, 5.2, 5.3 & 6.01
SUN - SunOS 4.1.x, Solaris 2.3 & 2.4
Contact:
Cray Research, Inc.
655 Lone Oak Drive
Eagan, Minnesota 55121
Tel: +1 (612)452-6650
Email: craysoft@cray.com
Systems Supported:
Sun - Solaris & SunOS
SGI - IRIX
IBM - AIX
HP - HP-UX
DEC - OSF/1
Cray - UNICOS
Contact: Terry Graff
Hewlett-Packard Corp.
300 Apollo Drive
Chelmsford
Ma 01824 - USA
Tel: +1 (508) 436 5931
Email:
Systems Supported:
HP Task Broker runs on HP 9000 Series 300, 400, 600, 700, and 800 computers
running the HP-UX operating system, and the HP Apollo workstations DN2500,
DN3500, DN4500,
DN5500, and DN10000 running Domain/OS. In addition, Scientific Applications
International Corporation (SAIC) has ported Task Broker to the Sun3, Sun4, and
SPARCstation platforms.
3.2.5
LOAD SHARING FACILITY (LSF)
3.2.6
NETWORK QUEUING ENVIRONMENT (NQE
)
3.3
Research Packages
3.3.1
BATCH
3.3.2
COMPUTING CENTRE SOFTWARE (CCS)
3.3.4
DISTRIBUTED JOB MANAGER (DJM)
3.3.5
DISTRIBUTED QUEUING SYSTEM (DQS 3.X)
3.3.6
EXTENSIBLE ARGONNE SCHEDULER SYSTEM (EASY)
3.3.7
FAR - A TOOL FOR EXPLOITING SPARE WORKSTATION
CAPACITY
3.3.8
GENERIC NETWORK QUEUING SYSTEM (GNQS)
3.3.9
MULTIPLE DEVICE QUEUING SYSTEM (MDQS
)
3.3.10
PORTABLE BATCH SYSTEM (PBS)
3.3.11
THE PROSPERO RESOURCE MANAGER (PRM)
4.
Assessment of Evaluation Criteria
Num. Package Vendor Version
1. Codine - Computing in GENIAS GmbH, Germany 3.3.2
Distrib. Network Envn.
2. Connect:Queue Sterling Corp., USA October
95
3. Load Balancer Unison Software, USA June 95
4. Load Leveler IBM Corp., USA 1.2.1
5. LSF - Load Sharing Facility Platform Computing, Canada 2.1
6. NQE Craysoft Corp., USA 2.0
7. Task Broker Hewlett-Packard Corp. 1.2
4.1.1
CODINE
Functionality Supported Comments
Commercial/Research Commercial Fully supported commercial package by
Genias GmbH
Cost Yes There are academic and commercial rates.
User Support Yes tel. & Email - seems to be very responsive.
Heterogeneous Yes
Platforms Most Major Convex, Cray, DEC, HP, IBM, SGI and Sun
Operating Systems Most Major ConvexOS, UniCOS, OSF/1, Ultrix, HP-UX,
AIX, Irix, SunOS and Solaris
Additional No
Hardware/Software
Batch Jobs Yes
Interactive Support Yes
Parallel Support Yes PVM and Express
Queue Types Yes
Dispatching policy Yes Configurable.
Impact on Cluster Yes Can be set up to minimise impact
Users
Impact on Cluster w/s Yes If process Migration and checkpointing used
Load Balancing Yes No details.
Check Pointing Yes Optional - Relink code with Codine
libraries
Process Migration Yes Optional
Job Monitoring and Yes
Rescheduling
Suspension/Resumption Yes BSD Signals supported
of Jobs
Resource Administration Yes Yes, via a master scheduler
Job Runtime Limits Yes
Forked Child Management No No migration or checkpointing of these
processes.
Job Scheduling Priority Yes
Process Management Yes Runtime limits and exclusive CPU usage
Job Scheduling Control Yes
GUI/Command-line Yes
Ease of Use Unknown
User Allocation of Job Yes
User Job Status Query Yes
Job Statistics Yes
Runtime Configuration Yes Dynamic
Dynamic Resource Pool Yes
Single Point of Failure Yes Being worked upon.
Fault Tolerance Some Master scheduler will reschedule a job
Security Issues Yes Normal Unix - but seems to have been
thought out.
4.1.2
CONNECT:QUEUE
Functionality Supported Comments
Commercial/Research Commercial Sterling Software - Dallas office
Cost ?
User Support Yes By Sterling
Heterogeneous Yes
Platforms Most Major Data General, Digital, IBM HP, SGI,
Amdahl, Meiko, and Sun
Operating Systems Most Major DG/UX, OSF/Ultrix, Aix/ECS, UX, Irix, UTS,
SunOS/Solaris
Additional No
Hardware/Software
Batch Jobs Yes
Interactive Support No Planned for the future but will redirect
interactive i/o
Parallel Support Yes PVM and Linda
Queue Types Yes Multiple queue types
Dispatching policy Yes CPU/Memory/Queue based
Impact on w/s Owner Yes Owner has no control over w/s resources
Impact on Cluster w/s Yes CPU/Memory/Swap-space
Load Balancing Yes CPU/Memory/Queue based
Check Pointing No Planned for the future
Process Migration No Planned for the future
Job Monitoring and Yes
Rescheduling
Suspension/Resumption No
of Jobs
Resource Administration Yes
Job Runtime Limits Yes
Forked Child Management No
Job Scheduling Priority Yes
Process Management Yes Need to configure whole machine
Job Scheduling Control Yes
GUI/Command-line Both GUI for user and administrator
Ease of Use Unknown
User Allocation of Job Limited User can only specify which queue
User Job Status Query Yes
Job Statistics Yes
Runtime Configuration Yes Administrator can modify queues and their
policies on "fly"
Dynamic Resource Pool Yes Probably yes
Single Point of Failure Minimised Multiple master schedulers are run.
Fault Tolerance Yes Scheduler will attempt to restart jobs
after a failure
Security Issues Yes Uses normal Unix security
4.1.3
LOAD BALANCER
Functionality Supported Comments
Commercial/Research Commercial Unison Software Solution
Cost POA
User Support Yes Email/Tel
Heterogeneous Yes
Platforms Most Major Sun, HP, IBM, DEC, SGI
Operating Systems Most Major Ultrix, HP-UX, Aix, Irix, SunOS, Solaris
Additional No
Hardware/Software
Batch Jobs Yes
Interactive Support Yes
Parallel Support No
Queue Types Multiple
Dispatching policy Multiple Numerous factors can be taken into account
Impact on w/s Owner Configurable User have rights over their w/s
Impact on Cluster w/s Yes CPU/Memory/Swap
Load Balancing Yes Tries to distribute load evenly.
Check Pointing Yes
Process Migration Yes
Job Monitoring and Yes
Rescheduling
Suspension/Resumption Yes
of Jobs
Resource Administration Yes Numerous factors can be taken into account
Job Runtime Limits Yes
Forked Child Management No
Job Scheduling Priority Yes
Process Management Yes Exclusive CPU access available
Job Scheduling Control Yes
GUI/Command-line Both
Ease of Use Unknown
User Allocation of Job Yes Submit to one of many queues
User Job Status Query Yes
Job Statistics Yes
Runtime Configuration Yes
Dynamic Resource Pool Yes Master scheduler check resource table
regularly
Single Point of Failure Yes New release will overcome this with
multiple master schedulers
Fault Tolerance Partially If machine crashes, scheduler will try and
re-run Job.
Security Issues Yes Unix security
4.1.4 LOADLEVELER
Functionality Supported Comments
Commercial/Research Commercial IBM Kingston v 1.2.1
Cost POA
User Support Yes IBM type - depends on level purchased
Heterogeneous Yes
Platforms Most Major SP, RS/6000, Sun SPARC, SGI workstation,
HP 9000
Operating Systems Most Major AIX 4.1.3, SunOS 4.1.2, Solaris 2.3, IRIX
5.2, HP-UX 9.01
Additional No
Hardware/Software
Batch Jobs Yes
Interactive Support Yes Only on Spx, planned for clusters in
future release
Parallel Support Yes Only on Spx, planned for clusters in
future release - PVMe
Queue Types Multiple
Dispatching policy Configurable
Impact on w/s Owner Configurable User determines when their machine is
available
Impact on Cluster w/s Yes CPU/RAM/Swap
Load Balancing Yes based on fair and optimal use of resources
Check Pointing Yes Need to relink with LL libraries
Process Migration Yes Need to relink with LL libraries
Job Monitoring and Yes
Rescheduling
Suspension/Resumption Yes
of Jobs
Resource Administration Yes Administrator can configure this
Job Runtime Limits Yes
Forked Child Management No
Job Scheduling Priority Yes
Process Management Yes Can have exclusive CPU usage
Job Scheduling Control Yes
GUI/Command-line Both NQS compatible
Ease of Use Unknown
User Allocation of Job Yes
User Job Status Query Yes
Job Statistics Yes
Runtime Configuration Yes
Dynamic Resource Pool Yes
Single Point of Failure Limited Master schedulers - but does have
checkpointing
Fault Tolerance Limited Schedulers will try and re-run jobs.
Security Issues Yes Normal Unix Security
4.1.5
LSF
Functionality Supported Comments
Commercial/Research Commercial Platform Computing Corp.
Cost POA
User Support Yes
Heterogeneous Yes
Platforms Most Major Convex, Digital, HP, IBM, SGI and Sun
Operating Systems Most Major ConvexOS, Ultrix, HP--UX, Aix, Irix, SCO,
SunOS and Solaris
Additional No
Hardware/Software
Batch Jobs Yes
Interactive Support Yes
Parallel Support Yes PVM, P4, TCCMSG, DSM and Linda
Queue Types Multiple
Dispatching policy Configurable Satisfy job resource and system
requirements
Impact on w/s Owner Yes LSF will migrate jobs on and off w/s
Impact on Cluster w/s Yes CPU/RAM/Swap
Load Balancing Yes Tries to distributed work load evenly
across cluster
Check Pointing No Planned in future releases
Process Migration Yes
Job Monitoring and Yes Migration of jobs
Rescheduling
Suspension/Resumption Yes
of Jobs
Resource Administration Yes
Job Runtime Limits Yes
Forked Child Management No
Job Scheduling Priority Yes
Process Management Yes Exclusive CPU usage possible
Job Scheduling Control Yes
GUI/Command-line Both
Ease of Use Unknown
User Allocation of Job Yes Limited
User Job Status Query Yes
Job Statistics Yes
Runtime Configuration Yes
Dynamic Resource Pool Yes? Not sure but seems likely.
Single Point of Failure No Master scheduler re-elected by slave
schedulers
Fault Tolerance Yes jobs restarted
Security Issues Yes Normal Unix, trusted hosts and Kerboros
support
4.1.6
NQE
Functionality Supported Comments
Commercial/Research Commercial CraySoft Corp.
Cost Yes $2875 per 10-user network - 27th Sep 95
User Support Yes email, tel, etc
Heterogeneous Yes
Platforms Most Major SPARC, IBM RS/6000, SGI, Digital Alpha, HP
and Cray Research.
Operating Systems Most Major Solaris, SunOS, Aix, Irix, Dec-OSF/1,
HP-UX & Unicos (latest)
Additional No
Hardware/Software
Batch Jobs Yes
Interactive Support Yes
Parallel Support Yes PVM
Queue Types Multiple Batch and pipe queues
Dispatching policy Yes Configurable by administrators
Impact on w/s Owner Configurable w/s owner can configure impact
Impact on Cluster w/s Yes CPU/RAM/Swap
Load Balancing Yes
Check Pointing No Possible Cray if NQS available
Process Migration No Possible Cray if NQS available
Job Monitoring and Yes
Rescheduling
Suspension/Resumption Yes
of Jobs
Resource Administration Yes
Job Runtime Limits Yes
Forked Child Management No Only under UniCOS
Job Scheduling Priority Yes
Process Management Yes Workstation can be configured to allow
exclusive CPU
Job Scheduling Control Yes
GUI/Command-line Both WWW interface
Ease of Use Unknown
User Allocation of Job Yes To one of a number of queues
User Job Status Query Yes
Job Statistics Yes
Runtime Configuration Yes
Dynamic Resource Pool Yes
Single Point of Failure No Client forwards jobs to master scheduler,
multiple schedulers
Fault Tolerance Some Jobs not sent to queues not responding
Security Issues High Has US DoD B1 security rating, plus Cray
Multi-level security
4.1.7
TASK BROKER
Functionality Supported Comments
Commercial/Research Commercial
Cost Yes Doc & Media = $315, 10 w/s License ~ $5k
User Support Yes tel/email - depends on how much you pay
Heterogeneous No Version supplied by HP only works on HP
platforms.
Platforms One HP - 3rd Party versions available.
Operating Systems One HP-UX
Additional No
Hardware/Software
Batch Jobs Yes
Interactive Support Limited X supported for I/O, stdin/stderr need
special w/s configuration
Parallel Support No
Queue Types Many Configurable Queues
Dispatching policy Yes
Impact on w/s Owner Yes Can be achieved, but need to "edit" local
configuration file.
Impact on Cluster w/s Yes CPU/RAM/Disk
Load Balancing Yes Scheme where each w/s "bids" for job
Check Pointing No
Process Migration No
Job Monitoring and Yes & No No automatic rescheduling of jobs
Rescheduling
Suspension/Resumption Yes Supports all BSD signals
of Jobs
Resource Administration Yes Local and global configuration files.
Job Runtime Limits No No built in
Forked Child Management No See above...
Job Scheduling Priority Yes
Process Management Yes w/s can be configured to provide exclusive
or shared access to jobs
Job Scheduling Control Yes
GUI/Command-line Both Motif based GUI
Ease of Use Unknown
User Allocation of Job Yes
User Job Status Query Yes
Job Statistics Yes Admin and user statistics
Runtime Configuration Yes Need to "edit" configuration file on each
w/s - static
Dynamic Resource Pool Yes Need to "edit" local configuration file to
add/withdrawn w/s.
Single Point of Failure No Schedulers distributed - jobs are not
guaranteed to complete.
Fault Tolerance Yes Jobs are not guaranteed to complete.
Security Issues Yes Normal Unix
4.2
Research Packages
Num. Package Vendor Version
1. Batch UCSF, USA 4.0
2. CCS - Computing Centre Paderborn, Germany ?
Software
3. Condor Wisconsin State 5.5.3
University, USA
4. DJM - Distributed Job Manager Minnesota Supercomputing ??
Center
5. DQS 3.x Florida State University, 3.1
USA
6. EASY Argonne National Lab, USA 1.0
7. far University of Liverpool, UK 1.0
8. MDQS ARL, USA ??
9. Generic NQS University of Sheffield, UK 3.4
10. Portable Batch System NASA Amass & LLNL, USA 1.1
11. PRM - Prospero Resource University of Southern 1.0
Manager California
12. QBATCH Vita Services Ltd., USA ??
4.2.1
BATCH
Functionality Supported Comments
Commercial/Research Research Unclear, IPR seems to belong to UCSF
Cost PD GNU Types license
User Support Yes email Author
Heterogeneous Yes
Platforms Many IBM RS/6000, Sparc, MIPS, Alpha
Operating Systems Many AIX 3.x, IRIX 3.x, 4.x, & 5.x, SunOS 4.x
and Ultrix 4.1
Additional No NFS
Hardware/Software
Batch Jobs Yes
Interactive Support Yes
Parallel Support No
Queue Types Yes Multiple Queues
Dispatching policy Configurable
Impa