Erol Akarsu,
Geoffrey Fox, Tomasz Haupt, Alexey Kalinichenko, Kang-Seok Kim, Praveen
Sheethaalnath, and Choon-Han Youn
Northeast Parallel Architectures Center, Syracuse University
Abstract:
In this paper, we discuss use of Gateway for a seamless desktop access to high performance resources. We illustrate our ideas with two Gateway applications that require access to remote resources: the Landscape Management System (LMS) and Quantum Simulations (QS). For LMS we use Gateway to retrieve data from many different sources as well as to allocate remote computational resources needed to solve the problem at hand. Gateway transparently for the user controls the necessary data transfer between hosts. Quantum Simulations requires access to HPCC resources and therefore we layered Gateway on top of Globus metacomputing toolkit. This way Gateway plays the role of a job broker for Globus.
The visual HPDC framework introduced by this project offers an intuitive Web browser based interface and a uniform point of interactive control for a variety of computational modules and applications, running at

Figure 1: Three-Tier Architecture of Gateway.
various labs on different platforms. Our technology goal is to build a high-level user friendly commodity software based visual programming and runtime environment for HPDC[1,2].
Gateway offers a particular programming paradigm implemented over a virtual Web accessible metacomputer. A gateway application is given by a computational graph visually edited by end-users, using Java applets. Modules are written by module developers, people who have only limited knowledge of the system on which the modules will run. They not need concern themselves with issues such as: allocating and running the modules on various machines, creating connections among the modules, sending and receiving data across these connections, or running several modules concurrently on one machine. The Gateway system hides these management and coordination functions from the developers, allowing them to concentrate on the modules being developed.
The implementation of Gateway is described in details in our previous papers [3, 4, 5, 6]. Here, we briefly recall that Gateway is implemented as a modern three-tier system, as shown in Fig. 1. Tier 1 is a high-level front-end for visual programming, steering, run-time data analysis and visualization, and collaboration built on top of the Web and OO commodity standards. Distributed object-based, scalable, and reusable Web server and Object broker Middleware forms Tier 2. Back-end services comprise Tier 3.
In our current prototype, the middle tier is given by a network of Gateway servers (GS), as shown in fig.2.

Figure 2: Architecture of the Gateway’s middle-tier
A secure access to the system is facilitated by a dedicated gatekeeper server, comprises three logical components: a (secure) Web Server, the AKENTI [7] server, and CORBA based Gateway server. The user accesses the Gateway system through a portal web page from the gatekeeper web server. The portal implements the first component of the gateway security: user authentication and generation of the user credentials that eventually will be used to grant access to resources. The authorization process is controlled by the AKENTI server. For each authorized user, the web server creates a session (that is, it instantiates the user context in the Gateway server, as described below) and gives permission to download the front-end applet. The applet is used to create or restore, run, and control user applications.
The main functionality of the Gateway server is managing the users sessions. A session is established automatically after the authorized user is connected to the gatekeeper by creating a user context. The user context is a container object that stores the user applications. The application is another container object that stores components of the user application. The application component is either a single Gateway module or another, finer grain application context. This way, the Gateway server can manage simultaneously many sessions, and within each session, the user can define many applications hierarchically composed of many modules distributed over the network of servers.
The Gateway modules are CORBA objects conforming to the JavaBeans model. The functionality of a module can be implemented directly in the body of the module. However, typically the module serves as a proxy of specific backend services, such as DBMS or HPCC services.
The middle-tier also provides a number of generic services that further simplify development of distributed applications such as access to databases and remote file systems. The file service supports a variety of file operations, including deleting, copying, and moving of remote files owned by the user, as well as files transfer between systems.
The most prominent service implemented by the Gateway system is the job service that provides secure access to high performance compute servers. To implement it we follow the database access model: the Gateway middle-tier service acts as a client which sends request to the back-end DBMS through JDBC API. To access HPCC resources, the middle-tier sends requests to the back-end systems through metacomputing API. We use the Globus [8] metacomputing toolkit as the API, and we plan to follow the emerging standards being developed by DATORR [9] and the NSCA Alliance National Machine Room initiatives.
In order to run Gateway over Globus there must be at least one Gateway node capable of executing Globus commands, such as globusrun. In other words, there must be at least one host on which both Globus and Gateway server are installed. This host serves as a “bridge” between two domains: a network of Gateway servers and a network of resources controlled by Globus: the Grid. The jobs that require computational power of massively parallel computers are directed to the Globus domain, while others can be launched on much more modest platforms, such as the user’s desktop or even a laptop running Windows NT.
Both Globus and Gateway gains from this symbiotic coexistence. From the Gateway perspective, Globus is an optional, high performance (and secure) back-end, while Gateway serves as a high-level web accessible visual interface and a job broker for Globus. Together they cover much wider application domain: Globus adds HPCC world to Gateway, while Gateway adds commodity software, such as databases.
LMS project was sponsored by the U.S. Army Corps of Engineers Waterways Experiment Station (CEWES) Major Shared Resource Center (MSRC) at Vicksburg, MS, under the DoD HPC Modernization Program, Programming Environment and Training (PET). The first, pilot phase of the project can be described as follows. A decision maker (the end user of the system) wants to evaluate changes in vegetation in some geographical region over a long time period caused by some short term disturbances such as a fire or human’s activities. One of the critical parameters of the vegetation model is soil condition at the time of the disturbance. This in turn is dominated by rainfalls that possibly occur at that time. Consequently, the implementation of this project requires:
- Data retrieval from remote sources including DEM (data elevation models) data, land use maps, soil textures, dominating flora species, and their growing characteristics, to name a few. The data are available from many different sources, for example from public services such as USGS web servers, or from proprietary databases. The data come in different formats, and with different spatial resolutions. Without Gateway, the data must be manually prefetched.
- Data preprocessing to prune and convert the raw data to a format expected by the simulation software. This preprocessing is performed interactively using WMS[10] (Watershed Modeling System) package.
- Execution of two simulation programs: EDYS[10] for vegetation simulation including the disturbances and CASC2D[10] for watershed simulations during rainfalls. The latter results in generating maps of the soil condition after the rainfall. The initial conditions for CASC2D are set by EDYS just before the rainfall event, and the output of CASC2D after the event is used to update parameters of EDYS. We used test data sets for time periods with at least 2 rainfalls. Consequently the data transfer between the two codes had to be performed several times during one simulation. EDYS is not CPU demanding, and it is implemented only for Windows95/98/NT systems. On the other hand, CASC2D is very computationally intensive and typically is run on powerful UNIX compute servers.
- Visualization of the results of the simulation. Again, WMS is used for this purpose.
The purpose of this project was to demonstrate feasibility of implementing a system that would allow launching and controlling the complete simulation from a networked laptop. We successfully implemented it using Gateway with WMS and EDYS encapsulated as Gateway modules running locally on the laptop and CASC2D executed by Gateway on remote hosts, either at CEWES in Vicksburg, MS or NPAC in Syracuse, NY. We demonstrated the system using a Pentium II based laptop running Windows NT located in Washington, DC (with CASC2D running at Vicksburg) and in Orlando, FL during Supercomputing ’98 (with CASC2D running alternatively at Syracuse and Vicksburg).
.

Figure 3. Retrieving DEM and Land Used data from USGS.
For this project we developed a custom front-end that allows the user to interactively select the region of interest by drawing a rectangle on a map, select the data type to be retrieved, launch WMS to preprocess the data and make visualizations, and finally launch the simulation with CASC2D running on a host of choice. An example screendump is given in Fig.3
QS project[11] is performed within Alliance Team B and its primary purpose is demonstrate feasibility of layering Gateway on top the Globus metacomputing toolkit. This way Gateway serves as a job broker for Globus, while Globus (or more precisely, GRAM-keeper) takes responsibility of actual resource allocation, which includes authentication and authorization of the Gateway user to use computational resources under Globus control.
This application can be characterized as follows. A chain of high performance applications (both commercial packages such as GAUSSIAN or GAMESS or custom developed) is run repeatedly for different data sets. Each application can be run on several different (multiprocessor) platforms, and consequently, input and output files must be moved between machines. Output files are visually inspected by the researcher; if necessary applications are rerun with modified input parameters. The output file of one application in the chain is the input of the next one, after a suitable format conversion. The logical structure of the application is shown in Fig. 4

Figure 4. Logical structure of the QS application
GAUSSIAN and GAMES are run as Globus jobs on Origin2000 or Convex Exemplar at NCSA, while all file editing and format conversion a performed on the user’s desktop.
Unlike LMS, for QS we are using the Gateway editor as the front-end. The Gateway editor provides an intuitive environment to visually compose (click-drag-and-drop) a chain of data-flow computations from preexisting modules (as shown in Fig. 5). In the edit mode, modules can be added to or removed from the existing network, as well as connections between the modules can be updated. Once created network can be saved (on the server side) to be restored at a later time. The workload can be distributed among several Gateway nodes (Gateway servers) with the interprocessor communications taken care of by the middle-tier services. More, thanks to the interface to the Globus system in the backend, execution of particular modules can be delegated to powerful HPCC systems. In the run mode, the metaaplication represented by the visually constructed graph is passed to the middle-tier by sending a series of requests (module instantiation, intermodule communications) to the middle tier services.

The control of the module execution is exercised not only by sending relevant data through the input ports of the module. Majority of modules we developed so far requires some additional parameters that can be entered via "Module Controls" (similarly to systems such as AVS[12]). The module controls are Java applets displayed in a card panel of the main Gateway applet. The communication channels between the backend implementation of a module and its front-end Module Controls are generated automatically during the instantiation of the module.
Not all applications follow closely the data flow paradigm. Therefore it is necessary to define an interface so that different front-end package can be "plugged in" into the middle-tier, giving the user a chance to use the front-end that best fits the application at hand. Currently we offer visual editors based on GEF[13] and VGJ [14]. In the future, we will add an editor based on the UML[14] standard, and we will provide an API for creating custom ones.
There are several other projects addressed to solving the problem of seamless access to remote resources. A comprehensive list of these is available from the DATORR web site [9]. Here we mention the three that are most closely related to this project.
The UNICORE project [15] introduces an excellent model for the Abstract Task Descriptor that most likely will strongly influence the DATORR standard, and consequently we are taking a very similar approach. The UNICORE middle-tier is given by a network of Java web servers (Jigsaw). The WebSubmit project [16] implements web access to remote high performance resources through CGI scripts. Both projects use https protocol for user authentication (as we do), and implement custom solutions for access control. The ARCADE project [17] is in a very early stage, and its designers intend to use CORBA to implement the middleware. As of now, there is no available description of the ARCADE security model.
We use Gateway to provide a seamless access to remote resources for two applications that require it. For LMS we use Gateway to retrieve data from many different sources as well as to allocate remote computational resources needed to solve the problem at hand. Gateway transparently for the user controls the necessary data transfer between hosts. Quantum Simulations requires access to HPCC resources and therefore we layered Gateway on top of Globus metacomputing toolkit. We admit that Gateway does not comprise a complete solution yet for the seamless access to remote resources. Many issues, notably security, still remain to be solved. We expect to leverage from the recent DATORR (Desktop Access to Remote Resources) initiative of the Java Grande Forum when tackling these issues.
1. G. Fox and W. Furmanski, "HPcc as High Performance Comodity Computing", chapter for the "Building National Grid" book by I. Foster and C. Kesselman, http://www.npac.syr.edu/users/gcf/HPcc/HPcc.html
2. G. Fox, W. Furmanski and T. Haupt, SC97 handout: High Performance Commodity Computing (HPcc), http://www.npac.syr.edu/users/haupt/SC97/HPccdemos.html
3. D. Bhatia, V. Burzevski, M. Camuseva, G. C. Fox, W. Furmanski and G. Premchandran, "WebFlow - a visual programming paradigm for Web/Java based coarse grain distributed computing", Concurrency: Practice and Experience, Vol. 9 (6), pp. 555-577, June 1997. See also WebFlow Home Page at NPAC: http://osprey7.npac.syr.edu:1998/iwt98/products/webflow.
4. Erol Akarsu, Tomasz Haupt, Geoffrey Fox and Wojtek Furmanski, “WebFlow - High-Level Programming Environment and Visual Authoring Toolkit for High Performance Distributed Computing”, Supercomputing'98, November 1998.
5. Tomasz Haupt, Erol Akarsu and Geoffrey Fox, Web Based Metacomputing, Special Issue on MetaComputing for the FGCS International Journal on Future Generation Computing Systems.
6. T. Haupt, E. Akarsu, G. Fox, A. Kalinichenko, K-K .Kim, P. Sheethalnath, C-H. Youn, “The Gateway System: Uniform Web Based Access to Remote Resources”, High Performance Computing and Networking’ 99, Amsterdam, April 1999.
7. S. S. Mudumbai, W. Johnston, M. R. Thompson, A. Essiari, G. Hoo, K. Jackson, Akenti – A Distributed Access Control System, home page: http://www-itg.lbl.gov/Akenti
8. I. Foster and C. Kesselman, "Globus: A Metacomputing Infrastructure Toolkit", International Journal of Super computing Applications, 1997. See also Globus Home Page: http://www.globus.org
9. DATORR ( Desktop Access to Remote Resources), http://www.javagrande.org
10. WMS, EDYS and CASC2D codes has been made available to us by CEWES. EDYS is written by Michael Childress and CASC2D is written by Fred Ogden, http://www.wes.hpc.mil
11. Quantum Simulations , http://www.ncsa.uiuc.edu/Apps/CMP/cmp-homepage.html
12. Advanced Visualization System, http://www.avs.com/
13. GEF: The Graph Editing Framework home page http://www.ics.uci.edu/pub/arch/gef/
14. UML Home Page http://www.rational.com/uml
15. UNICORE: Uniform Access to Computing Resources, home page: http://www.fz-juelich.de/unicore
16. WebSubmit: A Web-based Interface to High-Performance Computing Resources, home page: http://www.itl.nist.gov/div895/sasg/websubmit/websubmit.html
17. ARCADE, home page: http://www.icase.edu:8080