Landscape Management System
A WebFlow Application
Tomasz Haupt[1],
Erol Akarsu, Geoffrey C. Fox
NPAC at Syracuse University
Abstract
This paper describes a pilot Web-based implementation of
the Landscape Management System (LMS). The Web-based implementation extends the
Watershed Modeling System by adding the capability to download the input data
directly from the Internet, and execute the simulation codes on a remote
high-performance host. This makes it possible to run LMS anywhere from a
networked laptop. Furthermore, our system allows for constructing complex
simulations by coupling several independently developed codes into a single,
distributed application. The Web-based LMS is implemented as a WebFlow
application. WebFlow is a modern three-tier commodity standards-based High
Performance Distributed Computing (HPDC) system that integrates a high-level
graphical user interface (Tier 1), distributed
scalable object broker middleware (Tier 2), and a high-performance back end
(Tier 3).
Programming tools that are simultaneously sustainable,
highly functional, robust and easy to use have been hard to come by in the HPCC
arena. This is partially due to the difficulty in developing sophisticated
customized systems for what is a relatively small part of the worldwide
computing enterprise. Thus we have developed a new strategy - termed HPcc High
Performance Commodity Computing[1] - that builds HPCC programming tools on top
of the remarkable new software infrastructure being built for the commercial
web and distributed object areas. This
leverage of a huge industry investment naturally delivers tools with the
desired properties with the one (albeit critical) exception that high
performance is not guaranteed. Our approach automatically gives the user access
to the full range of commercial capabilities (e.g., databases and compute
servers), pervasive access from all platforms and natural incremental
enhancement as the industry software juggernaut continues to deliver software
systems of rapidly increasing power.
Our research addresses the need for high-level programming
environments and tools to support distance computing on heterogeneous
distributed commodity platforms and high-speed networks spanning across labs
and facilities. More specifically, we are developing a portable system based on industry standards and
commodity software components that provide a seamless access to remote
resources through Web-based user interfaces or customized application
GUIs.
We have successfully employed preliminary versions of WebFlow for two classes of applications: Quantum Monte Carlo simulations[2] (QS) developed within NCSA Alliance Team B (described is Section 3.1), and the Landscape Management System (LMS) - the primary subject of this report. In addition, WebFlow is used for implementation of the Gateway system at ASC MSRC, as described in Section 3.2.
As described above,
one of the most important goals of the WebFlow system is to provide a seamless
access to remote resources. That is, our goal is to create an illusion that all
resources needed to complete the user tasks are available locally (an analogy:
an NFS mounted disk or a network printer). In particular, an authorized user
can allocate the needed resources without explicit login to the host
controlling the resources. And by “resources” we mean all hardware and software
components needed to complete the task - including but not limited to - compute
engines from workstations to supercomputers, storage, databases, instruments,
codes, libraries, and licenses. A critical issue in providing access to remote
resources is security of access (user authentication and authorization), which
is a primary concern of the Gateway system implementation and it is discussed
in Section 3.2.
Another of our goals
is to provide user-friendly, high-level graphical user interfaces (WebFlow
front end) for the user applications. We simultaneously follow several
approaches, as the requirements for different applications vary greatly. For
Quantum Simulations we provide a Web-accessible front end (a Java applet) that
allows the user to visually compose an application from pre-existing modules
following a dataflow paradigm. For the Gateway projects we will use a Web-based
Problem Solving Environment[3] (an expert system) that allows the user to
interactively define a problem, identify resources (both software and
hardware), submit the task for execution and, finally, analyze the results.
For LMS we face
different requirements. LMS applications are composed of independent modules
selected by the application developers, and the end user only selects the
application that best suites the problem at hand. We refer to this as the
“navigate and choose” paradigm. Another unique feature of the LMS front end is
that it automates the retrieval of necessary input data from Internet
repositories such as the USGS Web site.
To satisfy the
requirements of supporting multiple front ends and a variety of back end
platforms, we implement WebFlow as a three-tier system, as shown in Figure 1.
The heart of the system is the middle tier that provides a mapping
between the application-independent Abstract Task Specification (ATS) and the
platform-independent Resource Specification (RS). In addition, the middle tier
offers a number of services that simplify the development of applications, and
it provides support for interactions between usually distributed components of
the application.
WebFlow applications are composed of independently developed modules. Module developers do not need to concern themselves with issues such as allocating and running modules on various machines, creating connections among modules, sending and receiving data across these connections, or running several modules concurrently on one machine. The WebFlow’s middle tier hides these management and coordination functions from the developers who have only a limited knowledge of the system on which the modules will run.
Following our HPcc
model, we build WebFlow in strict conformance to industry standards. We avoid
custom protocols and follow the general trends, and whenever feasible we use
commodity software components. We believe that such an approach will simplify
maintenance of our system and will let us take advantage of the new
developments introduced by industry. However, our solution is not unique. Web
technologies are being developed at unprecedented fast pace, and competition between
vendors often leads to multiple standards. We had to make choices between the
competing standards (for example, we chose CORBA and not Microsoft’s DCOM or
Sun’s Java RMI to implement the distributed objects), and it remains to be seen
whether we made the right choice. Nevertheless, some of our choices, such as
https (http over SSL) and XML, seem to be non-controversial. When there are no standards available (for
example, AST or RS), we participate in creation of such standards (DATORR[4],
NCSA Alliance).
Figure 2: Example of WebFlow object hierarchy. See text for explanations.
WebFlow consists of
objects that interact with each other through event notifications. The basic
object is a WebFlow server, which is a CORBA container object (also referred to
as a context). The server contains other containers and modules. The hierarchy
of the containers and modules is composed to fit the application at hand. An
example is shown in Figure 2. In this particular configuration we have three
WebFlow servers, each running as a separate process (typically on three
different hosts), one of which serves as a master, the two others as its
slaves. The master server publishes its reference (IOR – Interoperable Object
Reference) so that clients can connect to it and request its services. In
addition, the master keeps references to its slaves (slave server proxies),
which are returned to the client upon request. Within the slave WebFlow context
one or more users creates his or her own contexts, which in turn contain the
user applications. The application consists of modules. In this way a number of
users can create several applications, each made of independent (possibly
hierarchically composed) modules and WebFlow middle-tier services (cf. Figure
3). Modules and services are executable written in Java. There is no difference
between the module and the service, other than that the module is provided by
the user (or the application developer) and placed within the user’s
application context, while the service is provided by the WebFlow system itself
and implements a standard task such as job submission or access to mass storage
or file manipulation. Applications differ in how the modules interact with each
other. In Figure 3 we illustrate some

Figure 3: WebFlow server
of the possibilities.
In application 1 of user 1, modules interact in the most general way: each
module can invoke any public method of the other modules. In application 1 of
user 2 a data-flow model is used: the modules exchange data through their input
and output ports. Application 2 of user 1 consists of two modules that do not
interact. They can run concurrently, and they can exchange messages
“privately”, that is, without WebFlow help say, using MPI. Actually, a single
module may represent a data parallel application implemented in HPF, or C++
with MPI. In LMS we use yet another way
of communicating between the modules, as described in Section 4 below.
The modules (and
services) are technically CORBA objects implemented in Java. However, that does
not mean that the actual functionality of the module must be implemented in
Java. Legacy applications can be easily encapsulated as CORBA objects and thus
used as WebFlow modules (as we did in the case of LMS, see Section 4). But typically a module serves merely as
proxy for services rendered by the back end. As an example, the middle tier
provides the service of submitting a job using Globus[5]. To submit a job the
service acts as a client to the GRAM (Globus Resource Allocation Manager)
server. More specifically, it sends a request expressed in the Globus RSL
(Resource Specification Language) that defines the target machine, location of
executable and input files, as well as instructions what for dealing with the
standard output and standard error streams. Optionally, through GASS (Global
Access to Secondary Storage) both executable and input data sets can be staged
prior to the execution of the job, and output files can be uploaded to a
specified location after the job is completed. The Java code of the WebFlow
proxy module generates the RSL command, and the WebFlow module developer never
needs to see the actual application, let alone make an attempt to rewrite the
application in Java.
WebFlow servers are
started by the WebFlow administrator using the command
java
WebFlow.Server file.conf, where file.conf is the server configuration file.
Objects within WebFlow server contexts (such as Application context or module)
are created using methods of the WebFlow server addContext(contextName) and addModule(moduleName).
Interactions between modules can be defined using WebFlow server method attachEvent(ORES,EventName,ORT,MethodName),
where ORES is an the object reference of the event source, Event Name is the
name of the event fired by the module
that serves as the event source, ORT is the object reference of the event
target, and MethodName is the name of the method of the target module that is
to be invoked as the response to the event. This information is stored
dynamically by the event adapter, which allows us not to have to implement the
event target as the Java or JavaBeans event listener, which would violate our
design goal for all modules to be developed independently of each other and be
reused by different applications. Instead, we use dynamical invocation of
methods of remote objects through both DII (Dynamic Interface Invocation) and
DSI (Dynamic Stub Invocation). Events (which, as with everything else in
WebFlow are CORBA objects) encapsulate data to be sent from one module to the
other.
The middle tier is given by a mesh of WebFlow servers. If the client is implemented as a Java applet, one of the WebFlow servers, namely, the master server, plays the role of gatekeeper. It is accompanied by a (secure, i.e., SSL based) Web server, as shown in Figure 4.
Figure 4: Architecture of the WebFlow
middle tier
The user downloads the
applet from the Web server (and since https protocol is used she is
authenticated in this process and, optionally through the Akenti server
receives authorization to use the WebFlow resources – see Section 3.2 for more
details on WebFlow security). The applet reads the master server IOR, which is
posted as an html document, and establishes communication with the WebFlow.
Because of the Java sandbox security mechanism, the applet can communicate
exclusively with the master WebFlow server, which runs on the same host as the
Web server. This is the reason we
introduced the proxy servers maintained by the master. The client (i.e., the
applet) communicates with the remote slave WebFlow server through these proxies
and constructs the WebFlow context hierarchy as needed using WebFlow server
methods. In particular, it builds an application within a selected context.
Simple applications can be run on a single WebFlow server. We have introduced a distributed middle tier, a mesh of WebFlow servers, because, in general, different hosts may have access to different resources (hardware, operating system, software, access control, etc.). As shown in Figure 5, each WebFlow server manages a particular set of user modules and services. The communication between modules on different hosts is managed transparently by the middle-tier for the user.
Figure 5: Mesh of WebFlow Servers implemented as CORBA objects that
manage and coordinates distributed
computation.
F
Gatekeeper Authentication Authorization

The QS project is a part of Alliance Team B, and its primary purpose is to demonstrate the feasibility of layering WebFlow on top of the Globus metacomputing toolkit. WebFlow thus serves as a job broker for Globus, while Globus (or more precisely, GRAM – Globus Resource Allocation Manager) takes responsibility of actual resource allocation, which includes authentication and authorization of the WebFlow user to use computational resources under Globus control (Figure 6).
Figure6: WebFlow over Globus The system is given by two networks: one controlled by
the WebFlow (workstations) and the other controlled by Globus (HPCC
resources). There must be at least one node with both WebFlow server and
GRAM client installed.

This application can be characterized as follows. A chain of high-performance applications (both commercial packages such as GAUSSIAN or GAMESS or custom developed) is run repeatedly for different data sets. Each application can be run on several different (multiprocessor) platforms and, consequently, input and output files must be moved between machines. Output files are visually inspected by the researcher; if necessary, applications are rerun with modified input data sets. The output file of one application in the chain is the input of the next one, after a suitable format conversion.
GAUSSIAN and GAMES are run as Globus jobs on Origin2000 or Convex Exemplar at NCSA, while all file editing and format conversion is performed on the user’s desktop.
For QS we are using the WebFlow editor applet as the front end (Figure 7). The WebFlow editor provides an intuitive environment to visually compose (click-drag-and-drop) a chain of data-flow computations from preexisting modules. In the edit mode, modules can be added to or removed from the existing network, and connections between the modules can be updated. Once created, the network can be saved (on the server side) to be restored at a later time. The workload can be distributed among several WebFlow nodes (WebFlow servers) with interprocessor communications being taken care of by the middle-tier services. Thanks to the interface to the Globus system in the back end, execution of particular modules can be delegated to powerful HPCC systems.
The visual
representation of the application is translated into the Abstract Task
Specification and sent to the middle tier as an XML document. The AST itself is
expressed in terms of DTD (Document Type Definition), and it is listed in
Figure 8. XML documents conforming to this specification provide all necessary
information to create the WebFlow context hierarchy (see Figure 9 for a simple
example); the <taskspec> tag provide references for the user context, and
the <task> tag describes the application context. The computational graph
visually created by the user is represented by the sequence of the
<module> and <connection> tags.
Figure7: WebFlow data-flow editor applet used as the
front end for Quantum Monte Carlo simulations. It allows the user to perform
interactive, visual composing of the application from pre-existing modules.
The XML document is
parsed by a middle-tier service and is used to generate a Java code on-the-fly
that serves as internal client for the WebFlow servers (Figure 10). This means
that once the application is committed by pressing the run button nothing can
be changed until the application is completed or aborted. However, since each
module comes with its own front-end control panel, the parameters that the
module developer made visible to the user can be modified, even at runtime.
<!ELEMENT taskspec
(task)+>
<!ATTLIST taskspec
UserContextRef CDATA #REQUIRED
UserName CDATA #REQUIRED
>
<!ELEMENT task ((task | module)*,connection*) >
<!ATTLIST task
AppName CDATA #REQUIRED
>
<!ELEMENT module (#PCDATA) >
<!ATTLIST module
modulename CDATA #REQUIRED
host CDATA #REQUIRED
>
<!ELEMENT connection (out,in)>
<!ELEMENT in EMPTY>
<!ELEMENT out EMPTY>
<!ATTLIST out
modulename CDATA #REQUIRED
eventname CDATA #REQUIRED
>
<!ATTLIST in
modulename CDATA #REQUIRED
method CDATA #REQUIRED
>
Figure8 XML Document
Type Definition used as the Abstract Task Specification.
<taskspec
UserContextRef="IOR:000000000000001649444c3a576562466c6f772f50726f78793a312e30000000000000010000000000000030000100000000000f3132382e3233302e32312e323132000004740000000000100000000036f66c3800088f6800000002"
UserName="haupt">
<task
AppName=”SimpleTest”>
<module
modulename="FileBrowser.1" host="localhost">
</module>
<module
modulename="SaveAs.2" host="localhost">
</module>
<connection>
<out
modulename="FileBrowser.1"
eventname="Event101"/>
<in
modulename="SaveAs.2" method="Method1"/>
</connection>
</task>
</taskspec>
Figure9 A simple XML
document conforming to the DTD specification of Figure 8.
Figure10: A visual
representation of the application is translated into an XML document and sent
to the middle tier. An XML parser, implemented as a service, processes the
document and generates the Java code that serves as a client for the WebFlow
server to create the context hierarchy, instantiate modules, and bind events.
This project performed in collaboration with ASC MSRC and the Ohio Supercomputing Center is in the very early stage (it started Jan 1, 1999). Our goal is to combine the CCM Problem Solving Environment under development at OSC with secure, seamless access to resources provided by the WebFlow system. Security is the most important new aspect of WebFlow to be addressed in this project. It is required that Gateway will operate in an environment protected by Keberos5 in conjunction with SecurID technology.
Our suggested strategy is to run the client (an applet) in a keberized environment that will allow the user to generate the Keberos ticket locally without sending cleartext passwords through the networks. Then the user connects the Web Server to download the front-end applet. The Web server creates the user context in a new process on a selected host that runs with the user’s credentials. The applet will communicate with the WebFlow servers using CORBA security services implemented on top of the Keberos5. Access to the back-end services is provided by Globus GRAM-keeper linked against the Keberos5 libraries rather than the default SSL used by Globus.
This project is sponsored by CEWES MSRC at Vicksburg, under the DoD HPC Modernization Program, Programming Environment and Training (PET). The pilot phase of the project can be described as follows. A decision-maker (the end user of the system) wants to evaluate changes in vegetation in some geographical region over a long time period caused by some short-term disturbances such as a fire or human activity. One of the critical parameters of the vegetation model is the soil condition at the time of the disturbance. This, in turn, is dominated by rainfalls that possibly occur at that time. Consequently, the implementation of this project requires (cf. Figure11):
- Data retrieval from many different remote sources (web sites, databases)
- Data preprocessing to prune and convert the raw data to a format expected by the simulation software.
- Execution of two simulation programs: EDYS for vegetation simulation including the disturbances and CASC2D for watershed simulations during rainfalls. The latter results in generating maps of the soil condition after the rainfall. The initial conditions for CASC2D are set by EDYS just before the rainfall, and the output of CASC2D after the event is used to update parameters of EDYS.
- Visualizations of the results.

Figure 11: Logical structure of the LMS simulations implemented by this project
The purpose of this project was to demonstrate the feasibility of implementing a system that would allow launching and controlling the complete simulation from a networked laptop. We successfully implemented it using WebFlow with WMS and EDYS encapsulated as WebFlow modules running locally on the laptop and CASC2D executed by WebFlow on remote hosts.
Casc2d simulates watersheds. It runs in a loop over rainfall events, and in each iteration of the loop the program simulates water flow in the area of interest. Once the simulation of the rainfall event is completed (according to some predefined criteria) the simulation switches to a “dry” mode in which the program simulates the condition of soil in the absence of precipitation. A new rainfall starts a new iteration.
Edys simulates the evolution of vegetation taking into account the soil condition at the beginning of simulation. During the simulation, it uses averaged precipitation data rather than data describing actual rainfall, event after event. Edys runs for a specified time period, and before it exits it saves its state on a disk. This means that the simulation can be resumed later.
The accuracy of the Edys simulation can be improved by coupling it with casc2d, that is, by feeding Edys with accurate data on soil condition after each rainfall. We implemented the coupling in the following way (cf. Figure 12). We start the simulation with casc2d (on a host running Unix). It reads its input files and determines the time of the first rainfall. It writes the data to the disk and starts the loop over events. However, in each iteration, before proceeding with the simulations, casc2d waits until new data on the soil condition generated by Edys are available. Technically, every ten seconds it checks the modification time of its input files*. In the meantime, the data written by casc2d are sent to the host running Edys (a laptop running Windows NT). The received data include the date of the next rainfall. The Edys simulation is launched and continued until that date. The simulation program exits, and its output files are sent to the host of casc2d. Casc2d detects the arrival of the new data and resumes the simulations. As soon as the current is completed, casc2d saves the results to a file and begins a new iteration. The results are sent to the laptop, Edys is run until the next event, and its output is sent to the Unix host to let casc2d continue. This pattern is repeated until all rainfall events are processed. Then casc2d exits, and the final run of Edys is performed. The run terminates at a predefined date, typically 20 years after the first rainfall.

Figure12: Exchange of data between casc2d (left-hand side) and Edys (right-hand side). It is important to note that casc2d is run only once. It pauses while waiting for the new data and quits only after all events are processed. In contrast, Edys is launched each time the data are needed.
This application requires two computational modules: one encapsulating EDYS to be run on WindowsNT box and the other encapsulating CASC2D to be run on a Unix workstation. Consequently, we need two application contexts, one on each machine. As usual, we also need the master server, which we place on the WindowsNT box, as shown in Figure 13, because we run the client application there. In addition, we run Web servers on both machines. They are needed primarily to exchange the data between the modules. We also publish the master IOR on the Web server on the WindowsNT side.
The servers are started manually using the java WebFlow.Server file.conf command, with a different configuration file for each servers. For the master server, we specify the full path of the file where the IOR
Figure 13: WebFlow implementation of LMS
is to be stored. For the slave server, we specify the URL at which the IOR is available, and define the modules under the server control (see Figure 14). At this time the configuration files must be generated manually. We believe that in the future releases of the WebFlow system we will be able to automatically generate the configuration files from the IDL specifications.
The servers are accessed by the client code that is a part of the LMS front end. Figure 15 shows the relevant part of the code. First, the client initializes the ORB object (line 1), and then reads an IOR of the master server from the URL (line 3). For the IOR it creates a CORBA object obj (line 4) and casts it to the correct type: WebFlowContex (line 5). Now it can call methods of this object. In lines 9 and 10 it uses the method getWFServer(serverName) to retrieve references to both slave servers. It adds module “runEdys” to the ntserver context (line 11) and module “runCasc2d” to osprey4 contexts (line 12). In line 13 it casts obj p2 to type runCasc2d, as it needs to invoke one of its methods (in line 16). Then it connects the modules, event “EdysDone”, fired by module runEdys, will invoke method “runAgain()” of module runCasc2d (line 14), and event “casc2dDone”, fired by runCasc2d, will invoke method “receiveData()” of runEdys. Now everything is ready to start the execution. It is triggered by invoking method run() of module runCasc2d.

A)
Server name = master
File=D:\Jigsaw\Jigsaw\WWW\Gateway\IOR\master.ref
URL=none
Modules:==================================
B)
Server name = ntserver
File=none
URL=
http://maine.npac.syr.edu:8001/Gateway/IOR/master.txt
Modules:==================================
runEdys lms.idl
WebFlow.lms.runEdysImpl
C)
Server name = osprey4
File=none
URL=
http://maine.npac.syr.edu:8001/Gateway/IOR/master.txt
Modules:==================================
runCasc2d lms.idl WebFlow.lms.runCasc2dImpl
Fig .14: Configuration files for A)
master, B) WindowsNT slave C) Unix slave WebFlow servers
Each module is described by its name,
IDL file, and Java class name that
implements it.
1.
ORB orb = ORB.init(args, new java.util.Properties());
2. String masterURL = args[0];
3. String ref=getIORFromURL(masterURL);
4. org.omg.CORBA.Object
obj=orb.string_to_object(ref);
5. WebFlowContext
master=WebFlowContextHelper.narrow(obj);
6. WebFlowContext slave;
7.
try {
8.
org.omg.CORBA.Object p1,p2;
9.
slave1=WebFlowContextHelper.narrow(master.getWFServer(“ntserver”));
10.
slave2=WebFlowContextHelper.narrow(master.getWFServer(“osprey4”));
11. p1 =
slave.addNewModule(“runEdys”);
12. p2 = slave.addNewModule(“runCasc2d”);
13. runCasc2 rc=runCasc2dHelper(p2);
14.
master.attachEvent(p1,”EdysDone”,p2,”runAgain”);
15.
master.attachEvent(p2,”Casc2dDone”,p1,”receiveData”);
16. rc.run();
17. } catch(Exception e) {};
Figure15: Fragment of the client code
Adding new modules to
the client is simply a matter of adding a few lines of code (addNewModule and
narrow). If the methods of these modules are to be invoked directly from the
client, add attaching events, if any.
In the next releases
we plan to reuse the XML parser that we developed for the Quantum Simulations
project, after which there will be no need to provide a client code in Java at
all. Instead, the application will be defined by a static XML document,
available from a Web server, and instantiated dynamically at runtime. More, the
modification of the application will be possible just by editing the XML file
without introducing any changes to the code.
Figs. 16 and 17 show
the pseudocode of both modules, runCasc2d and runEdys, respectively. The
simulation starts by invoking the method run() by the client.
Figure 16: Pseudocode of the runCasc2d module
Module runCasc2d is started from the front end by invoking its run() method, which creates a new Java thread that runs casc2d code in a separate process. It then invokes the waitForData() method. This method waits until casc2d generates the first data set for Edys, copies the files to a location seen by the Web server, and fires event “Casc2dDone” that invokes the run() method of run Edys.
When Edys fires the “EdysDone” event, the runAgain() method of runCasc2d is invoked. This method receives data from Edys (using the Java URLconnection class to access files from the Web server on the Edys host), and executes the UNIX touch command on a selected control file. By “touching” this file we change the ‘last modified’ property of that file, and this triggers casc2d to resume its operations. The controls then go to the waitForData() method, described above.
Figure 17: Pseudocode of the runEdys module
The RunEdys module is triggered by the Casc2dDone event that invokes the receiveData() method. First, a file options.txt is generated. This files defines input parameters: start date of the simulation (StartDay), number of days to be run (DayDiff), parameter 3, which is a toggle to switch the Edys visualizations on and off, and parameter 4 which defines disturbances, if any. Then the data from the Web server of the casc2d host are downloaded. Finally, the Edys code is launched. After it is completed, its output is copied to a location seen by the Web server, and the “EdysDone” event is fired.
The sendData() method (which is identical in both modules) actually does not send any data. Instead, it copies data from the Edys (or Casc2d) working directory to a document directory of the Web server (Figure18). This step could be avoided letting the codes write and read their input and output files directly from the WebServer. This would require slight modifications of these codes, and we had no access to their sources.
Figure 18: Data exchange between runEdys (left-hand side) and runCasc2d (right-hand side) modules. Events are exchanged through IIOP, data sets are downloaded from Web servers.
This pilot implementation of the Web-based LMS does not require any powerful computational resources, and consequently we provided only a limited support for back-end services. In particular, we simply used Java Runtime class to run the WebFlow modules on the same host on which the WebFlow server runs. As discussed is Section 3 above, we are prepared to provide a secure access to remote, high-performance resources when it is needed.
For this project we developed a custom front end implemented as a Java application (as opposed to Web-accessible Java applets used in projects described in Section 3). There are several reasons for that. One is that we were explicitly asked to do so. Second, we did it for performance reasons. Finally, the front end is an extension to the WMS system that needs to be installed on the client side, anyway. Therefore, it does not really matter if the extensions to WMS are to be downloaded as an applet each time the LMS is run, or downloaded once and stored permanently on the client machine.
The WMS program (Watershed Modeling System) is a rich collection of tools for data pre- and post- processing. Furthermore, it allows us to run the simulation locally and visualize the results. WMS is available on many platforms, including Windows 95/98/NT and numerous flavors of Unix.
We made WMS the centerpiece of our front end. We enhanced it by providing the capability to import raw data sets directly from the Internet, submitting the simulations to remote hosts and, as described above, making different simulations interact with each other. Consequently, the LMS front end consists of three parts: data wizard, WMS, and job submission. Each part is accessible by pressing the corresponding button on the LMS main panel (Figure19).

Figure 19: LMS front end main panel
The data wizard panel shown in Figure 20 allows us to select the data type to be retrieved (currently we support DEM and Land Use maps) and to define the region of interest. This can be done either by directly typing coordinates of the bounding box into the provided text fields, or by drawing boundaries of the region on a map. In the latter case, the position of the rectangle is automatically translated into coordinates.

Figure 20: LMS front end data wizard panel
Next, the coordinates are translated into names of the corresponding set of maps available from the USGS web site, and the selected maps are downloaded, uncompressed, and saved in a directory accessible by the WMS package.
Figure 20: A screendump of a WMS session: in the central
window just downloaded DEM data are displayed. The raw date must be
pre-processed now, including selecting a watershed region, smoothing, and
format conversion.

The WMS button on the main panel starts the WMS program on the local host by the using Java Runtime class in a separate thread. The WMS controls are available during the entire LMS session.
The functionality of this part can be deduced from Section 4.3 above. Figure 21 shows the front-end panel that is used to start the simulations.
Figure 21: LMS front-end
simulation panel. The controls allow for selecting the simulation mode:
Casc2d alone, Edys alone, and both simulations coupled, as described in
Section 4.3. In the latter case,
the user also provides the end date of the Edys run, Edys visualization
toggle and disturbances, the directory that contains casc2d input data
files and finally the user select the host on which the casc2d simulation
is to be run.

This part of the front end acts as a client to the middle-tier services, and it communicates with the WebFlow servers using CORBA IIOP.
This report describes a pilot implementation of the web-based LMS. We successfully demonstrated the feasibility of our approach. Nevertheless, the system is not mature enough to be used in a production mode, and it requires many improvements. The most important areas of the future development include:
· Secure access to resources
· Access to high-performance resources
· Resource detection
· Fault tolerance
· Simplified installation
· Extensibility; adding new application specific modules and reusable middle-tier services
· More general ways of implementing of different kinds of interactions between modules
· Customization of the front-end
· Increased functionality of the front
The obvious next steps are to take advantage of our experience by using WebFlow in other projects: Quantum Simulations and Gateway. In particular, the Gateway project is focused on providing a secure access to resources in the environment protected by Keberos5 in conjunction with SecurID technology and fault tolerance. It seems to us that solutions developed for the Gateway project will be directly applicable to LMS. Within the Quantum Simulation we have already developed methods of providing access to high-performance resources that can be used for this project.
Resource detection is an important feature that is missing in the current implementation. There are several commercial (such as Jini[8] by Sun Microsystems, Inc.) or academic (such as Ninja[9] developed at UC Berkeley) technologies available than can possibly be adopted for LMS.
Ease of installation and extensibility are other key features for turning this academic prototype into a product that can be disseminated to end users. Currently, the system is built from many freeware components coming from various vendors. This goes along with our HPcc philosophy, and we will adhere to this approach as we see it to be beneficial in the long term in terms of maintenance and future upgrades. The only area of improvement here is the simplification of the configuration procedure of these components to make them interoperate with each other. In particular, some parameters of the in-house developed components (WebFlow servers) are hardwired in the code, while they should be part of the configuration files.
Extensibility is far less trivial. We expect to add many new modules to LMS in order to support numerous applications and simulation codes to truly support the idea of “navigate and choose” resources (hardware and software) to solve the problem at hand. Currently, the middle-tier services and the front-end client are tightly coupled in the sense that adding a new middle-tier module (or just modifying its interface) requires modifications in the front end. This means that the module developer has to be capable of editing and compiling front-end Java classes. Admittedly, this is very bad. Fortunately, we can reuse the ideas we implemented in the Quantum Simulation project. As discussed in Section 3.1, the QS front end allows the composing an application from modules on the fly. The basic idea is to replace the Java client code by an XML document. The advantages of that are:
· Composing an application is reduced to generating a single XML document.
· Adding a new module does not require any modifications of the front-end code.
· The XML document that defines the application can serve a dual purpose: in the front end it can be converted to HTML – the application documentation, and in the middle tier it can be used to generate the WebFlow client code on the fly.
Further, it makes sense to convert the data wizard to a WebFlow module, too. This way we will decouple the front-end graphical interface from the functionality (such as downloading the data from the Internet) that logically belongs to the middle tier. This will allow us to implement more powerful methods to search and download data including on-the-fly format conversions, from Web sites and other data repositories, including data bases and mass storage devices. Also, when encapsulated as a WebFlow module, the data wizard will be capable of sending high-volume data directly from their source to the high-performance computer that will run the simulation.
Finally, we need to generalize the way in which the modules interact with each other. Our preference is to adhere to the object-oriented approach (even when the application are written in a procedural language such as Fortran). As an example, we suggest a different encapsulation of casc2d code as a CORBA object. Instead of a loop over rainfall events, it should implement a method to process a single event. Then we could add another object – a time manager – that would dispatch the work between Edys and casc2d as needed (driven by the rainfall events). Moreover, yet another object could resolve differences in temporal and spatial resolutions of maps used by these codes.
To summarize, we used WebFlow - a scalable, high-level, commodity standards-based HPDC system that integrates:
·
High-level front
ends for visual programming, steering, and data visualization built on top of
the Web and OO commodity standards (Tier 1).
·
Distributed
object-based, scalable, and reusable Web server and Object broker Middleware (Tier
2)
·
High-performance
back end implemented using the metacomputing toolkit of GLOBUS [2] (Tier 3)
We use this system to implement the Landscape Management System extending its original WMS interface. Our extensions allow downloading data directly from the Internet and launching coupled simulations on remote hosts. The pilot implementation successfully proved the concept of the Web-based interface. We also outlined the next steps toward turning this academic prototype into a powerful “navigate and choose” tool that can boost productivity of the end users, and what is most important, the tool that provides access to necessary software and hardware anytime, anywhere, given the connection to the Internet.
Acknowledgments
This work has been funded by the DoD High Performance Computing Modernization Program CEWES Major Shared Resource Center through Programming Environment and Training (PET). Supported by Contract Number: DAHC 94-96-C-0002 with Nichols Research Corp.
The work presented in this report has been done in
collaboration with Jeffrey Holland, Billy Johnson, and Clay LaHatte (CEWES, Vicksburg, MS), Fred L. Ogden (University of Connecticut), and W.
Michael Childress (Shepherd Miller,
Inc.).
1. G. Fox and W. Furmanski, "HPcc as High Performance Comodity Computing", chapter for the "Building
National Grid" book by I. Foster and C. Kesselman, http://www.npac.syr.edu/uses/gcf/HPcc/HPcc.html
2. Contact person: Lubos Mitas, NSCA, http://www.ncsa.uiuc.edu/Apps/CMP/cmp-homepage.html
3. Contact person: Ken Flurchick, Ohio Supercomputing Center, http://www.osc.edu/~kenf/Gateway
4. DATORR: Desktop Access to Remote Resources, home page http://www-fp.mcs.anl.gov/~gregor/datorr/
5. Globus Metacomputing Toolkit, home page http://www.globus.org
6. casc2d has been written by Fred Ogden, University of Connecticut, ogden@eng2.uconn.edu
7. Edys has been written by Michael Childress, Shepherd Miller ,Inc., mchildress@shepmill.com
8. Sun Microsystems, Inc., JINI Connection Technology, home page http://www.sun.com/jini/index.html
9. UC Bekeley, NinjaProject, home page http://ninja.cs.berkeley.edu/
List of acronyms
ATS – Abstract Task Specification
ASC – Aeronautical System Center
CEWES – US Army Corps of Engineers Waterways Experimental Station
CORBA – Common Object Request Broker Architecture (Object Management Group)
DCOM – Distributed Component Object Model (Microsoft, Inc.)
DII – Dynamic Interface Invocation (a CORBA term)
DSI – Dynamic Stub Invocation (a CORBA term)
DTD – Document Type Definition (an XML term)
GASS – Global Access to Secondary Services (a part of the Globus toolkit)
GRAM – Globus Resource Allocation Manager (a part of the Globus toolkit)
GUI – Graphical User Interface
HPCC - High Performance Computing and Communications
HPcc – High Performance Commodity Computing
HPDC - High Performance Distributed Computing
HPF – High Performance Fortran
IDL – Interface Description Language (a CORBA term)
IIOP – Internet Inter-ORB Protocol (a CORBA term)
IOR – Interoperable Object Reference (a CORBA term)
LMS – Landscape Management System
MPI – Message Passing Interface
MSRC – Major Shared Resource Center
NCSA – National Computational Science Alliance
NPAC – Northeast Parallel Architecture Center at Syracuse University
ORB – Object Request Broker
QS – Quantum Simulations (a WebFlow application)
OSC – Ohio Supercomputing Center
PSE – Problem Solving Environment
RSL – Resource Specification Language (used by Globus)
RS – Resource Specification
SSL – Secure Socket Layer
USGS – US Geological Services
WMS – Watershed Modeling System
XML – eXtended Markup Language