Meiko World

Status:

Active hardware manufacturer.

Overview of Organization:

Meiko Scientific was founded in 1985 after Inmos management recommended a delay in the introduction of the transputer. The transputer development team headed by Miles Chesney resigned from Inmos and formed Meiko to exploit the new parallel processing technology. Nine weeks later they demonstrated a transputer system at the SIGGRAPH graphics trade show in San Francisco in July 1985. The Computing Surface which developed from it became commercially available in 1986.

Meiko currently has more than 100 employees in its Bristol Research and Development centre, and offices and distributors throughout Europe, Asia and the United States.

Important customers/end users for Meiko are Shell Exploration and Production; British Aerospace; Chemical Design; Defense Research Agency Malvern, and Applied Geophysical Software and the UK Atomic Energy Authority

Meiko have produced a range of products from workstation add-on boards to large stand-alone parallel systems with hundreds of processors.

Platforms Documented:

Contact Address:

Meiko
Reservoir Place
1601 Trapelo Road
Waltham MA 02154, USA.
Tel (617) 890 7676
Fax (617) 890 5042

or

Meiko
650 Aztec West
Almondsbury
Bristol BS12 4SD, UK.
Tel 0454 616171
Fax 0454 618188

See Also:

Meiko's own WWW server.

The Meiko Computing Surface

Overview of Platform:

The original Computing Surface from Meiko was built from a large number of transputer processing nodes. These nodes could be connected together to form logical processing domains to give each user a dedicated resource from the pool of nodes in the whole machine. The original computing surface was solely built from Transputers, but now more exotic ``flavours'' are available and both SPARC and i860 nodes may be intermixed to form a heterogeneous system. See the description of the Concerto below.

The Computing Surface was first known as just the "CS" but is now sometimes referred to as Meiko's CS1 in contradistinction to their CS2 product. The operating system for the early CS models was a UNIX like subset known as MeikOS. Meiko quickly realised the advantages of using a pre-existing operating system with which their customer base would be more familiar, and provided a version of SunOS. Later models of the CS also provided a Sun compatible host or front end processor. A number of boards were provided which allowed a CS to be mounted on a Sun workstation and latterly, as a stand alone machine with a SPARC node as an internal "log-on" module.

The early CS was difficult to debug, but later models were provided with tdb, a transputer debugger, and pdb, and more general parallel debugger.

Compute Hardware:

The CS was purchased as a cabinet and power supply and a kit of boards and software modules that could be configured to customer requirements. Each board had at its heart a transputer, and the original MK009 compute board and MK015 graphics board each had T414 integer transputers as the node CPUs. When the T800 series of floating point transputers became available from Inmos, Meiko supplied a range of compute, mass storage, interface and graphics boards.

Interconnect / Communications System:

The early CS models had a topology that was configurable with a number of twisted wire pairs in the back of the cabinet, and it was possible to reconfigure the machine in less than an hour manually. Later models of the CS has a switchable backplane, that allowed the topology to be reconfigured in software. At the time this was a very valuable feature, when switching technology was still primitive compared to the present, and to obtain the best performance on an application it was vital to use the optimum topology to maximise data locality. The Meiko CS proved an invaluable tool for the experimental investigation of different topologies. In addition to the 4 nearest neighbour links on each transputer node, the Computing Surface Network (CSN) communications architecture allowed point-to-point communication within the heterogeneous node system.

Memory System:

Memory in the CS1 was entirely distributed - both physically and logically. Early T4 transputer nodes came with 256kBytes of memory, but later T8 compute nodes typically had between 1 and 8 MBytes of memory. The machine was programmed as a real memory machine, in that their was no virtual memory, consequently a lot of the art of programming a CS was in knowing how to minimise code and data size.

Benchmarks / Compute and data transfer performance:

The compute performance of the early T4 transputers was quite low at typically less than 1MIPS per node. The T8 nodes turned out to be very well balanced indeed and typically yielded around 1MFLOPS per node.

The transputer links are designed to support a communications rate of 5, 10 or 20 Mbits per second, depending on the model. In principle this should work two way, doubling the peak speed. In practice, for applications level programming, the effective rate of communications per node is around 1MByte per second.

The fast I/O boards on the CS1 allowed up to 80MBytes/second transfer rate.

Operating System Software and Environment:

The CS was originally programmed using Occam and Occam channels as the only available communications mechanism between processors. This was quickly seen to present software engineering difficulties, with a limited amount of Occam software being available worldwide. Meiko rapidly provided C and Fortran compilers, and these could communicate in a message passing paradigm using initially Meiko's own CS Tools package. CS Tools provides a port based set of explicit message passing primitives and is in many ways superior to some of the other message passing systems used worldwide. Unfortunately, CS Tools is not available on any other organisations platforms but Meiko's and so is likely to go the way of all small-market proprietary software. PARMACS was also available on the later CS models.

Networkability/ I/O System / Integrability / Reliability / Scalability:

Special I/O nodes were available as separate boards to be integrated into the system. Two examples are the Mass Store Element (MK021) and the Data Port Element (MK040). The Mass Store Element provided a memory mapped SCSI interface at 3MBytes/s which could be connected to a 100MByte or 500MByte Winchester disk or a 1-2GByte laser disk. The Data Port Element provided an I/O link capable of 80MBytes/s data transfer rate.

The early CS was difficult to integrate into a general purpose computing environment with its non-compatible operating system, binary source files of Occam. VAX systems were used as an early attempt to map the files on the Meiko system onto a filing system and environment that users would find more familiar.

Later models of the CS with the SunOS operating system were however well integrated and could cross mount file systems with user workstations.

One of the largest computing surfaces built from T800s was the Edinburgh Concurrent Supercomputer which had over 450 T800 transputer nodes. Although this machine was nearly always partitioned up by users into about a dozen domains, experiments done at Edinburgh did prove that applications could be made to run on the full machine. Scaling experiments inevitably showed that the 4 nearest neighbour links between transputer nodes were not sufficient for many algorithms and applications. Nevertheless, for many then state-of-the art application problem sizes, 16 or 64 or 128 transputers proved a very well balanced compute resource.

Notable Applications / Customers / Market Sectors:

Meiko had a number of noteworthy CS installations worldwide. As noted above, the largest CS1 installation was at the Edinburgh parallel Computing centre in Scotland. A number of universities still operate CS1 hardware, and it was a particularly good platform in terms of being able to buy hardware incrementally and still have a useful machine at each stage. The UK Atomic Energy Authority used Meiko CS hardware for computational fluid dynamics and engineering simulations. Another major application area for Meiko has been OLTP and special purpose Meiko Oracle server machines have been configured.

Overall Comments:

The Meiko CS1 was effectively the pioneering platform for MIMD and SPMD computing in Europe. Many of the ideas and methods seen in current commercial platforms owe their origins to the early CS and those who worked on it. Perhaps its most noteworthy feature is that of balance between compute and communications.


Meiko Concerto

Overview of Platform:

In response to increased compute performance on competitor machines, Meiko designed an enhancement to their Computing Surface that involved making hybrid nodes, using two T800 transputers and an Intel i860 chip. This machine was perhaps not marketed very well and went under a variety of different names. It is usually referred to as the Concerto (reflecting that the transputers and i860s were acted in concert) but is also sometimes just called Meiko's i860 CS.

This machine proved interesting since it was one of the earliest to bring together a vector node in a distributed memory MIMD machine. Although Meiko successfully increased the compute performance of the hardware, this machine was let down by immature compiler technology and a poor balance between compute and communications performance.

Compute Hardware:

The Concerto had two Inmos T8 transputers and one Intel i860 chip on each node. These three chips communicated together via a shared memory bus system.

Interconnect / Communications System:

The Concerto nodes used the four links per T8 transputer to form an eight-link hybrid node. A software switchable backplane allowed these nodes to be configured at will in any desired topology.

Memory System:

The memory shared between the three chips on each node was accessible to the user as a real distributed memory system, and typically nodes had 8, 16 or 32 MBytes of memory. There was no virtual memory system, so it was important to configure the machine with enough memory for the application.

Benchmarks / Compute and data transfer performance:

The i860 chip is notionally capable of in excess of 60MFLOPS, and indeed hand-crafted assembler coded application have achieved in excess of 50MFLOPS per node in highly vectorizable sections. More typically, well written vector Fortran applications achieved between 7 and 12 MFLOPS per node. This is partially due to there being insufficient bandwidth between nodes to keep the i860 busy, but also because of the difficulties in building a compiler smart enough to make good use of the many facilities on the i860 chip.

The communications system was only slightly better than that of single T8 CS at around a few MBytes per second per link achievable with careful programming.

Operating System Software and Environment:

The Concerto ran the SunOS operating system, and the native message passing system was the Meiko CS Tools package, a port based system, much the same as the later models of the CS.

Networkability/ I/O System / Integrability / Reliability / Scalability:

I/O on the Concerto was implemented through the T8 links communications structure, so that each node could only access the filing system through an effective bottleneck of one T8 link. This proved somewhat of a handicap for general applications codes, although applications rewritten specially for this machine were able to make use of the large real memory system and achieve superlinear speedups in some cases by avoiding the paging costs of virtual memory. The first model of the Concerto suffered from teething troubles in the board design. This machine was slightly ahead of its time, and the design pushed board integration technology close to its limits. This problem was solved in the later Concerto models by modifications to the board layout.

Considering the Edinburgh machine was run 24 hours a day, 7 days a week for nearly two years on two demanding applications, it proved remarkably reliable.

Notable Applications / Customers / Market Sectors:

The most notable installation of this machine was the Grand Challenge Machine installed at Edinburgh parallel Computing Centre, for the QCD and Carr Parrinello simulation projects. This machine had 64 of the hybrid nodes and was run as a dedicated resource for those two application codes. Latterly, as the teething troubles were overcome, EPCC was successful in porting industrial applications to this machine, including computational fluid dynamics simulations.

Several other sites still have smaller machines in the form of one or two boards as accelerators to CS1 machines, or as in-Sun boards used for development purposes.

Incremental scaling is possible by adding boards to the system, although it appears that 64 nodes is an effective practical limit due to the filing system communications bottleneck.

Overall Comments:

This machine pioneered some interesting ideas in terms of hardware balance. It is probably most noteworthy as indicating the role of vector computing in the new high performance platforms. (Physically) distributed memory systems are probably the only way a machine can be made scalable, and vector technology is relegated to the node level of a parallel machine where it can fulfill a valuable role if well implemented.


Meiko CS2

Overview of Platform:

The Meiko CS2, like the CS1 has a modular hardware and software configuration, and therefore a system can be chosen according to customer needs and budget.

Compute Hardware:

Each compute node in a CS2 is a SPARC superscalar microprocessor, with an optional attached Fujitsu vector processing unit. A specific installation will be some mix of scalar and vector nodes. There are also two variants of the scalar nodes, one optimised for I/O intensive applications, and one for computationally intensive applications.

Interconnect / Communications System:

Nodes communicate via a multi-stage, multi-level switch, which unlike the nearest neighbour transputer links of the CS1, offer a low latency any-where-to-anywhere connectivity.

Memory System:

Each node will typically have between 64 and 128 MBytes of memory per node, but by the use of a Direct memory Access (DMA) facility between nodes, the system has the support mechanism for virtual global memory.

Benchmarks / Compute and data transfer performance:

Meiko peak performance figures are 40MFLOPS (double precision) per scalar node, and early application results suggest that achievable sustainable figures of in excess of half of this may be likely.

The vector nodes are reported by Meiko as capable of 200MFLOPS in double precision for peak performance.

Meiko figures of 100MBytes per second bidirectionally between nodes are probably realistic, and the architecture components are designed to be able to sustain 800MBytes second ultimately.

The error corrected memory system is organised into 16 independent banks with an aggregate bandwidth of 3.2GBytes per second.

Operating System Software and Environment:

The CS2 uses a multiple instance Solaris UNIX operating system, to provide a multi user access system. Unlike the CS1 and Concerto platforms, the CS2 supports arbitrary user logins on the nodes.

Meiko envisage the CS2 can be used as a multiple UNIX; for explicit message passing programs; and for data parallel programming. The CS Tools port-based message passing environment is provided, as are the PVM, PARMACS portable message passing systems, and Meiko also provide an Intel NX/2 look-alike message passing interface.

Meiko provide array extended Fortran compilers which are supported by native CS Tools communications calls.

A toolset including performance analysis and multi-process debugger is supplied.

Networkability/ I/O System / Integrability / Reliability / Scalability:

The scalar nodes optimised for I/O have peripheral interfaces for ethernet and for two SCSI-2 disk controllers and three SBus slots. Nodes optimized for computational performance do not have this direct peripheral I.O capability.

The attached I/O devices have facilities for striped, mirrored and RAID filestores to give a total storage capacity of over 4TBytes using commodity disks.

Networking is provided by multiple ethernet and FDDI connections running standard protocols such as FTP,TCP/IP and UNIX streams and sockets, Telnet and NFS. Multiple HiPPI connections for framestores and mass storage devices are also provided.

Notable Applications / Customers / Market Sectors:

Notable customers at present are the Lawrence Livermore National Laboratory (USA), which have ordered a large vector node system. CERFACS (France) and CERN (Switzerland) which have both ordered hybrid scalar/vector node systems. Southampton University already have an early model of the CS2.

Meiko are clearly targeting their existing market in database systems and ORACLE users with the I/O optimised scalar nodes. Other important market sectors include computational electromagnetics, computational fluid dynamics, molecular dynamics and engineering simulations.

Overall Comments:

This machine has some interesting hardware developments, specifically the switching system and direct memory access mechanism.


hawick@npac.syr.edu
saleh@npac.syr.edu