Meiko World
Status:
Active hardware manufacturer.
Overview of Organization:
Meiko Scientific was founded in
1985 after Inmos management recommended a delay in the introduction of
the transputer. The transputer development team headed by Miles
Chesney resigned from Inmos and formed Meiko to exploit the new
parallel processing technology. Nine weeks later they demonstrated a
transputer system at the SIGGRAPH graphics trade show in San Francisco
in July 1985. The Computing Surface which developed from it became
commercially available in 1986.
Meiko currently has more than 100 employees in its Bristol Research
and Development centre, and offices and distributors throughout
Europe, Asia and the United States.
Important customers/end users for Meiko are Shell Exploration and
Production; British Aerospace; Chemical Design; Defense Research
Agency Malvern, and Applied Geophysical Software and the UK Atomic
Energy Authority
Meiko have produced a range of products from workstation add-on boards
to large stand-alone parallel systems with hundreds of processors.
Platforms Documented:
Contact Address:
Meiko
Reservoir Place
1601 Trapelo Road
Waltham MA 02154, USA.
Tel (617) 890 7676
Fax (617) 890 5042
or
Meiko
650 Aztec West
Almondsbury
Bristol BS12 4SD, UK.
Tel 0454 616171
Fax 0454 618188
See Also:
Meiko's own WWW server.
The Meiko Computing Surface
Overview of Platform:
The original Computing Surface from Meiko was built from a large
number of transputer processing nodes. These nodes could be connected
together to form logical processing domains to give each user a
dedicated resource from the pool of nodes in the whole machine. The
original computing surface was solely built from Transputers, but now
more exotic ``flavours'' are available and both SPARC and i860 nodes
may be intermixed to form a heterogeneous system. See the description
of the Concerto below.
The Computing Surface was first known as just the "CS" but is now
sometimes referred to as Meiko's CS1 in contradistinction to their CS2
product.
The operating system for the early CS models was a UNIX like subset
known as MeikOS. Meiko quickly realised the advantages of using a
pre-existing operating system with which their customer base would be
more familiar, and provided a version of SunOS. Later models of the
CS also provided a Sun compatible host or front end processor. A
number of boards were provided which allowed a CS to be mounted on a
Sun workstation and latterly, as a stand alone machine with a SPARC
node as an internal "log-on" module.
The early CS was difficult to debug, but later models were provided
with tdb, a transputer debugger, and pdb, and more general parallel
debugger.
Compute Hardware:
The CS was purchased as a cabinet and power supply and a kit of boards
and software modules that could be configured to customer
requirements. Each board had at its heart a transputer, and the
original MK009 compute board and MK015 graphics board each had T414
integer transputers as the node CPUs. When the T800 series of
floating point transputers became available from Inmos, Meiko supplied
a range of compute, mass storage, interface and graphics boards.
Interconnect / Communications System:
The early CS models
had a topology that was configurable with a number of twisted wire
pairs in the back of the cabinet, and it was possible to reconfigure
the machine in less than an hour manually. Later models of the CS has
a switchable backplane, that allowed the topology to be reconfigured
in software. At the time this was a very valuable feature, when
switching technology was still primitive compared to the present, and
to obtain the best performance on an application it was vital to use
the optimum topology to maximise data locality. The Meiko CS proved
an invaluable tool for the experimental investigation of different
topologies. In addition to the 4 nearest neighbour links on each
transputer node, the Computing Surface Network (CSN) communications
architecture allowed point-to-point communication within the
heterogeneous node system.
Memory System:
Memory in the CS1 was entirely distributed -
both physically and logically. Early T4 transputer nodes came with
256kBytes of memory, but later T8 compute nodes typically had between
1 and 8 MBytes of memory. The machine was programmed as a real memory
machine, in that their was no virtual memory, consequently a lot of
the art of programming a CS was in knowing how to minimise code and
data size.
Benchmarks / Compute and data transfer performance:
The compute performance of the early T4 transputers was quite low at
typically less than 1MIPS per node. The T8 nodes turned out to be very
well balanced indeed and typically yielded around 1MFLOPS per node.
The transputer links are designed to support a communications rate of
5, 10 or 20 Mbits per second, depending on the model. In principle
this should work two way, doubling the peak speed. In practice, for
applications level programming, the effective rate of communications
per node is around 1MByte per second.
The fast I/O boards on the CS1 allowed up to 80MBytes/second
transfer rate.
Operating System Software and Environment:
The CS was
originally programmed using Occam and Occam channels as the only
available communications mechanism between processors. This was
quickly seen to present software engineering difficulties, with a
limited amount of Occam software being available worldwide. Meiko
rapidly provided C and Fortran compilers, and these could communicate
in a message passing paradigm using initially Meiko's own CS Tools
package. CS Tools provides a port based set of explicit message
passing primitives and is in many ways superior to some of the other
message passing systems used worldwide. Unfortunately, CS Tools is
not available on any other organisations platforms but Meiko's and so
is likely to go the way of all small-market proprietary software.
PARMACS was also available on the later CS models.
Networkability/ I/O System / Integrability / Reliability /
Scalability:
Special I/O nodes were available as separate boards
to be integrated into the system. Two examples are the Mass Store
Element (MK021) and the Data Port Element (MK040). The Mass Store
Element provided a memory mapped SCSI interface at 3MBytes/s which
could be connected to a 100MByte or 500MByte Winchester disk or a
1-2GByte laser disk. The Data Port Element provided an I/O link
capable of 80MBytes/s data transfer rate.
The early CS was difficult to integrate into a general purpose
computing environment with its non-compatible operating system, binary
source files of Occam. VAX systems were used as an early attempt to
map the files on the Meiko system onto a filing system and
environment that users would find more familiar.
Later models of the CS with the SunOS operating system were however
well integrated and could cross mount file systems with user
workstations.
One of the largest computing surfaces built from T800s was the
Edinburgh Concurrent Supercomputer which had over 450 T800 transputer
nodes. Although this machine was nearly always partitioned up by
users into about a dozen domains, experiments done at Edinburgh did
prove that applications could be made to run on the full machine.
Scaling experiments inevitably showed that the 4 nearest neighbour
links between transputer nodes were not sufficient for many algorithms
and applications. Nevertheless, for many then state-of-the art
application problem sizes, 16 or 64 or 128 transputers proved a very
well balanced compute resource.
Notable Applications / Customers / Market Sectors:
Meiko had
a number of noteworthy CS installations worldwide. As noted above,
the largest CS1 installation was at the Edinburgh parallel Computing
centre in Scotland. A number of universities still operate CS1
hardware, and it was a particularly good platform in terms of being
able to buy hardware incrementally and still have a useful machine at
each stage. The UK Atomic Energy Authority used Meiko CS hardware for
computational fluid dynamics and engineering simulations. Another
major application area for Meiko has been OLTP and special purpose
Meiko Oracle server machines have been configured.
Overall Comments:
The Meiko CS1 was effectively the
pioneering platform for MIMD and SPMD computing in Europe. Many of
the ideas and methods seen in current commercial platforms owe their
origins to the early CS and those who worked on it. Perhaps its most
noteworthy feature is that of balance between compute and
communications.
Meiko Concerto
Overview of Platform:
In response to increased compute
performance on competitor machines, Meiko designed an enhancement to
their Computing Surface that involved making hybrid nodes, using two
T800 transputers and an Intel i860 chip. This machine was perhaps not
marketed very well and went under a variety of different names. It is
usually referred to as the Concerto (reflecting that the transputers
and i860s were acted in concert) but is also sometimes just called
Meiko's i860 CS.
This machine proved interesting since it was one of the earliest to
bring together a vector node in a distributed memory MIMD machine.
Although Meiko successfully increased the compute performance of the
hardware, this machine was let down by immature compiler technology
and a poor balance between compute and communications performance.
Compute Hardware:
The Concerto had two Inmos T8 transputers and one Intel i860 chip on each node.
These three chips communicated together via a shared memory bus system.
Interconnect / Communications System:
The Concerto nodes
used the four links per T8 transputer to form an eight-link hybrid
node. A software switchable backplane allowed these nodes to be
configured at will in any desired topology.
Memory System:
The memory shared between the three chips on
each node was accessible to the user as a real distributed memory
system, and typically nodes had 8, 16 or 32 MBytes of memory. There
was no virtual memory system, so it was important to configure the
machine with enough memory for the application.
Benchmarks / Compute and data transfer performance:
The i860 chip is notionally capable of in excess of 60MFLOPS, and
indeed hand-crafted assembler coded application have achieved in
excess of 50MFLOPS per node in highly vectorizable sections. More
typically, well written vector Fortran applications achieved between 7
and 12 MFLOPS per node. This is partially due to there being
insufficient bandwidth between nodes to keep the i860 busy, but also
because of the difficulties in building a compiler smart enough to
make good use of the many facilities on the i860 chip.
The communications system was only slightly better than that of single
T8 CS at around a few MBytes per second per link achievable with
careful programming.
Operating System Software and Environment:
The Concerto ran
the SunOS operating system, and the native message passing system was
the Meiko CS Tools package, a port based system, much the same as the
later models of the CS.
Networkability/ I/O System / Integrability / Reliability /
Scalability:
I/O on the Concerto was implemented through the T8
links communications structure, so that each node could only access
the filing system through an effective bottleneck of one T8 link.
This proved somewhat of a handicap for general applications codes,
although applications rewritten specially for this machine were able
to make use of the large real memory system and achieve superlinear
speedups in some cases by avoiding the paging costs of virtual memory.
The first model of the Concerto suffered from teething troubles in the
board design. This machine was slightly ahead of its time, and the
design pushed board integration technology close to its limits. This
problem was solved in the later Concerto models by modifications to
the board layout.
Considering the Edinburgh machine was run 24 hours a day, 7 days a
week for nearly two years on two demanding applications, it proved
remarkably reliable.
Notable Applications / Customers / Market Sectors:
The most
notable installation of this machine was the Grand Challenge Machine
installed at Edinburgh parallel Computing Centre, for the QCD and Carr
Parrinello simulation projects. This machine had 64 of the hybrid
nodes and was run as a dedicated resource for those two application
codes. Latterly, as the teething troubles were overcome, EPCC was
successful in porting industrial applications to this machine,
including computational fluid dynamics simulations.
Several other sites still have smaller machines in the form of one or
two boards as accelerators to CS1 machines, or as in-Sun boards used
for development purposes.
Incremental scaling is possible by adding boards to the system,
although it appears that 64 nodes is an effective practical limit due
to the filing system communications bottleneck.
Overall Comments:
This machine pioneered some interesting
ideas in terms of hardware balance. It is probably most noteworthy as
indicating the role of vector computing in the new high performance
platforms. (Physically) distributed memory systems are probably the
only way a machine can be made scalable, and vector technology is
relegated to the node level of a parallel machine where it can fulfill
a valuable role if well implemented.
Meiko CS2
Overview of Platform:
The Meiko CS2, like the CS1 has a
modular hardware and software configuration, and therefore a system
can be chosen according to customer needs and budget.
Compute Hardware:
Each compute node in a CS2 is a SPARC
superscalar microprocessor, with an optional attached Fujitsu vector
processing unit. A specific installation will be some mix of scalar
and vector nodes. There are also two variants of the scalar nodes,
one optimised for I/O intensive applications, and one for
computationally intensive applications.
Interconnect / Communications System:
Nodes communicate via
a multi-stage, multi-level switch, which unlike the nearest neighbour
transputer links of the CS1, offer a low latency any-where-to-anywhere
connectivity.
Memory System:
Each node will typically have between 64 and
128 MBytes of memory per node, but by the use of a Direct memory
Access (DMA) facility between nodes, the system has the support
mechanism for virtual global memory.
Benchmarks / Compute and data transfer performance:
Meiko
peak performance figures are 40MFLOPS (double precision) per scalar
node, and early application results suggest that achievable
sustainable figures of in excess of half of this may be likely.
The vector nodes are reported by Meiko as capable of 200MFLOPS in
double precision for peak performance.
Meiko figures of 100MBytes per second bidirectionally between nodes
are probably realistic, and the architecture components are designed
to be able to sustain 800MBytes second ultimately.
The error corrected memory system is organised into 16 independent
banks with an aggregate bandwidth of 3.2GBytes per second.
Operating System Software and Environment:
The CS2 uses a
multiple instance Solaris UNIX operating system, to provide a multi
user access system. Unlike the CS1 and Concerto platforms, the CS2
supports arbitrary user logins on the nodes.
Meiko envisage the CS2 can be used as a multiple UNIX; for explicit
message passing programs; and for data parallel programming. The CS
Tools port-based message passing environment is provided, as are the
PVM, PARMACS portable message passing systems, and Meiko also provide
an Intel NX/2 look-alike message passing interface.
Meiko provide array extended Fortran compilers which are supported by
native CS Tools communications calls.
A toolset including performance analysis and multi-process debugger is
supplied.
Networkability/ I/O System / Integrability / Reliability /
Scalability:
The scalar nodes optimised for I/O have peripheral
interfaces for ethernet and for two SCSI-2 disk controllers and three
SBus slots. Nodes optimized for computational performance do not have
this direct peripheral I.O capability.
The attached I/O devices have facilities for striped, mirrored and
RAID filestores to give a total storage capacity of over 4TBytes using
commodity disks.
Networking is provided by multiple ethernet and FDDI connections
running standard protocols such as FTP,TCP/IP and UNIX streams and
sockets, Telnet and NFS. Multiple HiPPI connections for framestores
and mass storage devices are also provided.
Notable Applications / Customers / Market Sectors:
Notable
customers at present are the Lawrence Livermore National Laboratory
(USA), which have ordered a large vector node system. CERFACS
(France) and CERN (Switzerland) which have both ordered hybrid
scalar/vector node systems. Southampton University already have an
early model of the CS2.
Meiko are clearly targeting their existing market in database systems
and ORACLE users with the I/O optimised scalar nodes. Other important
market sectors include computational electromagnetics, computational
fluid dynamics, molecular dynamics and engineering simulations.
Overall Comments:
This machine has some interesting hardware
developments, specifically the switching system and direct memory
access mechanism.
hawick@npac.syr.edu
saleh@npac.syr.edu