Cambridge Parallel Processing Inc. (CPP)

# Cambridge Parallel Processing Inc. (CPP)

### Status

Active hardware manufacturer.

### Overview of Organization

Cambridge Parallel Processing Inc were formerly known as Actime Memory Technology Ltd, a British company spun off from ICL in 1986. ICL had been involved with fine grained parallel machines for some years, without producing commercially successful machines. AMT encountered financial difficulties in 1992 and now CPP are the manufacturers of the Distributed Array Processor (DAP) - a massively parallel SIMD computer. The market place for this platform is the academic, industrial and defence sectors.

As of 1998 the compnay's main hardware product is Gamma II Plus. Commercially available since 1986, Gamma II Plus is a massively parallel workstation based on DAP.

There are over 115 DAPs installed worldwide, excluding those in embedded systems. (Numbers of these latter are not widely known).

### Platforms Documented

Cambridge Parallel Processing,
England,
UK.
Fax: 0344 305544.


# The DAP 500, 600, 500c and 600c Series

### Overview of Platform

The AMT DAP is an SIMD lockstep machine which operates on multiple data one bit at a time. It can support, via software, variable-length arithmetic. Configuration is a grid of processing elements with nearest neighbour connections and row/column data highways. The row/column data highways allow efficient global fetches and broadcasts giving the system the properties of associative processors.

Architecture --- The DAP is hosted by a front-end computer (typically a Sun workstation). The machines are based on single bit Processing Elements (PEs) which are custom designed using CMOS technology. The 500c and 600c machines come with an 8-bit co-processor per PE to speed up floating point and integer operations. Code and data stores are separate, and the processors have access to a high speed data channel. The control structure consists of a Master Control Unit (MCU) which reads instructions from a code store and issues them to the PEs. The MCU also acts a high speed scalar processor. An application consists of two parts: one running on the front end and a separately compiled part running on the DAP itself.

In the naming convention of a DAP, the first digit gives the edge size as a power of 2, the next two digits give the clock speed, and the letter c' indicates the use of co-processor, so a DAP 510c has 32X32=1024 processors, 10 MHz clock and b-bit coprocessors.

The major differences over the ILLIAC IV are: bit processors; row/column highway; much larger memory per processor; high input/output capability.

AMT offers two models of the DAP. The DAP 510 is a 32 x 32 array of processors, and the DAP 610 is a 64 x 64 array of processors. The DAP array is constructed using custom CMOS VLSI chips which contain 64 processor elements. Both models of the DAP currently operate with a 100 nsec cycle time. A real-time graphic display interface is available for the DAP systems. The following table summarizes the characteristics of the two DAP models.

The development environment (cross-compilers and run time debugging aids) are supplied running under UNIX. The DAP is linked as a peripheral via a 1.5 Mbyte/sec parallel interface.

Model       Memory         I/O Data       Processing             Memory
Bandwidth          Rate         Elements        Configurations

DAP 510    1.2 Gbytes/sec   50 Mbytes/sec      1024        4,  8, 16  Mbytes
DAP 610    4.8 Gbytes/sec  100 Mbytes/sec      4096       16, 32, 64  Mbytes


### Compute Hardware

Each node of the machine effectively consists of a single bit processor, optional floating point accelerator and node memory.

The DAP 510 is small enough to fit under a desk, while the DAP 610 is housed in a standard EIA rack cabinet. Both DAP models can be hosted by Sun or DEC VAX computers and workstations. The DAP can be connected to a Sun host via the SCSI interface. Connection to DEC VAX systems is via DR11W or DRB32 interfaces. Connection to the Aptec IOC is supported as well as direct connection to VME bus.

                       DAP 510                DAP 610

Array size             32 x 32                 64 x 64
Array memory            8 Mbytes               16 Mbytes (max. of 128 or 512 Mbytes)
Code store            512 Kbytes              512 Kbytes (max. of 4 Mbytes)
Instruction rate       10 MHz                  10 MHz
host                  Sun or VAX              Sun or VAX
Size                  17 x 13 x 20 in.      45 x 25 x 38 in.
`

The present DAP systems are third-generation machines which started with a 64 x 64 array originally installed at QMC (Queen Mary College, University of London). The QMC machine, which had an effective cycle time of 250 nsec, proved highly adaptable to a wide range of numerical problems based on partial differential equations. The performance on large-scale Monte Carlo simulations in lattice gauge theory and molecular dynamics was found to be exceptional and, in some specialized applications such as the Ising model, the DAP outperformed a CRAY-1 by a factor of 10.

### Interconnect / Communications System

The interconnection is to the four nearest neighbours in a 2-D grid, together with an additional bus system connecting processors by rows and columns.

### Memory System

Each bit processor has its own part of the array memory. Processors access data of beighbouring PE's by carrying out hardware implemented SHIFT operations. Array memory can also be addressed conventionally by the MCU processor. A fast channel is provided to allow data to be fed into one edge of the square torus of PE's, and can be used to conbtrol an additional data storage system.

### Benchmarks / Compute and data transfer performance

Performance The DAP 510c has a peak of 140 MFLOPS, while the DAP 610c has a peak of 560 MFLOPS. The DAP 610 can achieve 40,000 MIPS for Boolean operations. (1992 Figures)

Data Transfer The high speed data channel can operate at 70 Mbyte/sec. Transfer between memory and processors is 1280 Mbyte/s for the DAP 510 and 5120 Mbyte/s for the DAP 610.

### Operating System Software and Environment

Whatever operating system is run on the host or front end is employed. This is typically UNIX or VMS. The internal DAP operating system that interfaces the MCU and processor array is not normally visible to the user.

Languages A parallel Fortran (FORTRAN-PLUS) is available, similar to Fortran 90. A full Fortran 90 and parallel C are ultimate goals.

Programming Environment The development tools on the front end can be used. A simulator running on Sun and VAXs is available to assist development. Extensive libraries for image processing, numerical calculations, signal processing and text and search processing exist.

The principal programming language used is Fortran plus, an augmented Fortan that includes most of the array features proposed for Fortran 8X. APAL, an assembler language, is also available.

### Networkability/ I/O System / Integrability / Reliability / Scalability

I/O Node processors can return data to the broadcast processor along fast data highways for collation and return to the front-end. In addition a fast bus along one edge of the porocessor array can drive disk drives or a high resolution graphical device.

Scalability Two sizes available: the DAP 510 with $32 \times 32$ processing elements (PEs) and the DAP 610 with $64 \times 64$ processing elements. Both sizes run at 10 MHz. The two machines are also available with an 8-bit co-processor per PE.

Fault Tolerance --- The control structure consists of a Master Control Unit (MCU) which reads instructions from a code store and issues them to the PEs. Each PE has an associated slave which performs the same operation; errors are reported to the MCU.

### Notable Applications / Customers / Market Sectors

The variable length arithmetic capabilities of the DAP make it particularly well adapted to large scale signal and image processing applications.

AMT provides libraries of algorithms in subroutine form to support image and signal processing application development. A general-purpose algorithm library is also available. Major application areas include scientific and engineering computing, image processing, signal processing, defense applications, and database applications.

Initial shipments of the DAP 510 began in February 1988. Shipments of the DAP 610 began in November 1988. At year end of 1988, 60 DAP 510 and 5 DAP 610 machines were installed in the United States and Europe.

The application areas targetted are defence, scientific and engineering. Typical examples are neural networks, fluid flow, DNA sequencing, molecular modelling, signal processing and speech recognition. Because of its well established position in the market (over ten years if the ICL time is counted) the products are generally regarded as stable and the software base is extensive.

A recent market area successfully targeted by CPP is that of fast text database searching. Reuters, the news agency are major customers of CPP, using DAP text searching servers for databases of newsarticles. CPP provide teh hardware and software to integrate DAP-text server systems into existing database systems, preserving existing user interfaces.