Processors: The PEs are custom designed by MasPar. They are RISC-like and grouped into clusters of 16 on the chips. Each cluster has the PE memories and connections to the communications network. Instructions are issued by the Array Control Unit, which is a RISC-like processor based on standard chips from Texas Instruments.
Topology: Grid connections allow communication to 8 nearest neighbors.
Operating System: Supplied with UNIX front-end.
Languages: The languages supported are an ANSI compatible C, and MasPar Fortran (MPF) which is an in-house version of Fortran 90.
Programming Environment: MasPar has licensed a version of the Fortran conversion package VAST-2 from Pacific-Sierra Research Corporation. This product converts from scalar Fortran 77 source code to parallel MPF source. The conversion can also be done in reverse.
Performance: 1.2 GFLOPS (2.6 GIPS) for a 16384 PE machine.
Data Transfer: Nearest neighbor 18 Gbyte/sec for 16384 machine, and 1300 Mbyte/sec using the global router (Manufacturers figures).
Scalability: Scales from 1024 -- 16384 processing elements.
Fault Tolerance: Manufacturers claim mean time between failures of over 8,000 hours. No fault tolerant features.
Price Performance: With an estimated \pounds500,000 for a 16384 processor system this gives \pounds450,000 per GFLOP (using figures from PPP).
User base: The machine is marketed as a Grand Challenge machine due to its high reliability. The DAP 610c has a lower FLOPS rating by a factor of two for a machine with 4 time fewer processors. The installed base is small. Typical applications are DNA sequence matching and image deblurring.
It is not clear whether the DECmpp is exactly the same product as the MP-1. It will be interesting to see what DEC does if the SIMD market takes off.

The ACU has two tasks
Programs written in normal C and Fortran are executed on the front-end machine. These programs can contain procedures written in MPL (MasPar Programming Language) or MPF. When these procedures are called, they are executed entirely inside the DPU. Executing entirely in the DPU is an advantage in the sense that the code is slightly simpler in terms of design. However, sequential code segments will probably perform poorly due to the limited processing capability of the processor inside the ACU. Depending on the amount of sequential code in the entire program, it may or may not pay off to run it entirely in the DPU.
Parallel operations on parallel (plural) data are executed in the DPU as follows. The ACU broadcasts each instruction to all PE's in the PE Array. Each PE in the array then executes the instruction simultaneously, manipulating each PE's copy of the plural (parallel) data.
Programs written using MPL are executed as above except for the following. The ACU fetches and decodes all program instructions during its execution. When an instruction that operates on singular data is encountered, the ACU simply executes the instruction locally on its own processor. When an instruction operating on plural data is decoded, it is processed as described above. The front-end processor does not execute any code in this case.
The PE Array is a 2D mesh of relatively simple processors (PEs). Our MasPar at UO consists of a 64 x 64 array of PEs, for a total of 4096 processors. Each processor is connected to all eight of its neighbors as shown in the diagram below. The connections at the edges of the mesh wrap around to form a torus shaped network.

Diagram from MasPar
System Overview and MPPE Manual
MasPar Computer
Corporation
Each PE is capable of reading and writing memory and performing arithmetic operations. The PEs are not able to fetch or decode instructions, they can only execute instructions. Each PE has 16K of RAM and forty 32-bit registers.
The 2D mesh of PEs is divided up into 4 x 4 clusters of processors. Since our MasPar here at UO has 4096 processors, we have a 16 x 16 mesh of clusters. The diagram below illustrates how clusters and PEs are related.

Diagram from
MasPar System Overview and MPPE Manual
MasPar Computer
Corporation
An important point to remember is that each PE in a cluster shares a common global communications channel (a crossbar switch). This is important because bottlenecks can arise when massive amounts of inter-cluster global routing is used for communication.
There are three constructs for communicating between PEs in the PE Array.
and
PE_index_expression (the absolute index) ranges
from 0 to 4095
PE_row_index (the y-coordinate of the PE) ranges
from 0 to 63
PE_column_index (the x-coordinate of the PE)
ranges from 0 to 63
The expressions, PE_index_expression, PE_row_index, and PE_column_index must be singular expressions. This means that they may not reference parallel (plural) variables. These expressions are used to index the PE array in order to uniquely specify a single PE.
The expression, expression, is used to specify one of two things, depending on which side of the '=' the proc expression is on.
or
The xnet construct has eight forms
The expression, distance_expr, must be a singular expression. This means that it cannot reference plural (parallel) variables. This expression is used to compute the distance, measured in number of PEs, between the communicating PEs.
The expression, expression, is used to specify one of two things, depending on which side of the '=' the xnet expression is on.
or
PE_index_expression (the absolute index) ranges from 0 to 4095
The expression, expression, is used to specify one of two things, depending on which side of the '=' the router expression is on.
or