In contrast to the 1970s, in the early 1980s it was MIMD (multiple instruction, multiple data) computers that dominated the activity in parallel computing. The first of these was the Denelcor Heterogeneous Element Processor (HEP). The HEP attracted widespread attention despite its terrible cost performance because of its many interesting hardware features that facilitated programming. The Denelcor HEP was acquired by several institutions, including Los Alamos, Argonne National Laboratory, Ballistic Research Laboratory, and Messerschmidt in Germany. Messerschmidt was the only installation that used it for real applications. The others, however, used it extensively for research on parallel algorithms. The HEP hardware supported both fine-grain and large-grain parallelism. Any one processor had an instruction pipeline that provided parallelism at the single instruction level. Instructions from separate processes (associated with separate user programs or tasks) were put into hardware queues and scheduled for execution once the required operands had been fetched from memory into registers, again under hardware control. Instructions from up to 128 processes could share the instruction execution pipeline. The latter had eight stages; all instructions except floating-point divide took eight machine cycles to execute. Up to 16 processors could be linked to perform large-grain MIMD computations. The HEP had an extremely efficient synchronization mechanism through a full-empty bit associated with every word of memory. The bit was automatically set to indicate whether the word had been rewritten since it had last been written into and could be set to indicate that the memory location had been read. The value of the full-empty bit could be checked in one machine cycle. Fortran, C, and Assembler could be used to program the HEP. It had a UNIX environment and was front-ended by a minicomputer. Because Los Alamos and Argonne made their HEPs available for research purposes to people who were interested in learning how to program parallel machines or who were involved in parallel algorithm research, hundreds of people became familiar with parallel computing through the Denelcor HEP [Laksh:85a].
A second computer that was important in the early 1980s, primarily because it exposed a large number of computational scientists to parallelism, was the CRAY X-MP/22, which was introduced in 1982. Certainly, it had limited parallelism, namely only two processors; still, it was a parallel computer. Since it was at the very high end of performance, it exposed the hardcore scientific users to parallelism, although initially mostly in a negative way. There was not enough payoff in speed or cost to compensate for the effort that was required to parallelize a program so that it would use both processors: the maximum speedup would, of course, only be two. Typically, it was less than two and the charging algorithms of most computer centers generated higher charges for a program when it used both processors than when it used only one. In a way, though, the CRAY X-MP multiprocessor legitimized parallel processing, although restricted to very large grain, very small numbers of processors. A few years later, the IBM 3090 series had the same effect; the 3090 can have up to six vector and scalar processors in one system. Memory is shared among all processors.
Another MIMD system that was influential during the early 1980s was the New York University Ultracomputer [Gottlieb:86a] and a related system, the IBM RP3 [Brochard:92a], [Brochard:92b], [Darema:87a], [Pfister:85a]. These systems were serious attempts to design and demonstrate a shared-memory architecture that was scalable to very large numbers of processors. They featured an interconnection network between processors and memories that would avoid hot spots and congestion. The fetch-and-add instruction that was invented by Jacob Schwartz [Schwartz:80a] would avoid some of the congestion problems in omega networks. Unfortunately, these systems took a great deal of time to construct and it was the late 1980s before the IBM RP3 existed in a usable fashion. At that time, it had 64 processors but each was so slow that it attracted comparatively little attention. The architecture is certainly still considered to be an interesting one, but far fewer users were exposed to these systems than to other designs that were constructed more quickly and put in places that allowed a large number of users to have at least limited access to the systems for experimentation. Thus, the importance of the Ultracomputer and RP3 projects lay mainly in the concepts.