Next: Barriers Up: Architecture Previous: Architecture

State of the Art

The state of the art in parallel architecture is represented by multiprocessors comprised of on the order of a thousand high-end microprocessors (developed for the workstation market) at clock rates of a 100 MHz or more with one to four instructions issued per cycle and 32 megabytes of main memory per processor. Together, these resources are integrated to form systems with peak performance exceeding 100 GigaFLOPS with main memory in the tens of gigabytes. Vector architectures with up to 16 very high-speed, large, and highly pipelined processors produce peak performance above 10 GigaFLOPS using the fastest available semiconductor technology. SIMD architectures employing more than 10,000 fine-grain processing elements have delivered a few GigaFLOPS performance at modest cost.

The cost of the most powerful machines today is in the range of $50M, including some mass storage and peripherals. Latency management techniques applied to parallel architecture include (1) caches, cache hierarchies, and cache coherence mechanisms; (2) low-latency computing structures; (3) hardware and software prefetching methods; and (4) rapid context switching, multithreaded techniques. Resource management techniques such as data partitioning and task allocation/scheduling are done almost entirely in software, with much of it performed by the application program itself. Fine-grain parallelism is usually exposed by compile-time analysis and is used for individual processor instruction scheduling in execution pipelines and superscalar ALUs. Some hardware support for reducing overhead of synchronization, data migration, and message passing has been incorporated. Generally speaking, these systems are difficult to program and optimize.



Next: Barriers Up: Architecture Previous: Architecture


gcf@npac.syr.edu