Can CMOS technology meet the needs of PetaFLOPS computing machines? This question was examined by projecting the capability of a typical 1994 processor to very small feature size devices. This exploratory exercise led to the conclusion that a year 2015 device based on 0.05 micrometer technology could be a reasonable building block for a PetaFLOPS system. Table 5.2 summarizes the results of the working group's ``extrapolations'' from 2007 to 2015.
The tabulated technology parameters are a simple extension of the published technology roadmap together with the assumption that all three device generations will use 0.9 volt supply voltage. The present heavy development focus on portable equipment assures that low voltage processes and low power circuit techniques will emerge for even high-performance circuits.
Power density (Watts/unit area) in the memory regions of each chip was assumed to be lower than in the processing logic regions. Also, it was assumed that memory power increased by a factor of only 1.4 when the memory capacity doubled. The most difficult aspect of device scaling is expected to be management of the on-chip interconnect. If device dimensions are to scale in accord with the minimum feature size, the number of layers of interconnect must grow and the pitch between interconnect lines must shrink to accomplish the improved density. It is unlikely that a simple continuation of present day interconnect practices will accomplish the necessary improvement. At the very least, hierarchical levels of interconnect are anticipated, and it is likely that some portion of the interconnect function will be performed in an off-chip substrate.
No sophisticated device architectural innovations are assumed in predicting device throughput. The assumption is that two instructions will be executed per clock period. (This, however, is a very pessimistic assumption. At these small feature sizes, the area penalty for using very elaborate node architectures is nil, and the trend most probably will be to increase greatly the instructions per clock period.) The only architectural matters touched on are the amount of memory per node and the number of nodes per chip. The arbitrary memory sizes of one megabyte per node and eight megabytes per node were chosen to illustrate the general nature of the memory sizing trade-off. The two issues of concern are the bandwidth of the processing node to memory connection, and the device and system level power dissipation.
In addition to the processor chips discussed above, a PetaFLOPS machine
will require memory devices. In view of the historical record, there is
no rationale for assuming anything other than that the density
progression expressed in the 1992 roadmap will continue through the
0.05 micrometer generation. At 0.1 micrometers, the roadmap predicted
DRAMs. Using the rule of thumb of a four times
density increase per generation leads to a prediction of
devices at 0.05 micrometers. Achieving this density
seems impossible from the perspective of the current state of the art,
but such has been the case for all past projections of electronic
device capabilities two decades into the future.
Another speculation is that it will be necessary to build SRAM parts
rather than DRAM because of subthreshold leakage concerns. If that is
the case, the density should be predicted to be one-fourth of the DRAM
value, or .
To be useful for PetaFLOPS applications, the memory device architectures must be revised to provide very high I/O bandwidth. Wide words will be the norm. The chips will have area array connections to support both a large number of signal lines as well as distribution of power and ground to local regions on the chip.