Lattice QCD is a perfect problem for simple parallel computer architectures. High efficiency is very easy to reach. The PetaFLOPS threshold will allow a dramatic change in the scope of numerical simulations of lattice QCD, which will become a really effective phenomenological tool and support to experiments. Weak interaction physics will be understood in a seriously quantitative way, and it will be possible to compute scattering amplitudes with high precision. Experiments like the one that are planned in this period (beauty and phi factories) will be able to exploit such a powerful help (quantitative predictions from the microscopic theory, without approximations).
As we already said, numerical simulations of lattice QCD can have very
high efficiency even on very simple architectures. The problem is
computationally intensive, since one always operates on complex
matrices: a low cpu memory bandwidth is acceptable. Since
one is simulating a virtual world, and only needs to write on disk a
few average numbers (apart from backups and check points), a powerful
I/O channel is not needed. The problem is local and homogeneous, and the
mapping to processor architecture straightforward. The cost-effective
mesh architecture of Class III in Table 4.3 seems
satisfactory. Further, a reasonable
lattice requires around
10 terabytes of memory to match a PetaFLOPS performance. Larger
problems than this would require major new algorithms such as the
multiscale renormalization group.