High performance computing (HPC) has penetrated all aspects of science and engineering. More and more engineering and scientific problems can be modeled reasonably well to make simulation a viable alternative to an expensive real-world experiment. The insight gained into our world and into our body naturally leads to more questions, and to more complex questions – which in turn need more analysis. So the scientific exploration using numerical methods becomes a self-amplifying process: The more we know, the more we find questions that need answers. Some questions have no real hard deadline to be answered, and consequently do not push the limits of computational performance to model. Others, however, need an answer fast. What is tomorrow’s weather forecast worth when it comes out the day after tomorrow? Questions like that are what is pushing the HPC supplier community to come up with ever-increasing Floating Point performance.
The very nature of HPC is that the solution to one set of questions or problems generates yet another set of questions. These next-step problems are generally more complex than their predecessors. Subsequently, more accurate and powerful tools are required to move ahead which generates an insatiable thirst for compute cycles and a compulsion to achieve constantly increasing levels of performance and precision.
Recent advances in technology by the leading processor companies have made HPC capabilities available to many more individuals and organizations by combining excellent performance with the cost benefits of volume markets. The availability and affordability of these new systems has created a new class of challenges that now need to be answered. However, it has become clear that the HPC community cannot simply extrapolate the path it has been banking on for years. Just cranking up the clock frequencies of processors does not work any more. The resulting solutions not only dissipated too much energy, they started to demonstrate the limits of air-cooling. Consequently, the semiconductor manufacturers had to find other solutions to increase total available processing performance.
One of today's dreams for HPC system architects is building the world's first system capable of achieving a PetaFLOPS on LINPACK (one quadrillion floating-point operations per second solving dense systems of linear equations). This level of performance would take several hundred nodes or racks, each with a sustained speed of a few TeraFLOPS, and very good scaling.
Electricity costs range from about 5 cents per kilowatt-hour in places like the Department of Energy's Pacific Northwest National Laboratory (which is next to a surplus of nuclear power as well as plenty of hydroelectric power) to about 23 cents per kilowatt-hour at the Maui High-Performance Computing Center. As the price of hardware falls with Moore's law, the price of the energy to flip all those ever-denser bits keeps rising with inflation and the price of a barrel of oil.
If a two-socket server consumes just 300 watts and you keep it on for a year, how much does the electricity cost? At 12 cents per kilowatt-hour it will use $375 per year, just over $1 per watt per year.
Let's assume that such a server could achieve 50 GigaFLOPS on the Linpack benchmark. It would require 20,000 such systems to deliver a sustained PetaFLOPS of performance with perfect scaling. For that "dream machine" the annual electricity costs would be roughly $6,000,000.
Nobody has yet succeeded in building a system with the precision, performance, scaling, and power consumption characteristics that meet that lofty goal. A more realistic yet still conservative estimate is around $10,000,000 in energy budget and a 10 megawatt power supply to keep the system operating.
Every facility has some maximum ability to supply power to the computer room, and increasing it by several megawatts is a major engineering project. The same is true – sometimes to an even larger degree – for the removal of the generated but unwanted heat.
Today it is generally accepted that supercomputing entails filling a building with racks. The issue is how many can be accommodated.
The limit of floor space is like the limit of power dissipation, in that it does not simply translate into cost. The floor space may not be available, at any price. Even if the floor area is available it may not be able to bear the weight of several thousand systems. In the financial centers like Manhattan and London that use HPC for financial modeling suitable facilities are not only expensive but are unlikely to be available exactly where they are needed. As a result, it is important to pack as much computational performance into a volumetric unit as possible.