Failure Rates
Mean Time Between Failure
There is more to reliability than just ECC protection. Mean Time Between Failure (MTBF) is a measure of reliability.
Key factors affecting MTBF are:
- Hotter chips fail more often
- Bigger chips fail more often
- Higher clock rate chips fail more often
- Moving part reduce MTBF (eg cooling fans)
Consider how the IBM BlueGene design address these: it use smaller, slower, lower power processors with full ECC.
Similarly ClearSpeed's processors directly address all these reliability issues: smaller die sizes, lower clock rates, much lower power processors with full ECC protection and no cooling fans.
Very larger, very hot, high clock-rate GPGPUs relying on fans have much lower MTBFs than ClearSpeed accelerators and their host systems.
Temperature and power control
To ensure that they operate within the specified power and temperature limits, ClearSpeed's Advance e710 and Advance e720 accerlerators include dynamic management of clock speed
If either the power consumption or the temperature exceed the specified limits then the processor clock speed will be reduced to ensure reliable operation under all circumstances.







