Thoughts about the Turbo Boost feature in Intel Nehalem

While sitting in a workshop meeting, my thoughts went away from the meeting and circled around the Turbo Boost feature of the Nehalem processor. Too make it clear at the beginning: Turbo Boost is a good feature. It helps to use the existent potential of the CPU instead of the expected potential. Overclockers have shown in the past, das most CPU have more performance potential as the print on the heat spreader says. But while I think this is a good idea, this will open some interesting benchmarketing tricks.
There is an interesting whitepaper about Turbo Boost at the Intel Website: Intel Turbo Boost Technology in Intel Core Microarchitecture (Nehalem) Based Processors. I hope that i didn´t overlooked something that gives all processors the same overclocking potential. But i take here my own overclocking experience into consideration. Even at the same type there are series that doesn´t run stable even with a modestly increased frequency where as there are units that accept vastly higher frequency without needing a pipeline to your prefered vendor of liquid nitrogen. There are several problems with it: The first problem is the TDP definition. I wrote before the problem in regard of another article: Just because a processor is labeled as for example 65W TDP it doesn´t mean that it will take 65W. Just to repeaT: Let´s assume you have three defined TDP: 50,65 and 80 Watts. When you assign the 50 Watts TDP to it, it means that it dissipates 50 watts at maximum. 65 means : It wasn´t good enough for the 50 Watts TDP sticker, but it took unter 66 W, so it´s a 65 W TDP proc. Now you have to take into consideration, that the activation of the Turbo Boost is done on the foundation of the actual temperature, the power consumption and the current consumption. As long the processor is in it´s specification limits, the processor overclocks itself and clocks down again if the processor is outside this defined operational envelop. Let´s assume that the 65W TDP range goes from 51W to 65W. Now let´s assume we run the benchmark with a unit that that was barely in this range and not in a higher one. The factory installed limits for temperature are reached fast so the system clocks down to normal speed. The performance gain is small. Now let´s assume that you get an 51W version. The system can stay much longer in a higher speed bin. My expectation: The vendors with a tendency to benchmarketing will take a few hundred procs and test them for units that failed the criteria for the lower TDP by a watt or so and they will use this hand-selected CPUs for their benchmarks. But that twist makes benchmarks absolutely absurd: You can´t repeat them with your home system as long you don´t have the luck to have such an CPU as well. As nonsensical some benchmark loads like TPC-C are: they are repeatable (okay … there are some side conditions as having several thousand disks) but at least they are not dependent from the temperature. Even the intake air temperature of the system is important to the performance: With such mechanisms it´s a difference if you cool down the intake air to 16 degrees celsius or to 23 degrees celsius. I would think it´s even a difference in performance, if you have a rack full of those systems or just a single one in a rack as servers tend to dissipate heat not only by air, but by radiation from the server chassis, too. Perhaps it´s a good practice to do two benchmarks with Nehalem based systems. One benchmark without Turboboost to get a repeatable result. On the other side there should be a result with activated TurboBoost. For the turboboosted mode it should be mandatory to publish exact environmental data and a list of the state changes of the processor throughout the time. To find the upside on this development: Turbo Boost has a nice side effect. Good chassis and cooling design directly translates into performance advantages. The better your system design is, the cooler your system runs, the longer BIOS keeps the CPU in the boosted modes. There are some people who told us in the past, that our Galaxy-class systems are overdesigned just for x86 servers. Well, may be it was the right decision with Nehalem in sight. Furthermore the Turboboost technology will introduce some interesting challenges to the design of schedulers in operating systems. Unter certain circumstances it would be an sensible choice to give up locality and to migrate processors from a core to other CPUs to allow TurboBoost to increase the frequency as the other cores are idling. But you can´t do that by default, as this performance impact comes at a price. You would increase the power usage of the system as you can´t power down the CPU. To think in benchmarking terms: A performance-biased scheduler would hurt at power consumption benchmarks, and a power consumption biased scheduler would hurt at performance benchmarks. And by the way: The schedulers of an operating system must be much more intelligent in the future. A scheduler unaware of the nature of Turbo Boost may yield different result as well depending from it´s scheduling decision. If it´s scheduling all processes on the same CPU the execution time of the benchmark will be slower as when it spreading the load on seperate CPUs. As you see: For the benchmark community the TurboBoost feature will be a can of worms. I think i will do some more research on this topic in the next few days.