Why is Victoria Falls more important than Rock (in the short term) ?

I wrote some days ago, that in my humble opinon tthe Victoria Falls procs are more important than Rock. This have serveral reasons and i want to explain those as far as i can go without telling secrets about the topic. At first: Rock is an important processor … it contains many technologies to keep us competitive against Itanium III or Power 7 to 10 (however this iterations of their respective CPU franchises may look at a given point of time in the future). Many people think: We need Rock desperately to get more single-thread performance into the SPARC line. But we have an answer there: It´s APL. And to peemptivly answer some comments regarding “Sun APL is relabled fujitsu gear”: There is much less Fujitsu in APL than you think, there is much more Sun in APL than you think and there is no FSC in APL (besides the label on the front-bezel). SPARC64 has a good roadmap. Enough performance to keep Power6 under control. The distance between top-of-the line Power6 gear with 4.7 Ghz and APL in real life application isn´t as big as some synthetic benchmarks may suggest and the supply of 4.7 Ghz procs used in this benchmarks is really sparse. And never forget: Sun has two powerful competive weapons … Solaris and it´s people. So: Why is Victoria Falls more important than Rock. Victoria Falls adds an important capability to the Niagara2 line: building multi-proc systems. A quad-core VF contains 32 cores, 64 piplines, 256 threads. Victoria Falls solves the remaining weakness of the UltraSPARC T-line: The single-socketness of the Niagara-based systems. So: What´s the idea behind VF and N2. It´s quite simple … the design target of this processor line is “Threads … lot´s of them.” I know, such a system is not suitable to all tasks, but we ride on a wave together with other: Intel does multicore, AMD does multicore(only IBM is the last bastion of a few big cores). And we ride in front of the pack. With such market heavyweights on the same wave, it seems to be a safe bet to forecast more software which is better optimized for systems with many threads. We need such software to leverage all the advantages of VF, but Intel and AMD need such software, too. By the way: I don´t think we will see x86 Octo- or Hexacores anytime soon. I don´t think that the main market for high end x86 will need such processors, because the applications can´t use them. The application? Games! Modern game engines have probems to saturate even quadcores because of the programming modell (and most work in games is done in the GPU)This leads to an interesting thought game: In my opinion the massive R&D costs for the x86 line is subsidised by the desktop market. Is it economical reasonable to develop an x86 variant, that has no big market in the desktop market? I don´t really think that a x86 with a server-only market will be cheaper than any other server processor based on a non-x86 architecture. But i have diverted from the main topic of this article. Many peope think, that the UltraSPARC T-line isn´t capable to some tasks. And they are correct and incorrect at the same time. HPC for example: It´s obvious, you wouldn´t use a T2 to simulate the crash of a car (or the inflight disintegration of the maiden-flight 787 when they forgotten some of the temporary fasteners ;). SCNR ). But the UltraSPARC T2 a very capable chip to do financial HPC, which uses Monte-Carlo quite often (and the cryptographic unit has some nice functions to speed up this tasks). You wouldn´t use it for one-at-a time batch runs, but what happens when you have 64 of them at the same time? And then there is an additional class of applications: Such systems are performance sensitive, but look on a different metric: There are applications, where the time needed for the application is secondary to the time when it´s executed and when the execution has finalized. Just a small thought game: Lets assume a virtual processor A: . It contains 4 cores, it´s really fast, each core is capable to execute any command in 1 second, it has to wait 2 seconds for access to memory. Let´s assume a second processor B with 8 cores, but every command needs 3 seconds. It doesn´t have to wait for memory, as every core is capable to switch to the non-stalled thread of of four threads. Now take a programm of 100 commands, and you have 10 executions of it with data reaching you at the same time: Virtual processor A needs 3000 coreseconds for it. Virtual Processor B needs 3000 coreseconds. Albeit three times as fast per core, it needs the same time. Now factor in the number of cores: Processor A needs 750 seconds to execute the workload, processor B needs only 375 seconds, albeit having only a one-third of the speed (And this thought game is beautified for in favor of the 4 core processor) With millions of clock cycles per second, this may sound as this factor isn´t important. But: Programs are longer than just 100 commands today ;) And this latencies are relevant: The obvious example is the webserver. And an not-so-obvious application is the software on trading systems for the stock market. First come, first serve. And first come may be important when the next sub-prime ad-hoc reports hit the markets ;) The clock speed of a processor is a missguiding benchmark. Sometimes even the execution speed of a single thread isn´t an indicator for sufficient performance. It hides the fact that performance is at least a twodimensional game out of latency and execution speed for a given workload. Okay, there are pathological workloads with a single thread doing all the work (for such tasks. But this workloads are exactly this: pathological. A quickly hacked database loader. Some reporting stuff (but again, one report may be slower, but whats with 16,32 or 256 reports in parallel?). And besides of this: From my experience a competitors claim “Almost all Application are Singlethreaded” is simply not true. Out of all this reasons VF is more important than Rock. Rock will give us amazing capabilities to give us a competive edge for the future (Rock is just the beginning, you´ve ain´t seen nothing), but for the near future Victoria Falls is much more interesting: We can address the single-thread performance market with current and future APL systems, but Victoria Falls open us a completly new and huge volume server market.