Poor mans Fault Tolerance
I should cook tea the whole day. I had an idea what we could do with all this threads in a 4 socket Victoria Falls system while waiting for the water to boil: Let´s build a poor mans fault tolerant system out of it. It´s an unrefined idea … Let´s assume you have to control something: For example a valve in a chemical plant. It´s an important valve, thus you can´t afford a fault (for example by cosmic rays flipping a bit). Thus it´s a common method to compute something multiple times and compare the result. Two computations isn´t enough. When the results differ, you can´t tell what result is the right one. Or better: What´s the result with the highes probability to be correct. The practice suggests to compute it three times and to compare it afterwarts. The result with at least two votes of three in this quorum wins. You could implement every decision making instance in an LDOM. Thus you could even implement different operating system patch levels in each of the systems. And this fits with the four procs of a fully blown Victoria Falls system . Three procs for computations and one procs for the comparator. To ensure that every process runs on the same core it was tested, you could bind it to a processor. This would reduce the thread count to 64 effective processors, but you have implemented a poor-mans fault-tolerant system (poor man because of no hardware lockstepping, it would be just application lockstepping) and well, you have more than enough hardware threads in a Victoria Falls system … I will think a little bit more about it in the next few days….