Storage Anarchist is dancing ballet

A while ago, the Storage Anarchist wrote a text about the usage of flash in various vendor strategies. I didn´t knew at first, what i should write about this, and well … Adam Leventhals blog-post was a first well-tempered reply. But i want to put some additional perspective to this article. At first … you have to take all the statements in the Storage Anarchists blog with a grain of salt. Vastly larger than the salt grain you need for this blog. I´m a technologist, so the amount of salt is much smaller. I have some self respect, thus i don´t do marketing ;) . I try to put a technological perspective to the stuff Sun does. And try to explain it, because some developments at Sun have components and ideas that aren´t obvious and need some explanations for people that doesn´t have the time to think about the newest idea of Sun. I´m glad about the fact that my boss leaves me the time to dig down into this technology. But i wander from the topic. The Storage Anarchist is an EMC marketing employee. So it´s not really wonder that he likes the strategy of EMC at most. I´m fine with that. Perhaps i would write the same in his position: EMC hasn’t really a good strategy to embed Flash in their arrays besides disks-out/ssd-in, but leaving the rest to application (respectively the middleware) but they were the first to offer them. Or to be exact … they offered STEC SSD at first time. Well … not really an archievement … plugging a 3,5” inch disk into a normal storage device looks pretty much obvious. And by the way: It isn´t even a good integration. A December 2008 manual of the EMC DMX-4 states, that a loop with a flash drive has to be configured on 2 GBit/s instead of 4 GBit/s. Furthermore EMC suggests to put not more than 2 SSD on a loop, dedicate a whole quadrant for them. This looks really like the consequences of putting extremly high speed disk drives that was designed to house a large number of high-rpm disk drives. You can use it, but it comes at a high price. In EMCs position i wouldn´t brag that loud about the availability of flash. But to keep the dancing metaphor: The Storage Anarchists article is one of the best ballet dancing events of this young century … he would even beat the best international ensembles with his text. He can write about Suns flash strategy without even touching the real advantages of flash (and obviously without touching any of the disadvantages of EMCs approach to flash). Additionally i think, he is too intelligent to be not aware of the advantages of ZFS, Hybrid Storage Pools and other resulting technologies. Thus his article is one of the finest pieces of marketing dance i saw in the recent time. At first i should explain something about this “Flash doesn´t belong in the storage array”. Let´s assume you have an SSD capable to do 40.000 IOPS. A second are 1.000.000 microseconds. Thus a single IOP takes 25 microseconds. The reality of storage area networks is: You have 5 hops on average from server to storage. Assume 2 microseconds per hop. This gives you 10 microseconds in total. Thus a IOP takes 35 microseconds instead of 25 seconds microseconds (1.000.000 microseconds divided by 35 microseconds). Thus you can execute 28571 IOPS. You just sacrified 11500 IOPS just to your latency. Over a quarter of performance just went downhill. That´s the very simple reason why we tend to say that SSD has to be as near as possible to the CPU to get optimal performance. Of course 28571 IOPS is still a large amount of performance. But is this really an excuse for not using a quarter of the performance thus not using a quarter of your money? And to look into the future: This gap gets wider when the SSDs get faster. Leaving performance somewhere is sometimes an unsolvable problem as technology can´t do better. But especially in this case, this isn´t the truth. The technology exists to yield this idle performance. It´s just not in EMCs realm. It´s in the realm of servers and server vendors. And there is the problem for companies like EMC. A problem bigger than the 11500 missing IOPS. One of the reasons for this large storage boxes is the performance of this boxes. When we increase the performance of the storage in the server by an intelligent augumentation, you don´t need this ultra-highperformance with giant caches with hundreds of PowerPC-CPUs on their line cards any more. You just need some trays to hold your rotating rust drivers. JBODS … and even those doesn´t have to be extremly speedy. Commodity technology. No big margins companies. Different business to the one EMC liked to do with their large storage systems. Same with the operating sytems of such large storage boxes. With the advent of commodity CPUs with an everincreasing core count and a high clock rate in conjunction with high speed I/O busses like PCI Express Gen2 and the like running standard operating sytems as a storage operating system is a feasible way to go. So no real differentiator in the future. But´s in the interest of such storage vendors to tell customers that it´s a good idea to use their expensive boxes with flash … you want to keep them alive as long. But who needs them when it´s cheaper and more performant to solve the problem in the server? At the end it´s the same problem we had with x86. Of course there are many good reasons for M9000. But for the same stuff we offered an E100000 in the past, we can offer an M5000 or sometimes a T5120 today. The same happens to storage vendors now and it´s good to see from the perspectice of Sun employee that Sun is one of vendors to bring this challenge to the storage vendors. I don´t really thing that there are prosperous times for the manufacturers of those large storage boxes ahead (And you might speculate about the reasons why a storage vendor purchased a virtualisation company ;) ) The problem is even worser: The large servers got faster and faster. An full-blown M9000 provides an extreme amount of computing power. There is an increase in performance. But rotating rust disk drives and the large arrays optimized for them won´t solve performance problems (besides putting more and more RAM into the boxes, but even this RAM is at the wrong place) anymore. It´s important to know … bandwidth is often not the problem … it´s the latency of I/O operations that ruins your day. And we can´t expect any technologies solving this problem with hard drives .Rotating rust technology made some big advancement in bandwidth by increasing the data density. The problem of this technology is the fact that there some physical limits like the sound barrier, the speed movement of head carriers, the rotational speed of the platters. Those parameters set the borders for maximum IOPS of a rotating rust drive. A PC is loud enough … you don´t want to add sonic booms from the head carrier ;) The next problem is the heating of the disk platters in the disc drives in regard of the friction with the air in the disk drive (not to mention the bearings). And you can´t suck the air out of disk drive as the the distance between head and platter is kept by an aerodynamic process. No air, no aerodynamics ;) But i have wandered from the topic. So i just explained why centralized SAN SSD isn´t a good idea. It just keeps a large part of your performance away from you. For SSD i just accept one exception from this rule “SSD in the Server”. In a cluster environment the write-biased sZIL has to be shared between the cluster nodes, as you it´s important that the surviving cluster node can replay the changes in the log after a cluster failover. But this distance between both cluster nodes and the ssd has to be as shortest as possible to get the best possible performance out of it. Of course i understand there are many reasons for using SAN storage. There may be even reasons to use SSD in SAN-Storage (for example using a storage array in a ship ;) ). The main reason for this the centralization of stored data. And this is where the the Hybrid Storage Pool of ZFS kicks in. It enables you to have both. Centralized storage and optimal usage of your SSD. L2ARC and sZIL enables you to place your SSD as near as possible to the CPU and leaving the primary storage in from of rotating rust as central as possible. Best of both worlds, one of the rare cases where you can eat the hen as well as the egg. Perhaps this is the point where Suns approach is different. As the Storage Anarchists correctly notes. There are several ways to integrate the upcoming SSD. Most of this approaches are just EMC disk-out/ssd-in way. IBM just build a large array of storage system containing special PCI-Express cards with Flash on it, EMC let you order some ultra expensive SSD instead of disk. The market of enterprise isn´t in a state where the competition already ironed out most ideas. ZFS gives us here an competitve advantage. Two realatively simple features integrated into Solaris enables us to use SSD in a sensible way as long SSDs aren´t as cheap and as big as rotating rust disk drives. This day will come, but it isn´t today. Perhaps we don´t need L2ARC and sZIL then as we can fall back to our old schemes then as the complete storage is flash then. But i´m sure technology will gives us an lower-capacity, higher-costs, higher-performance storage technolgy then as flash is it today, thus both will be an important technology in the future as well. And by the way: sZIL and L2ARC of ZFS solve another problem as well. The problem of the universal speed-limit: The light-speed in fibres giving you even higher latencies on longer-distance cables, not to talk about the additional latency of WAN protocols and WAN hardware. Regarding upcoming products: I can´t talk about upcoming products. But one thing i want to write in this blog: There are standards. 2,5” SATA/SAS disk drivers looks like a standard form factor to me for servers. We will use it. And at least for some servers in our portfolio there is already the Intro-Mail, introducing SSD, thus they will be available really soon now. And for other technologies: Well … don´t wait for long … there will be interesting annoucements soon. I really would like to write about them … but there is the official leak problem again. Well. That´s a different story. To close the article: SSD isn´t that easy as the Storage Anarchist wants to explain and perhaps believe. It isn´t that easy like just putting some flash drives in an otherwise unmodified storage box. You need some time to do it right. You need some new technologies or new thinking to do it right. Sun spend this time and now Sun has an advantage here. We may be later than EMC with our flash integration (not perfectly right, as you can use hybrid storage pools in Opensolaris for while now and Solaris 10 10/08 already contains the sZIL feature, you were just not able to purchase the SSDs for that features at Sun)… but it´s “flash done right” … for example in the Sun Storage 7000 series.