Forrester about the future of storage

A few days ago there were a raging discussion in my blog with a reader about the future of storage. The discussion ignited about my text about the waning importance of storage array controllers.. I’m argued that we will see different storage concepts in the future, whereas the reader was pretty sure, that the storage controllers will play the same role in the future than today and that’s just Sun who thinks differently (at least that was my understanding of his comments) Yesterday i found the following article of the Storageractice of Sun Taiwan: They cite in 你還需要儲存區域網路嗎 a study of Forrester about the future of storage. My colleagues in Taiwan posted a single slide of this document. And the slide is really interesting, as it supports some of my arguments in regard of the future development of storage. It looks like that Forrester predicts the same development of application specific storage mechanisms. Forrester predicts, that storage will go from SAN-centric infrastructure to an application centric structure, from a networked to an clustered, directed attached interconnect. You can find the Forrester research document at their website: Do You Really Need A SAN Anymore?. But i have to warn you … they want $499 for this document … hefty price for something that is pretty obvious. That matches my prediction for the future. Some of these developments are already visible. For example: When you use Hadoop (or a follow-on development) for storing and processing your data, there is no need for high end storage. The same for pNFS which follows the same “clustered, direct attached” approach to storage like Hadoop. The Windows fraction develops pCIFS concepts and from my perspective the Local Continuous Replication of MS Exchange opens up this notorious IOPS eater for similar concepts. Just to make it clear: I’m pretty sure that large storage arrays will be part of the IT for a long time. They have their advantages. But easier management isn’t one of them when the storage is managed by the application. At the end you don’t want to manage your storage for its own sake, you want to run an application. When your application is able to manage it’s own storage, centralized management looses some of it’s appeal.
Furthermore: I’m pretty sure that the vendors of large storage arrays will be hit harder than the vendors of large server systems. Computing knows a vast class of problems that can’t be parallelized. That’s the reason why Sun, HP and IBM still sells a large amount of this high-end SMP or NUMA servers with hundreds of cores with extremely fast interconnects. Those servers are used for this class of problems with such a large need of communication between the compute nodes, that sharding isn’t feasible because of Amdahl’s law. But: Storage itself isn’t such a problem. Delivering data to a server is a problem of the “Embarrassingly parallel” class. The participating nodes doesn’t have to communicate to each other to deliver blocks, files or objects and it’s easy to separate it on several servers: A simple form of such a separation could be done by providing storage by a block level protocol on a network and using RAID10 for the separation on several servers. The block A on server A isn’t a different one just because you read block B on Server B. There is no technological need for such large storage arrays that mandates their use because of performance. It’s just the easier management of the storage that has some appeal. But when we get rid of the management of storage because of applications that manage their storage optimally for their own needs ? What justifies the expense for large storage arrays? And you have to keep in consideration: The management of large storage arrays isn’t really simple when you have an application that makes nice things problematic. For example snapshotting a database: You can’t just tell your storage “snapshot now”. Albeit you would get a nice snapshot of your storage, you don’t have a snapshot of your database, at least not necessarily a consistent one. You have to orchestrate your database with you snapshotting function of your storage by scripts. We are just accustomed to this step, as we have to do it for quite a while now. Microsoft introduced the VSS hack to control arrays from the application because people just want to press a button in the application. So at the end the management isn’t done by the storage. It’s done by the application. Centralized storage isn’t easy to manage. You have to configure it optimally for your application and sometimes you have even some conflicting optimization targets between your applications. It needs specialists knowhow to tune storage to the needs of an application. Is RAID-1 better than RAID-5 for this application. Do I need short-stroking or not. Can I use wide stripes? Or do I have to use several smaller stripes in conjunction with RAID0? When you take all this into consideration, wouldn’t it be easier to just to provide storage as a bunch of blocks, files or objects and leave it to the application (pNFS, CIFS,Database, File server et. al) to manage and to optimize automatically by the application and not by an artificial point of management in between that isn’t really aware of the inner workings of the hard disks and isn’t really aware of the inner workings of the application? Okay, this text got already longer than I intended, but when I’m looking at the single slide and at the table of content of the document offered by the Forrester Website, it looks like I’m not the only one who thinks that the storage world will look pretty much differently in the future (The difference: You get my opinions for free ;) ). We have to think about it and perhaps we have to throw some perceptions into the bin. PS: No … I’m not able to speak Chinese, I’m just able to use Google Translate.