Raisins, Storage, Solaris and the X4275
Some new systems were announced a few days ago. All are Nehalem based. And all of them have some special features seperating them from the competition. But this isn´t the point of this article.
I find one system the most interesting: Really long ago i´ve speculated about the usage of internal storage for other systems to provide a SAN without conventional storage , at this time we´ve just talked about iSCSI and about system with 2.5” disks. Now look at the X4275. The system with 12 3,5” disks on 2 rack units.
This is exactly the same amount of storage density you can get with a J4200 (and to be exact the maximum density you can reach on one rack when you just use front accessible disks).So you have a rack full of servers and a rack full of storage in a single rack. You just have to make the storage in one server usable to other servers. And here Solaris comes into the game: We have several opportunities to do so: The most obvious iSCSI. With the FC target in COMSTAR, Fibrechannel of course. But we just want to span a rack wide network (or a few racks). Obviously iSER comes into mind on the foundation of QDR Infiniband. Or pNFS in the near future. You would have to dedicate one or two servers as meta data servers, but that´s just a rack unit at top and at the bottom of the rack. Or for special application Hadoop and the HDFS. While sitting in my train to Fulda another idea came into my mind. I´ve created it one year (or so) ago for the response to a RfP. At this time iSCSI was planed for the customers but with the integration of COMSTAR we have even more possibilities today. I´m sure this isn´t unique or original, as it´s too obvious:
- Take the storage of one system, create a single huge pool or several small pools out of it.
- Divide this pool in multiple emulated volumes
- Share this volumes with your prefered block level sharing protocol like iSCSI, iSER or FC target
- Add a number of small head nodes without harddisks but with fast I/O to it (like PCIe 2.0).
- This servers are called "Storage Nodes"
- Get access to emulated volumes on the data server via iSCSI, iSER or FC initiator
- Create a volume on the head nodes out of this remote subvolumes
- Share this volume with your prefered block or filelevel protocol (the later after formating it with a filesystem)
- The server doing this task are called "Storage Head Nodes"
Sounds strange for a storagesystem. With the distribution of a volume on several subvolumes on different servers you get some interesting advantages: A full server can fail without problems (RAISIN = Redundant Array of Inexpensive Storage Integrating Nodes ;). Load balancing technologies can kick in for example channel bundling on Ethernet, that doesn´t work on a single stream) Well … what you get for it? Given that you use two X4170 as storage head nodes and X4275 as storage nodes you would get:
- 20*12*1TB=504 TB raw in a rack and 2 TB per disk thus 1008 TB per rack is coming.
- more than enough horsepower to encrypt, compress, deduplicate and scrub in the storage system. Hey ... there is even a cool marketing buzzword incoming: "Multilevel deduplication", as you could deduplicate on the storage nodes as well as the storage head nodes. ;)
- 20*144=2880 GB second-level data cache (the cache in the storage nodes)
- 2*144=288 GB first-level data cache (the cache in the storage head nodes)
- When you need more CPU or I/O horsepower in the front end , just add a headnode, when you need more storage add more head nodes, when you need non-volatile caches or really fscking huge non-volatile caches ... well ... there are a veeeery interesting products in the pipe for announcement real soon now .
As you may have appreciated, we didn´t used special storage hardware or software so far. Just fairly standard x86 server and Solaris x86. You can (and must) build such a system on your own. But how hard is it to create a standard jumpstart server based on JET and create a script to put all disks of a new server into a pool and divide it in - let´s say 64 GByte chunks via ZFS emulated volumes and share them via your prefered protocol. Of course Nehalems are vastly oversized for such a task, but remember that i´ve talked about Hadoop at the beginning. You could keep a certain area of your ZFS pool for block level storage (let´s say a half terabyte), run Hadoop on your storage nodes (limit it with a processor set or with resource management to a certain amount of compute power) and use it for tasks you could do parallely without moving data around (like analysing logfiles for pageview, certain patterns or something like that or doing convert jobs). So you have a distributed storage/compute system in your storage area as well. It´s easy to forget when you are too focused on the Storage Appliances like the Sun Storage 7000 series, that general purpose hardware and general purpose OS for storage systems means, that you could run general-purpose software for providing data services as well on your system. With the Sun Storage 7000 the OpenStorage idea isn´t at an end … it´s just the beginning … and the more you think about the possibilities, the more you will find.