Over the years I did many, many presentations. Whenever talking with the customers afterwards about what they would like to see in ZFS, there was one feature that was always mentioned: removing devices. While it was no problem, for example, to remove a member disk of a mirror, you couldn’t remove a top-level vdev — you weren’t able to remove a mirror out of a stripe of mirrors. With Solaris 11.4 we finally have such a feature allowing you to do exactly this. It’s really easy to use, so if I only wanted to show this feature, this would be a rather short entry. However, I would like to shed some light on the mechanism behind it.
Preparing an Example
Let’s assume we have three devices and we have created a striped pool out of them:
root@batou:/# zpool create testpool c1t2d0 c1t3d0 c1t4d0
We create some files in it:
root@batou:/# cd testpool
root@batou:/testpool# mkfile 1g test1 test2 test3 test4 test5 test6
Let’s now check the structure of the pool. For this I’m using the zdb -L command. The output is much longer than represented here.
root@batou:/testpool# zdb -L testpool
[...]
name: 'testpool'
[...]
hostname: 'batou'
vdev_children: 3
[...]
children[0]:
guid: 1209395020087258815
id: 0
type: 'disk'
path: '/dev/dsk/c1t2d0s0'
devid: 'id1,sd@SATA_____VBOX_HARDDISK____VB96a218f1-27200143/a'
phys_path: '/pci@0,0/pci8086,2829@d/disk@2,0:a'
[...]
children[1]:
guid: 5622741003370822611
id: 1
type: 'disk'
path: '/dev/dsk/c1t3d0s0'
devid: 'id1,sd@SATA_____VBOX_HARDDISK____VB9cc00131-8b8a0295/a'
phys_path: '/pci@0,0/pci8086,2829@d/disk@3,0:a'
[...]
children[2]:
guid: 12149574521403767327
id: 2
type: 'disk'
path: '/dev/dsk/c1t4d0s0'
devid: 'id1,sd@SATA_____VBOX_HARDDISK____VB5b29f40e-f9bc48b9/a'
phys_path: '/pci@0,0/pci8086,2829@d/disk@4,0:a'
[...]
capacity operations bandwidth —— errors ——
description used avail read write read write read write cksum
testpool 5.85G 41.8G 745 0 80.9M 0 0 0 0
/dev/dsk/c1t2d0s0 1.95G 13.9G 244 0 26.6M 0 0 0 0
/dev/dsk/c1t3d0s0 1.95G 13.9G 255 0 27.1M 0 0 0 0
/dev/dsk/c1t4d0s0 1.95G 13.9G 245 0 27.2M 0 0 0 0
[...]
We have 6 gigabytes worth of data, three devices, thus 2 gigabytes per device. Before you ask, I honestly don’t know why zdb -L shows no writes. Will check this. Now let’s remove one of the top-level vdevs.
Removing the Device
The removal process is really simple to trigger via the remove subcommand to zpool:
root@batou:/# zpool remove testpool c1t4d0
The device you want to remove then gets into REMOVING state:
NAME STATE READ WRITE CKSUM
testpool ONLINE 0 0 0
c1t2d0 ONLINE 0 0 0
c1t3d0 ONLINE 0 0 0
c1t4d0 REMOVING 0 0 0
After a while the device will disappear from the pool:
NAME STATE READ WRITE CKSUM
testpool ONLINE 0 0 0
c1t2d0 ONLINE 0 0 0
c1t3d0 ONLINE 0 0 0
In case you want to remove a top-level vdev in a mirror, you have to use the name of the top-level vdev. Let’s assume a pool consisting of two mirrors:
NAME STATE READ WRITE CKSUM
testpool ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
c1t2d0 ONLINE 0 0 0
c1t3d0 ONLINE 0 0 0
mirror-1 ONLINE 0 0 0
c1t4d0 ONLINE 0 0 0
c1t5d0 ONLINE 0 0 0
To remove the top-level vdev you have to address it by its name. In this case mirror-0:
root@sol114s1:~# zpool remove testpool mirror-0
Behind the Curtain
So how was this done by Oracle Solaris? Well, this is quite simple. It doesn’t really reorganize the data. The pool still has three devices after the change. You just don’t see the third one. When you check with zdb -L testpool you will see that the third device changed to:
children[2]:
guid: 14641473971126587410
id: 2
type: 'pseudo'
path: '$VDEV-9DA81B2EED2E2E37'
phys_path: 'testpool/$VDEV-9DA81B2EED2E2E37'
removing: 1
The third device has been substituted by a virtual device. This virtual device resides on the disks remaining in the pool. You can see it quite nicely in the output of zdb:
capacity operations bandwidth —— errors ——
description used avail read write read write read write cksum
testpool 6.03G 25.7G 3.55K 0 3.87M 0 0 0 0
/dev/dsk/c1t2d0s0 3.02G 12.9G 1.77K 0 1.92M 0 0 0 0
/dev/dsk/c1t3d0s0 3.02G 12.9G 1.76K 0 1.91M 0 0 0 0
$VDEV-9DA81B2EED2E2E37 2.00G 13.9G 20 0 28.9K 0 0 0 0
There is still a third device with 2 GB worth of data, but more interestingly, the remaining devices have now taken over the data, as indicated by the increased used column for both devices. As long as the data isn’t changed, it will stay on this virtual device. Please note that the system isn’t simply blocking the full size of the vdev on disk — it only reserves space for the actual data.
Let’s now delete everything in the pool by issuing an rm /testpool/* command:
capacity operations bandwidth —— errors ——
description used avail read write read write read write cksum
testpool 499K 31.7G 460 0 2.85M 0 0 0 0
/dev/dsk/c1t2d0s0 316K 15.9G 203 0 1.51M 0 0 0 0
/dev/dsk/c1t3d0s0 184K 15.9G 248 0 1.22M 0 0 0 0
$VDEV-9DA81B2EED2E2E37 6.50K 15.9G 9 0 119K 0 0 0 0
The consumption has been significantly reduced. Let’s now recreate our data files:
root@batou:/testpool# mkfile 1g test1 test2 test3 test4 test5 test6
After this you will see the following output in the zdb -L output:
capacity operations bandwidth —— errors ——
description used avail read write read write read write cksum
testpool 6.00G 25.7G 2.54K 0 194M 0 0 0 0
/dev/dsk/c1t2d0s0 3.00G 12.9G 1.31K 0 96.7M 0 0 0 0
/dev/dsk/c1t3d0s0 3.00G 12.9G 1.22K 0 97.7M 0 0 0 0
$VDEV-9DA81B2EED2E2E37 6.50K 15.9G 9 0 119K 0 0 0 0
The virtual device isn’t used for new writes. However, all reads for the removed disk’s data are now serviced by the virtual device, which means by proxy through the remaining disks. The virtual device doesn’t receive any new data. So over time, as you change the data in your pool, the virtual device won’t be used anymore. Of course, when the data is static and you never change it, it won’t be migrated off the vdev.
When you add a new device, it won’t substitute the virtual device acting as the third device:
root@batou:~# zpool add testpool c1t4d0
You will see a pool with four devices instead:
capacity operations bandwidth —— errors ——
description used avail read write read write read write cksum
testpool 6.00G 41.6G 1.55K 0 4.07M 0 0 0 0
/dev/dsk/c1t2d0s0 3.00G 12.9G 771 0 1.64M 0 0 0 0
/dev/dsk/c1t3d0s0 3.00G 12.9G 774 0 1.62M 0 0 0 0
$VDEV-9DA81B2EED2E2E37 6.50K 15.9G 9 0 119K 0 0 0 0
/dev/dsk/c1t4d0s0 21.0K 15.9G 38 0 708K 0 0 0 0
Conclusion
After quite some time, ZFS finally has the ability to remove top-level vdevs. I think that will reduce a lot of questions from now on in presentations.