Creating a zpool configuration out of a bunch of F40/80 cards

In a some customer situation i’m using a number of Oracle Sun Flash Accelerator F40 PCIe Cards</code> or F80 PCIe cards to create flash storage areas inside a server. For example i had 8 F40 cards in a server by using a SPARC M10 and a PCIe Expansion Box which enables you to connect up to 11 F40/F40 cards per expansion box. The configuration with 8 F80 cards for example is a configuration i’m using on very special occasions and for special purposes, in this case it was a self-written application of a customer needing a lot flash storage inside the server. I won’t disclose more. On the other side: I’m sizing quite frequently systems with two F80 cards for “separated ZIL” purposes . Either if you use the SSD as data storage or as separated ZIL: When you do mirroring you have to ensure that mirrors are not using mirror halves on the same card. From the systems perspective you see four disk devices per F40/F80 card with 100 respective 200 GB capacity per disk and thus you can just add them to your zpool configuration. However configuring the system was a little bit unpractical. The problem: It’s not that easy to create a configuration that ensures that no mirror has it’s two vdevs on a single F40/F80 card. Perhaps there is an easier way, however I didn’t found it so far. It’s a little bit hard to find acceptable disk pairs when you are looking on PCI trees like /devices/pci@8000/pci@4/pci@0/pci@8/pci@0/pci@0/pci@0/pci@1/pci@0/pci@1/pciex1000,7e@0/iport@80:scsi. Well, at two cards it’s not that hard, but still not a nice job. After doing this manually a few times, i thought that at 8 or 22 cards doing this manually is a job for people who killed baby seals, listen to Justin Bieber or equivalent horrible things. But i didn’t committed to such crimes and this problem is nothing that a little bit of shell-fu can’t solve. You can do it in a single line of shell. Well … a kind of a single line of shell. You see the F40/F80 cards in output of cfgadm -alv like this.

<small>unavailable  scsi-sas     n        /devices/pci@8000/pci@4/pci@0/pci@8/pci@0/pci@0/pci@0/pci@1/pci@0/pci@1/pciex1000,7e@0/iport@80:scsi<br />
c13::w9999999999993b58,0       connected    configured   unknown    Client Device: /dev/dsk/c0t9999999999993B58d0s0(sd27)</small>

In most of my examples outputs i will just show a single line. The commands produce obviously a lot more output lines. The single line is just used to show what the command is doing. At first we have to solve a problem. The output for each device is divided into two lines. We have to concatenate them. Piece of cake: cfgadm -alv| awk '!(NR%2){print p" "$0}{p=$0}'. This results in

<small>c13::w9999999999993b58,0       connected    configured   unknown    Client Device: /dev/dsk/c0t9999999999993B58d0s0(sd27) unavailable  disk-path    n        /devices/pci@8000/pci@4/pci@0/pci@8/pci@0/pci@0/pci@0/pci@1/pci@0/pci@1/pciex1000,7e@0/iport@80:scsi::w9999999999993b58,0</small>

Believe me … every two lines are now joined into a single one. I’m just interested in the lines containing /dev/dsk. So i just keep them by cfgadm -alv | awk '!(NR%2){print p" "$0}{p=$0}' | grep "dev/dsk" When i do a word count i will see 32 device. Looks like i have found my 32 solid state disks:

<small>cfgadm -alv | awk '!(NR%2){print p" "$0}{p=$0}' | grep "dev/dsk" | wc -l<br />
      32<br />
</small>

At first I have to get rid of the unnecessary spaces in the output lines.

<small>$ cfgadm -alv | awk '!(NR%2){print p" "$0}{p=$0}' | grep "dev/dsk" | tr -s " "<br />
c13::w9999999999993b58,0 connected configured unknown Client Device: /dev/dsk/c0t9999999999993B58d0s0(sd27) unavailable disk-path n /devices/pci@8000/pci@4/pci@0/pci@8/pci@0/pci@0/pci@0/pci@1/pci@0/pci@1/pciex1000,7e@0/iport@80:scsi::w9999999999993b58,0<br />
</small>

There are a lot of unnecessary fields in this input. I will just keep the ones I need - the eleventh and the seventh column containing the disk device and the position in the /devices tree:

<small>$ cfgadm -alv | awk '!(NR%2){print p" "$0}{p=$0}' | grep "dev/dsk" | tr -s " " | awk '{print $11,$7;}'<br />
/devices/pci@8000/pci@4/pci@0/pci@8/pci@0/pci@0/pci@0/pci@1/pci@0/pci@1/pciex1000,7e@0/iport@80:scsi::w9999999999993b58,0 /dev/dsk/c0t9999999999993B58d0s0(sd27)<br />
</small>

Now I will sort them. The idea is that when you sort the devices by the string representing the position in the /devices tree, the devices on the same card are directly following each other in packs of 4. Okay, the /device tree has done it’s job after sorting. Let’s get rid of it.

<small>$ cfgadm -alv | awk '!(NR%2){print p" "$0}{p=$0}' | grep "dev/dsk" | tr -s " " | awk '{print $11,$7;}' | sort -k 1 | awk '{print $2;}'<br />
/dev/dsk/c0t9999999999993B58d0s0(sd27)<br />
</small>

The s0 and the (sdxx) part must be cut away … i will to this in a single step. i define s0 as the delimiter and keep just the stuff before the delimiter.

<small>$ cfgadm -alv | awk '!(NR%2){print p" "$0}{p=$0}' | grep "dev/dsk" | tr -s " " | awk '{print $11,$7;}' | sort -k 1 | awk '{print $2;}' |  awk -F "s0" '{print $1}'<br />
/dev/dsk/c0t9999999999993B58d0<br />
</small>

The /dev/dsk/ is unnecessary … so just remove it as well.

<small>$ cfgadm -alv | awk '!(NR%2){print p" "$0}{p=$0}' | grep "dev/dsk" | tr -s " " | awk '{print $11,$7;}' | sort -k 1 | awk '{print $2;}' | awk -F "s0" '{print $1}' | sed 's/\/dev\/dsk\///'<br />
c0t9999999999993CA8d0<br />
c0t9999999999991B7Cd0<br />
c0t9999999999994478d0<br />
c0t99999999999934B8d0<br />
c0t9999999999992748d0<br />
c0t999999999999273Cd0<br />
c0t9999999999993B5Cd0<br />
c0t9999999999993B58d0<br />
c0t99999999999980ECd0<br />
c0t9999999999997CE0d0<br />
c0t9999999999998548d0<br />
c0t999999999999860Cd0<br />
c0t9999999999991A6Cd0<br />
c0t9999999999991C7Cd0<br />
c0t9999999999991A84d0<br />
c0t9999999999991AA4d0<br />
c0t9999999999992CE4d0<br />
c0t9999999999992398d0<br />
c0t9999999999992358d0<br />
c0t9999999999993D98d0<br />
c0t999999999999801Cd0<br />
c0t9999999999998030d0<br />
c0t9999999999998038d0<br />
c0t9999999999998034d0<br />
c0t9999999999997DE8d0<br />
c0t9999999999997DFCd0<br />
c0t9999999999993288d0<br />
c0t9999999999994500d0<br />
c0t99999999999940F4d0<br />
c0t9999999999993F38d0<br />
c0t999999999999358Cd0<br />
c0t99999999999935B8d0<br />
</small>

Now we have to bring this devices in a different sequence to form mirror pairs. As already stated: The assumption is that due to the sort we did before each four lines are representing the devices that are placed on the same card. So when the both mirror halves are 4 lines apart on the list before, it’s on a different card. In order to process all 32 devices, i’m using the following construct:

<small>$ cfgadm -alv | awk '!(NR%2){print p" "$0}{p=$0}' | grep "dev/dsk" | tr -s " " | awk '{print $11,$7;}' | sort -k 1 | awk '{print $2;}' | awk -F "s0" '{print $1}' | sed 's/\/dev\/dsk\///' | tee >(awk 'NR%4==0' > 0.out) >(awk 'NR%4==1' > 1.out) >(awk 'NR%4==2' > 2.out ) >(awk 'NR%4==3 ' > 3.out) >> /dev/null ; cat 0.out 1.out 2.out 3.out<br />
c0t99999999999934B8d0<br />
c0t9999999999993B58d0<br />
c0t999999999999860Cd0<br />
c0t9999999999991AA4d0<br />
c0t9999999999993D98d0<br />
c0t9999999999998034d0<br />
c0t9999999999994500d0<br />
c0t99999999999935B8d0<br />
c0t9999999999993CA8d0<br />
c0t9999999999992748d0<br />
c0t99999999999980ECd0<br />
c0t9999999999991A6Cd0<br />
c0t9999999999992CE4d0<br />
c0t999999999999801Cd0<br />
c0t9999999999997DE8d0<br />
c0t99999999999940F4d0<br />
c0t9999999999991B7Cd0<br />
c0t999999999999273Cd0<br />
c0t9999999999997CE0d0<br />
c0t9999999999991C7Cd0<br />
c0t9999999999992398d0<br />
c0t9999999999998030d0<br />
c0t9999999999997DFCd0<br />
c0t9999999999993F38d0<br />
c0t9999999999994478d0<br />
c0t9999999999993B5Cd0<br />
c0t9999999999998548d0<br />
c0t9999999999991A84d0<br />
c0t9999999999992358d0<br />
c0t9999999999998038d0<br />
c0t9999999999993288d0<br />
c0t999999999999358Cd0<br />
</small>

Okay, let’s do a quick check if everything went right:

<small>cfgadm -alv | awk '!(NR%2){print p" "$0}{p=$0}' | grep "dev/dsk" | tr -s " " | awk '{print $11,$7;}' | sort -k 1 | awk '{print $2;}' | awk -F "s0" '{print $1}' | sed 's/\/dev\/dsk\///' | tee >(awk 'NR%4==0' > 0.out) >(awk 'NR%4==1' > 1.out) >(awk 'NR%4==2' > 2.out ) >(awk 'NR%4==3 ' > 3.out) >> /dev/null ; cat 0.out 1.out 2.out 3.out | xargs -I {} grep {} list.devices | awk '{print $1;}' | awk 'ORS=NR%2?RS:RS RS' | sed 's#/devices/pci@8000/pci@4/pci@0/pci@8/pci@0/pci@0/pci@0/pci@1/pci@0#..#'<br />
../pci@0/pciex1000,7e@0/iport@80:scsi::w99999999999934b8,0<br />
../pci@1/pciex1000,7e@0/iport@80:scsi::w9999999999993b58,0
../pci@10/pci@0/pci@0/pciex1000,7e@0/iport@80:scsi::w999999999999860c,0<br />
../pci@10/pci@0/pci@1/pciex1000,7e@0/iport@80:scsi::w9999999999991aa4,0
../pci@10/pci@0/pci@10/pciex1000,7e@0/iport@80:scsi::w9999999999993d98,0<br />
../pci@11/pci@0/pci@0/pciex1000,7e@0/iport@80:scsi::w9999999999998034,0
../pci@11/pci@0/pci@1/pciex1000,7e@0/iport@80:scsi::w9999999999994500,0<br />
../pci@8/pciex1000,7e@0/iport@80:scsi::w99999999999935b8,0
../pci@0/pciex1000,7e@0/iport@10:scsi::w9999999999993ca8,0<br />
../pci@1/pciex1000,7e@0/iport@10:scsi::w9999999999992748,0
../pci@10/pci@0/pci@0/pciex1000,7e@0/iport@10:scsi::w99999999999980ec,0<br />
../pci@10/pci@0/pci@1/pciex1000,7e@0/iport@10:scsi::w9999999999991a6c,0
../pci@10/pci@0/pci@10/pciex1000,7e@0/iport@10:scsi::w9999999999992ce4,0<br />
../pci@11/pci@0/pci@0/pciex1000,7e@0/iport@10:scsi::w999999999999801c,0
../pci@11/pci@0/pci@1/pciex1000,7e@0/iport@10:scsi::w9999999999997de8,0<br />
../pci@8/pciex1000,7e@0/iport@10:scsi::w99999999999940f4,0
../pci@0/pciex1000,7e@0/iport@20:scsi::w9999999999991b7c,0<br />
../pci@1/pciex1000,7e@0/iport@20:scsi::w999999999999273c,0
../pci@10/pci@0/pci@0/pciex1000,7e@0/iport@20:scsi::w9999999999997ce0,0<br />
../pci@10/pci@0/pci@1/pciex1000,7e@0/iport@20:scsi::w9999999999991c7c,0
../pci@10/pci@0/pci@10/pciex1000,7e@0/iport@20:scsi::w9999999999992398,0<br />
../pci@11/pci@0/pci@0/pciex1000,7e@0/iport@20:scsi::w9999999999998030,0
../pci@11/pci@0/pci@1/pciex1000,7e@0/iport@20:scsi::w9999999999997dfc,0<br />
../pci@8/pciex1000,7e@0/iport@20:scsi::w9999999999993f38,0
../pci@0/pciex1000,7e@0/iport@40:scsi::w9999999999994478,0<br />
../pci@1/pciex1000,7e@0/iport@40:scsi::w9999999999993b5c,0
../pci@10/pci@0/pci@0/pciex1000,7e@0/iport@40:scsi::w9999999999998548,0<br />
../pci@10/pci@0/pci@1/pciex1000,7e@0/iport@40:scsi::w9999999999991a84,0
../pci@10/pci@0/pci@10/pciex1000,7e@0/iport@40:scsi::w9999999999992358,0<br />
../pci@11/pci@0/pci@0/pciex1000,7e@0/iport@40:scsi::w9999999999998038,0
../pci@11/pci@0/pci@1/pciex1000,7e@0/iport@40:scsi::w9999999999993288,0<br />
../pci@8/pciex1000,7e@0/iport@40:scsi::w999999999999358c,0<br />
</small>

Looks correct. No mirror pair has it’s device on the same card. Before you ask: I’ve generated the file list.devices beforehand with cfgadm -alv | awk '!(NR%2){print p" "$0}{p=$0}' | grep "dev/dsk" | tr -s " " | awk '{print $11,$7;}' > list.devices. Second check: None of the devices have been used twice in the configuration.

<small>cfgadm -alv| awk '!(NR%2){print p" "$0}{p=$0}' | grep "dev/dsk" | tr -s " " | awk '{print $11,$7;}' | sort -k 1 | awk '{print $2;}' | awk -F "s0" '{print $1}' | sed 's/\/dev\/dsk\///' | tee >(awk 'NR%4==0' > 0.out) >(awk 'NR%4==1' > 1.out) >(awk 'NR%4==2' > 2.out ) >(awk 'NR%4==3 ' > 3.out) >> /dev/null ; cat 0.out 1.out 2.out 3.out  | sort | uniq -c | grep -v "1 " | wc -l<br />
       0</small>

Okay … now i have to transform this into a zpool configuration or at least as a easily cut n’ pasteable part of a zpool configuration. By the tasks we did before, it’s ensured that the device in the n-th line and the n+1 line are not on the same card. So transforming it into a zfs configuration is quite simple.

<small>$ cfgadm -alv | awk '!(NR%2){print p" "$0}{p=$0}' | grep "dev/dsk" | tr -s " " | awk '{print $11,$7;}' | sort -k 1 | awk '{print $2;}' | awk -F "s0" '{print $1}' | sed 's/\/dev\/dsk\///' | tee >(awk 'NR%4==0' > 0.out) >(awk 'NR%4==1' > 1.out) >(awk 'NR%4==2' > 2.out ) >(awk 'NR%4==3 ' > 3.out) >> /dev/null ; cat 0.out 1.out 2.out 3.out | awk '{if ((NR % 2) == 1) printf("mirror "); print; }' | tr -s "\n" " "<br />
mirror c0t99999999999934B8d0 c0t9999999999993B58d0 mirror c0t999999999999860Cd0 c0t9999999999991AA4d0 mirror c0t9999999999993D98d0 c0t9999999999998034d0 mirror c0t9999999999994500d0 c0t99999999999935B8d0 mirror c0t9999999999993CA8d0 c0t9999999999992748d0 mirror c0t99999999999980ECd0 c0t9999999999991A6Cd0 mirror c0t9999999999992CE4d0 c0t999999999999801Cd0 mirror c0t9999999999997DE8d0 c0t99999999999940F4d0 mirror c0t9999999999991B7Cd0 c0t999999999999273Cd0 mirror c0t9999999999997CE0d0 c0t9999999999991C7Cd0 mirror c0t9999999999992398d0 c0t9999999999998030d0 mirror c0t9999999999997DFCd0 c0t9999999999993F38d0 mirror c0t9999999999994478d0 c0t9999999999993B5Cd0 mirror c0t9999999999998548d0 c0t9999999999991A84d0 mirror c0t9999999999992358d0 c0t9999999999998038d0 mirror c0t9999999999993288d0 c0t999999999999358Cd0<br />
</small>

Now you can use it as you want for creating for the zpool itself or for the separated log for a zpool. There is one thing this script can’t do: When you have multiple PCI Expansion boxes with F40/F80 cards the configuration generated by the shell-fu doesn’t ensure that the both mirror are in different PCIe Expansion boxes. But hey, there is some stuff i have to keep for myself ;)