df considered problematic
I could summarize this article in a short sentence: Don’t use df
But i want to explain this to you in a slightly detailed manner. It’s not that way, that df -k
delivers wrong data. But df
is based on a number of assumption, that are wrong in conjunction with data services like deduplication or compression. So you have to interpret the data differently.
The more interesting question at this point is “Do your scripts interpret the numbers correctly?”. I know there are countless scripts at customers out there monitoring disk space by using df
that may need a rethought when used with modern filesystems like ZFS. So: What’s the problem with df
and what are the alternatives?
Magical harddisk enlargement
Let’s just assume you have a ZFS system with deduplication:
I’ve created a ZFS based on files as a backend store just to demonstrate you the effect. At first we have an empty ZFS filesystem.
Now we copy a file into our newly created filesystem. I’ve choosed the wireshark
binary. You can use any other file for this demonstration. It just should have a significant size. Showing the problem with small files is a little bit more tedious as you have to create more files.
We’ve copied a single file into the pool. Our disk is 219136 KByte in size. The file takes round about 2201 Kbyte. We have still 216860 KByte available. Looks consistent. Okay … let’s copy the file wireshark
a second time into the filesystem.
Deduplication just works as designed. The second file didn’t take roundabout 2.2 MBytes, it just took 6 Kbyte. The data was deduplicated by ZFS. But look at the used
column: 4380 Kilobytes used. Obviously the amount of data is counted twice. This is correct too, because from a file system perspective there are two files in the size of 2.2 MBytes and not one. When you separate the files onto separate pools or system, you would have two times the 2.2 MBytes. But now it’s getting interesting. Look at the second column, the “kbytes” column: This is the total size of the filesystem (free and used space). Normally the amount of kbytes in a filesystem doesn’t change. But we have now a bigger filesystem after copying a file into it. The size of it was increased by 2176 Kilobytes. Interestingly exactly the size of file deduplicated by ZFS.
But that’s not unreasonable as well. Let’s assume we have a three blocks disk. i write a single block on it. I have 2 blocks left. No i write the same block on the disk. With a normal filesystem you would use another block. Thus you would have 1 block left, and two blocks on disk. A deduplicating filesystem has two blocks on disk as well, but it has still two free blocks. The only way to make sense out of this situation is to increase the size of the filesystem thus showing the filesystem with a total size of 4 blocks.
That has certain impact to the monitoring of the system. So using the kbyte
column isn’t reasonable as it’s a moving target. used
doesn’t factor in deduplication. Nevertheless there is a way to get more reasonable data about the space consumption in your pool. It’s zpool list
:
This tool correctly shows the size, the allocated space and the free space.
Misguiding percentages
I didn’t talked about the capacity
column so far. I will now demonstrate to you how looking at the capacity column gives you a totally wrong impression. I wrote a small script just to copy a file again and again with a number appended to the filename. Of course this is the classic use case for deduplication: Let’s look at the pool after a hundred copies:
When you look at the capacity column you would assume that your pool is 51% filled. But we just copied the same file again and again. Was there a problem with the deduplication?
No, of course not. When you look at the avail
column you will see that despite of copying a 2.2 MByte file a hundred times, we just used 319 Kilobytes instead of round-about 220 MBytes. But from the perspective of the filesystem it’s filled with 222297 KBytes worth of data.
But: Instead of having 219136 KByte pool we have now a 438912 KByte pool. So we don’t have an almost empty disk as the block allocation by ZFS would suggest nor we have an full disk as a short calculation of the original size minus the size of the files copied in to the filesystem would suggest (or to be exact a disk a negative number of available blocks, as 219136 KBytes minus 222297 KBytes is -3161 KBytes). Empty would be totally wrong from filesystem perspective. So adding the used
to the avail
to get the size of disk (and to calculate the capacity column from there) is a reasonable design choice. BTW: I’m sure you are already aware of the explanation for the 51% used capacity.
Again zpool list
is a much better tool to get some insights to your system.
The 100 files just took 2.62 M after deduplication. The size of the pool is still the real one and the capacity calculation is more reasonable. Now let’s push this situation to the extreme. I’ve restarted the script to generate 1000 copies of the wireshark
file.
Well … 2.401.486 Kilobytes in a pool that was 219.136 KBytes initially. Due to the deduplication we’ve just used 11807 Kilobytes for this amount of data. Additional 11 MByte in a pool of 246 MBytes in size gives you capacity usage of 93% instead of 51% percent due to the way how this numbers are calculated. Again zpool list
gives you much more reasonable numbers.
Digging in the source
But why behaves this system this way? The reason is in the source code of df.c
: There is a function called adjust_total_blocks
starting at line 1224. The comment describe the basic problem:
When you look at the 1271 you will recognize how this total
value is calculated:
When you look into those properties you query those properties manually
So the total size is 209080832 plus 2459780608 = 2668861440 or 2606310 KBytes. Now remember the last df -k
output:
Now look at the line 1441 of df.c
:
Let’s compute this manually - 2459780608/2668861440*100+0.5
results to 92.6%. XCU and POSIX.2 mandate the rounding to the next integer: 93%. So you see, the output of df
is perfectly correct. As i stated in the beginning, there are some assumption in df and one of them is the assumption that the value used
refers to the physically used capacity on the media. But with deduplication this is a wrong assumption, as the physically used capacity may be less than the capacity shown by the filesystem (or let’s say: logically used capacity).
Conclusion
I hope i was able to give you an example, why you shouldn’t use df
with ZFS. Or when you use it with ZFS: Know the informative value of this numbers. But in most cases, zpool list
gives you the numbers that you really want.
I know - some people would say “Modify df!”. But can you really do this? df
looks at the data from the filesystem perspective. But where should count a block that is referenced by multiple filesystems in a single pool? Difficult question. The most reasonable answer is: To any of the filesystems. A flattened (reduplicated) version of any deduplicated version would really use the amount of data designated by ZFS_PROP_USED
. The same is valid for deduplication in a single filesystem. The latent size of a filesystem is the flattened, reduplicated size, not the deduplicated one. From my point of view the number presented by df
are perfectly correct in the design goals of df
, but they are incorrect in the way, many people use df
.
zpool list
looks from the pool perspective and doesn’t have all this problems posed by the filesystem perspective. It just looks for allocated blocks when it tells you the allocated space.
By the way, you could say, this is an absolutely unrealistic use case. Really? Let’s consider you store Windows desktop images on your fileserver for usage with your favourite virtualisation tool. You have a thousand desktops and all desktops are relatively equal (All use Windows 7 for example). Then you have a vast amount of duplicates, which would be deduplicated by ZFS. It’s pretty much the same like with the wireshark
file just with much larger files :)