ZFS Deduplication features in PSARC
There are two really interesting features for Solaris in the PSARC process. The first feature is the PSARC case “2009/571 - ZFS Deduplication Properties”. The description of the properties gives some interesting insight into the feature:
- the mechanism is based on checksums to find duplicates</a>
- selectable checksum mechanism
- you can switch off the verify of deduplication - to check if a duplicate is really a duplicate and not just a hash collision - when using a strong checksum mechanism like sha256
- There is a configurable number of blocks that gets deduplicated into a single block before keeping a second copy. This should protect you from the situation that you may lose 1000 files if you lose the block that is used as the "master block" for the deduplication. The default setting is to have a second block as the data source when more than 100 blocks were deduplicated to a single block. </ul> </noautobr>Another interesting PSARC is the "2009/557 - ZFS send dedup" case. This case describes the changes to the stream generated by
zfs send
to support deduplication on this stream of data. Due to this PSARC blocks just have to be transmitted once in a stream. Interesting point: You can use stream deduplication to reduce transfer time without using deduplication on disks and the feature uses the already computed checksums of the data on disk for deduplication when the choosen mechanism is cryptographically strong enough.