Less known Solaris Features: SamFS - Part 3: The jargon of SamFS

As usual, this technology has it´s own jargon. Thus i will start to define the most important words at first.

The circle of live

Before defining the jargon, it´s important to understand, that every file under the control of SamFS follows a certain lifecyle. You create or modify it, the system archives it, after a certain time without an access the system removes it from expensive storage, when it has copies on cheaper ones, when you access it, it will be gathered from the cheaper storage and delivered to you. When you delete it, you have to remove it from all your medias. This cycle is endless until a file is deleted.

Policies

Albeit every file is under the control of the described cycle, the exact life of a file doens´t have to be the same for every file. SamFS knows the concept of policies to describe the way, SamFS should handle a file. How many copies should SamFS make of a file on which media. The most difficult task of configuring SamFS is to find a most adaequate policy. You need experience for it, but it´s something that you can easly learn on the job.

Archiving

Okay, the first step is archiving. Let´s assume you´ve created a file. The data get´s stored into the SamFS filesystem. Okay, but you´ve defined a policy, that you want two copies on a tape media. The process that does this job ist called archiver, the process itself is called archiving. Archiving moves your files to the desired media. The metadata of the files is augmented with the positions of the file. SamFS can create up to 4 copies of a file. Important to know: SamFS doesn´t wait with the archiving process until it needs space on the cache media. It starts the process of archiving files with the next run of the archive (for example every 5 minutes)

Releasing

Okay, let´s assume you filesystem is 90% full. You need some space to work. Without SamFS you would move around the data manually. SamFS works similar and differently at the same time. The archiver already moved your data to different places. Thus releasing is the process to delete the data from your filesystem. But it doesn´t delete all of it. It keeps a stub of it in the filesystem. This process is called releasing. The metadata (filename, acl, ownership, rights, and the start of the file) stays on disk. Thus you won´t see a difference. You can walk around in your directories and you will see all your files. The difference: The data itself isn´t in the filesystem anymore, thus it don´t consume space in it.

Staging

Okay, after long time (the file was already released) you want to access the data. You go into the filesystem, and open this file. SamFS intercepts this call, and automatically gathers the data from the archive media. In the meantime the reads from this file will be blocked, thus the process accessing the data blocks, too. SamFS uses informations from the metadata to find the media.

Recycling

Okay, the end of the lifetime of a file is it´s deletion. That´s easy for disks. But you can´t delete a single file from tape in an efficient manner. Thus SamFS uses a different method: The data on the tape is just marked as invalid, the stub get´s deleted. But the data stays on tape. After a while more and more data may get deleted from tape. This may end in a swiss cheese wher only a small amount of data is actual data. This would be waste of tape and the access pattern get´s slower and slower. Reycling solves this by a single trick. The residual active data gets a special marker. When the archiver runs the next time, the data get´s archived again. Now there is no actual data left on the tape. You can erase it by writing a new label to it and you can use it for new data again. This process is called recycling.

The circle of life

Okay, with this jargon we can draw a picture of this processes.


Once a file gets newly written or updated, it gets archived. Based on a combination policies, usage and the caching strategy it´s possible it´s getting released and staged again and again. And at the end, the tape with the data will be recycled.

Watermarks

Watermarks are an addtional, but very important concept in SamFS. The cache is much smaller than the filesystem . Nevertheless you have to provide space for new and updated data. So SamFS implements two important watermarks: Then the cache gets filled to the high watermark, the system starts to release the least recently used files with a minimum number of copies on archive media automatically. This process process stops, when the low water mark is reached. Thus you can ensure that you have at least a certain amount of free capacity to store new or updated data in the filesystem.

The SamFS filesystem: Archive media

When used in conjunction with the Archiver/Stager/Releaser construct, the SamFS filesystem itself isn´t much more than a cache for all the data you store in this filesystem. Not the size of the SamFS filesystem is decisive for the size of your file system, the amount of archive media is the limitation of the size. For example. With a 1 GB disk cache and 10 petabyte of T10000 tapes, you can store up to 5 petabyte of data. Why 5 petabyte? Well, it´s a best practise to store two copies of every file on your system, just in case a tape gets lost or damaged. Archive media can be of different nature:

The media doesn´t even have to be in reach of an autoloader. SamFS knows the concept of offlined archive media, for example tapes in a safe. When you try to access data on an offlined media, the accessing process blocks and the admin is notified to move it´s a.. to put the tape into a drive.