Playing with ZFS on Linux

ZFS is one of the most advanced local filesystems you could store your data on. For more details feel free to check the wikipedia page: https://en.wikipedia.org/wiki/ZFS

An online search for “install ZFS on centos 7” got me a few results … I selected this one: https://linuxhint.com/install-zfs-centos7/ which seems to be good enough

For Gentoo things are a bit easier … simply run emerge -uavDN zfs and magic will happen

Once installed, because it usually builds a kernel module, you’ll probably want to load that module so remember to modprobe zfs

Now that we’re good to go here’s a few commands to “play” with 🙂

1. Create a mirrored storage pool (equivalent with raid1)

zpool create mypool1 mirror /dev/vg/disk1 /dev/vg/disk2

2. Create a striped mirrored pool (equivalent with raid10)

zpool create mypool1 mirror /dev/vg/disk1 /dev/vg/disk2 mirror /dev/vg/disk3 /dev/vg/disk4

3. Create a raidz1 pool (equivalent with raid5)

zpool create mypool1 raidz1 /dev/vg/disk1 /dev/vg/disk2 /dev/vg/disk3 /dev/vg/disk4

4. Create a striped raidz1 pool (equivalent to raid50)

zpool create mypool1 raidz1 /dev/vg/disk1 /dev/vg/disk2 /dev/vg/disk3 /dev/vg/disk4 raidz1 /dev/vg/disk5 /dev/vg/disk6 /dev/vg/disk7 /dev/vg/disk8

To create raidz2/raidz3 pool (equivalent with raid6 and raid6+1more parity drive) simply replace raidz1 above with raidz2 or raidz3

5. Destroy a pool

zpool destroy mypool1

6. Grow (extend) an existing pool

zpool add mypool1 mirror /dev/vg/disk5 /dev/vg/disk6
zpool add mypool1 raidz1 /dev/vg/disk7 /dev/vg/disk8 /dev/vg/disk9

Extending a pool means adding extra drives or mirrors or raidzX to the pool (you can even mix them up) BUT remember this: you can add but you cannot remove and also adding does not rebalance data between spans (in other words, data that was already present will stay on original disks unless you recopy it to the storage pool)

7. Convert an existing disk to a mirror

zpool attach mypool1 /dev/vg/disk5 /dev/vg/disk10

8. Remove drive from mirror

zpool detach mypool1 /dev/vg/disk10

9. Replace (usually failed) drive in mirror and/or raidz

zpool replace mypool1 /dev/vg/disk9 /dev/vg/disk10

At pool level, you can replace a drive regardless of if it is online or offline BUT this works only if the data on it can be copied over from the other drives (ex. if drive is part of mirror or raidz it will work, if not (equivalent to raid0) then your pool will be corrupt and replacing the drive is impossible)

10. Adding cache / log devices to an existing pool

zpool add mypool1 log mirror /dev/vg/disk3 /dev/vg/disk4
zpool add mypool1 cache /dev/vg/disk5

Cache devices cannot be mirrored and do no need to be mirrored. Data on them can be lost without affecting integrity of the data stored on the pool BUT may impact performance when no longer present.

Log devices can be mirrored (if you ask me I’d say they should be mirrored by default) and can be removed without loosing data BUT if they fail there may be data loss

11. Disks inside the pool can be upgraded with bigger disks (as is the case with RAID, you can replace a drive with a bigger drive) AND if you configure the pool to autoexpand it will automatically expand itself and will expose the extra free space to the filesystem once you replace all drives part of the same span (mirror or raidz)

12. Snapshots are incremental and do copy-on-write. To create a snapshot one simply has to run a command looking like this:

zfs snapshot mypool1@snapshot1

Restoring a snapshot is a matter of running

zfs rollback mypool1@snapshot1

but note this – rollback is irreversible and if reverting back to an older snapshot, all snapshots created after the one you want to rollback to will be deleted

13. You may want to look at the output of following commands:

zfs get all
zpool get all

and decide if you want to make changes (maybe you want to enable deduplication or compression …)

14. In time I discovered that it is a good idea to work with device paths rather than device names because device names change whereas wwn paths will stay the same (might want to look inside folder /dev/disk/by-id/)

15. Upgrading zfs on Linux is a crappy process from what I’ve seen so far and usually means downtime because the pool will be exported (unmounted) and re-imported (re-mounted). If upgrading the kernel to a newer version, zfs module will be rebuilt and after rebooting the node it will be there in your new shiny kernel BUT the pool might not be automatically imported in which case:

zpool import

will show what pools it finds and once you identify or remember the name of the one you want to import simply run

zpool import $poolname (mypoo1 in my case)

16. ZFS’s equivalent for mount / umount is zfs import / zfs export

17. By default when you use zpool create you get the whole shabang created BUT zfs is the filesystem, zpool is the VG (think LVM) and the zvol would be the LV (think LVM) meaning one can create a storage pool without creating the mountpoint (thus no filesystem) then create a zvol on top of that and use it as the block device for say some other filesystem

(nothing will stop you from doing what I’m describing below)

zpool create -f -m none mypool1 mirror /dev/vg/disk1 /dev/vg/disk2
zfs create -s -o compression=lz4 -V 1T
mkfs.xfs /dev/mypool1/zvol1

This makes sense when / if you want / need to use a fancy form of software raid (that is if mdadm will not do it for you)

18. Zfs works best when you expose the actual drives and let it work with them directly. If you decide to use some fancy raid card to create your array then create the zpool and fs on top of that, you’ll not benefit from the features that come with zfs.

If you have the option of using ZFS then I do recommend you use it. If not then tough luck. Also I’ve seen it used as the filesystem for the root partition for a CentOS machine. I do not recommend doing this because of the complexity it brings into the equation (upgrading the kernel or zfs package will require you to perform extra steps to get the zpool cache file added to initramfs so that the kernel can import the pool and run the OS off it)


Posted

in

,

by

Tags: