Converting the Root Filesystem of Fedora Linux to ZFS

What This Document Covers

This document is a step-by-step guide on how to convert an existing installation of Fedora that is not currently using ZFS to using ZFS for all primary filesystems (/, /boot, /var, etc). At the end of the document, the zpool will be expanded across a second storage device providing a mirror setup. This is done without any data loss and with minimal downtime.

While there "should" be no data loss, the operations this guide will recommend are inherently risky and a typo or disk failure while following this guide might destroy all existing data. Make a backup before proceeding.

This guide is written for Fedora 27, and has been tested and confirmed to work on Fedora 26 and Fedora 25. Users of other distributions can use this guide as a general outline.

Why Use ZFS-on-Linux

The primary feature I wanted from switching to ZFS was the ability ZFS has to detect silent data corruption and self-heal when data corruption is detected (in mirror setups). When ZFS is used in a two-disk mirror, it also has the benefit of nearly doubling read speeds, which is generally unheard of in other RAID-1 setups. The transparent compression was also compelling; comparing before and after results on my system, the compression saves about 15GiB of storage space.

After using ZFS as my root filesystem for six months, I have come to love the automatic snapshots (setup at the end of this guide). Being able to recovered accidentally deleted or modified files, is a major advantage which I never considered before I switched.

ZFS has many other great features which are documented on the ZFS Wikipedia page.

Prerequisite Reading

Before attempting to follow this guide, an understanding of ZFS and the ZFS-on-Linux Project is required. Please see these resources:

Overview of Converting to ZFS-on-Linux

The rest of this document will be a step-by-step guide to converting an existing installation of Fedora to use ZFS as the / and /boot filesystems. The high-level overview of the plan is to install ZFS-on-Linux, create a ZFS filesystem on a second storage device, use rsync to copy over all the data from the current operating system install, install GRUB2 on the second storage device, boot off the second storage device, and add the first storage device to the zpool as a mirror device. Easy.

Table of Contents

  1. Assumptions
  2. Initial Disk and ZFS Setup
  3. Zpool Creation & Properties
  4. Dataset Creation & Properties
  5. ZFS Mountpoints
  6. Data Propagation
  7. Boot Loader Installation
  8. Default Mountpoints and the Boot Filesystem
  9. Mirrored Storage Pool
  10. Compression Performance
  11. Scrubbing
  12. Automatic Snapshots
  13. Adding ZFS to the Rescue Disk
  14. Troubleshooting
    1. Setting Up a chroot
    2. Rebuilding the Initial RAM Filesystem
    3. ZFS DKMS Module Issues
    4. Upgrading the zpool
    5. GRUB2 Boot Errors
  15. Kernel Upgrades
  16. Copyright
  17. Questions or Feedback?

Assumptions #Top

/dev/sda is the disk with the non-ZFS system install; /dev/sdb is a blank unused storage device. Both drives are the same size (250GiB SSDs are used throughout this guide), which is important, if planning on creating the mirrored storage pool as instructed at the end of this guide.

In this guide, I am using an MBR-style partition since my system doesn't use EFI. However, this guide will work for an EFI enabled system, although the path to the grub.cfg file will be different and an EFI System Partition would need to be created.

Initial Disk and ZFS Setup #Top

Ideally, we'd just give the entire block device /dev/sdb to ZFS, but since we need somewhere to install GRUB2, we'll create an MBR style partition table with one partition that starts 1MiB into the space and takes up all free space.

On /dev/sdb, create a partition table with one partition filling the entire drive. Ensure the first partition starts 1MiB into the drive, and set the partition ID to bf/Solaris, otherwise GRUB2 will throw errors at boot:

sgdisk -Z /dev/sdb
GPT data structures destroyed! You may now partition the disk using fdisk or
other utilities.
echo -e 'o\nn\np\n\n\n\nY\nt\nbf\nw\n' | fdisk /dev/sdb
Syncing disks.
fdisk -l /dev/sdb
Disk /dev/sdb: 232.9 GiB, 250059350016 bytes, 488397168 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0xe68f3500

Device     Boot Start       End   Sectors   Size Id Type
/dev/sdb1        2048 488397167 488395120 232.9G bf Solaris

Make a note of your current kernel version. Ensure when the kernel-headers and kernel-devel packages get installed they match your running kernel. Otherwise the ZFS DKMS module will fail to build:

uname -r
4.11.12-200.fc26.x86_64

Install the packages for ZFS-on-Linux. Pay special attention to ensure the kernel-headers and kernel-devel package versions match that of the installed kernel:

dnf install -y http://download.zfsonlinux.org/fedora/zfs-release.fc26.noarch.rpm

dnf install -y --allowerasing kernel-headers kernel-devel zfs zfs-dkms zfs-dracut

Double check that all kernel package versions are the same:

rpm -qa | grep kernel
kernel-core-4.11.12-200.fc26.x86_64
kernel-headers-4.11.12-200.fc26.x86_64
kernel-modules-4.11.12-200.fc26.x86_64
kernel-devel-4.11.12-200.fc26.x86_64
kernel-4.11.12-200.fc26.x86_64

Verify the ZFS DKMS module built and installed for the current kernel version:

dkms status
spl, 0.7.3, 4.11.12-200.fc26.x86_64, x86_64: installed
zfs, 0.7.3, 4.11.12-200.fc26.x86_64, x86_64: installed

If the zfs modules failed to install, see the Troubleshooting/ZFS DKMS Module Issues section.

Load the ZFS modules and confirm the ZFS modules loaded:

modprobe zfs
lsmod | grep zfs
zfs                  3395584  6
zunicode              331776  1 zfs
zavl                   16384  1 zfs
icp                   253952  1 zfs
zcommon                69632  1 zfs
znvpair                77824  2 zcommon,zfs
spl                   106496  4 znvpair,zcommon,zfs,icp

Zpool Creation & Properties

In Linux, base device names such as /dev/sdb are not guaranteed to be the same across reboots. Having the device name change would prevent the pool from importing at boot. Thus, one of the presistent naming methods found under /dev/disk/ must be used instead. There are several choices, all of which has pros and cons. In this guide, since we are using a small two disk pool, we will use the /dev/disk/by-id/ method. See the ZoL FAQ for more details.

The file /etc/hostid is used on Solaris systems as a unique system identifier, and while Linux does not have an /etc/hostid file, ZFS expects one to exist. Otherwise, it will use a random hostid which will, seemingly randomly, change and prevent the system from booting.

Generate a new /etc/hostid file:

zgenhostid

Determine the device ID of the /dev/sdb1 device:

find -L /dev/disk/by-id/ -samefile /dev/sdb1
/dev/disk/by-id/wwn-0x5002538d405cc6aa
/dev/disk/by-id/ata-Samsung_SSD_850_EVO_250GB_S21NNXAG927110B-part1

In the following steps, replace ata-Samsung_SSD_850_EVO_250GB_S21NNXAG927110B-part1 with the device ID of the second drive from your system.

Since the drives being used in this guide are both SSDs, which use 4KiB sector sizes, but which lie to the operating system and claim to use 512 byte sectors, use the option ashift=12 to force ZFS to align the filesystem to 4KiB. This will result in a little better performance at the expense of slightly less efficient storage of small files.

The cachefile=none option prevents the file /etc/zfs/zpool.cache file from being created/updated for this pool. When using ZFS as the root filesystem, any change to the pool layout will invalidate the cache file and prevent the pool from being importable, and thus prevent the system from booting.

Create a ZFS storage pool named "tank" on the secondary storage device:

zpool create -o ashift=12 -o cachefile=none tank /dev/disk/by-id/ata-Samsung_SSD_850_EVO_250GB_S21NNXAG927110B-part1
zpool list
NAME   SIZE  ALLOC   FREE  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
tank   232G   468K   232G         -     0%     0%  1.00x  ONLINE  -

Dataset Creation & Properties

When deciding which datasets to create and what ZFS features to enable, there are many choices to consider. In this example, only a basic group of datasets will be created. For real-world use on a multiuser system, setting a quota on tank/home would be strongly recommended to prevent one user from taking all available space in the pool. Other recommendations include the creation of separate datasets with quotas for tank/var, tank/var/log, and tank/var/log/audit, to prevent application logs from taking all available space in the pool.

Many applications and users expect File Access Control Lists (FACLs) to "just work". When creating ZFS datasets, the functionality of FACLs must be manually enabled per-dataset with the option acltype=posixacl.

For datasets such as tank/tmp, the ability to run executables can be disabled with the option exec=off.

Two other choices are whether or not compression and deduplication should be enabled. While testing with my data, deduplication doesn't provide any notable space savings, while taking a lot of system memory (8GiB in testing). Compression, on the other hand, is essentially free. On my datasets, compression does not negatively affect performance, and as of the writing of this guide, is currently saving 15GiB of space. On a 250GB SSD, that's a nice savings.

From previous testing, compression makes no difference on my /boot/ and /home/ directories.

My /home/ directory is using ecryptfs, which effectively makes /home/ incompressible.

Create the ZFS datasets with desired options:

zfs create -o compression=lz4 -o dedup=off -o acltype=posixacl tank/root
zfs create -o compression=off -o dedup=off -o acltype=posixacl -o quota=500M tank/root/boot
zfs create -o compression=lz4 -o dedup=off -o acltype=posixacl -o exec=off -o setuid=off tank/root/tmp
zfs create -o compression=lz4 -o dedup=off -o acltype=posixacl tank/root/var
zfs create -o compression=off -o dedup=off -o acltype=posixacl tank/home

Next, create a ZFS ZVolume for swap. For the best performance, set the volume block size the same as the system's page size; for most x86_64 systems, this will be 4KiB. The command getconf PAGESIZE can be used to verify. The sync=always option ensures that writes are not cached and immediately written out to disk. The primarycache=metadata option prevents read swap data from being cached in memory, which would defeat the purpose of swap. ZVOLs will appear in the /dev/zvol/tankname directory.

Create a 4GiB ZFS volume and configure it to be used as swap space:

zfs create -V 4G -b $(getconf PAGESIZE) -o logbias=throughput -o sync=always -o primarycache=metadata -o com.sun:auto-snapshot=false tank/swap
mkswap /dev/zvol/tank/swap
Setting up swapspace version 1, size = 4 GiB (4294963200 bytes)
no label, UUID=e444defd-5710-4ace-a3d8-7ae7dfdb8302

Confirm the datasets were created successfully:

zfs list
NAME             USED  AVAIL  REFER  MOUNTPOINT
tank             876K   225G   112K  /tank
tank/home         96K   225G    96K  /tank/home
tank/root        392K   225G   104K  /tank/root
tank/root/boot    96K   500M    96K  /tank/root/boot
tank/root/tmp     96K   225G    96K  /tank/root/tmp
tank/root/var     96K   225G    96K  /tank/root/var
tank/swap       4.25G   118G    60K  -

At this point, the ZFS storage pool and datasets have been created. ZFS is working. The next few steps will take us through copying over the installed operating system to the ZFS datasets.

ZFS Mountpoints

Examine the default mountpoints of the ZFS datasets:

zfs get mountpoint
NAME            PROPERTY    VALUE            SOURCE
tank            mountpoint  /tank            default
tank/home       mountpoint  /tank/home       default
tank/root       mountpoint  /tank/root       default
tank/root/boot  mountpoint  /tank/root/boot  default
tank/root/tmp   mountpoint  /tank/root/tmp   default
tank/root/var   mountpoint  /tank/root/var   default

Update the mountpoint for the dataset tank/home to mount inside the directory /tank/root/:

zfs set mountpoint=/tank/root/home tank/home

Examine the new default mountpoints:

zfs get mountpoint
NAME            PROPERTY    VALUE            SOURCE
tank            mountpoint  /tank            default
tank/home       mountpoint  /tank/root/home  local
tank/root       mountpoint  /tank/root       default
tank/root/boot  mountpoint  /tank/root/boot  default
tank/root/tmp   mountpoint  /tank/root/tmp   default
tank/root/var   mountpoint  /tank/root/var   default

Confirm the datasets are mounted:

zfs mount
tank                            /tank
tank/root                       /tank/root
tank/root/boot                  /tank/root/boot
tank/root/tmp                   /tank/root/tmp
tank/root/var                   /tank/root/var
tank/home                       /tank/root/home

Data Propagation

The next few steps will copy all data from the / filesystem to the ZFS datasets that were created earlier. Ideally this would be done with the root filesystem mounted read-only to prevent data from changing while being copied.

If LVM is in use on /dev/sda, one option would be to create snapshots of the logical volumes, then mount and rsync the LVM snapshots. This is left as an exercise to the reader, since the writer didn't think to do that until well after the fact.

Turn off all unnecessary services, like database servers and user programs that are likely to write to the disk while the copy is in progress. Cross your fingers, and hope for the best.

The rsync commands used below have a boat-load of options being passed. Broken down, they do the following:

-W, --whole-file
Copy files whole (w/o delta-xfer algorithm).
-S, --sparse
Handle sparse files efficiently.
-H, --hard-links
Preserve hard links.
-A, --acls
Preserve ACLs (implies -p).
-X, --xattrs
Preserve extended attributes.
-h, --human-readable
Output numbers in a human-readable format.
-a, --archive
Archive mode; equals -rlptgoD (no -H,-A,-X).
-v, --verbose
Increase verbosity.
-x, --one-file-system
Don't cross filesystem boundaries.
--progress
Show progress during transfer.
--stats
Give some file-transfer stats.

All datasets except for tank/root need to be unmounted for the initial copy of data:

echo /tank/root/{var,tmp,boot,home} | xargs -n1 zfs unmount
zfs mount
tank                            /tank
tank/root                       /tank/root

Copy all data from the root filesystem to the ZFS dataset tank/root:

rsync -WSHAXhavx --progress --stats / /tank/root/

Remount all ZFS datasets and confirm they mounted:

zfs mount -a
zfs mount
tank                            /tank
tank/root                       /tank/root
tank/root/boot                  /tank/root/boot
tank/home                       /tank/root/home
tank/root/tmp                   /tank/root/tmp
tank/root/var                   /tank/root/var

Use rsync to copy the data from all other filesystems to the ZFS datasets. (Note the trailing slash on the source directory, it is very important.):

rsync -WSHAXhav --progress --stats /boot/ /tank/root/boot/

rsync -WSHAXhav --progress --stats /tmp/ /tank/root/tmp/

rsync -WSHAXhav --progress --stats /var/ /tank/root/var/

rsync -WSHAXhav --progress --stats /home/ /tank/root/home/

There should now be an exact copy of the data from the old filesystems on the new ZFS filesystem.

Boot Loader Installation

Prepare a chroot environment to interactively enter the ZFS filesystem so that the the boot loader can be installed on /dev/sdb.

Bind mount the filesystems /dev/, /sys/, /run/ and /proc/ from the running system into the chroot environment:

mount --rbind /dev/ /tank/root/dev/
mount --rbind /proc/ /tank/root/proc/
mount --rbind /sys/ /tank/root/sys/
mount --rbind /run/ /tank/root/run/

Enter the chroot:

chroot /tank/root/ /bin/bash

Generate a new /etc/fstab to point to the ZFS datasets:

awk '$3 == "zfs" {$4="defaults"; print}' /proc/mounts | column -t > /etc/fstab

Edit /etc/fstab, add in mount definitions for /proc/ (optional), /var/tmp/ (optional), and swap:

$EDITOR /etc/fstab
tank/root            /         zfs   defaults                          0  0
tank/root/boot       /boot     zfs   defaults                          0  0
tank/home            /home     zfs   defaults                          0  0
tank/root/tmp        /tmp      zfs   defaults                          0  0
tank/root/var        /var      zfs   defaults                          0  0
proc                 /proc     proc  rw,nosuid,nodev,noexec,hidepid=2  0  0
/tmp                 /var/tmp  none  bind                              0  0
/dev/zvol/tank/swap  swap      swap  defaults                          0  0

Over the next few steps, various GRUB2 commands will be used to detect ZFS and install the GRUB2 boot loader. While GRUB2 does support ZFS, there are bugs. A major issue is that GRUB2 uses the command zpool status to determine the underlying block device. However, zpool status doesn't return the entire path to the block device. See http://list.zfsonlinux.org/pipermail/zfs-discuss/2016-June/025765.html and https://bugs.launchpad.net/ubuntu/+source/grub2/+bug/1632694 for more information.

Set the environmental variable ZPOOL_VDEV_NAME_PATH=yes which changes zpool status's default behavior:

zpool status

        NAME                                                   STATE     READ WRITE CKSUM
        tank                                                   ONLINE       0     0     0
          ata-Samsung_SSD_850_EVO_250GB_S21NNXAG927110B-part1  ONLINE       0     0     0

export ZPOOL_VDEV_NAME_PATH=YES
zpool status

        NAME                                                                   STATE     READ WRITE CKSUM
        tank                                                                   ONLINE       0     0     0
          /dev/disk/by-id/ata-Samsung_SSD_850_EVO_250GB_S21NNXAG927110B-part1  ONLINE       0     0     0

Confirm that GRUB2 can recognize the ZFS filesystem, and install the GRUB2 boot loader:

grub2-probe /
zfs
grub2-install --modules=zfs /dev/sdb
Installing for i386-pc platform.
Installation finished. No error reported.

Update the GRUB2 configuration file /etc/default/grub for the new ZFS filesystem. GRUB_PRELOAD_MODULES="zfs" must be added so that GRUB2 can detect the zpool at boot:

$EDITOR /etc/default/grub
GRUB_TIMEOUT=5
GRUB_DISTRIBUTOR="$(sed 's, release .*$,,g' /etc/system-release)"
GRUB_DEFAULT=saved
GRUB_DISABLE_SUBMENU=true
GRUB_TERMINAL_OUTPUT="console"
GRUB_CMDLINE_LINUX="rd.lvm.lv=fedora_host-live/root rd.lvm.lv=fedora_host-live/boot rd.lvm.lv=fedora_host-live/swap rhgb quiet"
GRUB_DISABLE_RECOVERY="true"
GRUB_CMDLINE_LINUX="quiet"
GRUB_PRELOAD_MODULES="zfs"

Regenerate the GRUB2 configuration file /boot/grub2/grub.cfg:

grub2-mkconfig > /boot/grub2/grub.cfg
Generating grub configuration file ...

done

The above command may have generated some errors such as Command failed which should be safe to ignore.

Enable the ZFS target and related services:

systemctl enable zfs.target zfs-import-scan zfs-share zfs-mount

Earlier when the zpool was created, the option cachefile=none was used. This makes importing pools slow on systems that have many disks, but also allows the system to boot if the layout of the zpool changes.

If the file /etc/zfs/zpool.cache exists, Dracut will copy it into the initramfs, and then the system will try to use it at boot. To prevent this from becoming a problem, delete the zpool.cache file and ensure the zpool has the cache file disabled:

rm -f /etc/zfs/zpool.cache
zpool get cachefile
NAME  PROPERTY   VALUE      SOURCE
tank  cachefile  none       local

Rebuild the Initial RAM Filesystem using Dracut:

dracut --force /boot/initramfs-$(uname -r).img

Ensure the etc/zfs/zpool.cache file does not exist in the initramfs file:

lsinitrd /boot/initramfs-$(uname -r).img | grep zpool.cache || echo OKAY
OKAY

Exit the chroot environment and unmount all non-ZFS filesystems from the chroot:

exit
umount -l /tank/root/{dev,proc,sys,run}

Default Mountpoints and the Boot Filesystem #Top

Set the datasets' mountpoints to legacy in order to be compatible with the systemd way of mounting filesystems:

zfs set mountpoint=legacy tank/root/boot
zfs set mountpoint=legacy tank/home
zfs set mountpoint=legacy tank/root/tmp
zfs set mountpoint=legacy tank/root/var

The previous step deletes the /boot/, /tmp/, and /var/ directories, which are required at boot for legacy mounting. Re-create them to keep the system happy:

mkdir /tank/root/{boot,tmp,var}/

The /tank/ directory is no longer needed on the new ZFS filesystem. Remove it:

rmdir /tank/root/tank/

SELinux has been left in Enforcing mode throughout this guide. A previous step has created a few directories that now have incorrect labels. Set the system to perform a relabel on the entire filesystem at boot:

touch /tank/root/.autorelabel

Set the mountpoint for the tank/root dataset to /:

zfs set mountpoint=/ tank/root
cannot mount '/': directory is not empty
property may be set but unable to remount filesystem

Set the boot filesystem property on the tank/root/boot dataset:

zpool set bootfs=tank/root/boot tank

ZFS best practice is to not store any data in the base of the pool, and instead store data in datasets. Prevent the base dataset tank from automatically mounting during pool import:

zfs set canmount=off tank

Export the zpool so that it will be importable on reboot:

zpool export tank

Confirm SELinux is still on and that people who turn it off are total n00bs:

getenforce
Enforcing

Reboot the system, and boot off of the /dev/sdb device:

reboot

If any errors occurred, it is likely that an above step was either skipped or typo'd. Boot the system off /dev/sda, remount the datasets, enter the chroot environment, and double check all steps were completed properly.

Congratulations! If you have followed the steps in this guide this far, you now have a bootable ZFS filesystem and ZFS datasets for all primary filesystems. The next few steps in this guide will demonstrate how to attach the /dev/sda disk to be a member of the pool as a mirror device and show the performance of a mirror setup.

Mirrored Storage Pool

Now that ZFS is working correctly and the system is bootable, a few final steps can be taken to attain the full benefits of using ZFS.

In my own setup, I chose to go with adding the /dev/sda disk into the zpool as a mirror device. I chose this because using a mirror disk in ZFS gives me a few huge benefits including, but not limited to: faster read (2x) speed, self-healing, and bragging rights.

A new partition table must be created on /dev/sda. The partition created must be the same size or larger than the /dev/sdb1 partition created earlier in this guide.

Before creating the mirror, we'll test the read speed of a single disk setup.

Create a test file called readme (I crack me up). Use the /dev/urandom device instead of /dev/zero so that ZFS compression won't mess up the test results:

dd if=/dev/urandom of=readme count=128k bs=4096

Drop filesystem caches to ensure we are not measuring RAM speed. Test disk speed before mirror creation:

echo 3 > /proc/sys/vm/drop_caches
pv -rb readme > /dev/null
 512MiB [502MiB/s]

Clear out the existing partition table on /dev/sda, then copy the partition table from /dev/sdb to /dev/sda:

vgchange -an
  0 logical volume(s) in volume group "fedora_host-live" now active
sgdisk -Z /dev/sda
GPT data structures destroyed! You may now partition the disk using fdisk or
other utilities.
sfdisk -d /dev/sdb | sfdisk /dev/sda

The partition table has been altered.
Calling ioctl() to re-read partition table.
Syncing disks.

Examine the new partition table:

fdisk -l /dev/sda
Disk /dev/sda: 223.6 GiB, 240057409536 bytes, 468862128 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x84b667c6

Device     Boot Start       End   Sectors   Size Id Type
/dev/sda1        2048 468862127 468860080 223.6G bf Solaris

Earlier in this guide, we installed GRUB2 on /dev/sdb. By also installing GRUB2 on /dev/sda, this will allow the system to boot should either storage device fail. Install the GRUB2 boot-loader onto /dev/sda:

ZPOOL_VDEV_NAME_PATH=YES grub2-install --modules=zfs /dev/sda
Installing for i386-pc platform.
Installation finished. No error reported.

Identify the device ID of the newly created partition:

find -L /dev/disk/by-id/ -samefile /dev/sda1
/dev/disk/by-id/wwn-0x5001517bb2a26677-part1
/dev/disk/by-id/ata-INTEL_SSDSC2CT240A3_CVMP234400PF240DGN-part1

Add /dev/sda1 into the tank by attaching the disk to the pool:

zpool attach tank /dev/disk/by-id/ata-Samsung_SSD_850_EVO_250GB_S21NNXAG927110B-part1 \
  /dev/disk/by-id/ata-INTEL_SSDSC2CT240A3_CVMP234400PF240DGN-part1
Make sure to wait until resilver is done before rebooting.

Verify the disk was attached and that there is now a mirror:

zpool status tank
  pool: tank
 state: ONLINE
  scan: scrub repaired 0B in 0h4m with 0 errors on Sun Aug  6 13:10:06 2017
config:

        NAME                                                     STATE     READ WRITE CKSUM
        tank                                                     ONLINE       0     0     0
          mirror-0                                               ONLINE       0     0     0
            ata-Samsung_SSD_850_EVO_250GB_S21NNXAG927110B-part1  ONLINE       0     0     0
            ata-INTEL_SSDSC2CT240A3_CVMP234400PF240DGN-part1     ONLINE       0     0     0

errors: No known data errors

Drop filesystem caches to ensure we are not measuring RAM speed. Test disk speed after mirror creation:

echo 3 > /proc/sys/vm/drop_caches
pv -rb readme > /dev/null
 512MiB [1015MiB/s]

As shown in the results of the previous commands, adding a second device doubled the read speeds. Write speeds remain unaffected.

Compression Performance

As noted in the summary at the top of this guide, I stated that ZFS compression is currently saving 15GiB of space.

The command zfs get compressratio can be used to display the current compression ratios:

zfs get compressratio
NAME             PROPERTY       VALUE  SOURCE
tank             compressratio  1.16x  -
tank/home        compressratio  1.00x  -
tank/root        compressratio  1.56x  -
tank/root/boot   compressratio  1.00x  -
tank/root/tmp    compressratio  2.14x  -
tank/root/var    compressratio  1.63x  -
tank/swap        compressratio  1.00x  -

Using the command du with the --apparent-size option shows the size a file claims to be:

du -hsxc --apparent-size /*
146M    /boot
24M     /etc
68G     /home
4.1G    /root
11M     /tmp
6.6G    /usr
33G     /var

Without the --apparent-size option, du shows the space on disk the file is actually taking:

du -hsxc /*
150M    /boot
16M     /etc
70G     /home
4.1G    /root
291K    /tmp
4.7G    /usr
20G     /var

Combining the output of the above two commands makes it easy to see where the differences in size are:

paste <(du -hsxc --apparent-size $(find / -maxdepth 1 -type d -fstype zfs)) <(du -hsxc $(find / -maxdepth 1 -type d -fstype zfs)) | \
awk 'BEGIN {print "Apparent Actual Mount"} {$2=""; print}' | column -t
Apparent  Actual  Mount
29G       27G     /       ## 2GiB savings here
172M      161M    /boot
10        1.5K    /tank
12M       915K    /tmp
68G       70G     /home   ## 4KiB sector size costs me 2GiB here.
33G       20G     /var    ## ZFS's lz4 compression saves me 13GiB here
130G      117G    total

Comparing the output of the two du commands above, we can see on the directories /home/ and /boot/, on which compression is disabled, that I'm actually using more disk space than the files alone are taking up. This is due to the use of the 4KiB sector size, but as we can see on the directories /etc/, /tmp/, /usr/, and /var/, compression is saving just over 15GiB of space.

Scrubbing #Top

Scrubbing is the method by which ZFS detects and corrects data errors. A ZFS scrub checks every block in the pool and compares it to the block's checksum value. The ZFS storage pool is not scrubbed automatically, and thus, it must be done manually or be scripted. To that end, I have written a small shell script which can be dropped into /etc/cron.weekly/. This script will automatically scrub the root zpool once a week and will complain about any errors over wall and syslog.

Download and install the zfs-scrub-and-report.sh script:

wget -O /etc/cron.weekly/zfs-scrub-and-report.sh \
    http://matthewheadlee.com/projects/zfs-as-root-filesystem-on-fedora/zfs-scrub-and-report.sh
chmod 700 /etc/cron.weekly/zfs-scrub-and-report.sh
/etc/cron.weekly/zfs-scrub-and-report.sh
Scrubbing has started.
Scrub complete, no errors detected.

Automatic Snapshots #Top

One of the greatest superpowers of ZFS is its ability to create snapshots. A snapshot is an exact read-only copy of the dataset from the point in time in which the snapshot was taken. Snapshots use the available free space in the pool, but their storage requirements are generally pretty small as they only have to store the difference between the snapshot and the current state of the filesystem.

Using a small script, the creation and deletion of snapshots can be automated, giving the ability to rollback any dataset to any previous point in time or to recover deleted files. The ZFS-on-Linux Project provides a great shell script for this purpose. The zfs-auto-snapshot script installs itself as a series of cron jobs which run every 15 minutes, hourly, daily, weekly and monthly. See: https://github.com/zfsonlinux/zfs-auto-snapshot for more details.

Download and install the zfs-auto-snapshot package:

git clone 'https://github.com/zfsonlinux/zfs-auto-snapshot.git' /tmp/zfs-auto-snapshot/
Cloning into '/tmp/zfs-auto-snapshot'...

Checking connectivity... done.
make -C /tmp/zfs-auto-snapshot/ install
install -d /etc/cron.d

install src/zfs-auto-snapshot.sh /usr/local/sbin/zfs-auto-snapshot

By default neither cron nor anacron include the directory /usr/local/sbin/ in the PATH environmental variable, meaning the hourly, daily, weekly, and monthly zfs-auto-snapshot cron jobs will not execute.

Modify the cron configuration files /etc/crontab, /etc/cron.d/0hourly, and /etc/anacrontab by adding the directory /usr/local/sbin/ to the PATH environmental variable:

$EDITOR /etc/crontab
PATH=/sbin:/bin:/usr/sbin:/usr/bin
PATH=/sbin:/bin:/usr/sbin:/usr/bin:/usr/local/sbin
$EDITOR /etc/cron.d/0hourly
PATH=/sbin:/bin:/usr/sbin:/usr/bin
PATH=/sbin:/bin:/usr/sbin:/usr/bin:/usr/local/sbin
$EDITOR /etc/anacrontab
PATH=/sbin:/bin:/usr/sbin:/usr/bin
PATH=/sbin:/bin:/usr/sbin:/usr/bin:/usr/local/sbin

Once snapshots have been taken, they are accessible under the root of each dataset through the directory .zfs/. An individual file from a specific snapshot could be restored by simply copying it from the snapshot:

ls -l /.zfs/snapshot/
total 9
drwxrwxrwx.  1 root root  0 Nov 16 19:51 monday
dr-xr-xr-x. 20 root root 25 Nov 16 09:01 zfs-auto-snap_frequent-2017-11-17-0030
drwxrwxrwx.  1 root root  0 Nov 16 19:51 zfs-auto-snap_frequent-2017-11-17-0045
ls -l /.zfs/snapshot/zfs-auto-snap_frequent-2017-11-17-0030/etc/passwd
-rw-r--r--. 1 root root 2696 Jul 25 15:16 /.zfs/snapshot/zfs-auto-snap_frequent-2017-11-17-0030/etc/passwd
ls -l /var/.zfs/snapshot/zfs-auto-snap_frequent-2017-11-17-0045/log/messages
-rw-------. 1 root root 11093703 Nov 16 19:44 /var/.zfs/snapshot/zfs-auto-snap_frequent-2017-11-17-0045/log/messages

Snapshots can be listed through the zfs list command:

zfs list -t snapshot
NAME                                                                         USED  AVAIL  REFER  MOUNTPOINT
tank@zfs-auto-snap_frequent-2017-11-17-0030                                    0B      -   124K  -
tank@zfs-auto-snap_frequent-2017-11-17-0045                                    0B      -   124K  -
tank@zfs-auto-snap_frequent-2017-11-17-0100                                    0B      -   124K  -
tank/home@zfs-auto-snap_frequent-2017-11-17-0030                             124M      -  66.7G  -
tank/home@zfs-auto-snap_frequent-2017-11-17-0045                             114M      -  66.7G  -
tank/home@zfs-auto-snap_frequent-2017-11-17-0100                            81.2M      -  66.8G  -
tank/root@zfs-auto-snap_frequent-2017-11-17-0030                             292K      -  24.7G  -
tank/root@zfs-auto-snap_frequent-2017-11-17-0045                             184K      -  24.7G  -
tank/root@zfs-auto-snap_frequent-2017-11-17-0100                             152K      -  24.7G  -
tank/root/boot@zfs-auto-snap_frequent-2017-11-17-0030                          0B      -   152M  -
tank/root/boot@zfs-auto-snap_frequent-2017-11-17-0045                          0B      -   152M  -
tank/root/boot@zfs-auto-snap_frequent-2017-11-17-0100                          0B      -   152M  -

An entire dataset can be rolled back to a previous snapshot using the zfs rollback command. If rolling back any system datasets such as tank/root, tank/var, etc., it would be advised to first reboot the system to rescue mode.

 zfs rollback tank/root/tmp@zfs-auto-snap_frequent-2017-11-17-0045

Adding ZFS to the Rescue Disk #Top

By default, the Fedora install/rescue image lacks the ability to read, write, or mount newer versions of the ZFS filesystem. This lack of support for ZFS can cause headaches when trying to fix a non-bootable system which uses ZFS as root.

The next few steps will describe how to add support for the latest version of ZFS to the Fedora Live DVD.

Download the Fedora Live ISO and write it to a disc or thumb-drive and boot off it.

Install the official ZFS-on-Linux release rpm:

dnf install -y http://download.zfsonlinux.org/fedora/zfs-release.fc27.noarch.rpm

Install the kernel-headers and kernel-devel packages for the current running version of the kernel. The Live image kernel will likely be a few releases behind the latest available version:

dnf install --allowerasing -y kernel-headers-$(uname -r) kernel-devel-$(uname -r) zfs zfs-dracut

Verify the ZFS DKMS module built and installed for the current kernel version and load them:

dkms status
spl, 0.7.5, 4.13.9-300.fc27.x86_64, x86_64: installed
zfs, 0.7.5, 4.13.9-300.fc27.x86_64, x86_64: installed
modprobe zfs

If the zfs modules failed to install, see the Troubleshooting/ZFS DKMS Module Issues section.

Import the tank and setup a chroot as shown in the Troubleshooting section.

Troubleshooting #Top

Sometimes during boot, the system will get dropped to the Dracut Emergency Shell. Generally, this seems to happen if there has been a physical change to the zpool, such as the failure of a drive. These problems can usually be fixed by importing the zpool setting up a chroot and correcting the issue.

Other times the system may completely fail to boot in which case a Fedora Live Image can be used to boot and get the zpool back online. See the Adding ZFS to the Rescue Disk section for details how to boot a system that will not boot.

Setting Up a chroot

Forcefully import the ZFS zpool:

zpool import -f tank

Set the ZFS mountpoints to /sysroot/ and mount everything:

zfs set mountpoint=/sysroot tank/root
zfs set mountpoint=/sysroot/boot tank/root/boot
zfs set mountpoint=/sysroot/tmp tank/root/tmp
zfs set mountpoint=/sysroot/var tank/root/var
zfs mount -a

Bind mount the filesystems /dev/, /sys/, /run/, and /proc/ from the Initial RAM Filesystem into the chroot environment:

mount --rbind /dev/ /sysroot/dev/
mount --rbind /proc/ /sysroot/proc/
mount --rbind /sys/ /sysroot/sys/
mount --rbind /run/ /sysroot/run/

Enter the chroot environment:

chroot /sysroot/ /bin/bash

See the other troubleshooting sections to see how to solve common problems.

After the system has been fixed, exit the chroot environment, and unmount all non-ZFS filesystems from the chroot environment:

exit
umount -l /sysroot/{dev,proc,sys,run}/

Set the ZFS tank/root mountpoint back to /:

zfs set mountpoint=/ tank/root

Set the ZFS datasets' mountpoints back to legacy:

zfs set mountpoint=legacy tank/root/boot
zfs set mountpoint=legacy tank/root/tmp
zfs set mountpoint=legacy tank/root/var

Export the zpool so that it will be importable on reboot, and then reboot:

zpool export tank
reboot -f

Rebuilding the Initial RAM Filesystem

At times the Initial RAM Filesystem will need to be rebuilt by hand. Below, are some of the reasons I've encountered:

Rebuild the Initial RAM Filesystem using Dracut:

dracut --force /boot/initramfs-$(uname -r).img

Ensure the etc/zfs/zpool.cache file does not exist in the initramfs:

lsinitrd /boot/initramfs-$(uname -r).img | grep zpool.cache || echo OKAY
OKAY

ZFS DKMS Module Issues

If the spl or zfs module failed to build and/or install, the most common reason is either that the kernel-headers or kernel-devel packages are not installed and/or the installed version does not match the version of the kernel you are building for. Install the correct version of the kernel-headers or kernel-devel packages and rebuild the modules.

The spl & zfs modules can be rebuilt using the dkms command. The example below will rebuild the spl & zfs modules and initramfs for all installed kernel versions:

zfsver=$(rpm -q --qf "%{VERSION}\n" zfs)
while read -r kernelver; do
  for module in spl zfs; do
    dkms add -k "${kernelver}" "${module}/${zfsver}"
    dkms build -k "${kernelver}" "${module}/${zfsver}"
    dkms install -k "${kernelver}" "${module}/${zfsver}"
  done
  initfilename="/boot/initramfs-${kernelver}.img"
  dracut --force "${initfilename}" "${kernelver}"
  if [[ "$(lsinitrd "${initfilename}" | grep -c -e spl.ko -e zfs.ko)" -ne 2 ]]; then
    echo "ERROR: failed to find spl or zfs modules in the "${initfilename}" initramfs."
    exit 1
  fi
done < <(rpm -q --qf "%{VERSION}-%{RELEASE}.%{ARCH}\n" kernel)

Upgrading the zpool

When a new version of ZFS is installed, commonly new ZFS features become available. To boot the system, GRUB2 the initial ramfs, and the root filesystem all need to use a version of zfs which supports the enabled features.

Upgrade GRUB2:

export ZPOOL_VDEV_NAME_PATH=YES
grub2-probe /
zfs
grub2-install --modules=zfs /dev/sdX
Installing for i386-pc platform.
Installation finished. No error reported.

Rebuild the initial ramfs:

dracut --force /boot/initramfs-$(uname -r).img

Finally upgrade the zpool:

zpool upgrade tank

GRUB2 Boot Errors

When booting, if errors like those seen below appear, ensure the directive GRUB_PRELOAD_MODULES="zfs" exists in the file /etc/default/grub and rerun grub2-mkconfig.

error: compression algorithm 17 not supported
error: compression algorithm 37 not supported
error: compression algorithm 79 not supported
error: unknown device 66326293.
error: compression algorithm inherit not supported
error: unsupported embedded BP (type=47)

Kernel Upgrades #Top

Updating the kernel tends to be problematic, the issues seem to mostly stem from the DKMS system rebuilding the ZFS kernel modules after the updated initramfs is built. Below is the order of steps I've found works best when doing a kernel update.

Perform a dnf update:

dnf update -y
Last metadata expiration check: 1:29:05 ago on Tue Nov 21 19:44:54 2017.
Dependencies resolved.
========================================================================
 Package                Arch     Version           Repository    Size
========================================================================
Installing:
 kernel                 x86_64   4.13.13-100.fc26  updates      106 k
 kernel-core            x86_64   4.13.13-100.fc26  updates       21 M
 kernel-devel           x86_64   4.13.13-100.fc26  updates       12 M
 kernel-modules         x86_64   4.13.13-100.fc26  updates       24 M
 kernel-modules-extra   x86_64   4.13.13-100.fc26  updates      2.2 M
Removing:
 kernel                 x86_64   4.12.11-200.fc26  @updates       0
 kernel-core            x86_64   4.12.11-200.fc26  @updates      54 M
 kernel-devel           x86_64   4.12.11-200.fc26  @updates      43 M
 kernel-modules         x86_64   4.12.11-200.fc26  @updates      23 M
 kernel-modules-extra   x86_64   4.12.11-200.fc26  @updates     2.0 M

Transaction Summary
========================================================================
Install  5 Packages
Upgrade  5 Packages
Remove   5 Packages

Total download size: 74 M

Confirm the spl and zfs kernel modules built for the newly installed kernel:

find /lib/modules/4.13.13-100.fc26.x86_64/ -name zfs.ko -o -name spl.ko
/lib/modules/4.13.13-100.fc26.x86_64/extra/zfs.ko
/lib/modules/4.13.13-100.fc26.x86_64/extra/spl.ko

If the zfs modules failed to install, see the Troubleshooting/ZFS DKMS Module Issues section.

Notice the spl and zfs kernel modules do not exist in the initramfs:

lsinitrd /boot/initramfs-4.13.13-100.fc26.x86_64.img | grep -e spl.ko -e zfs.ko

Ensure the zpool.cache file does not exist, and rebuild the initramfs for the new kernel version:

rm -f /etc/zfs/zpool.cache
dracut --force /boot/initramfs-4.13.13-100.fc26.x86_64.img 4.13.13-100.fc26.x86_64

Rebuild the GRUB2 configuration file:

grub2-mkconfig > /boot/grub2/grub.cfg
Generating grub configuration file ...

Found linux image: /boot/vmlinuz-4.13.13-100.fc26.x86_64
Found initrd image: /boot/initramfs-4.13.13-100.fc26.x86_64.img

done

Confirm the initramfs for the newly installed kernel now contains the spl and zfs kernel modules:

lsinitrd /boot/initramfs-4.13.13-100.fc26.x86_64.img | grep -e spl.ko -e zfs.ko
-rw-r--r--   1 root     root       157352 Aug 11 06:23 usr/lib/modules/4.13.13-100.fc26.x86_64/extra/spl.ko
-rw-r--r--   1 root     root      3000240 Aug 11 06:23 usr/lib/modules/4.13.13-100.fc26.x86_64/extra/zfs.ko

Reboot to the newly installed kernel:

reboot

Questions or Feedback? #Top

Contact me