This document is a step-by-step guide on how to convert an existing installation of Fedora that is not currently
using ZFS to using ZFS for all primary filesystems (/
, /boot
, /var
, etc). At
the end of the document, the zpool will be expanded across a second storage device providing a mirror setup. This
is done without any data loss and with minimal downtime.
While there "should" be no data loss, the operations this guide will recommend are inherently risky and a typo or disk failure while following this guide might destroy all existing data. Make a backup before proceeding.
This guide is written for Fedora 27, and has been tested and confirmed to work on Fedora 26 and Fedora 25. Users of other distributions can use this guide as a general outline.
The primary feature I wanted from switching to ZFS was the ability ZFS has to detect silent data corruption and self-heal when data corruption is detected (in mirror setups). When ZFS is used in a two-disk mirror, it also has the benefit of nearly doubling read speeds, which is generally unheard of in other RAID-1 setups. The transparent compression was also compelling; comparing before and after results on my system, the compression saves about 15GiB of storage space.
After using ZFS as my root filesystem for six months, I have come to love the automatic snapshots (setup at the end of this guide). Being able to recovered accidentally deleted or modified files, is a major advantage which I never considered before I switched.
ZFS has many other great features which are documented on the ZFS Wikipedia page.
Before attempting to follow this guide, an understanding of ZFS and the ZFS-on-Linux Project is required. Please see these resources:
The rest of this document will be a step-by-step guide to converting an existing installation of Fedora to use
ZFS as the /
and /boot
filesystems. The high-level overview of the plan is to install
ZFS-on-Linux, create a ZFS filesystem on a second storage device, use rsync to copy over all the data from the
current operating system install, install GRUB2 on the second storage device, boot off the second storage device,
and add the first storage device to the zpool as a mirror device. Easy.
Table of Contents
/dev/sda
is the disk with the non-ZFS system install; /dev/sdb
is a blank unused storage
device. Both drives are the same size (250GiB SSDs are used throughout this guide), which is important, if planning
on creating the mirrored storage pool as instructed at the end of this guide.
In this guide, I am using an MBR-style partition since my system doesn't use EFI. However, this guide will work for
an EFI enabled system, although the path to the grub.cfg
file will be different and an EFI System
Partition would need to be created.
Ideally, we'd just give the entire block device /dev/sdb
to ZFS, but since we need somewhere to
install GRUB2, we'll create an MBR style partition table with one partition that starts 1MiB into the space and
takes up all free space.
On /dev/sdb
, create a partition table with one partition filling the entire drive. Ensure the first
partition starts 1MiB into the drive, and set the partition ID to bf/Solaris
, otherwise GRUB2 will
throw errors at boot:
sgdisk -Z /dev/sdb GPT data structures destroyed! You may now partition the disk using fdisk or other utilities. echo -e 'o\nn\np\n\n\n\nY\nt\nbf\nw\n' | fdisk /dev/sdb Syncing disks. fdisk -l /dev/sdb Disk /dev/sdb: 232.9 GiB, 250059350016 bytes, 488397168 sectors Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disklabel type: dos Disk identifier: 0xe68f3500 Device Boot Start End Sectors Size Id Type /dev/sdb1 2048 488397167 488395120 232.9G bf Solaris
Make a note of your current kernel version. Ensure when the kernel-headers
and
kernel-devel
packages get installed they match your running kernel. Otherwise the ZFS DKMS module
will fail to build:
uname -r 4.11.12-200.fc26.x86_64
Install the packages for ZFS-on-Linux. Pay special attention to ensure the kernel-headers
and
kernel-devel
package versions match that of the installed kernel:
dnf install -y http://download.zfsonlinux.org/fedora/zfs-release.fc26.noarch.rpmdnf install -y --allowerasing kernel-headers kernel-devel zfs zfs-dkms zfs-dracut
Double check that all kernel package versions are the same:
rpm -qa | grep kernel kernel-core-4.11.12-200.fc26.x86_64 kernel-headers-4.11.12-200.fc26.x86_64 kernel-modules-4.11.12-200.fc26.x86_64 kernel-devel-4.11.12-200.fc26.x86_64 kernel-4.11.12-200.fc26.x86_64
Verify the ZFS DKMS module built and installed for the current kernel version:
dkms status spl, 0.7.3, 4.11.12-200.fc26.x86_64, x86_64: installed zfs, 0.7.3, 4.11.12-200.fc26.x86_64, x86_64: installed
If the zfs modules failed to install, see the Troubleshooting/ZFS DKMS Module Issues section.
Load the ZFS modules and confirm the ZFS modules loaded:
modprobe zfs lsmod | grep zfs zfs 3395584 6 zunicode 331776 1 zfs zavl 16384 1 zfs icp 253952 1 zfs zcommon 69632 1 zfs znvpair 77824 2 zcommon,zfs spl 106496 4 znvpair,zcommon,zfs,icp
In Linux, base device names such as /dev/sdb
are not guaranteed to be the same across reboots. Having
the device name change would prevent the pool from importing at boot. Thus, one of the presistent naming methods
found under /dev/disk/
must be used instead. There are several choices, all of which has pros and
cons. In this guide, since we are using a small two disk pool, we will use the /dev/disk/by-id/
method. See the ZoL FAQ for more details.
The file /etc/hostid
is used on Solaris systems as a unique system identifier, and while Linux does
not have an /etc/hostid
file, ZFS expects one to exist. Otherwise, it will use a random hostid which
will, seemingly randomly, change and prevent the system from booting.
Generate a new /etc/hostid
file:
zgenhostid
Determine the device ID of the /dev/sdb1
device:
find -L /dev/disk/by-id/ -samefile /dev/sdb1 /dev/disk/by-id/wwn-0x5002538d405cc6aa /dev/disk/by-id/ata-Samsung_SSD_850_EVO_250GB_S21NNXAG927110B-part1
In the following steps, replace ata-Samsung_SSD_850_EVO_250GB_S21NNXAG927110B-part1
with
the device ID of the second drive from your system.
Since the drives being used in this guide are both SSDs, which use 4KiB sector sizes, but which lie to the
operating system and claim to use 512 byte sectors, use the option ashift=12
to force ZFS to align
the filesystem to 4KiB. This will result in a little better performance at the expense of slightly less efficient
storage of small files.
The cachefile=none
option prevents the file /etc/zfs/zpool.cache
file from being
created/updated for this pool. When using ZFS as the root filesystem, any change to the pool layout will
invalidate the cache file and prevent the pool from being importable, and thus prevent the system from booting.
Create a ZFS storage pool named "tank
" on the secondary storage device:
zpool create -o ashift=12 -o cachefile=none tank /dev/disk/by-id/ata-Samsung_SSD_850_EVO_250GB_S21NNXAG927110B-part1 zpool list NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT tank 232G 468K 232G - 0% 0% 1.00x ONLINE -
When deciding which datasets to create and what ZFS features to enable, there are many choices to consider. In this
example, only a basic group of datasets will be created. For real-world use on a multiuser system, setting a quota
on tank/home
would be strongly recommended to prevent one user from taking all available space in the
pool. Other recommendations include the creation of separate datasets with quotas for tank/var
,
tank/var/log
, and tank/var/log/audit
, to prevent application logs from taking all
available space in the pool.
Many applications and users expect File Access Control Lists (FACLs) to "just work". When creating ZFS datasets,
the functionality of FACLs must be manually enabled per-dataset with the option acltype=posixacl
.
For datasets such as tank/tmp
, the ability to run executables can be disabled with the option
exec=off
.
Two other choices are whether or not compression and deduplication should be enabled. While testing with my data, deduplication doesn't provide any notable space savings, while taking a lot of system memory (8GiB in testing). Compression, on the other hand, is essentially free. On my datasets, compression does not negatively affect performance, and as of the writing of this guide, is currently saving 15GiB of space. On a 250GB SSD, that's a nice savings.
From previous testing, compression makes no difference on my /boot/
and /home/
directories.
My /home/
directory is using ecryptfs, which
effectively makes /home/
incompressible.
Create the ZFS datasets with desired options:
zfs create -o compression=lz4 -o dedup=off -o acltype=posixacl tank/root zfs create -o compression=off -o dedup=off -o acltype=posixacl -o quota=500M tank/root/boot zfs create -o compression=lz4 -o dedup=off -o acltype=posixacl -o exec=off -o setuid=off tank/root/tmp zfs create -o compression=lz4 -o dedup=off -o acltype=posixacl tank/root/var zfs create -o compression=off -o dedup=off -o acltype=posixacl tank/home
Next, create a ZFS ZVolume for swap. For the best performance, set the volume block size the same as the system's
page size; for most x86_64 systems, this will be 4KiB. The command getconf PAGESIZE can be used to
verify. The sync=always
option ensures that writes are not cached and immediately written out to
disk. The primarycache=metadata
option prevents read swap data from being cached in memory, which
would defeat the purpose of swap. ZVOLs will appear in the /dev/zvol/tankname
directory.
Create a 4GiB ZFS volume and configure it to be used as swap space:
zfs create -V 4G -b $(getconf PAGESIZE) -o logbias=throughput -o sync=always -o primarycache=metadata -o com.sun:auto-snapshot=false tank/swap mkswap /dev/zvol/tank/swap Setting up swapspace version 1, size = 4 GiB (4294963200 bytes) no label, UUID=e444defd-5710-4ace-a3d8-7ae7dfdb8302
Confirm the datasets were created successfully:
zfs list NAME USED AVAIL REFER MOUNTPOINT tank 876K 225G 112K /tank tank/home 96K 225G 96K /tank/home tank/root 392K 225G 104K /tank/root tank/root/boot 96K 500M 96K /tank/root/boot tank/root/tmp 96K 225G 96K /tank/root/tmp tank/root/var 96K 225G 96K /tank/root/var tank/swap 4.25G 118G 60K -
At this point, the ZFS storage pool and datasets have been created. ZFS is working. The next few steps will take us through copying over the installed operating system to the ZFS datasets.
Examine the default mountpoints
of the ZFS datasets:
zfs get mountpoint NAME PROPERTY VALUE SOURCE tank mountpoint /tank default tank/home mountpoint /tank/home default tank/root mountpoint /tank/root default tank/root/boot mountpoint /tank/root/boot default tank/root/tmp mountpoint /tank/root/tmp default tank/root/var mountpoint /tank/root/var default
Update the mountpoint for the dataset tank/home
to mount inside the directory
/tank/root/
:
zfs set mountpoint=/tank/root/home tank/home
Examine the new default mountpoints:
zfs get mountpoint NAME PROPERTY VALUE SOURCE tank mountpoint /tank default tank/home mountpoint /tank/root/home local tank/root mountpoint /tank/root default tank/root/boot mountpoint /tank/root/boot default tank/root/tmp mountpoint /tank/root/tmp default tank/root/var mountpoint /tank/root/var default
Confirm the datasets are mounted:
zfs mount tank /tank tank/root /tank/root tank/root/boot /tank/root/boot tank/root/tmp /tank/root/tmp tank/root/var /tank/root/var tank/home /tank/root/home
The next few steps will copy all data from the /
filesystem to the ZFS datasets that were created
earlier. Ideally this would be done with the root filesystem mounted read-only to prevent data from changing while
being copied.
If LVM is in use on /dev/sda
, one option would be to create snapshots of the logical volumes, then
mount and rsync the LVM snapshots. This is left as an exercise to the reader, since the
writer didn't think to do that until well after the fact.
Turn off all unnecessary services, like database servers and user programs that are likely to write to the disk while the copy is in progress. Cross your fingers, and hope for the best.
The rsync commands used below have a boat-load of options being passed. Broken down, they do the following:
All datasets except for tank/root
need to be unmounted for the initial copy of data:
echo /tank/root/{var,tmp,boot,home} | xargs -n1 zfs unmount zfs mount tank /tank tank/root /tank/root
Copy all data from the root filesystem to the ZFS dataset tank/root
:
rsync -WSHAXhavx --progress --stats / /tank/root/
Remount all ZFS datasets and confirm they mounted:
zfs mount -a zfs mount tank /tank tank/root /tank/root tank/root/boot /tank/root/boot tank/home /tank/root/home tank/root/tmp /tank/root/tmp tank/root/var /tank/root/var
Use rsync to copy the data from all other filesystems to the ZFS datasets. (Note the trailing slash on the source directory, it is very important.):
rsync -WSHAXhav --progress --stats /boot/ /tank/root/boot/rsync -WSHAXhav --progress --stats /tmp/ /tank/root/tmp/ rsync -WSHAXhav --progress --stats /var/ /tank/root/var/ rsync -WSHAXhav --progress --stats /home/ /tank/root/home/
There should now be an exact copy of the data from the old filesystems on the new ZFS filesystem.
Prepare a chroot environment to interactively enter the ZFS filesystem so that the the boot loader can be
installed on /dev/sdb
.
Bind mount the filesystems /dev/
, /sys/
, /run/
and /proc/
from
the running system into the chroot environment:
mount --rbind /dev/ /tank/root/dev/ mount --rbind /proc/ /tank/root/proc/ mount --rbind /sys/ /tank/root/sys/ mount --rbind /run/ /tank/root/run/
Enter the chroot:
chroot /tank/root/ /bin/bash
Generate a new /etc/fstab
to point to the ZFS datasets:
awk '$3 == "zfs" {$4="defaults"; print}' /proc/mounts | column -t > /etc/fstab
Edit /etc/fstab
, add in mount definitions for /proc/
(optional), /var/tmp/
(optional), and swap
:
$EDITOR /etc/fstab tank/root / zfs defaults 0 0 tank/root/boot /boot zfs defaults 0 0 tank/home /home zfs defaults 0 0 tank/root/tmp /tmp zfs defaults 0 0 tank/root/var /var zfs defaults 0 0 proc /proc proc rw,nosuid,nodev,noexec,hidepid=2 0 0 /tmp /var/tmp none bind 0 0 /dev/zvol/tank/swap swap swap defaults 0 0
Over the next few steps, various GRUB2 commands will be used to detect ZFS and install the GRUB2 boot loader. While GRUB2 does support ZFS, there are bugs. A major issue is that GRUB2 uses the command zpool status to determine the underlying block device. However, zpool status doesn't return the entire path to the block device. See http://list.zfsonlinux.org/pipermail/zfs-discuss/2016-June/025765.html and https://bugs.launchpad.net/ubuntu/+source/grub2/+bug/1632694 for more information.
Set the environmental variable ZPOOL_VDEV_NAME_PATH=yes
which changes zpool status's
default behavior:
zpool statusNAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 ata-Samsung_SSD_850_EVO_250GB_S21NNXAG927110B-part1 ONLINE 0 0 0 export ZPOOL_VDEV_NAME_PATH=YES zpool status NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 /dev/disk/by-id/ata-Samsung_SSD_850_EVO_250GB_S21NNXAG927110B-part1 ONLINE 0 0 0
Confirm that GRUB2 can recognize the ZFS filesystem, and install the GRUB2 boot loader:
grub2-probe / zfs grub2-install --modules=zfs /dev/sdb Installing for i386-pc platform. Installation finished. No error reported.
Update the GRUB2 configuration file /etc/default/grub
for the new ZFS filesystem.
GRUB_PRELOAD_MODULES="zfs"
must be added so that GRUB2 can detect the zpool at boot:
$EDITOR /etc/default/grub GRUB_TIMEOUT=5 GRUB_DISTRIBUTOR="$(sed 's, release .*$,,g' /etc/system-release)" GRUB_DEFAULT=saved GRUB_DISABLE_SUBMENU=true GRUB_TERMINAL_OUTPUT="console"GRUB_CMDLINE_LINUX="rd.lvm.lv=fedora_host-live/root rd.lvm.lv=fedora_host-live/boot rd.lvm.lv=fedora_host-live/swap rhgb quiet"GRUB_DISABLE_RECOVERY="true" GRUB_CMDLINE_LINUX="quiet" GRUB_PRELOAD_MODULES="zfs"
Regenerate the GRUB2 configuration file /boot/grub2/grub.cfg
:
grub2-mkconfig > /boot/grub2/grub.cfg Generating grub configuration file ...done
The above command may have generated some errors such as Command failed
which should be safe to ignore.
Enable the ZFS target and related services:
systemctl enable zfs.target zfs-import-scan zfs-share zfs-mount
Earlier when the zpool was created, the option cachefile=none
was used. This makes importing pools
slow on systems that have many disks, but also allows the system to boot if the layout of the zpool changes.
If the file /etc/zfs/zpool.cache
exists, Dracut will copy it into the initramfs, and then the system
will try to use it at boot. To prevent this from becoming a problem, delete the zpool.cache
file and
ensure the zpool has the cache file disabled:
rm -f /etc/zfs/zpool.cache zpool get cachefile NAME PROPERTY VALUE SOURCE tank cachefile none local
Rebuild the Initial RAM Filesystem using Dracut:
dracut --force /boot/initramfs-$(uname -r).img
Ensure the etc/zfs/zpool.cache
file does not exist in the initramfs file:
lsinitrd /boot/initramfs-$(uname -r).img | grep zpool.cache || echo OKAY OKAY
Exit the chroot environment and unmount all non-ZFS filesystems from the chroot:
exit umount -l /tank/root/{dev,proc,sys,run}
Set the datasets' mountpoints to legacy
in order to be compatible with the systemd way of mounting
filesystems:
zfs set mountpoint=legacy tank/root/boot zfs set mountpoint=legacy tank/home zfs set mountpoint=legacy tank/root/tmp zfs set mountpoint=legacy tank/root/var
The previous step deletes the /boot/
, /tmp/
, and /var/
directories, which
are required at boot for legacy mounting. Re-create them to keep the system happy:
mkdir /tank/root/{boot,tmp,var}/
The /tank/
directory is no longer needed on the new ZFS filesystem. Remove it:
rmdir /tank/root/tank/
SELinux has been left in Enforcing mode throughout this guide. A previous step has created a few directories that now have incorrect labels. Set the system to perform a relabel on the entire filesystem at boot:
touch /tank/root/.autorelabel
Set the mountpoint
for the tank/root
dataset to /
:
zfs set mountpoint=/ tank/root cannot mount '/': directory is not empty property may be set but unable to remount filesystem
Set the boot
filesystem property on the tank/root/boot
dataset:
zpool set bootfs=tank/root/boot tank
ZFS best practice is to not store any data in the base of the pool, and instead store data in datasets. Prevent
the base dataset tank
from automatically mounting during pool import:
zfs set canmount=off tank
Export the zpool so that it will be importable on reboot:
zpool export tank
Confirm SELinux is still on and that people who turn it off are total n00bs:
getenforce Enforcing
Reboot the system, and boot off of the /dev/sdb
device:
reboot
If any errors occurred, it is likely that an above step was either skipped or typo'd. Boot the system off
/dev/sda
, remount the datasets, enter the chroot environment, and double check all steps were
completed properly.
Congratulations! If you have followed the steps in this guide this far, you now have a bootable
ZFS filesystem and ZFS datasets for all primary filesystems. The next few steps in this guide will demonstrate how
to attach the /dev/sda
disk to be a member of the pool as a mirror device and show the performance of
a mirror setup.
Now that ZFS is working correctly and the system is bootable, a few final steps can be taken to attain the full benefits of using ZFS.
In my own setup, I chose to go with adding the /dev/sda
disk into the zpool as a mirror device. I
chose this because using a mirror disk in ZFS gives me a few huge benefits including, but not limited to: faster
read (2x) speed, self-healing, and bragging rights.
A new partition table must be created on /dev/sda
. The partition created must be the same size or
larger than the /dev/sdb1
partition created earlier in this guide.
Before creating the mirror, we'll test the read speed of a single disk setup.
Create a test file called readme
(I crack me up). Use the /dev/urandom
device instead of
/dev/zero
so that ZFS compression won't mess up the test results:
dd if=/dev/urandom of=readme count=128k bs=4096
Drop filesystem caches to ensure we are not measuring RAM speed. Test disk speed before mirror creation:
echo 3 > /proc/sys/vm/drop_caches pv -rb readme > /dev/null 512MiB [502MiB/s]
Clear out the existing partition table on /dev/sda
, then copy the partition table from
/dev/sdb
to /dev/sda
:
vgchange -an 0 logical volume(s) in volume group "fedora_host-live" now active sgdisk -Z /dev/sda GPT data structures destroyed! You may now partition the disk using fdisk or other utilities. sfdisk -d /dev/sdb | sfdisk /dev/sdaThe partition table has been altered. Calling ioctl() to re-read partition table. Syncing disks.
Examine the new partition table:
fdisk -l /dev/sda Disk /dev/sda: 223.6 GiB, 240057409536 bytes, 468862128 sectors Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disklabel type: dos Disk identifier: 0x84b667c6 Device Boot Start End Sectors Size Id Type /dev/sda1 2048 468862127 468860080 223.6G bf Solaris
Earlier in this guide, we installed GRUB2 on /dev/sdb
. By also installing GRUB2 on
/dev/sda
, this will allow the system to boot should either storage device fail. Install the GRUB2
boot-loader onto /dev/sda
:
ZPOOL_VDEV_NAME_PATH=YES grub2-install --modules=zfs /dev/sda Installing for i386-pc platform. Installation finished. No error reported.
Identify the device ID of the newly created partition:
find -L /dev/disk/by-id/ -samefile /dev/sda1 /dev/disk/by-id/wwn-0x5001517bb2a26677-part1 /dev/disk/by-id/ata-INTEL_SSDSC2CT240A3_CVMP234400PF240DGN-part1
Add /dev/sda1
into the tank by attach
ing the disk to the pool:
zpool attach tank /dev/disk/by-id/ata-Samsung_SSD_850_EVO_250GB_S21NNXAG927110B-part1 \ /dev/disk/by-id/ata-INTEL_SSDSC2CT240A3_CVMP234400PF240DGN-part1 Make sure to wait until resilver is done before rebooting.
Verify the disk was attached and that there is now a mirror:
zpool status tank pool: tank state: ONLINE scan: scrub repaired 0B in 0h4m with 0 errors on Sun Aug 6 13:10:06 2017 config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 ata-Samsung_SSD_850_EVO_250GB_S21NNXAG927110B-part1 ONLINE 0 0 0 ata-INTEL_SSDSC2CT240A3_CVMP234400PF240DGN-part1 ONLINE 0 0 0 errors: No known data errors
Drop filesystem caches to ensure we are not measuring RAM speed. Test disk speed after mirror creation:
echo 3 > /proc/sys/vm/drop_caches pv -rb readme > /dev/null 512MiB [1015MiB/s]
As shown in the results of the previous commands, adding a second device doubled the read speeds. Write speeds remain unaffected.
As noted in the summary at the top of this guide, I stated that ZFS compression is currently saving 15GiB of space.
The command zfs get compressratio can be used to display the current compression ratios:
zfs get compressratio NAME PROPERTY VALUE SOURCE tank compressratio 1.16x - tank/home compressratio 1.00x - tank/root compressratio 1.56x - tank/root/boot compressratio 1.00x - tank/root/tmp compressratio 2.14x - tank/root/var compressratio 1.63x - tank/swap compressratio 1.00x -
Using the command du with the --apparent-size
option shows the size a file claims to be:
du -hsxc --apparent-size /* 146M /boot 24M /etc 68G /home 4.1G /root 11M /tmp 6.6G /usr 33G /var
Without the --apparent-size
option, du shows the space on disk the file is actually
taking:
du -hsxc /* 150M /boot 16M /etc 70G /home 4.1G /root 291K /tmp 4.7G /usr 20G /var
Combining the output of the above two commands makes it easy to see where the differences in size are:
paste <(du -hsxc --apparent-size $(find / -maxdepth 1 -type d -fstype zfs)) <(du -hsxc $(find / -maxdepth 1 -type d -fstype zfs)) | \ awk 'BEGIN {print "Apparent Actual Mount"} {$2=""; print}' | column -t Apparent Actual Mount 29G 27G / ## 2GiB savings here 172M 161M /boot 10 1.5K /tank 12M 915K /tmp 68G 70G /home ## 4KiB sector size costs me 2GiB here. 33G 20G /var ## ZFS's lz4 compression saves me 13GiB here 130G 117G total
Comparing the output of the two du commands above, we can see on the directories /home/
and /boot/
, on which compression is disabled, that I'm actually using more disk space than the files
alone are taking up. This is due to the use of the 4KiB sector size, but as we can see on the directories
/etc/
, /tmp/
, /usr/
, and /var/
, compression is saving just over
15GiB of space.
Scrubbing is the method by which ZFS detects and corrects data errors. A ZFS scrub checks every block in the pool
and compares it to the block's checksum value. The ZFS storage pool is not scrubbed automatically, and thus, it
must be done manually or be scripted. To that end, I have written a small shell script which can be dropped into
/etc/cron.weekly/
. This script will automatically scrub the root zpool once a week and will complain
about any errors over wall and syslog.
Download and install the zfs-scrub-and-report.sh script:
wget -O /etc/cron.weekly/zfs-scrub-and-report.sh \ http://matthewheadlee.com/projects/zfs-as-root-filesystem-on-fedora/zfs-scrub-and-report.sh chmod 700 /etc/cron.weekly/zfs-scrub-and-report.sh /etc/cron.weekly/zfs-scrub-and-report.sh Scrubbing has started. Scrub complete, no errors detected.
One of the greatest superpowers of ZFS is its ability to create snapshots. A snapshot is an exact read-only copy of the dataset from the point in time in which the snapshot was taken. Snapshots use the available free space in the pool, but their storage requirements are generally pretty small as they only have to store the difference between the snapshot and the current state of the filesystem.
Using a small script, the creation and deletion of snapshots can be automated, giving the ability to rollback any dataset to any previous point in time or to recover deleted files. The ZFS-on-Linux Project provides a great shell script for this purpose. The zfs-auto-snapshot script installs itself as a series of cron jobs which run every 15 minutes, hourly, daily, weekly and monthly. See: https://github.com/zfsonlinux/zfs-auto-snapshot for more details.
Download and install the zfs-auto-snapshot package:
git clone 'https://github.com/zfsonlinux/zfs-auto-snapshot.git' /tmp/zfs-auto-snapshot/ Cloning into '/tmp/zfs-auto-snapshot'...Checking connectivity... done. make -C /tmp/zfs-auto-snapshot/ install install -d /etc/cron.d install src/zfs-auto-snapshot.sh /usr/local/sbin/zfs-auto-snapshot
By default neither cron nor anacron include the directory /usr/local/sbin/
in the PATH
environmental variable, meaning the hourly, daily, weekly, and monthly zfs-auto-snapshot cron jobs will not
execute.
Modify the cron configuration files /etc/crontab
, /etc/cron.d/0hourly
, and
/etc/anacrontab
by adding the directory /usr/local/sbin/
to the PATH
environmental variable:
$EDITOR /etc/crontabPATH=/sbin:/bin:/usr/sbin:/usr/binPATH=/sbin:/bin:/usr/sbin:/usr/bin:/usr/local/sbin
$EDITOR /etc/cron.d/0hourlyPATH=/sbin:/bin:/usr/sbin:/usr/binPATH=/sbin:/bin:/usr/sbin:/usr/bin:/usr/local/sbin
$EDITOR /etc/anacrontabPATH=/sbin:/bin:/usr/sbin:/usr/binPATH=/sbin:/bin:/usr/sbin:/usr/bin:/usr/local/sbin
Once snapshots have been taken, they are accessible under the root of each dataset through the directory
.zfs/
. An individual file from a specific snapshot could be restored by simply copying it from the
snapshot:
ls -l /.zfs/snapshot/ total 9 drwxrwxrwx. 1 root root 0 Nov 16 19:51 monday dr-xr-xr-x. 20 root root 25 Nov 16 09:01 zfs-auto-snap_frequent-2017-11-17-0030 drwxrwxrwx. 1 root root 0 Nov 16 19:51 zfs-auto-snap_frequent-2017-11-17-0045 ls -l /.zfs/snapshot/zfs-auto-snap_frequent-2017-11-17-0030/etc/passwd -rw-r--r--. 1 root root 2696 Jul 25 15:16 /.zfs/snapshot/zfs-auto-snap_frequent-2017-11-17-0030/etc/passwd ls -l /var/.zfs/snapshot/zfs-auto-snap_frequent-2017-11-17-0045/log/messages -rw-------. 1 root root 11093703 Nov 16 19:44 /var/.zfs/snapshot/zfs-auto-snap_frequent-2017-11-17-0045/log/messages
Snapshots can be listed through the zfs list command:
zfs list -t snapshot NAME USED AVAIL REFER MOUNTPOINT tank@zfs-auto-snap_frequent-2017-11-17-0030 0B - 124K - tank@zfs-auto-snap_frequent-2017-11-17-0045 0B - 124K - tank@zfs-auto-snap_frequent-2017-11-17-0100 0B - 124K - tank/home@zfs-auto-snap_frequent-2017-11-17-0030 124M - 66.7G - tank/home@zfs-auto-snap_frequent-2017-11-17-0045 114M - 66.7G - tank/home@zfs-auto-snap_frequent-2017-11-17-0100 81.2M - 66.8G - tank/root@zfs-auto-snap_frequent-2017-11-17-0030 292K - 24.7G - tank/root@zfs-auto-snap_frequent-2017-11-17-0045 184K - 24.7G - tank/root@zfs-auto-snap_frequent-2017-11-17-0100 152K - 24.7G - tank/root/boot@zfs-auto-snap_frequent-2017-11-17-0030 0B - 152M - tank/root/boot@zfs-auto-snap_frequent-2017-11-17-0045 0B - 152M - tank/root/boot@zfs-auto-snap_frequent-2017-11-17-0100 0B - 152M -
An entire dataset can be rolled back to a previous snapshot using the zfs rollback
command. If
rolling back any system datasets such as tank/root
, tank/var
, etc., it would be advised
to first reboot the system to rescue mode.
zfs rollback tank/root/tmp@zfs-auto-snap_frequent-2017-11-17-0045
By default, the Fedora install/rescue image lacks the ability to read, write, or mount newer versions of the ZFS filesystem. This lack of support for ZFS can cause headaches when trying to fix a non-bootable system which uses ZFS as root.
The next few steps will describe how to add support for the latest version of ZFS to the Fedora Live DVD.
Download the Fedora Live ISO and write it to a disc or thumb-drive and boot off it.
Install the official ZFS-on-Linux release rpm:
dnf install -y http://download.zfsonlinux.org/fedora/zfs-release.fc27.noarch.rpm
Install the kernel-headers
and kernel-devel
packages for the current running version of
the kernel. The Live image kernel will likely be a few releases behind the latest available version:
dnf install --allowerasing -y kernel-headers-$(uname -r) kernel-devel-$(uname -r) zfs zfs-dracut
Verify the ZFS DKMS module built and installed for the current kernel version and load them:
dkms status spl, 0.7.5, 4.13.9-300.fc27.x86_64, x86_64: installed zfs, 0.7.5, 4.13.9-300.fc27.x86_64, x86_64: installed modprobe zfs
If the zfs modules failed to install, see the Troubleshooting/ZFS DKMS Module Issues section.
Import the tank and setup a chroot as shown in the Troubleshooting section.
Sometimes during boot, the system will get dropped to the Dracut Emergency Shell. Generally, this seems to happen if there has been a physical change to the zpool, such as the failure of a drive. These problems can usually be fixed by importing the zpool setting up a chroot and correcting the issue.
Other times the system may completely fail to boot in which case a Fedora Live Image can be used to boot and get the zpool back online. See the Adding ZFS to the Rescue Disk section for details how to boot a system that will not boot.
Forcefully import the ZFS zpool:
zpool import -f tank
Set the ZFS mountpoints to /sysroot/
and mount everything:
zfs set mountpoint=/sysroot tank/root zfs set mountpoint=/sysroot/boot tank/root/boot zfs set mountpoint=/sysroot/tmp tank/root/tmp zfs set mountpoint=/sysroot/var tank/root/var zfs mount -a
Bind mount the filesystems /dev/
, /sys/
, /run/
, and /proc/
from the Initial RAM Filesystem into the chroot environment:
mount --rbind /dev/ /sysroot/dev/ mount --rbind /proc/ /sysroot/proc/ mount --rbind /sys/ /sysroot/sys/ mount --rbind /run/ /sysroot/run/
Enter the chroot environment:
chroot /sysroot/ /bin/bash
See the other troubleshooting sections to see how to solve common problems.
After the system has been fixed, exit the chroot environment, and unmount all non-ZFS filesystems from the chroot environment:
exit umount -l /sysroot/{dev,proc,sys,run}/
Set the ZFS tank/root
mountpoint back to /
:
zfs set mountpoint=/ tank/root
Set the ZFS datasets' mountpoints back to legacy:
zfs set mountpoint=legacy tank/root/boot zfs set mountpoint=legacy tank/root/tmp zfs set mountpoint=legacy tank/root/var
Export the zpool so that it will be importable on reboot, and then reboot:
zpool export tank reboot -f
At times the Initial RAM Filesystem will need to be rebuilt by hand. Below, are some of the reasons I've encountered:
/etc/hostid
file changes.
Rebuild the Initial RAM Filesystem using Dracut:
dracut --force /boot/initramfs-$(uname -r).img
Ensure the etc/zfs/zpool.cache
file does not exist in the initramfs:
lsinitrd /boot/initramfs-$(uname -r).img | grep zpool.cache || echo OKAY OKAY
If the spl
or zfs
module failed to build and/or install, the most common reason is
either that the kernel-headers
or kernel-devel
packages are not installed and/or the
installed version does not match the version of the kernel you are building for. Install the correct version of
the kernel-headers
or kernel-devel
packages and rebuild the modules.
The spl
& zfs
modules can be rebuilt using the dkms command. The
example below will rebuild the spl
& zfs
modules and initramfs for all installed kernel versions:
zfsver=$(rpm -q --qf "%{VERSION}\n" zfs) while read -r kernelver; do for module in spl zfs; do dkms add -k "${kernelver}" "${module}/${zfsver}" dkms build -k "${kernelver}" "${module}/${zfsver}" dkms install -k "${kernelver}" "${module}/${zfsver}" done initfilename="/boot/initramfs-${kernelver}.img" dracut --force "${initfilename}" "${kernelver}" if [[ "$(lsinitrd "${initfilename}" | grep -c -e spl.ko -e zfs.ko)" -ne 2 ]]; then echo "ERROR: failed to find spl or zfs modules in the "${initfilename}" initramfs." exit 1 fi done < <(rpm -q --qf "%{VERSION}-%{RELEASE}.%{ARCH}\n" kernel)
When a new version of ZFS is installed, commonly new ZFS features become available. To boot the system, GRUB2 the initial ramfs, and the root filesystem all need to use a version of zfs which supports the enabled features.
Upgrade GRUB2:
export ZPOOL_VDEV_NAME_PATH=YES grub2-probe / zfs grub2-install --modules=zfs /dev/sdX Installing for i386-pc platform. Installation finished. No error reported.
Rebuild the initial ramfs:
dracut --force /boot/initramfs-$(uname -r).img
Finally upgrade the zpool:
zpool upgrade tank
When booting, if errors like those seen below appear, ensure the directive GRUB_PRELOAD_MODULES="zfs"
exists in the file /etc/default/grub
and rerun grub2-mkconfig.
error: compression algorithm 17 not supported error: compression algorithm 37 not supported error: compression algorithm 79 not supported error: unknown device 66326293. error: compression algorithm inherit not supported error: unsupported embedded BP (type=47)
Updating the kernel tends to be problematic, the issues seem to mostly stem from the DKMS system rebuilding the ZFS kernel modules after the updated initramfs is built. Below is the order of steps I've found works best when doing a kernel update.
Perform a dnf update:
dnf update -y Last metadata expiration check: 1:29:05 ago on Tue Nov 21 19:44:54 2017. Dependencies resolved. ======================================================================== Package Arch Version Repository Size ======================================================================== Installing: kernel x86_64 4.13.13-100.fc26 updates 106 k kernel-core x86_64 4.13.13-100.fc26 updates 21 M kernel-devel x86_64 4.13.13-100.fc26 updates 12 M kernel-modules x86_64 4.13.13-100.fc26 updates 24 M kernel-modules-extra x86_64 4.13.13-100.fc26 updates 2.2 M Removing: kernel x86_64 4.12.11-200.fc26 @updates 0 kernel-core x86_64 4.12.11-200.fc26 @updates 54 M kernel-devel x86_64 4.12.11-200.fc26 @updates 43 M kernel-modules x86_64 4.12.11-200.fc26 @updates 23 M kernel-modules-extra x86_64 4.12.11-200.fc26 @updates 2.0 M Transaction Summary ======================================================================== Install 5 Packages Upgrade 5 Packages Remove 5 Packages Total download size: 74 M
Confirm the spl
and zfs
kernel modules built for the newly installed kernel:
find /lib/modules/4.13.13-100.fc26.x86_64/ -name zfs.ko -o -name spl.ko /lib/modules/4.13.13-100.fc26.x86_64/extra/zfs.ko /lib/modules/4.13.13-100.fc26.x86_64/extra/spl.ko
If the zfs modules failed to install, see the Troubleshooting/ZFS DKMS Module Issues section.
Notice the spl
and zfs
kernel modules do not exist in the initramfs:
lsinitrd /boot/initramfs-4.13.13-100.fc26.x86_64.img | grep -e spl.ko -e zfs.ko
Ensure the zpool.cache
file does not exist, and rebuild the initramfs for the new kernel version:
rm -f /etc/zfs/zpool.cache dracut --force /boot/initramfs-4.13.13-100.fc26.x86_64.img 4.13.13-100.fc26.x86_64
Rebuild the GRUB2 configuration file:
grub2-mkconfig > /boot/grub2/grub.cfg Generating grub configuration file ...Found linux image: /boot/vmlinuz-4.13.13-100.fc26.x86_64 Found initrd image: /boot/initramfs-4.13.13-100.fc26.x86_64.img done
Confirm the initramfs for the newly installed kernel now contains the spl
and zfs
kernel
modules:
lsinitrd /boot/initramfs-4.13.13-100.fc26.x86_64.img | grep -e spl.ko -e zfs.ko -rw-r--r-- 1 root root 157352 Aug 11 06:23 usr/lib/modules/4.13.13-100.fc26.x86_64/extra/spl.ko -rw-r--r-- 1 root root 3000240 Aug 11 06:23 usr/lib/modules/4.13.13-100.fc26.x86_64/extra/zfs.ko
Reboot to the newly installed kernel:
reboot
Copyright © 2017 Matthew Headlee. This work is licensed under an Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) license.