Redundancy with ZFS

05 Nov, 2024 (updated: 05 Nov, 2024)

2061 words | 10 min to read | 6 hr, 11 min to write

In the last part of our self-hosting journey, we set up the basics of our home server with Ansible-NAS. Now, it’s time to dig deeper into redundancy and reliability, the core pillars of any storage solution that aims to protect data long-term. This article will focus on using ZFS for creating a redundant and resilient storage setup.

What is ZFS?

ZFS (Zettabyte File System) is popular in home labs and enterprise solutions alike, and for good reason. ZFS brings a unique combination of redundancy, performance, and data integrity checks, making it ideal for home server environments where data protection is crucial. With ZFS, you can set up various RAID configurations (mirroring and RAID-Z) and use snapshots to back up and restore your data efficiently. Before we dive into setting up ZFS, let’s understand what RAID is.

What is RAID?

RAID (Redundant Array of Independent Disks) is a technology that combines multiple hard drives into a single logical unit to improve redundancy, performance, or both. There are several RAID levels, each with its own advantages and disadvantages. The most common RAID levels are:

RAID 0: Striping without Redundancy
RAID 1: Mirroring
RAID 5: Striping with Parity
RAID 6: Striping with Double Parity
RAID 10: Mirroring and Striping

And zfs actually adds RAID 7 which is called RAID-Z3 in zfs terms. It’s similar to RAID 6 but with triple parity.

I will not go into details explaining each RAID level, especially because there is a perfect wiki article about it.

But I’ll list some properties of each RAID level for you to understand the differences:

RAID Level	ZFS name	Minimum Disks	Fault Tolerance	Read Performance	Write Performance	Restore Performance
RAID 0	Striped	2	0	High	High	n/a
RAID 1	Mirror	2	1	High	Low	High
RAID 5	RAID-Z1	3	1	High	Medium	Low
RAID 6	RAID-Z2	4	2	High	Low	Low
	RAID-Z3	5	3	High	Low	Low
RAID 10	RAID-Z1	4	1	High	High	High

Fault tolerance means how many disks can fail without losing data. Read, write, and restore performance are self-explanatory.

The obvious choices for a NAS are RAID 1 and RAID 6. RAID 1 is simple and provides good read performance, while RAID 6 (for improved redundancy, I mean RAID 5 seems to be just inferior) provides more fault tolerance and better write performance. BUT RAID 6 is more complex and requires more disks. More disks mean they consume more energy and generate more heat and noise. On top of this if any disk fails in mirror RAID, you can just replace it and rebuild the mirror by simply copying all the blocks from the heathy disk. In RAID 6 you need to calculate the parity and write it to the new disk, which could take a long time, and you also should have your fingers crossed that no other disk fails during this time.

Yes, you do have better fault tolerance with RAID 6, but I’ll do backups anyway, so I don’t need that much fault tolerance, and I prefer the simplicity of RAID 1 and convenience of buing less disks, but each disk with more capacity which is usually cheaper ($ per Tb).

How to ZFS

What I like about zfs is it’s simplicity for the end user. ZFS provides 2 utilities: zpool and zfs. One is for managing pools, another one for managing datasets.

zpool

Creation of zfs pool is super easy, all it takes is providing bare minimum information - name of the pool (which by default also acts as mounting point name), raid level (optional), and a list of devices. I’ve created my mirror (RAID1) with the following command:

zpool create rust mirror \
    /dev/disk/by-id/wwn-0x5000c500db125622 \
    /dev/disk/by-id/wwn-0x5000c500db13fa95

The reason why I listed my devices by an ID is that I can plug those devices using any SATA cable. But I could totally do zpool create rust sda sdb instead.

ZFS mounted my pool on /rust. I could provide different mounting poin using -m flag. To get more information on how to work with zpool just man zpool, the manpage documentation is really good.

Now it’s time to create some datasets.

zfs datasets

As soon as the pool is created and (auto)mounted it’s perfectly usable by itself. I could create all the directories under /rust and be done with it. But not all the directories should have all the same properties. Some directories should be encrypted, some directories will store large files and would require larger blocksizes, some would have loads of frequently changed files and might benefit from disbled atime, etc. And this is what datasets are for.

Some of the most useful properties you can set on the zfs dataset are:

compression
recordsize
quota and reservation
atime
sync
copies
encryption

There is zfsprops(7) manpage that will give you much more information about all the different options and how they work much better than I can do in a single article. If you are curious how I configured my datasets, here it is, all my datasets and options:

# zfs get all -s local
NAME                          PROPERTY              VALUE                          SOURCE
rust                          compression           lz4                            local
rust                          xattr                 sa                             local
rust                          dedup                 off                            local
rust                          acltype               posix                          local
rust                          relatime              on                             local
rust/db/data                  recordsize            16K                            local
rust/documents                copies                2                              local
rust/documents                keylocation           prompt                         local
rust/downloads                recordsize            4K                             local
rust/encrypted                keylocation           prompt                         local
rust/media                    recordsize            1M                             local

Keep in mind that all the nested datasets inherit from parent, i.e. I have rust/docker which simply inherits all the rust properties:

# zfs get all rust/docker -s local,inherited
NAME         PROPERTY              VALUE                  SOURCE
rust/docker  compression           lz4                    inherited from rust
rust/docker  xattr                 sa                     inherited from rust
rust/docker  dedup                 off                    inherited from rust
rust/docker  acltype               posix                  inherited from rust
rust/docker  relatime              on                     inherited from rust

Typically, you want encryption=on on all datasets which store your sensitive data, atime=off on all datasets unless you want to record atime, recordsize=1M (or larger) on all datasets which mostly store large files (i.e. movies), and you definitely want compression, it comes with such a minimal overhead that there is virtually no reason not to enable compression on the top level (in my case, rust).

If you came across dedup and thinking it might be a great option to turn on, think twice. Deduplication comes at tremendous overhead, and generally not advised

Definitely read OpenZFS deduplication is good now and you shouldn’t use it first!

Working with datasets’ options

You can specify options using -o flag when creating a dataset, i.e.

zfs create -o recordsize=1m -o rust/media

or using zfs set:

zfs set recordsize=16K rust/db

To get all the options of a specific (or all) datasets, use zfs get <property> <dataset>. The command supports some other flags, feel free to explore them using man zfs-get. My personal most used commands are:

zfs get all
zfs get all -s local
zfs get all <dataset>
zfs get all -s local <dataset>

And finally another useful command you should know about is listing all datasets. It’s as easy as:

zfs list

Once again, man zfs-list has all the information you need know about using the command.

Home server file structure

I ended up using the following file structure:

# tree /rust -L 1
/rust
├── db
├── docker
├── documents
├── downloads
├── encrypted
├── exchange
└── media

Which should be fairly self-explanatory. All the docker mounted volumes are in rust/docker, the only exception is databases’ volumes for data and logs.

# zfs get all rust/db/data -s local
NAME          PROPERTY              VALUE                  SOURCE
rust/db/data  recordsize            16K                    local

The recordsize of 16K exactly matches MySQL’s page size, and I mainly use MySQL. I also have a couple of PostgreSQLs storing data in the same dataset, and PostreSQL page is 8K, and having 16K block sizes for Postgre is sub-optimal, but I was lazy to create a dataset for Postgre alone. Another option was to specify recordsize as 8K and make MySQL use 8K blocks via innodb_page_size option as per the MySQL documentation:

Each tablespace consists of database pages. Every tablespace in a MySQL instance has the same page size. By default, all tablespaces have a page size of 16KB; you can reduce the page size to 8KB or 4KB by specifying the innodb_page_size option when you create the MySQL instance. You can also increase the page size to 32KB or 64KB. For more information, refer to the innodb_page_size documentation.

But I was even lazier to use this approach xD

scrub

One of the coolest features of ZFS is it’s ability to find and fix data corruption automagically, thanks to scrub function. A scrub operation is a thorough check of all data on the ZFS pool, ensuring that each block of data is consistent with its checksum. If corruption is detected, ZFS can attempt to repair it using redundant copies in the mirror or parity data in RAID-Z configurations.

To start manual scrub on a zpool, use the following command:

zpool scrub rust # or whatever your pool name is

This initiates a scrub of the entire rust pool, scanning every block for errors. For larger pools, this process may take some time, but ZFS is designed to run scrubs in the background with minimal performance impact on other tasks.

Automating scrubs

It’s a good idea to automate scrubs so you don’t forget to run them regularly. Depending on your server’s setup, you can add a cron job or systemd timer to automatically initiate a scrub, for example, every month. Here’s a simple cron entry to run a monthly scrub on the first day of each month at 3:00 AM:

0 3 1 * * /sbin/zpool scrub rust

This cron job helps keep your data safe with regular, automated checks, so any data issues are quickly caught and corrected.

Monitoring `scrub` status

To check the status of a scrub, use:

zpool status rust

The output is plain english, so it’s easy to read:

  pool: rust
 state: ONLINE
  scan: scrub in progress since Tue Nov  5 00:51:58 2024
        511G scanned at 10.7G/s, 6.28G issued at 134M/s, 2.24T total
        0B repaired, 0.27% done, 04:51:54 to go

Once the scrub is complete, the output will show if any data errors were detected and whether they were successfully repaired.

Wrapping up

With your ZFS pool set up, datasets configured, and a regular scrub schedule in place, you’ve created a robust foundation for your home server. ZFS’s combination of redundancy, compression, and data integrity checks makes it a powerful tool for protecting your data long-term.

It’s a lot I didn’t cover about ZFS in this article, but what I did cover should give you a solid starting point for setting up a reliable storage solution. The essentials—pool creation, dataset configuration, and regular scrubbing—are the backbone of any ZFS setup, providing data protection and consistency with minimal maintenance.

Just to give you a hint on what’s not covered (and a good refresher for myself), here is a list:

Performance optimisation. We briefly touched on fine-tuning recordsize and compression, but what I haven’t noticed is having NVMe SSD for caching (which I haven’t implemented myself yet, but looking forward for it)
Snapshots and Clones. This allows to create a point-in-time backups of datasets. Snapshots are invaluable for rollback after accidental changes and for creating space-efficient clones for testing.
Replication. ZFS replication sends datasets to a remote server for an extra layer of backup and disaster recovery. I’ll do a separate article about this once I get around backing up my valuable datasets off site.
Encryption. I briefly touched encryption in this article, but there’s more to explore.

To explore ZFS further I’d recomment this article on archlinux wiki.

That’s it for the ZFS. In the next article I’ll show what software I run on my home server.

This article is part of the Let's self-host! series:

What is ZFS?

What is RAID?

How to ZFS

zpool

zfs datasets

Working with datasets’ options

Home server file structure

scrub

Automating scrubs

Monitoring scrub status

Wrapping up

Monitoring `scrub` status