In the last part of our self-hosting journey, we set up the basics of our home server with Ansible-NAS. Now, it’s time to dig deeper into redundancy and reliability, the core pillars of any storage solution that aims to protect data long-term. This article will focus on using ZFS for creating a redundant and resilient storage setup.
What is ZFS?
ZFS (Zettabyte File System) is popular in home labs and enterprise solutions alike, and for good reason. ZFS brings a unique combination of redundancy, performance, and data integrity checks, making it ideal for home server environments where data protection is crucial. With ZFS, you can set up various RAID configurations (mirroring and RAID-Z) and use snapshots to back up and restore your data efficiently. Before we dive into setting up ZFS, let’s understand what RAID is.
What is RAID?
RAID (Redundant Array of Independent Disks) is a technology that combines multiple hard drives into a single logical unit to improve redundancy, performance, or both. There are several RAID levels, each with its own advantages and disadvantages. The most common RAID levels are:
- RAID 0: Striping without Redundancy
- RAID 1: Mirroring
- RAID 5: Striping with Parity
- RAID 6: Striping with Double Parity
- RAID 10: Mirroring and Striping
And zfs actually adds RAID 7 which is called RAID-Z3 in zfs terms. It’s similar to RAID 6 but with triple parity.
I will not go into details explaining each RAID level, especially because there is a perfect wiki article about it.
But I’ll list some properties of each RAID level for you to understand the differences:
RAID Level | ZFS name | Minimum Disks | Fault Tolerance | Read Performance | Write Performance | Restore Performance |
---|---|---|---|---|---|---|
RAID 0 | Striped | 2 | 0 | High | High | n/a |
RAID 1 | Mirror | 2 | 1 | High | Low | High |
RAID 5 | RAID-Z1 | 3 | 1 | High | Medium | Low |
RAID 6 | RAID-Z2 | 4 | 2 | High | Low | Low |
RAID-Z3 | 5 | 3 | High | Low | Low | |
RAID 10 | RAID-Z1 | 4 | 1 | High | High | High |
Fault tolerance means how many disks can fail without losing data. Read, write, and restore performance are self-explanatory.
The obvious choices for a NAS are RAID 1 and RAID 6. RAID 1 is simple and provides good read performance, while RAID 6 (for improved redundancy, I mean RAID 5 seems to be just inferior) provides more fault tolerance and better write performance. BUT RAID 6 is more complex and requires more disks. More disks mean they consume more energy and generate more heat and noise. On top of this if any disk fails in mirror RAID, you can just replace it and rebuild the mirror by simply copying all the blocks from the heathy disk. In RAID 6 you need to calculate the parity and write it to the new disk, which could take a long time, and you also should have your fingers crossed that no other disk fails during this time.
Yes, you do have better fault tolerance with RAID 6, but I’ll do backups anyway, so I don’t need that much fault tolerance, and I prefer the simplicity of RAID 1 and convenience of buing less disks, but each disk with more capacity which is usually cheaper ($ per Tb).
How to ZFS
What I like about zfs is it’s simplicity for the end user. ZFS provides 2 utilities: zpool
and zfs
. One is for managing pools, another one for managing datasets.
zpool
Creation of zfs pool is super easy, all it takes is providing bare minimum information - name of the pool (which by default also acts as mounting point name), raid level (optional), and a list of devices. I’ve created my mirror (RAID1) with the following command:
zpool create rust mirror \ /dev/disk/by-id/wwn-0x5000c500db125622 \ /dev/disk/by-id/wwn-0x5000c500db13fa95
The reason why I listed my devices by an ID is that I can plug those devices using any SATA cable. But I could totally do zpool create rust sda sdb
instead.
ZFS mounted my pool on /rust
. I could provide different mounting poin using -m
flag. To get more information on how to work with zpool
just man zpool
, the manpage documentation is really good.
Now it’s time to create some datasets.
zfs datasets
As soon as the pool is created and (auto)mounted it’s perfectly usable by itself. I could create all the directories under /rust
and be done with it. But not all the directories should have all the same properties. Some directories should be encrypted, some directories will store large files and would require larger blocksizes, some would have loads of frequently changed files and might benefit from disbled atime
, etc. And this is what datasets are for.
Some of the most useful properties you can set on the zfs dataset are:
compression
recordsize
quota
andreservation
atime
sync
copies
encryption
There is zfsprops(7) manpage that will give you much more information about all the different options and how they work much better than I can do in a single article. If you are curious how I configured my datasets, here it is, all my datasets and options:
# zfs get all -s localNAME PROPERTY VALUE SOURCErust compression lz4 localrust xattr sa localrust dedup off localrust acltype posix localrust relatime on localrust/db/data recordsize 16K localrust/documents copies 2 localrust/documents keylocation prompt localrust/downloads recordsize 4K localrust/encrypted keylocation prompt localrust/media recordsize 1M local
Keep in mind that all the nested datasets inherit from parent, i.e. I have rust/docker
which simply inherits all the rust
properties:
# zfs get all rust/docker -s local,inheritedNAME PROPERTY VALUE SOURCErust/docker compression lz4 inherited from rustrust/docker xattr sa inherited from rustrust/docker dedup off inherited from rustrust/docker acltype posix inherited from rustrust/docker relatime on inherited from rust
Typically, you want encryption=on
on all datasets which store your sensitive data, atime=off
on all datasets unless you want to record atime
, recordsize=1M
(or larger) on all datasets which mostly store large files (i.e. movies), and you definitely want compression, it comes with such a minimal overhead that there is virtually no reason not to enable compression on the top level (in my case, rust
).
If you came across dedup
and thinking it might be a great option to turn on, think twice. Deduplication comes at tremendous overhead, and generally not advised
Definitely read OpenZFS deduplication is good now and you shouldn’t use it first!
Working with datasets’ options
You can specify options using -o
flag when creating a dataset, i.e.
zfs create -o recordsize=1m -o rust/media
or using zfs set
:
zfs set recordsize=16K rust/db
To get all the options of a specific (or all) datasets, use zfs get <property> <dataset>
. The command supports some other flags, feel free to explore them using man zfs-get
. My personal most used commands are:
zfs get allzfs get all -s localzfs get all <dataset>zfs get all -s local <dataset>
And finally another useful command you should know about is listing all datasets. It’s as easy as:
zfs list
Once again, man zfs-list
has all the information you need know about using the command.
Home server file structure
I ended up using the following file structure:
# tree /rust -L 1/rust├── db├── docker├── documents├── downloads├── encrypted├── exchange└── media
Which should be fairly self-explanatory. All the docker mounted volumes are in rust/docker
, the only exception is databases’ volumes for data and logs.
# zfs get all rust/db/data -s localNAME PROPERTY VALUE SOURCErust/db/data recordsize 16K local
The recordsize of 16K
exactly matches MySQL’s page size, and I mainly use MySQL. I also have a couple of PostgreSQLs storing data in the same dataset, and PostreSQL page is 8K
, and having 16K
block sizes for Postgre is sub-optimal, but I was lazy to create a dataset for Postgre alone. Another option was to specify recordsize as 8K
and make MySQL use 8K
blocks via innodb_page_size
option as per the MySQL documentation:
Each tablespace consists of database pages. Every tablespace in a MySQL instance has the same page size. By default, all tablespaces have a page size of 16KB
; you can reduce the page size to 8KB
or 4KB
by specifying the innodb_page_size
option when you create the MySQL instance. You can also increase the page size to 32KB
or 64KB
. For more information, refer to the innodb_page_size
documentation.
But I was even lazier to use this approach xD
scrub
One of the coolest features of ZFS is it’s ability to find and fix data corruption automagically, thanks to scrub function. A scrub operation is a thorough check of all data on the ZFS pool, ensuring that each block of data is consistent with its checksum. If corruption is detected, ZFS can attempt to repair it using redundant copies in the mirror or parity data in RAID-Z configurations.
To start manual scrub
on a zpool, use the following command:
zpool scrub rust # or whatever your pool name is
This initiates a scrub of the entire rust pool, scanning every block for errors. For larger pools, this process may take some time, but ZFS is designed to run scrubs in the background with minimal performance impact on other tasks.
Automating scrubs
It’s a good idea to automate scrubs so you don’t forget to run them regularly. Depending on your server’s setup, you can add a cron job or systemd timer to automatically initiate a scrub, for example, every month. Here’s a simple cron entry to run a monthly scrub on the first day of each month at 3:00 AM:
0 3 1 * * /sbin/zpool scrub rust
This cron job helps keep your data safe with regular, automated checks, so any data issues are quickly caught and corrected.
Monitoring scrub
status
To check the status of a scrub, use:
zpool status rust
The output is plain english, so it’s easy to read:
pool: rust state: ONLINE scan: scrub in progress since Tue Nov 5 00:51:58 2024 511G scanned at 10.7G/s, 6.28G issued at 134M/s, 2.24T total 0B repaired, 0.27% done, 04:51:54 to go
Once the scrub is complete, the output will show if any data errors were detected and whether they were successfully repaired.
Wrapping up
With your ZFS pool set up, datasets configured, and a regular scrub schedule in place, you’ve created a robust foundation for your home server. ZFS’s combination of redundancy, compression, and data integrity checks makes it a powerful tool for protecting your data long-term.
It’s a lot I didn’t cover about ZFS in this article, but what I did cover should give you a solid starting point for setting up a reliable storage solution. The essentials—pool creation, dataset configuration, and regular scrubbing—are the backbone of any ZFS setup, providing data protection and consistency with minimal maintenance.
Just to give you a hint on what’s not covered (and a good refresher for myself), here is a list:
- Performance optimisation. We briefly touched on fine-tuning
recordsize
andcompression
, but what I haven’t noticed is having NVMe SSD for caching (which I haven’t implemented myself yet, but looking forward for it) - Snapshots and Clones. This allows to create a point-in-time backups of datasets. Snapshots are invaluable for rollback after accidental changes and for creating space-efficient clones for testing.
- Replication. ZFS replication sends datasets to a remote server for an extra layer of backup and disaster recovery. I’ll do a separate article about this once I get around backing up my valuable datasets off site.
- Encryption. I briefly touched encryption in this article, but there’s more to explore.
To explore ZFS further I’d recomment this article on archlinux wiki.
That’s it for the ZFS. In the next article I’ll show what software I run on my home server.