ZFS Notes

Home / Notes / ZFS Notes
========================

These are some notes about working with OpenZFS (mainly on MacOSX,
but they should apply generally to any system using OpenZFS).

These notes use the standard Bourne shell convention of '$' for the
shell prompt for a non-root / unprivileged user and assume that this
user uses the 'sudo' command to executed commands with the necessary
privileges.  When working as a privileged user (e.g. root), sudo may
be omitted.  The prompt is not part of the commands listed, and, may
be different (for example, csh/tcsh historically used '%' as the
prompt).

These notes refer to a conventional spinning disk as a hard drive
and a drive based on non-volatile storage (such as flash) as a
solid state drive or ssd.

-------------------
High Level Concepts
-------------------

There are two important high level concepts when working with ZFS.
The first is the concept of pools or zpool, which are groups of one
or more disks.  The second is the concept of a dataset, which is
like a conventional filesystem.

----------------
Creating a zpool
----------------

To work with ZFS, a pool of one or more disks (called a 'zpool')
needs to first be created.

1. For a single hard disk:

    $ sudo zpool create -f -o ashift=12 \
                           -o feature@async_destroy=enabled \
                           -o feature@empty_bpobj=enabled \
                           -o feature@lz4_compress=enabled \
                           -o feature@spacemap_v2=enabled \
                           -O casesensitivity=insensitive \
                           -O normalization=formD \
                           -O compression=lz4 \
                           [name] [device]

   where [name] is the name of the pool, and [device] is the operating
   system device identifier for the hard disk.

   For example, if the hard drive is /dev/disk5 (as shown by '$ diskutil
   list' on MacOSX), then the following command will create a pool named
   'Mercury' on that hard drive:

    $ sudo zpool create -f -o ashift=12 \
                           -o feature@async_destroy=enabled \
                           -o feature@empty_bpobj=enabled \
                           -o feature@lz4_compress=enabled \
                           -o feature@spacemap_v2=enabled \
                           -O casesensitivity=insensitive \
                           -O normalization=formD \
                           -O compression=lz4 \
                           Mercury disk5

   The '-O casesensitivity=insensitive' and '-O normalization=formD' are
   optional, but helpful on MacOSX because they make the pool to act like
   more like a standard Mac disk, see:

   https://openzfsonosx.org/wiki/Zpool#Feature_flags

   Note, this command also enables compression for the zpool, which I
   prefer.  For more information about ZFS compression, see:

   https://klarasystems.com/articles/openzfs1-understanding-transparent-compression/

   To enable encryption for the entire pool, add '-O encryption=aes-256-ccm
   -O keyformat=passphrase' to the list of options, see:

   https://openzfsonosx.org/forum/viewtopic.php?f=11&t=3713#p11803

2. To combine multiple hard drives and make them look seem like one
   larger drive:

    $ sudo zpool create -f -o ashift=12 \
                           -o feature@async_destroy=enabled \
                           -o feature@empty_bpobj=enabled \
                           -o feature@lz4_compress=enabled \
                           -o feature@spacemap_v2=enabled \
                           -O casesensitivity=insensitive \
                           -O normalization=formD \
                           -O compression=lz4 \
                           [name] [device1] ... [deviceN]

   where [name] is the name of the pool, and [device1] through [deviceN]
   are the operating system device identifiers for the hard disks being
   combined.

   This is the equivalent of a RAID 0 volume and offers no protection
   against drive failure, see:

   https://en.m.wikipedia.org/wiki/Standard_RAID_levels

   I do not generally use this.

3. To mirror the contents of one drive to multiple other drives:

    $ sudo zpool create -f -o ashift=12 \
                           -o feature@async_destroy=enabled \
                           -o feature@empty_bpobj=enabled \
                           -o feature@lz4_compress=enabled \
                           -o feature@spacemap_v2=enabled \
                           -O casesensitivity=insensitive \
                           -O normalization=formD \
                           -O compression=lz4 \
                           [name] mirror [device1] ... [deviceN]

   where [name] is the name of the pool, and [device1] through [deviceN]
   are the operating system device identifiers for the hard disks being
   combined.  The 'mirror' command between the [name] and [device1]
   through [deviceN] tells zfs to mirror the contents of the first disk
   on all the other disks.

   This is the equivalent of a RAID 1 volume and preserves data as long
   as at least one drive is operational, see:

   https://en.m.wikipedia.org/wiki/Standard_RAID_levels

4. When the disks in the zpool are ssds, use '-o ashift=13' instead of
   '-o ashift=12', as above.  An ashift value of 13 may be better for
   ssds to avoid performance degradation over time, see, e.g.:

   https://openzfs.github.io/openzfs-docs/Performance%20and%20Tuning/Workload%20Tuning.html#alignment-shift-ashift
   https://old.reddit.com/r/zfs/comments/6duz3j/zfs_on_a_purely_ssd_setup/

   In addition, for ssds, TRIM support should be enabled:

    $ sudo zpool set autotrim=on [name]

   where [name] is the name of the pool.  TRIM is a ssd specific feature
   that helps limit performance degradation of ssds, see:

   https://en.m.wikipedia.org/wiki/Trim_(computing)

   For SMR hard drives, TRIM support should be disabled:

    $ sudo zpool set autotrim=off [name]

   See: https://vermaden.wordpress.com/2022/05/08/zfs-on-smr-drives/

----------------------------
Mounting / Importing a zpool
----------------------------

Once created, a zpool can be mounted as follows:

    $ sudo zpool import [name]

For example, the following command imports the zpool named 'Mercury'
created above:

    $ sudo zpool import Mercury

By default, a zpool is mounted in read-write mode, so that data
can be read from and written to the disk(s) in the zpool.  To mount
a zpool as read-only so that data can only be read from but no data
can be written to it (for example to avoid data corruption), the
following command can be used:

    $ sudo zpool import -o readonly=on [name]

For example, the following command imports the zpool named 'Mercury'
created above in read-only mode:

    $ sudo zpool import -o readonly=on Mercury

-------------------------------
Disabling Spotlight for a zpool
-------------------------------

On MacOSX, spotlight is a built-in desktop search feature that
I do not like enabled on zpools.  It can be disabled for zpools
as follows:

    $ sudo mdutil -i off /Volumes/[name]
    $ sudo mkdir -p /Volumes/[name]/.fseventsd
    $ sudo touch /Volumes/[name]/.fseventsd/no_log
    $ sudo touch /Volumes/[name]/.metadata_never_index

where [name] is the zpool's name.

Sources: https://openzfsonosx.org/wiki/Stopping_Spotlight_etc._from_changing_ZFS_without_permission
         https://apple.stackexchange.com/questions/6707/

----------------------------------
Listing Available / Mounted zpools
----------------------------------

Currently available / mounted zpools can be listed as follows:

    $ zpool list

-----------------------------------------
Unmounting / Ejecting / Exporting a zpool
-----------------------------------------

If a zpool is mounted, it can be unmounted / ejected as follows:

    $ sudo zpool export [name]

All mounted zpools can be ejected as follows:

    $ sudo zpool export -a

----------------
Renaming a zpool
----------------

Once a pool has been created, it can be renamed by first exporting
the pool (if it is currently mounted), and then reimporting it with
its new name:

    $ sudo zpool export [old name]
    $ sudo zpool import [old name] [new name]

where [old name] is the zpool's current name, and [new name] is the
zpool's new desired name.

For example, the following commands export the zpool Mercury and
rename it to Mercury7:

    $ sudo zpool export Mercury
    $ sudo zpool import Mercury Mercury7

Source: https://forums.freebsd.org/threads/renaming-zfs-pool-via-zpool-import.65498/

-----------------
Scrubbing a zpool
-----------------

The data integrity on a zpool can be confirmed by 'scrubbing' it.

1. To start a scrub:

    $ sudo zpool scrub [name]

2. To monitor a scrub that is in progress:

    $ zpool status [secs]

   where [secs] is the refresh interval

3. To stop / cancel a scrub that is in progress:

    $ sudo zpool scrub -s [name]

4. To pause a scrub that is in progress:

    $ sudo zpool scrub -p [name]

5. To resume a paused scrub (this is same command used to start a new
scrub):

    $ sudo zpool scrub [name]

In each of the above commands, [name] is the zpool's name.

Source: https://openzfs.github.io/openzfs-docs/man/8/zpool-scrub.8.html

-------------------------------------------------
Creating an Encrypted Dataset / Volume on a zpool
-------------------------------------------------

An encrypted volume may be created on a zpool as follows:

    $ sudo zfs create -o com.apple.browse=off \
                      -o com.apple.mimic_hfs=on \
                      -o encryption=on \
                      -o keylocation=prompt \
                      -o keyformat=passphrase \
                      [pool]/[vol]

where [pool] is the zpool's name and [vol] is the name of the
encrypted volume.  This command will prompt for the password
for the encrypted volume.

For example, the following command creates an encrypted volume
named 'crypto' on the zpool Mercury, which was created above:

    $ sudo zfs create -o com.apple.browse=off \
                      -o com.apple.mimic_hfs=on \
                      -o encryption=on \
                      -o keylocation=prompt \
                      -o keyformat=passphrase \
                      Mercury/crypto

The '-o com.apple.browse=off' and '-o com.apple.mimic_hfs=on'
options are useful for making the encrypted volume act like a
traditional MacOS / MacOSX drive but may be omitted, if that
compatibility is not needed.

Source: https://blog.heckel.io/2017/01/08/zfs-encryption-openzfs-zfs-on-linux/

------------------------------------
Enable sha512 Checksums for a Volume
------------------------------------

To enable SHA512 checksum for data on volume / filesystem:

    $ sudo zfs set checksum=sha512 [pool]/[vol]

where [pool] is the zpool's name and [vol] is the name of the
volume / filesystem on that zpool.

SHA512 checksum for data is faster than the default on 64-bit
systems by about 50%, see:

https://www.freebsd.org/cgi/man.cgi?query=zpool-features&sektion=7

Alternatively, 'checksum=skein', which is 80% faster than the default
on 64-bit systems, or 'checksum=edonr', which is 350% faster than the
default, could be used.  I do not use 'checksum=edonr' because FreeBSD
does not currently support it.  I may transition to 'checksum=skein'
in the future since it may slightly more secure than sha512, see:

https://www.freebsd.org/cgi/man.cgi?query=zpool-features&sektion=7

------------------------------------------------
Increasing / Decreasing Record Size for a Volume
------------------------------------------------

For volumes where large files are going to be stored, increasing the
record size from the default of 128K may provide some performance
benefits.

The record size can be set as follows:

    $ sudo zfs set recordsize=[size] [pool]/[vol]

where:

    [size] is the desired minimum record size
    [pool] is the zpool's name
    [vol]  is the name of the volume / filesystem

For example, the following command sets the record size for the
'crypto' encrypted volume on the Mercury zpool created above to
1 megabyte (1M):

    $ sudo zfs set recordsize=1M Mercury/crypto

If very small files are going to be stored on a volume, decreasing
the record size may provide some performance benefits.  For example,
the following command would reduce the minimum record size to 64
kilobytes (64K):

    $ sudo zfs set recordsize=64K Mercury/crypto

A few rules of thumb for setting the record size are:

    1024K for general-purpose file sharing/storage
      64K for KVM virtual machines using Qcow2 file-based storage
      16K for MySQL InnoDB
       8K for PostgreSQL

Source: https://blog.programster.org/zfs-record-size
        https://klarasystems.com/articles/tuning-recordsize-in-openzfs/

-------------------------------
Mounting All Volumes on a zpool
-------------------------------

The following command can be used to mount all the volumes on a zpool:

    $ zfs list -rH -o name [pool] | xargs -L 1 sudo zfs mount

where [pool] is the name of the zpool.

Source: https://serverfault.com/questions/450818/recursively-mounting-zfs-filesystems

-----------------------------------------
Destroying a Volume/Filesystem on a zpool
-----------------------------------------

To destroy a volume / filesystem on a zpool:

    $ sudo zfs destroy [pool]/[vol]

where:

    [pool] is the zpool's name
    [vol]  is the name of the volume / filesystem that should be destroyed

WARNING: BE CAREFUL - THERE IS NO PROMPT FOR CONFIRMATION

------------------
Creating Snapshots
------------------

To create a snapshot:

    $ zfs snapshot [pool]/[vol]@[snap]

where:

    [pool] is the zpool's name
    [vol]  is the name of the volume / filesystem for which a snapshot
           should be created
    [snap] is the name for the snapshot (ex. 2021-11-05-01)

------------------
Renaming Snapshots
------------------

To rename a snapshot:

    $ zfs rename [pool]/[vol]@[old_name] [new_name]

where:

    [pool]     is the zpool's name
    [vol]      is the name of the volume / filesystem on which the snapshot
               that is being renamed is located
    [old_name] is the current name for the snapshot (ex. 2021-11-05-01)
    [new_name] is the new name for the snapshot

--------------------
Destroying Snapshots
--------------------

To destroy a snapshot:

    $ zfs destroy [pool]/[vol]@[snap]

where:

    [pool] is the zpool's name
    [vol]  is the name of the volume / filesystem for which a snapshot
           should be created
    [snap] is the name for the snapshot (ex. 2021-11-05-01)

WARNING: BE CAREFUL - THERE IS NO PROMPT FOR CONFIRMATION

Source: https://www.freebsd.org/doc/handbook/zfs-zfs.html
        https://serverfault.com/questions/192927/

-----------------
Listing Snapshots
-----------------

1. To list all snapshots:

    $ zfs list -t snapshot

    or

    $ zfs list -t all

2. To List latest snapshot on all volumes:

    $ zfs list | awk '/enc\// { print $1; }' | \
      while read VOL ; do \
          zfs list -H -t snapshot "$VOL" | \
          sort -rn | head -1 | awk '{ print $1; }'; \
      done

---------------------
Replicating Snapshots
---------------------

1. The first time a snapshot is being replicated, the entire snapshot must
   be sent, as follows:

    $ zfs send -ec -v [src_pool]/[src_vol]@[snap] | \
      zfs receive -s -v [dest_pool]/[dest_vol]

   where:

    [src_pool]  is the name of the zpool where the snapshot is located
    [src_vol]   is the name of the filesystem where the snapshot is located
    [snap]      is the snapshot's name (ex. 2021-11-05-01)
    [dest_pool] is the name of the zpool to which the snap will be
                replicated
    [dest_vol]  is the name of the filesystem to which the snapshot will
                be replicated

2. Once a snapshot has been replicated, incremental differences can be sent
   as follows:

    $ zfs send -ec -v -i [src_pool]/[src_vol]@[snap_old] \
                         [src_pool]/[src_vol]@[snap_new] | \
      zfs receive -s -v -F [dest_pool]/[dest_vol]

   where:

    [src_pool]  is the name of the zpool where the snapshot is located
    [src_vol]   is the name of the filesystem where the snapshot is located
    [snap_old]  is the starting snapshot's name (ex. 2021-11-05-01); this
                snapshot must exist on [dest_pool]/[dest_vol]
    [snap_new]  is the ending snapshot's name (ex. 2021-12-05-01)
    [dest_pool] is the name of the zpool to which the snap will be
                replicated
    [dest_vol]  is the name of the filesystem to which the snapshot will
                be replicated

3. If replication fails before completing, it can be resumed as follows:

    $ TOKEN="`zfs get -H receive_resume_token [dest_pool]/[dest_vol] | \
      awk '{ print $3;}'`"
    $ zfs send -ec -v -t "$TOKEN" | zfs receive -s -v [dest_pool]/[dest_vol]

   where:

    [dest_pool] is the name of the zpool to which the snap will be
                replicated
    [dest_vol]  is the name of the filesystem to which the snapshot will
                be replicated

   In some cases, it may be necessary to abort the incomplete replicated
   snapshot, which can be done as follows:

    $ zfs receive -a [dest_pool]/[dest_vol]

Source: https://www.freebsd.org/doc/handbook/zfs-zfs.html
        https://forums.freebsd.org/threads/incremental-zfs-backup-errors.44530/
        https://unix.stackexchange.com/questions/343675/

-------------
Helpful Links
-------------

ZFS For Dummies - https://ikrima.dev/dev-notes/homelab/zfs-for-dummies/

-------
History
-------

09/05/2023 - Add link to ZFS For Dummies
09/04/2023 - Convert to html; add notes about enabling encryption when
             creating a zpool
05/09/2022 - Add tips for SMR drives0
04/22/2022 - Initial Version