EROFS Snapshotter

The EROFS snapshotter is a native containerd snapshotter to enable the EROFS filesystem, specifically to keep EROFS‑formatted blobs for each committed snapshot and to prepare an OverlayFS mount for each active snapshot.

In order to convert OCI container images directly into EROFS-formatted blobs, the EROFS differ must be specified with the EROFS snapshotter. Otherwise, if the walking differ is used, the EROFS snapshotter will behave much like the existing OverlayFS snapshotter: the applier of the walking differ will unpack the current layer into the active EROFS snapshot of the mounted OverlayFS, and the EROFS snapshotter will commit it into an EROFS-formatted blob, which is slower than using the EROFS differ. Also see the Configuration section.

Although the EROFS snapshotter sounds somewhat similar to an enhanced OverlayFS snapshotter, several kernel features are highly tied to the EROFS internals, so it would be better to leave it as an independent snapshotter. This way, existing OverlayFS users will not be impacted by the new EROFS‑specific behaviors, and interested users will also have a chance to use the EROFS filesystem and even develop the related ecosystems (such as ComposeFS, confidential containers, gVisor, Kata, nerdbox and more) together.

Use Cases

The EROFS snapshotter can benefit to several use cases:

For runC containers, instead of unpacking individual files into a directory on the backing filesystem, it applies OCI layers into EROFS blobs, therefore:

Improved image unpacking performance by live-converting tar archives into EROFS-formatted layers during unpacking, compared with directly unpacking to the host filesystem: By converting to EROFS-formatted layers, there is no extra filesystem metadata journal traffic during handling individual files and no need to remove a large number of files when GCing unused snapshots;
Here are the unpacking benchmark results against the OverlayFS snapshotter using containerd 2.2.1 (pulling from a local registry, without parallel unpacking):
Parallel unpacking is now supported natively, similar to the OverlayFS snapshotter. This capability is difficult to implement in disk‑snapshot‑style snapshotters such as blockfile, devmapper and ZFS snapshotters. It also uses an efficient method to persist layer data (via fsync) compared to the OverlayFS snapshotter, which can only use syncfs;
Better data persistence guarantee: compared to directly unpacking to the host filesystem, it provides better semantics by fsyncing the individual EROFS-formatted layer blobs instead of syncfsing the whole disk each time.
Full data protection for each snapshot using the FS_IMMUTABLE_FL file attribute and fsverity. EROFS uses FS_IMMUTABLE_FL and fsverity to protect each EROFS layer blob, ensuring the mounted tree remains immutable. However, since FS_IMMUTABLE_FL and fsverity protect individual files rather than a sub-filesystem tree, other snapshotter implementations like the overlayfs snapshotter are not quite applicable due to less efficiency at least;
Support given‑size block devices as the upper layer for OverlayFS to limit the disk quota for writable layers (usually ephemeral storage);
A dedicated EROFS default mount handler enables EROFS file‑backed mounts to avoid loop devices on runC. Note that specific runtime shims can handle EROFS mounts without this built‑in handler; for more details, see containerd Mounts and Mount Management ;
Native EROFS layers can be pulled from registries without conversion.

For VM containers, the EROFS snapshotter can efficiently pass through and share image layers, offering several advantages (e.g. better performance and smaller memory footprints) over virtiofs or 9p . Besides, the popular application kernel gVisor also supports EROFS for efficient image pass-through.

Why consider EROFS over other kernel filesystems?

EROFS is specifically designed as an immutable filesystem with the following highlights:

Lightweight, flexible on-disk format: Designed for archival use to avoid any serious filesystem consistency issues and minimize attack vectors. There is no need to estimate filesystem size or total inode counts in advance, unlike generic filesystems like EXT4;
Multi-device support: Enables native layering or content-addressable storage;
Bdev- and file-backed mounts: File-backed mounts supported since Linux 6.12, eliminating the need for loopback devices. This covers the latest mainstream distributions such as RHEL 10, Fedora 40, Debian 13, Ubuntu 26.04 LTS (or 24.04 LTS with HWE kernels), and more;
Memory sharing: Supports FSDAX using virtio-pmem and per-inode page cache sharing.

Usage

Ensure that EROFS filesystem is available

On newer Ubuntu/Debian systems, erofs-utils can be installed directly using the apt command, and on Fedora it can be installed directly using the dnf command.

# Debian/Ubuntu
$ apt install erofs-utils
# Fedora
$ dnf install erofs-utils

Make sure that erofs-utils version is 1.7 or higher.

When using the EROFS snapshotter, before starting containerd, also make sure the EROFS kernel module is loaded (Linux 5.4 or later is required): it can be loaded with modprobe erofs.

Checking if the EROFS snapshotter and differ are available

To check if the EROFS snapshotter is available, run the following command:

$ ctr plugins ls | grep erofs

The following message will be shown like below:

io.containerd.snapshotter.v1           erofs                    linux/amd64    ok
io.containerd.differ.v1                erofs                    linux/amd64    ok

Configuration

The following configuration can be used in your containerd config.toml. Don’t forget to restart containerd after changing the configuration.

  [plugins."io.containerd.snapshotter.v1.erofs"]
      # Enable fsverity support for EROFS layers, default is false
      enable_fsverity = true

      # Optional: Additional mount options for overlayfs
      ovl_mount_options = []

  [plugins."io.containerd.service.v1.diff-service"]
    default = ["erofs","walking"]

Note that if erofs-utils is version 1.8 or higher, you can add -T0 --mkfs-time to the differ’s mkfs_options to enable reproducible builds, as shown below:

  [plugins."io.containerd.differ.v1.erofs"]
    mkfs_options = ["-T0", "--mkfs-time"]

If erofs-utils is 1.8.2 or higher, it’s preferred to append --sort=none to the differ’s mkfs_options to avoid unnecessary tar data reordering for improved performance, as shown below:

  [plugins."io.containerd.differ.v1.erofs"]
    mkfs_options = ["-T0", "--mkfs-time", "--sort=none"]

Running a container

To run a container using the EROFS snapshotter, it needs to be explicitly specified:

$ # ensure that the image we are using exists; it is a regular OCI image
$ ctr image pull docker.io/library/busybox:latest
$ # run the container with the provides snapshotter
$ ctr run -rm -t --snapshotter erofs docker.io/library/busybox:latest hello sh

Quota Support

EROFS supports block mode to generate fixed‑size virtual blocks as the upper layers for overlayfs with a given filesystem formatted in order enable the disk quota. The default_size option can be used in the containerd configuration:

  [plugins."io.containerd.snapshotter.v1.erofs"]
    default_size = "20GiB"

Data Integrity

The EROFS snapshotter provides three methods to consolidate data integrity:

Data Integrity with Immutable File Attribute

By setting set_immutable = true, the EROFS snapshotter marks each layer blob with IMMUTABLE_FL. This ensures that dirty data is flushed immediately and the EROFS layer blob cannot be deleted, renamed, or modified.

The immutable file attribute is mainly used to ensure data persistence and prevent artificial data loss, but it cannot detect data corruption caused by hardware failures. Since it can flush in-memory dirty data, it may significantly increase the unpacking time it takes to launch a container: for example, the unpacking time for tensorflow:2.19.0 increases by 108.86% (from 10.090s to 21.074s) on EXT4. However, it has no impact on runtime performance.

Data Integrity with fs-verity

By setting enable_fsverity = true, the EROFS snapshotter will:

Enable fs-verity on EROFS layers during commit;
Verify the fs-verity status before mounting layers;
Skip fs-verity if the filesystem or kernel does not support it.

The fs-verity method guarantees that EROFS blob layers never change, but it introduces additional runtime overhead since all container image reads from the container will be slower because it needs to verify the Merkle hash tree first.

Data Integrity with dm-verity

The EROFS snapshotter supports device-mapper verity to provide block-level integrity verification for each EROFS layer. This method creates a dm-verity device for each layer and mounts it read-only. The dm-verity implementation uses the go-dmverity Go library, eliminating the need for external veritysetup command-line tools. This requires a Linux kernel with dm-verity support (CONFIG_DM_VERITY) and the device-mapper kernel module loaded.

The differ must be configured to generate dm-verity metadata:

[plugins."io.containerd.differ.v1.erofs"]
  enable_dmverity = true

When dm-verity is enabled, the EROFS differ formats each layer with dm-verity by appending a Merkle hash tree to the EROFS blob and generating a root hash. The hash tree is stored inline within the layer blob itself. The root hash and hash offset are saved in a .dmverity metadata file alongside the layer blob in JSON format. All other dm-verity parameters (block sizes, salt, etc.) are stored in a superblock within the layer blob and are auto-detected when mounting. Regular mode uses 4096-byte blocks (standard page size), while tar-index mode uses 512-byte blocks (dm-verity logical_block_size constraint).

The snapshotter can be configured to control dm-verity behavior using dmverity_mode:

[plugins."io.containerd.snapshotter.v1.erofs"]
  dmverity_mode = "auto"  # Options: "auto" (default), "on", "off"

The available modes are:

"auto" (default): Uses dm-verity if .dmverity metadata exists for a layer, otherwise mounts as regular EROFS. This allows mixing dm-verity and non-dm-verity layers in the same system.
"on": Requires dm-verity for all layers. If a layer lacks .dmverity metadata, mounting will fail with an error. Use this mode when you want to enforce integrity verification for all layers.
Important: If you enable dmverity_mode = "on" after layers have already been unpacked without dm-verity enabled in the differ, those existing layers will not have .dmverity metadata files. In this case, you must clean up the existing snapshots and re-pull the images with both enable_dmverity = true in the differ and dmverity_mode = "on" in the snapshotter configured. Alternatively, use dmverity_mode = "auto" to allow mixing dm-verity and non-dm-verity layers.
"off": Disables dm-verity completely, even if .dmverity metadata exists. Layers are mounted as regular EROFS without integrity verification. Use this for compatibility or when dm-verity overhead is unacceptable.

When mounting a layer with dm-verity enabled, the snapshotter reads the metadata from the .dmverity file and creates a dm-verity device. The dm-verity library automatically reads all parameters from the superblock, ensuring that any corruption or tampering will be detected at read time. The dm-verity device is then mounted as the backing layer in the OverlayFS stack

How It Works

For each layer, the EROFS snapshotter prepares a directory containing the following items:

  .erofslayer
  fs
  work

.erofslayer file is used to indicate that the layer is prepared by the EROFS snapshotter.

If the EROFS differ is also enabled, the differ will check for the existence of .erofslayer and convert the image content blob (e.g., an OCI layer) into an EROFS layer blob.

In this case, the snapshot layer directory will look like this:

  .erofslayer
  fs
  layer.erofs
  work

If dm-verity is enabled, a .dmverity metadata file will also be present:

  .erofslayer
  fs
  layer.erofs
  layer.erofs.dmverity
  work

Then the EROFS snapshotter will check for the existence of layer.erofs: it will mount the EROFS layer blob to fs/ and return a valid overlayfs mount with all parent layers. If dm-verity is enabled and the .dmverity file exists, the snapshotter will create a dm-verity device and mount that instead.

If other differs (not the EROFS differ) are used, the EROFS snapshotter will convert the flat directory into an EROFS layer blob on Commit instead.

In other words, the EROFS differ can only be used with the EROFS snapshotter; otherwise, it will skip to the next differ. The EROFS snapshotter can work with or without the EROFS differ.

Tar Index Mode

The EROFS differ also supports a “tar index” mode that offers a unique approach to handling OCI image layers:

Instead of extracting the entire tar archive to create an EROFS filesystem, the tar index mode:

Generates a tar index for the tar content
Appends the original tar content to the index
Creates a combined file: [Tar index][Original tar content]

The tar index can be stored in a registry alongside image layers, allowing nodes to fetch it directly when needed. Typically, the tar index is much smaller than a full EROFS blob, making it more efficient to store and transfer. If the tar index is not available in the registry, it can be generated on the node as a fallback. When integrating with dm-verity, the registry can also store the dm-verity Merkle tree and root hash signature together with the tar index, enabling nodes to retrieve all necessary artifacts without redundant computation.

In addition, we have a tar diffID for each layer according to the OCI image spec, so we don’t need to reinvent a new way to verify the image layer content for confidential containers but just calculate the sha256 of the original tar data (because erofs could just reuse the tar data with 512-byte fs block size and build a minimal index for direct mounting of tar) out of the tar index mode in the guest and compare it with each diffID.

Configuration

For the EROFS differ:

[plugins."io.containerd.differ.v1.erofs"]
  enable_tar_index = true