Skip to content

OverlayFS: Copy-on-Write Layers Like Docker

OverlayFS: Copy-on-Write Layers Like Docker

Written by:

Igor Gorovyy
DevOps Engineer Lead & Senior Solutions Architect

LinkedIn


Imagine you have 10 containers from the same nginx image. Each has its own filesystem, can write files, change configs. But copying a 100MB rootfs for each one is wasteful. OverlayFS solves this with copy-on-write.

This is part 5 of the Sheep & Shepherd series. Previous parts: namespaces gave the container its own view of the system, re-exec worked around Go's threading model, pivot_root gave it its own filesystem root, and cgroups v2 capped its resources. Now we make that filesystem cheap to spin up.

The idea is simple

OverlayFS stacks several directories into one. There's a lower layer (read-only, from the image) and an upper layer (read-write, for the container). The container sees both as a single filesystem.

graph TB
    subgraph "What the container sees"
        MERGED["merged/<br/>/bin, /etc, /var, /tmp<br/>(all together)"]
    end

    subgraph "Reality"
        LOWER["lower/ (image)<br/>read-only<br/>/bin/sh, /etc/hosts..."]
        UPPER["upper/ (container)<br/>read-write<br/>new and modified files"]
        WORK["work/<br/>OverlayFS temp files"]
    end

    LOWER --> MERGED
    UPPER --> MERGED
    WORK -.-> MERGED

How reads work

When the container reads a file: 1. OverlayFS looks in upper (container layer) 2. If not there -- looks in lower (image layer) 3. The container doesn't know where the file came from

How writes work (copy-up)

When the container modifies a file from the lower layer: 1. OverlayFS copies the file from lower to upper (copy-up) 2. Changes are written to the copy in upper 3. Lower stays unchanged 4. Subsequent reads of this file come from upper

New files are created directly in upper.

The code in Sheep

func (m *Manager) setupOverlay(id, lowerDir string) (string, error) {
    overlayBase := filepath.Join(m.baseDir, "overlay", id)
    upper := filepath.Join(overlayBase, "upper")
    work := filepath.Join(overlayBase, "work")
    merged := filepath.Join(overlayBase, "merged")

    for _, d := range []string{upper, work, merged} {
        os.MkdirAll(d, 0755)
    }

    if err := mountOverlay(lowerDir, upper, work, merged); err != nil {
        // Fallback: copy rootfs if overlay isn't supported
        return copyRootFS(lowerDir, merged)
    }

    return merged, nil
}

And the mount itself:

func mountOverlay(lower, upper, work, merged string) error {
    opts := fmt.Sprintf(
        "lowerdir=%s,upperdir=%s,workdir=%s",
        lower, upper, work)
    return syscall.Mount("overlay", merged, "overlay", 0, opts)
}

A single mount() system call with type "overlay" and options -- and the filesystem is ready. Same syscall.Mount machinery we used in pivot_root, just with a different filesystem type.

Directory structure

/var/lib/sheep/
  images/
    abc123/
      rootfs/          <- lower layer (image)
        bin/
        etc/
        usr/
  overlay/
    container_id/
      upper/           <- container changes
      work/            <- OverlayFS work directory
      merged/          <- what the container sees

Fallback for systems without OverlayFS

Not all systems support OverlayFS (for example, some older kernels or macOS for development). So there's a fallback -- plain copying:

func copyRootFS(src, dst string) (string, error) {
    entries, err := os.ReadDir(src)
    if err != nil {
        return "", err
    }
    for _, e := range entries {
        srcPath := filepath.Join(src, e.Name())
        dstPath := filepath.Join(dst, e.Name())

        info, err := e.Info()
        if err != nil { continue }

        if info.IsDir() {
            os.MkdirAll(dstPath, info.Mode())
            copyRootFS(srcPath, dstPath)
        } else if info.Mode().IsRegular() {
            copyFile(srcPath, dstPath)
        }
    }
    return dst, nil
}

It works, but it's slow and eats disk. OverlayFS is a much better solution.

Cleanup

When a container is removed:

func (m *Manager) cleanupOverlay(id string) {
    overlayBase := filepath.Join(m.baseDir, "overlay", id)
    merged := filepath.Join(overlayBase, "merged")
    unmountOverlay(merged)
    os.RemoveAll(overlayBase)
}

First unmount the overlay, then delete upper, work, merged.

Why this is efficient

10 containers from nginx: - Without overlay: 10 x 100MB = 1GB - With overlay: 100MB (one lower) + minimal (upper for each)

Docker goes further -- its layers can be shared between different images. If ubuntu:22.04 and nginx use the same base layers, they're stored only once. The official Docker docs on the overlay2 storage driver cover the layer-sharing model in detail.

In Sheep there's one lower layer per image. Simpler, but the principle is the same.

What we simplified

Copy-up copies the entire file even if you changed one byte. For large files (say, databases) this can be slow. That's why Docker recommends using volumes for frequently changing data.

Try it yourself

# Start two containers from the same image:
sudo ./sheep run --name ov1 -d minimal /bin/sleep 3600
sudo ./sheep run --name ov2 -d minimal /bin/sleep 3600
# Compare overlay directories:
ls /var/lib/sheep/overlay/
# upper/ is unique per container, lower (rootfs) is shared

The filesystem is in place. Next up -- bridge networking: how to give a container an IP and connect it to the network.

Previous: Cgroups v2