Linux Namespaces: Isolating a Process in 50 Lines of Go¶
Written by:
Igor Gorovyy
DevOps Engineer Lead & Senior Solutions Architect
A container is a process with a restricted view of the system. That's it. No magic, no virtual machines. The Linux kernel gives a process its own "namespace," and the process thinks it's alone on the machine.
I built the Sheep container runtime from scratch in Go, and in this series I'll show how it works from the inside. Let's start with the most important piece -- namespaces.
Five namespaces that make a container a container¶
Linux provides several types of namespaces. Each one isolates a different aspect:
| Namespace | Flag | What it isolates |
|---|---|---|
| UTS | CLONE_NEWUTS |
hostname -- the container gets its own name |
| PID | CLONE_NEWPID |
processes -- the container only sees its own PIDs |
| Mount | CLONE_NEWNS |
filesystem -- its own root, its own mounts |
| IPC | CLONE_NEWIPC |
inter-process communication -- shared memory, semaphores |
| Network | CLONE_NEWNET |
network -- its own network stack, IP, ports |
What it looks like in code¶
Here's the key function from Sheep that starts an isolated process:
func startContainer(c *Container) (int, error) {
cmd := reexecCommand(c)
cmd.SysProcAttr = &syscall.SysProcAttr{
Cloneflags: syscall.CLONE_NEWUTS |
syscall.CLONE_NEWPID |
syscall.CLONE_NEWNS |
syscall.CLONE_NEWIPC |
syscall.CLONE_NEWNET,
Unshareflags: syscall.CLONE_NEWNS,
}
cmd.Stdin = os.Stdin
cmd.Stdout = os.Stdout
cmd.Stderr = os.Stderr
if err := cmd.Start(); err != nil {
return 0, fmt.Errorf("start namespaced process: %w", err)
}
pid := cmd.Process.Pid
if err := setupCgroups(c, pid); err != nil {
cmd.Process.Kill()
return 0, fmt.Errorf("setup cgroups: %w", err)
}
if err := setupNetworkForContainer(c, pid); err != nil {
fmt.Fprintf(os.Stderr, "warning: network setup failed: %v\n", err)
}
go cmd.Wait()
return pid, nil
}
Everything hinges on one line -- SysProcAttr.Cloneflags. We pass the kernel a set of flags, and it creates a child process in separate namespaces.
What each flag does¶
CLONE_NEWUTS -- hostname isolation¶
A process in a new UTS namespace gets its own hostname. Without this, sethostname() would change the hostname of the entire machine.
if hostname != "" {
if err := syscall.Sethostname([]byte(hostname)); err != nil {
return fmt.Errorf("sethostname: %w", err)
}
}
CLONE_NEWPID -- process isolation¶
Inside a PID namespace, the process thinks its PID is 1. It can't see any host processes. It's like a separate operating system, just sharing one kernel.
CLONE_NEWNS -- filesystem isolation¶
A mount namespace gives the process its own mount table. The container can mount /proc, /sys, /tmp without affecting the host.
CLONE_NEWIPC -- IPC isolation¶
Shared memory segments, semaphores, message queues -- all separate. The container can't eavesdrop on the host's IPC.
CLONE_NEWNET -- network isolation¶
Its own network stack: its own interfaces, IP addresses, routing table, iptables. Until we set up a veth pair and bridge, the container has no network at all.
How it all fits together¶
graph TB
subgraph "Host"
HOST_PROC["Host processes<br/>PID 1, 2, 3..."]
HOST_NET["Host network<br/>eth0, 192.168.1.x"]
HOST_FS["Host filesystem<br/>/, /home, /var"]
end
subgraph "Container (new namespaces)"
C_PROC["PID 1 (init)"]
C_NET["Container network<br/>eth0, 10.20.0.x"]
C_FS["Own filesystem<br/>/, /proc, /tmp"]
C_HOST["hostname: sheep-abc123"]
end
HOST_PROC -.->|"CLONE_NEWPID"| C_PROC
HOST_NET -.->|"CLONE_NEWNET"| C_NET
HOST_FS -.->|"CLONE_NEWNS"| C_FS
The host sees the container process as a regular process with a high PID. But from the inside, the container only sees itself.
What can go wrong¶
Namespaces provide isolation but not resource limits. A process in a new PID namespace can still eat all the machine's memory. For limits, you need cgroups -- that's coming in part 4.
And one more thing. Go spawns several threads before main() even runs. This means that clone() with namespace flags doesn't work quite right if you call it directly. You need the re-exec pattern, which we'll cover in the next part.
Try it yourself¶
git clone https://github.com/igorovh/sheep && cd sheep
go build ./cmd/sheep
sudo ./sheep bootstrap minimal
sudo ./sheep run --name test minimal /bin/sh
# Inside the container:
hostname # you'll see the shortened container ID
ps aux # only one process -- /bin/sh
ip addr # its own network interface
Next up, we'll look at why Go and clone() clash and how the re-exec pattern fixes it.
Next: Re-Exec Pattern
