Reconciliation Loop: The Heart of Shepherd¶

Written by:

Igor Gorovyy
DevOps Engineer Lead & Senior Solutions Architect

The reconciliation loop is the heart of any orchestrator. The idea is simple: you describe the desired state, and controllers constantly compare it with the actual state and fix the difference. The entire Kubernetes architecture is built on exactly this — and in Shepherd we reproduce the same model, taking Kubernetes as the reference for our implementation.

The Pattern¶

graph TD
    A["Observe<br/>read current state"] --> B["Compare<br/>compare with desired"]
    B --> C{"Difference?"}
    C -->|Yes| D["Act<br/>fix it"]
    C -->|No| E["Sleep<br/>wait"]
    D --> E
    E --> A

In Shepherd, every controller follows this pattern. A ticker wakes up every few seconds, the controller looks at the state in the Store, and fixes whatever doesn't match.

ReplicationController¶

The most vivid example. A Deployment says "I want 3 replicas," the controller counts how many exist and adds or removes:

func (rc *ReplicationController) Run(stopCh <-chan struct{}) {
    ticker := time.NewTicker(5 * time.Second)
    defer ticker.Stop()

    for {
        select {
        case <-stopCh:
            return
        case <-ticker.C:
            rc.reconcile()
        }
    }
}

func (rc *ReplicationController) reconcile() {
    deployments, _ := rc.store.ListDeployments("")
    for _, dep := range deployments {
        rc.reconcileDeployment(dep)
    }
}

func (rc *ReplicationController) reconcileDeployment(
    dep *Deployment) {
    allPods, _ := rc.store.ListPods(dep.Metadata.Namespace)

    var matchingPods []*Pod
    for _, pod := range allPods {
        if matchLabels(pod.Metadata.Labels, dep.Spec.Selector) {
            matchingPods = append(matchingPods, pod)
        }
    }

    current := len(matchingPods)
    desired := dep.Spec.Replicas

    if current < desired {
        // Scale up
        for i := 0; i < desired-current; i++ {
            rc.createPodForDeployment(dep, current+i)
        }
    } else if current > desired {
        // Scale down - remove from the end
        for i := 0; i < current-desired; i++ {
            pod := matchingPods[len(matchingPods)-1-i]
            rc.store.DeletePod(
                pod.Metadata.Namespace, pod.Metadata.Name)
        }
    }

    // Update deployment status
    ready := 0
    for _, pod := range matchingPods {
        if pod.Status.Phase == PodRunning { ready++ }
    }
    dep.Status.Replicas = len(matchingPods)
    dep.Status.ReadyReplicas = ready
    rc.store.UpdateDeployment(dep)
}

Why This Works Reliably¶

The reconciliation loop is self-healing. If: - A pod crashes -- the controller sees fewer replicas than needed and creates a new one - A node is removed -- pods become Failed, the controller creates new ones, the scheduler assigns them to another node - Someone manually deletes a pod -- same thing, the controller restores it

Nobody cares why the state changed. The controller just sees the difference and fixes it.

Three Controllers in Shepherd¶

Controller	Interval	What It Does
ReplicationController	5s	Pod count = spec.replicas
ServiceController	5s	Service endpoints = Running pods
NodeController	10s	Heartbeat timeout = NotReady

All three run in parallel as goroutines with a single stopCh channel for shutdown.

Idempotency¶

Every reconcile must be idempotent. Calling it twice in a row produces the same result. This matters because the controller wakes up regularly and always processes the entire state, not just changes.

Where You Can Trip Up¶

Reconciliation every 5 seconds means up to 5 seconds of delay. For fast scale-up, that's slow. Kubernetes combines event-driven (watch) and periodic (resync) approaches to balance speed and reliability.

If one reconcile panics inside a goroutine, the loop just stops -- and the rest of the system doesn't notice. Real controllers wrap each iteration in a recover and an error metric.
Two controllers acting on the same pods can fight each other: one creates, the other deletes. Kubernetes solves this with owner references and a single owning controller per resource.

💡 Fun facts¶

The very term "reconcile" and the level-triggered (rather than edge-triggered) model were deliberately borrowed by Kubernetes from networking gear: routers have long worked on the "reconcile against the desired state, don't react to a single event" principle. That's why even if a controller misses an event, the next cycle still fixes everything.
In Kubernetes, almost no controller hits the API directly on every iteration -- between them sits an informer with a local cache and a deduplicating queue. Our direct ListPods on every tick is exactly what an informer removes.
controller-runtime (the framework most operators are built on) boils an entire controller down to a single Reconcile(req) (Result, error) function -- return requeue and it re-enqueues you on its own. The same observe-compare-act, except the framework spins the queue for you.
Idempotency here isn't a nicety, it's a survival requirement: the controller processes the entire state on every tick, so a non-idempotent action would compound its effect every 5 seconds.

What I figured out while digging into this¶

For a long time it wouldn't sit right with me why the controller doesn't react to events directly -- it felt wasteful: why re-read the whole state every time if just one pod changed? Until I hit a bug myself where a missed event left the system in an inconsistent state forever. Then it clicked: a level-triggered loop is robust precisely because it doesn't care how many events it missed -- it always looks at the full picture. Edge-triggered is faster, but one lost event and you're out of sync for good.

What could be improved¶

Add a recover to every reconcile iteration so a panic in one controller doesn't kill the whole loop.
Replace the fixed ticker with a backoff queue: on error, reschedule with exponential delay instead of waiting for the next tick.
Take a step toward event-driven: a watch on the Store that wakes the controller immediately on a change, with periodic resync left as a safety net.
Add metrics for reconcile duration and error count -- without them, you can't tell the loop is falling behind.

Try It Yourself¶

# Create a deployment with 3 replicas:
sheepctl apply -f - <<'EOF'
{"kind":"Deployment","metadata":{"name":"web"},"spec":{"replicas":3,"selector":{"app":"web"},"template":{"metadata":{"labels":{"app":"web"}},"spec":{"containers":[{"name":"web","image":"minimal"}]}}}}
EOF
sheepctl get pods     # 3 pods will appear
# Delete one pod manually:
sheepctl delete pod web-0
sleep 10 && sheepctl get pods  # the controller will restore it!

Reconciliation works. Next up -- ReplicationController in detail: scale up and scale down.

Resources¶

Kubernetes controllers — official controller pattern
Borg, Omega and Kubernetes — lessons on declarative systems
Patterns of Distributed Systems — Martin Fowler's catalog

Source code for the series: github.com/igorgorovoy/sheep-shepherd-meadow

Previous: Scheduler