Reconciliation Loop: The Heart of Shepherd¶
Written by:
Igor Gorovyy
DevOps Engineer Lead & Senior Solutions Architect
The reconciliation loop is the heart of any orchestrator. The idea is simple: you describe the desired state, and controllers constantly compare it with the actual state and fix the difference. The entire Kubernetes architecture is built on exactly this — and in Shepherd we reproduce the same model, taking Kubernetes as the reference for our implementation.
The Pattern¶
graph TD
A["Observe<br/>read current state"] --> B["Compare<br/>compare with desired"]
B --> C{"Difference?"}
C -->|Yes| D["Act<br/>fix it"]
C -->|No| E["Sleep<br/>wait"]
D --> E
E --> A
In Shepherd, every controller follows this pattern. A ticker wakes up every few seconds, the controller looks at the state in the Store, and fixes whatever doesn't match.
ReplicationController¶
The most vivid example. A Deployment says "I want 3 replicas," the controller counts how many exist and adds or removes:
func (rc *ReplicationController) Run(stopCh <-chan struct{}) {
ticker := time.NewTicker(5 * time.Second)
defer ticker.Stop()
for {
select {
case <-stopCh:
return
case <-ticker.C:
rc.reconcile()
}
}
}
func (rc *ReplicationController) reconcile() {
deployments, _ := rc.store.ListDeployments("")
for _, dep := range deployments {
rc.reconcileDeployment(dep)
}
}
func (rc *ReplicationController) reconcileDeployment(
dep *Deployment) {
allPods, _ := rc.store.ListPods(dep.Metadata.Namespace)
var matchingPods []*Pod
for _, pod := range allPods {
if matchLabels(pod.Metadata.Labels, dep.Spec.Selector) {
matchingPods = append(matchingPods, pod)
}
}
current := len(matchingPods)
desired := dep.Spec.Replicas
if current < desired {
// Scale up
for i := 0; i < desired-current; i++ {
rc.createPodForDeployment(dep, current+i)
}
} else if current > desired {
// Scale down - remove from the end
for i := 0; i < current-desired; i++ {
pod := matchingPods[len(matchingPods)-1-i]
rc.store.DeletePod(
pod.Metadata.Namespace, pod.Metadata.Name)
}
}
// Update deployment status
ready := 0
for _, pod := range matchingPods {
if pod.Status.Phase == PodRunning { ready++ }
}
dep.Status.Replicas = len(matchingPods)
dep.Status.ReadyReplicas = ready
rc.store.UpdateDeployment(dep)
}
Why This Works Reliably¶
The reconciliation loop is self-healing. If: - A pod crashes -- the controller sees fewer replicas than needed and creates a new one - A node is removed -- pods become Failed, the controller creates new ones, the scheduler assigns them to another node - Someone manually deletes a pod -- same thing, the controller restores it
Nobody cares why the state changed. The controller just sees the difference and fixes it.
Three Controllers in Shepherd¶
| Controller | Interval | What It Does |
|---|---|---|
| ReplicationController | 5s | Pod count = spec.replicas |
| ServiceController | 5s | Service endpoints = Running pods |
| NodeController | 10s | Heartbeat timeout = NotReady |
All three run in parallel as goroutines with a single stopCh channel for shutdown.
Idempotency¶
Every reconcile must be idempotent. Calling it twice in a row produces the same result. This matters because the controller wakes up regularly and always processes the entire state, not just changes.
Where You Can Trip Up¶
Reconciliation every 5 seconds means up to 5 seconds of delay. For fast scale-up, that's slow. Kubernetes combines event-driven (watch) and periodic (resync) approaches to balance speed and reliability.
- If one reconcile panics inside a goroutine, the loop just stops -- and the rest of the system doesn't notice. Real controllers wrap each iteration in a recover and an error metric.
- Two controllers acting on the same pods can fight each other: one creates, the other deletes. Kubernetes solves this with owner references and a single owning controller per resource.
💡 Fun facts¶
- The very term "reconcile" and the level-triggered (rather than edge-triggered) model were deliberately borrowed by Kubernetes from networking gear: routers have long worked on the "reconcile against the desired state, don't react to a single event" principle. That's why even if a controller misses an event, the next cycle still fixes everything.
- In Kubernetes, almost no controller hits the API directly on every iteration -- between them sits an informer with a local cache and a deduplicating queue. Our direct
ListPodson every tick is exactly what an informer removes. controller-runtime(the framework most operators are built on) boils an entire controller down to a singleReconcile(req) (Result, error)function -- returnrequeueand it re-enqueues you on its own. The same observe-compare-act, except the framework spins the queue for you.- Idempotency here isn't a nicety, it's a survival requirement: the controller processes the entire state on every tick, so a non-idempotent action would compound its effect every 5 seconds.
What I figured out while digging into this¶
For a long time it wouldn't sit right with me why the controller doesn't react to events directly -- it felt wasteful: why re-read the whole state every time if just one pod changed? Until I hit a bug myself where a missed event left the system in an inconsistent state forever. Then it clicked: a level-triggered loop is robust precisely because it doesn't care how many events it missed -- it always looks at the full picture. Edge-triggered is faster, but one lost event and you're out of sync for good.
What could be improved¶
- Add a recover to every reconcile iteration so a panic in one controller doesn't kill the whole loop.
- Replace the fixed ticker with a backoff queue: on error, reschedule with exponential delay instead of waiting for the next tick.
- Take a step toward event-driven: a watch on the Store that wakes the controller immediately on a change, with periodic resync left as a safety net.
- Add metrics for reconcile duration and error count -- without them, you can't tell the loop is falling behind.
Try It Yourself¶
# Create a deployment with 3 replicas:
sheepctl apply -f - <<'EOF'
{"kind":"Deployment","metadata":{"name":"web"},"spec":{"replicas":3,"selector":{"app":"web"},"template":{"metadata":{"labels":{"app":"web"}},"spec":{"containers":[{"name":"web","image":"minimal"}]}}}}
EOF
sheepctl get pods # 3 pods will appear
# Delete one pod manually:
sheepctl delete pod web-0
sleep 10 && sheepctl get pods # the controller will restore it!
Reconciliation works. Next up -- ReplicationController in detail: scale up and scale down.
Resources¶
- Kubernetes controllers — official controller pattern
- Borg, Omega and Kubernetes — lessons on declarative systems
- Patterns of Distributed Systems — Martin Fowler's catalog
Source code for the series: github.com/igorgorovoy/sheep-shepherd-meadow
Previous: Scheduler
