Skip to content

Blog

Welcome to my blog! Here you'll find articles about DevOps, AWS, cloud architecture, and more.

Recent Posts

Reconciliation Loop: The Heart of Shepherd

June 24, 2026 - The reconciliation loop is the heart of any orchestrator: describe the desired state, and controllers constantly compare it with reality and fix the difference. The observe-compare-act pattern, why it makes the system self-healing, the three controllers running in parallel in Shepherd, idempotency as a survival requirement, and where level-triggered beats edge-triggered. Modeled on Kubernetes controllers. Part fourteen of the Sheep & Shepherd series.

Scheduler: How to Pick a Node for a Pod

June 20, 2026 - A freshly created pod is Pending with no node assigned. How Shepherd's scheduler picks the best node in two phases — filter (drop infeasible nodes) and score (prefer the least-loaded). Resource checks, label matching, least-loaded scoring, and why a pod stays Pending even after it's scheduled. Modeled on the Kubernetes scheduler. Part thirteen of the Sheep & Shepherd series.

BoltDB Instead of etcd: Embedded State Store

June 17, 2026 - Kubernetes runs on etcd, but for a learning project that's overkill. How Shepherd stores all cluster state in BoltDB — an embedded key-value store in a single file. Buckets as tables, namespaced keys, read/write transactions, watch channels for change notifications, and an event log. Why it's perfect for learning and where it falls short of etcd. Part twelve of the Sheep & Shepherd series.

Kubernetes API Server in 300 Lines

June 13, 2026 - The Kubernetes API Server is the center of the entire cluster. How Shepherd implements REST API with CRUD for pods, services, deployments and nodes in ~300 lines of Go using only net/http. Asynchronous scheduling, namespaced resources, and logging middleware. Part eleven of the Sheep & Shepherd series.

A Docker CLI in 500 Lines of Go

June 10, 2026 - Subcommand routing, flag parsing, and formatted output — all without CLI frameworks. How Sheep implements run, ps, stop, rm and 11 more Docker-like commands in a single file using only the Go standard library. Part ten of the Sheep & Shepherd series.

Container Lifecycle: State Machine from Created to Removed

June 7, 2026 - A container moves through three states — created, running, stopped. How the state machine drives Create/Start/Stop/Remove, why every transition is persisted to state.json, and how signal 0 checks if a container survived a daemon restart. Part nine of the Sheep & Shepherd series.

Image Management: tar Archive to rootfs to Container

June 2, 2026 - A container image is just an archive with a filesystem. How a tar archive becomes a container rootfs through import and bootstrap, how OCI whiteout files delete files across layers, and why Sheep keeps a full rootfs instead of layers. Part eight of the Sheep & Shepherd series.

NAT and iptables: How a Container Sees the Internet

May 30, 2026 - A container's 10.20.0.x address is private — no router will route it. How ip_forward and a single MASQUERADE rule let packets reach the internet and find their way back via conntrack. Part seven of the Sheep & Shepherd series.

Bridge Networking: Giving a Container an IP Address

May 25, 2026 - A container in a fresh network namespace has no network at all — not even loopback. How a Linux bridge, veth pairs, and a touch of NAT give it an IP and an internet route. Part six of the Sheep & Shepherd series.

OverlayFS: Copy-on-Write Layers Like Docker

May 20, 2026 - How OverlayFS stacks a read-only image layer and a per-container read-write layer into one filesystem — and why copy-up lets 10 nginx containers share 100MB instead of each carrying their own. Part five of the Sheep & Shepherd series.

Cgroups v2: Limiting Memory, CPU, and PIDs

May 15, 2026 - Namespaces isolate but don't limit. How memory.max, cpu.max, and pids.max cap container resources through the cgroups v2 virtual filesystem — part four of the Sheep & Shepherd series.

pivot_root: How a Container Gets Its Own Filesystem

May 9, 2026 - How pivot_root(2) swaps a process's root directory at the mount namespace level — and why it's the proper isolation primitive instead of chroot. Part three of the Sheep & Shepherd series.

Re-Exec Pattern: Why Go and clone() Don't Get Along

May 2, 2026 - Go's threading model conflicts with clone(). The self re-exec pattern fixes it — part two of the Sheep & Shepherd series.

Linux Namespaces: Isolating a Process in 50 Lines of Go

April 28, 2026 - A container is a process with a restricted view of the system. How to isolate a process using Linux namespaces in 50 lines of Go — the first part of the Sheep & Shepherd series.

AI sovereignty: your own model on DGX Spark instead of an API

April 18, 2026 - How I stopped paying OpenAI and moved inference onto my own DGX Spark box with vLLM. About the hardware, the CUDA/PyTorch pain, an honest comparison with Ollama, and a small web UI to run it all.

EMM: LangGraph traces in Phoenix

April 11, 2026 - One init at startup covers 15 LangGraph agents. Manual spans extend coverage to voice (Gemini Live tools), avatar (Runway sessions), and Izabella chat (OpenAI/Ollama/Google + MCP tool loop).

EMM: A2A Inspector in the app and MCP for it

April 8, 2026 - Built-in UI plus an MCP server: inspect the Agent Card, run tasks/submit and tasks/status from the IDE without leaving the monorepo.

EMM A2A Phase 4: Auth, Rate Limiting, Observability

April 4, 2026 - X-API-Key, rate limiting, structured logging. A2A endpoints now protected like other APIs.

AI Reliability Engineering — certification from fwdays

March 31, 2026 - Completed the AI Reliability Engineering course from fwdays. Why AI system reliability belongs in the same conversation as classic SRE.

EMM A2A Phase 3: Stream task status

March 28, 2026 - SSE instead of polling. GET /api/a2a/tasks/{id}/stream. Theory, diagrams, capabilities.streaming.

EMM A2A Phase 2+: TaskStore and tasks/status

March 21, 2026 - A2A task lifecycle: submit → taskId → poll status. InMemoryTaskStore, 1h TTL. Diagrams.

Production Technology Risks: Planning for When Dependencies Fail

March 14, 2026 - PostgreSQL, MinIO, lakeFS: when choosing production technologies, think beyond features — what happens in 5 years?

EMM A2A Phase 2: Task Manager (list_board)

March 14, 2026 - Second skill — list_board. Routing by skillId, interaction diagrams, what changed.

EMM A2A Phase 1: Process Manager as A2A Server

March 7, 2026 - Process Manager is the first agent with an A2A interface. Protocol theory, interaction diagrams, what's implemented.

How I Became an AWS Community Builder

March 5, 2026 - A few years ago it was regular DevOps — deploys, scripts. Then I started thinking in clusters instead of servers. Here's how that led to AWS Community Builders.

Developing and Testing AI Agents: From LangGraph to Production

February 12, 2026 - How to write, test, and debug LangGraph agents? Which patterns work for StateGraph? Why are pytest fixtures critical? Development workflow from first code to production deployment.

Data Versioning for AI Agents: Real-World Experience with lakeFS

February 8, 2026 - When AI agents start moving your files around, version control stops being theoretical. Here's how integrating lakeFS changed my approach to data management in a LangGraph-based agent platform.

Kubernetes Deployment for AI Agents: Real-World Experience with LangGraph

February 3, 2026 - When AI agents move from local Docker Compose to Kubernetes, questions emerge about service discovery, caching, secrets management. How I deployed 7 microservices with minimal downtime.

Building MCP server for self-hosted Jira and Confluence.

November 19, 2025 - Building MCP server for self-hosted Jira and Confluence.

Izabella. Create agentic tools. Convertor from pdf to fb2 format.

October 30, 2025 - Izabella. Create agentic tools. Convertor from pdf to fb2 format.

Building a multi-site apartment searcher: Design patterns and architecture

October 16, 2025 - Building a multi-site apartment searcher: Design patterns and architecture.

Tarot AI Agent: Innovative Approach to Risk Assessment Through Artificial Intelligence

September 11, 2025 - Tarot AI Agent: Innovative Approach to Risk Assessment Through Artificial Intelligence.

Using AWS ECR as a Universal OCI Repository

July 10, 2025 - Using AWS ECR as a universal OCI repository for storing various types of artifacts.

Cert Manager in Kubernetes

June 3, 2025 - Setting up and using Cert Manager for automatic management of SSL certificates in Kubernetes.

New Architecture for Isabella - C4 Diagrams

May 14, 2025 - Development of a new system architecture using C4 diagrams.

New Architecture for Isabella - Structure

May 14, 2025 - Detailed structure analysis for the new Isabella architecture.

New Architecture for Isabella

May 10, 2025 - Overview of the new architecture design for Isabella project.

Waste Resources - Financial Optimization in the Cloud

April 14, 2025 - Analysis and optimization of cloud resource costs.

Redis Backup on AWS S3

March 19, 2025 - Automation of Redis backup to AWS S3.

RDS Import with Terraform

February 17, 2025 - Importing existing RDS instances into Terraform.

RDS Import with Terraform (EN)

February 17, 2025 - Importing existing RDS instances into Terraform (English version).

AWS Lambda Cost Optimization

February 26, 2025 - Strategies for optimizing AWS Lambda costs.

Kubernetes Onboarding with Flux

March 10, 2025 - Automating Kubernetes onboarding using Flux.

RDS Migration Cases

February 9, 2025 - Various scenarios for migrating databases to AWS RDS.

SPA Deployment on S3 with CloudFront

February 12, 2025 - Deploying Single Page Application on AWS S3 with CloudFront.

Karpenter Properties

December 8, 2024 - Properties and configuration of Karpenter for Kubernetes.

AI Stable Diffusion

February 8, 2025 - Using Stable Diffusion for image generation.

ARM vs AMD

December 5, 2024 - Comparison of ARM and AMD architectures for cloud solutions.