Розробка та тестування AI-агентів: від LangGraph до production¶
Перший LangGraph agent який я написав був simple. Input state, одна нода, output. Запустив через graph.invoke() - працює. Deploy в production - падає з cryptic error про missing state keys.
Виявилось TypedDict validation в LangGraph строгіший за dict. Мої tests використовували partial state, production отримував full state з unexpected keys. Tests проходили, production failing.
Це було learning moment: testing AI agents - це не те саме що testing звичайного коду. State management складніший. LLM calls не deterministic. Integration між агентами через MCP додає ще один layer complexity.
За 6 місяців розробки EMM я написав 8 agents, 200+ tests, viz frontend для monitoring. Навчився patterns які працюють, і помилок які треба уникати.
Розповідаю про development workflow, testing strategies, debugging techniques які працюють для LangGraph agents.
Development workflow - як писати агента¶
Типовий agent в EMM має структуру:
agents/content-classifier/
├── graph.py # State definition + nodes + build_graph()
├── agent.yaml # Metadata для LangGraph Studio
├── __init__.py
└── README.md
Крок 1: define state¶
StateGraph працює з typed state. TypedDict обов'язковий:
from typing import TypedDict, Optional
from langgraph.graph import StateGraph
class ContentClassifierState(TypedDict):
# Input
input: dict # від caller
title: str
content: str
jd_file: str
# Processing
jd_categories: dict
classification: Optional[dict]
cached_result: Optional[dict]
# Output
confidence: float
reasoning: str
result: dict
error: Optional[str]
Чому так багато keys? Тому що LangGraph не дозволяє partial updates без defined keys. Якщо нода return {"classification": {...}}, але classification не в TypedDict - runtime error.
Lesson learned: визначай всі можливі state keys заздалегідь. Краще мати unused keys ніж runtime failures.
Крок 2: write nodes¶
Кожна нода - function яка приймає state, повертає partial state:
def load_categories(state: ContentClassifierState) -> dict:
"""Load JD categories from MCP."""
jd_file = state["jd_file"]
# MCP call
response = mcp_client.call(
"mcp://jd-classifier/get_categories",
{"jd_file": jd_file}
)
if response["success"]:
return {"jd_categories": response["result"]}
else:
return {"error": f"Failed to load categories: {response['error']}"}
Pattern: завжди повертай dict з state updates. Не modify state in-place.
Крок 3: check cache¶
Agents часто роблять повторні requests. Cache критичний для performance:
def check_cache(state: ContentClassifierState) -> dict:
"""Check if classification already cached."""
title = state["title"]
content = state["content"]
# Generate cache key
cache_key = f"classification:{hash(title + content)}"
# MCP cache call
response = mcp_client.call(
"mcp://cache/get",
{"key": cache_key}
)
if response["success"] and response["result"]:
return {"cached_result": response["result"]}
return {"cached_result": None}
Cache через MCP тому що в Kubernetes кожен pod має свій memory. Redis-backed cache shared між pods.
Крок 4: conditional edges¶
Routing logic через conditional edges:
def should_classify(state: ContentClassifierState) -> str:
"""Route: use cache or classify?"""
if state["cached_result"]:
return "use_cache"
else:
return "classify"
Return string - це edge name в graph definition.
Крок 5: build graph¶
З'єднуємо ноди:
def build_graph() -> StateGraph:
graph = StateGraph(ContentClassifierState)
# Add nodes
graph.add_node("load_categories", load_categories)
graph.add_node("check_cache", check_cache)
graph.add_node("classify", classify_content)
graph.add_node("use_cache", use_cached_result)
graph.add_node("save_cache", save_to_cache)
graph.add_node("format_result", format_output)
# Set entry point
graph.set_entry_point("load_categories")
# Add edges
graph.add_edge("load_categories", "check_cache")
graph.add_conditional_edges(
"check_cache",
should_classify,
{
"use_cache": "use_cache",
"classify": "classify"
}
)
graph.add_edge("classify", "save_cache")
graph.add_edge("use_cache", "format_result")
graph.add_edge("save_cache", "format_result")
graph.set_finish_point("format_result")
return graph.compile()
Pattern: compile() повертає runnable graph. Без compile() - це просто definition.
Testing strategy - що і як тестувати¶
EMM має 3 layers testing:
Layer 1: unit tests - individual nodes¶
Test кожну ноду ізольовано:
# agents/tests/test_content_classifier.py
def test_load_categories(mock_mcp_client, temp_jd_file):
"""Test category loading."""
# Mock MCP response
mock_mcp_client.call.return_value = {
"success": True,
"result": {"46": "DevOps", "46.01": "Docker"},
"error": None
}
state = {
"jd_file": temp_jd_file,
"jd_categories": {},
# ... other required keys
}
result = load_categories(state)
assert "jd_categories" in result
assert result["jd_categories"]["46"] == "DevOps"
mock_mcp_client.call.assert_called_once()
Fixtures критичні. mock_mcp_client mock'ує MCP calls, temp_jd_file створює test jd.yaml.
Layer 2: integration tests - full graph¶
Test повний execution flow:
def test_classify_content(mock_mcp_client, temp_jd_file, monkeypatch):
"""Test full classification flow."""
monkeypatch.setenv("MODEL_NAME", "llama3.2")
# Mock sequence of MCP calls
mock_mcp_client.call.side_effect = [
{"success": True, "result": {"46.01": "Docker"}, "error": None}, # get categories
{"success": True, "result": None, "error": None}, # cache miss
{"success": True, "result": {
"jd_code": "46.01",
"confidence": 0.85
}, "error": None}, # classify
{"success": True, "result": True, "error": None} # save cache
]
graph = build_graph()
state: ContentClassifierState = {
"input": {"title": "Docker Best Practices", "content": "...", "jd_file": temp_jd_file},
"title": "Docker Best Practices",
"content": "...",
# ... initialize all state keys
}
result = graph.invoke(state)
assert result["error"] is None
assert result["classification"]["jd_code"] == "46.01"
assert result["confidence"] == 0.85
side_effect list mock'ує послідовність calls. Перший call повертає перший element, другий call - другий element, etc.
Pattern: завжди initialize весь state. Partial state працює в tests, failing in production.
Layer 3: E2E tests - agent interactions¶
Test взаємодію між агентами:
def test_bookmark_to_vault_flow(mock_mcp_client, temp_vault):
"""Test: bookmark → classify → save to vault."""
# Mock bookmark scraper calls content classifier
mock_mcp_client.call.side_effect = [
{"success": True, "result": {"url": "...", "content": "Docker article"}}, # scrape
{"success": True, "result": {"jd_code": "46.01", "confidence": 0.9}}, # classify
{"success": True, "result": {"path": temp_vault / "46.01-docker.md"}}, # save
]
bookmark_graph = build_bookmark_scraper_graph()
state = {"input": {"url": "https://example.com/docker"}, ...}
result = bookmark_graph.invoke(state)
# Verify file created
assert (temp_vault / "46.01-docker.md").exists()
assert result["error"] is None
E2E tests важливі для catching integration bugs.
Fixtures - infrastructure для tests¶
Fixtures в conftest.py reusable across tests:
# agents/tests/conftest.py
import pytest
from unittest.mock import MagicMock
from pathlib import Path
import tempfile
@pytest.fixture
def mock_mcp_client(monkeypatch):
"""Mock MCPClient для всіх tests."""
mock = MagicMock()
monkeypatch.setattr("mcp_servers.client.MCPClient.get_instance", lambda: mock)
return mock
@pytest.fixture
def temp_jd_file():
"""Create temporary jd.yaml для tests."""
with tempfile.NamedTemporaryFile(mode="w", suffix=".yaml", delete=False) as f:
f.write("""
categories:
"46": DevOps
"46.01": Docker
"47": Programming
""")
f.flush()
yield f.name
Path(f.name).unlink()
@pytest.fixture
def temp_vault(tmp_path):
"""Create temporary Obsidian vault."""
vault = tmp_path / "vault"
vault.mkdir()
(vault / "UNSORTED").mkdir()
return vault
monkeypatch - pytest built-in для patching. tmp_path - автоматично cleanup після test.
Debugging techniques¶
Problem 1: LangGraph state errors¶
Error: KeyError: 'classification' під час graph.invoke().
Cause: нода return dict без required key, або TypedDict missing key definition.
Debug:
# Add logging to nodes
def classify_content(state: ContentClassifierState) -> dict:
print(f"[DEBUG] State keys: {state.keys()}")
print(f"[DEBUG] Input: {state['input']}")
# ... node logic
result = {"classification": {...}}
print(f"[DEBUG] Returning: {result.keys()}")
return result
LangGraph Studio показує state після кожної ноди. Але logging дає більше control.
Problem 2: MCP call failures¶
Error: mcp://content-classifier/classify returns 500.
Cause: MCP service down, або invalid request params.
Debug:
response = mcp_client.call("mcp://service/method", params)
print(f"[DEBUG] MCP response: {response}")
if not response["success"]:
print(f"[ERROR] MCP failed: {response['error']}")
print(f"[ERROR] Params were: {params}")
In production використовую structured logging (JSON) для aggregation в Grafana.
Problem 3: non-deterministic LLM¶
LLM classifications не consistent між runs. Test failing intermittently.
Solution: mock LLM calls в tests:
@patch("agents.content_classifier.graph.call_llm")
def test_classification_logic(mock_llm, mock_mcp_client):
"""Test with deterministic LLM response."""
mock_llm.return_value = {
"jd_code": "46.01",
"confidence": 0.85,
"reasoning": "Docker content detected"
}
# Now test is deterministic
result = classify_content(state)
assert result["classification"]["jd_code"] == "46.01"
Production використовує real LLM, tests використовують mock. Це OK - ми test'уємо logic, не LLM accuracy.
Frontend - viz dashboard¶
EMM frontend minimal - це LangGraph Studio UI для development, плюс custom viz dashboard для monitoring.
Viz architecture¶
viz/
├── frontend/ # React + TypeScript
│ ├── src/
│ │ ├── components/
│ │ │ ├── GraphView.tsx
│ │ │ ├── ThreadList.tsx
│ │ │ └── RunDetails.tsx
│ │ ├── api/
│ │ │ └── client.ts
│ │ └── App.tsx
│ └── package.json
└── backend/ # FastAPI
├── main.py
├── loaders.py # Load graphs from langgraph.json
├── store.py # In-memory thread storage
└── tests/
Backend expose endpoints:
# viz/backend/main.py
from fastapi import FastAPI
from typing import List
app = FastAPI()
@app.get("/api/graphs")
def list_graphs() -> dict:
"""List all available graphs."""
graphs = load_graphs_from_registry()
return {"graphs": [g["id"] for g in graphs]}
@app.get("/api/graphs/{graph_id}")
def get_graph(graph_id: str) -> dict:
"""Get graph structure (nodes, edges)."""
graph = load_graph(graph_id)
return {
"graph_id": graph_id,
"nodes": extract_nodes(graph),
"edges": extract_edges(graph)
}
@app.post("/api/threads")
def create_thread(request: dict) -> dict:
"""Create execution thread."""
graph_id = request["graph_id"]
input_data = request["input"]
thread_id = generate_thread_id()
thread_store[thread_id] = {
"graph_id": graph_id,
"status": "idle",
"input": input_data
}
return {"thread_id": thread_id, "graph_id": graph_id, "status": "idle"}
Frontend calls API:
// viz/frontend/src/api/client.ts
export async function listGraphs(): Promise<string[]> {
const res = await fetch('/api/graphs');
const data = await res.json();
return data.graphs;
}
export async function getGraph(graphId: string): Promise<GraphData> {
const res = await fetch(`/api/graphs/${graphId}`);
return await res.json();
}
React component:
// viz/frontend/src/components/GraphView.tsx
import { useEffect, useState } from 'react';
import { getGraph } from '../api/client';
export function GraphView({ graphId }: { graphId: string }) {
const [graph, setGraph] = useState<GraphData | null>(null);
useEffect(() => {
getGraph(graphId).then(setGraph);
}, [graphId]);
if (!graph) return <div>Loading...</div>;
return (
<div className="graph-view">
<h2>{graph.graph_id}</h2>
<svg width="800" height="600">
{graph.nodes.map(node => (
<g key={node.id}>
<rect x={node.x} y={node.y} width={100} height={50} />
<text x={node.x + 10} y={node.y + 25}>{node.label}</text>
</g>
))}
{graph.edges.map((edge, i) => (
<line key={i} x1={edge.from.x} y1={edge.from.y}
x2={edge.to.x} y2={edge.to.y} stroke="black" />
))}
</svg>
</div>
);
}
Це simplified version. Production має D3.js для interactive graph, zoom/pan, click handlers.
Скріншоти Viz Dashboard¶
Візуалізація графів:
Моніторинг виконання thread'ів:
Деталі виконання вузлів:
Інтерактивний перегляд графа:
Потік виконання агента:
Інтерфейс відлагодження:
Обробка помилок:
Frontend testing¶
FastAPI tests з TestClient:
# viz/backend/tests/test_api.py
from fastapi.testclient import TestClient
from viz.backend.main import app
def test_list_graphs():
"""GET /api/graphs returns list."""
client = TestClient(app)
response = client.get("/api/graphs")
assert response.status_code == 200
data = response.json()
assert "graphs" in data
assert isinstance(data["graphs"], list)
def test_create_thread():
"""POST /api/threads creates thread."""
client = TestClient(app)
response = client.post("/api/threads", json={
"graph_id": "content-classifier",
"input": {"title": "Test", "content": "Test content"}
})
assert response.status_code == 200
data = response.json()
assert "thread_id" in data
assert data["status"] == "idle"
React tests з Jest + React Testing Library:
// viz/frontend/src/components/GraphView.test.tsx
import { render, screen, waitFor } from '@testing-library/react';
import { GraphView } from './GraphView';
import { getGraph } from '../api/client';
jest.mock('../api/client');
test('renders graph nodes', async () => {
const mockGraph = {
graph_id: 'test-graph',
nodes: [
{ id: 'node1', label: 'Start', x: 100, y: 100 }
],
edges: []
};
(getGraph as jest.Mock).mockResolvedValue(mockGraph);
render(<GraphView graphId="test-graph" />);
await waitFor(() => {
expect(screen.getByText('Start')).toBeInTheDocument();
});
});
Mock API calls для deterministic tests.
CI/CD - automated testing¶
GitHub Actions для testing:
# .github/workflows/test.yml
name: Test
on: [push, pull_request]
jobs:
test-backend:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3.12'
- name: Install dependencies
run: |
pip install -r requirements.txt
pip install pytest pytest-cov
- name: Run tests
run: |
pytest agents/tests/ --cov=agents --cov-report=xml
- name: Upload coverage
uses: codecov/codecov-action@v3
test-frontend:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Set up Node
uses: actions/setup-node@v3
with:
node-version: '20'
- name: Install dependencies
run: cd viz/frontend && npm install
- name: Run tests
run: cd viz/frontend && npm test
Tests run на кожен push. Coverage report upload до Codecov.
Performance optimization¶
Problem: slow agent execution¶
Content Classifier займав 5 секунд per classification. Неприйнятно для batch processing 100 files.
Profiling:
import time
def classify_content(state):
start = time.time()
# Load categories
t1 = time.time()
categories = load_categories(state)
print(f"Load categories: {time.time() - t1:.2f}s")
# Check cache
t2 = time.time()
cached = check_cache(state)
print(f"Check cache: {time.time() - t2:.2f}s")
# LLM call
t3 = time.time()
result = call_llm(state, categories)
print(f"LLM call: {time.time() - t3:.2f}s")
print(f"Total: {time.time() - start:.2f}s")
return result
Results: - Load categories: 0.1s - Check cache: 0.05s - LLM call: 4.8s ← bottleneck
Solution: batch LLM calls:
def classify_batch(states: List[ContentClassifierState]) -> List[dict]:
"""Classify multiple items in one LLM call."""
# Prepare batch prompt
batch_prompt = "\n\n".join([
f"Document {i}: {state['title']}\n{state['content'][:500]}"
for i, state in enumerate(states)
])
# Single LLM call
result = llm.invoke(batch_prompt)
# Parse results
return parse_batch_results(result, len(states))
Batch 10 items: 6 секунд total = 0.6s per item. 8x speedup.
Problem: memory leaks¶
Viz backend memory usage зростав з 200MB до 2GB через 24 hours.
Cause: thread_store accumulating threads without cleanup.
Fix:
from datetime import datetime, timedelta
def cleanup_old_threads():
"""Remove threads older than 1 hour."""
now = datetime.now()
to_remove = []
for thread_id, thread in thread_store.items():
if now - thread["created_at"] > timedelta(hours=1):
to_remove.append(thread_id)
for thread_id in to_remove:
del thread_store[thread_id]
# Run cleanup every 10 minutes
import threading
def schedule_cleanup():
cleanup_old_threads()
threading.Timer(600, schedule_cleanup).start()
schedule_cleanup()
Memory stable на 300MB після fix.
Lessons learned¶
1. TypedDict definitions критичні
Partial state в tests працює, production failing. Завжди define всі state keys.
2. Mock MCP calls в unit tests
Real MCP calls роблять tests slow і flaky. Mock для deterministic tests.
3. E2E tests catch integration bugs
Unit tests не виявляють agent interaction issues. E2E tests критичні.
4. Fixtures reusability > copy-paste
conftest.py з shared fixtures економить сотні рядків duplicate test setup.
5. LLM calls - batch коли можливо
Single-item LLM calls - bottleneck. Batching дає 5-10x speedup.
6. Frontend testing != backend testing
FastAPI TestClient для API tests. Jest + React Testing Library для component tests. Різні tools, різні patterns.
7. CI/CD automated testing обов'язковий
Manual testing не масштабується. GitHub Actions run tests на кожен push.
8. Profiling перед optimization
Не guess де bottleneck. Profile, measure, оптимізуй top bottleneck.
Висновки¶
Розробка AI-агентів на LangGraph - це більше ніж просто "write graph definition". Треба:
- Proper state management (TypedDict з усіма keys)
- Testing strategy (unit, integration, E2E)
- Fixtures infrastructure для reusable test setup
- Debugging techniques для non-deterministic LLM
- Frontend для monitoring (FastAPI backend + React)
- Performance optimization (batching, caching)
- CI/CD для automated testing
Для EMM це означає: - 8 agents, кожен з повним test coverage - 200+ tests (unit + integration + E2E) - Viz dashboard для real-time monitoring - CI/CD pipeline з automated testing - Performance optimization (batch LLM calls, Redis cache)
Development workflow: write node → unit test → integrate в graph → integration test → E2E test → deploy → monitor.
Якщо будуєте LangGraph agents - invest в testing infrastructure early. Mock MCP calls. Use fixtures. Write E2E tests. Profile before optimizing.
Tests сьогодні = менше bugs завтра.
Related: Kubernetes deployment для AI-агентів, Версіонування даних в AI-агентах з lakeFS
Author: Igor Gorovyy
Role: DevOps Engineer Lead & Senior Solutions Architect
LinkedIn: linkedin.com/in/gorovyyigor
Development summary¶
Backend: Python 3.12, LangGraph, FastAPI, pytest
Frontend: React 18, TypeScript, Jest, React Testing Library
Agents: 8 StateGraph agents, 40+ nodes total
Tests: 200+ tests (unit, integration, E2E)
Coverage: 85% (agents), 92% (viz backend)
CI/CD: GitHub Actions, automated testing on push













