Building a simple convertor from pdf to fb2¶

Written by:

Igor Gorovyy
DevOps Engineer Lead & Senior Solutions Architect

Converting PDF files into FB2 format is a crucial step in building intelligent document-processing systems, where agents not only analyze files but also prepare them for human-friendly use. Within the Izabella project, this capability becomes part of a broader suite of agentic tools designed to handle files from ingestion to readable output.

PDF, while perfect for fixed-page visual representation, is not ideal for flexible reading or structured analysis. FB2 (FictionBook 2.0), on the other hand, is an open XML-based format designed for books and documents, preserving chapters, annotations, metadata, and structure in a lightweight, device-friendly form. Converting from PDF to FB2, therefore, is not just a file transformation - it is a process of semantic reconstruction, making the document more adaptive and “alive.”

In the Izabella framework, document-processing agents are enhanced with this conversion capability. Each file undergoes a multi-stage pipeline:

Classification - determining the document type and content domain.

Vectorization - transforming the extracted text into semantic embeddings for intelligent search and knowledge retrieval.

Conversion - reformatting the file into FB2 for comfortable reading while preserving structure and meaning.

The new converter acts as a bridge between machine understanding and human experience. It ensures that every processed file retains its informational value while becoming accessible in a user-friendly, readable format across devices - from e-readers to mobile apps.

This integration of conversion into the Izabella agent ecosystem represents a step toward a more complete, human-centered AI knowledge system - one that doesn't just process and classify information but also delivers it in a form designed for real-world use and engagement.

Repository¶

The source code for the PDF to FB2 converter is available on GitHub:

pdf2fb2-convertor