Building a simple convertor from pdf to fb2¶
Written by:
Igor Gorovyy
DevOps Engineer Lead & Senior Solutions Architect
Converting PDF files into FB2 format is a crucial step in building intelligent document-processing systems, where agents not only analyze files but also prepare them for human-friendly use. Within the Izabella project, this capability becomes part of a broader suite of agentic tools designed to handle files from ingestion to readable output.
PDF, while perfect for fixed-page visual representation, is not ideal for flexible reading or structured analysis. FB2 (FictionBook 2.0), on the other hand, is an open XML-based format designed for books and documents, preserving chapters, annotations, metadata, and structure in a lightweight, device-friendly form. Converting from PDF to FB2, therefore, is not just a file transformation - it is a process of semantic reconstruction, making the document more adaptive and “alive.”
In the Izabella framework, document-processing agents are enhanced with this conversion capability. Each file undergoes a multi-stage pipeline:
Classification - determining the document type and content domain.
Vectorization - transforming the extracted text into semantic embeddings for intelligent search and knowledge retrieval.
Conversion - reformatting the file into FB2 for comfortable reading while preserving structure and meaning.
The new converter acts as a bridge between machine understanding and human experience. It ensures that every processed file retains its informational value while becoming accessible in a user-friendly, readable format across devices - from e-readers to mobile apps.
This integration of conversion into the Izabella agent ecosystem represents a step toward a more complete, human-centered AI knowledge system - one that doesn't just process and classify information but also delivers it in a form designed for real-world use and engagement.
Repository¶
The source code for the PDF to FB2 converter is available on GitHub:
