Donate to support freedom.
Get the same

Navigating PDF Standards with Decodocs and MCP Services

How discovering the PDF Association's standards inspired a new approach to machine-to-machine document processing.

Explore how Vasilkoff Ltd built Decodocs to navigate complex PDF standards like ISO 32000, enabling advanced search, clause finding, and MCP services.


Time to read: 4 min

Navigating PDF Standards with Decodocs and MCP Services

The Portable Document Format (PDF) is ubiquitous. But recently, while working on a complex document processing project at Vasilkoff Ltd, I stumbled upon a treasure trove that made me surprisingly happy: the official PDF Standards from the PDF Association.

You might wonder why a software engineer would get excited about ISO standards. The answer lies in the sheer complexity of what PDFs actually are. They aren't just flat images of text; they are intricate containers that can hold everything from structured text and vector graphics to ECMAScript (JavaScript), 3D content, and extensive metadata. When you are tasked with building a system that reliably extracts and processes this data, having a definitive guide to ISO 32000 (the core PDF standard) and its subsets (like PDF/A for archiving and PDF/UA for accessibility) is like finding a map in a dark forest.

This discovery was directly tied to a major project we were developing: Decodocs.

The Challenge: Building Decodocs

Decodocs started with a straightforward premise but quickly evolved into a sophisticated technical challenge. Organizations deal with thousands of complex documents daily. They needed a way to not just read these documents, but to understand them.

Our mandate for Decodocs involved several critical features:

  1. Search Documents: Not just by file name, but by deep semantic meaning.
  2. Search Extracted Text: Accurately pulling text out of PDFs—a task notoriously difficult due to how PDFs render text visually rather than semantically.
  3. Find Clauses: Identifying specific legal or operational clauses within massive documents.
  4. Index Audit Logs: Creating a searchable, immutable record of document interactions and modifications.

Understanding the depth of the PDF standards allowed our team to build Decodocs with a robust architecture. By adhering to specifications like PDF/A for long-term preservation and understanding how XMP metadata operates within the standard, we built a tool that didn't just "scrape" PDFs, but actually parsed them according to their fundamental rules.

This is exactly the kind of complex, high-stakes engineering where Vasilkoff Ltd excels. We don't just use off-the-shelf libraries and hope for the best; we dive into the core specifications to build reliable, scalable solutions. AI writes the code, but our senior engineers take full responsibility for the results.

The Next Evolution: Decodocs as an MCP Provider

While Decodocs is a powerful tool for human users, the market is rapidly changing. We are entering an era where humans aren't the only ones—or even the primary ones—who need to interact with PDF documents.

Machines, specifically Large Language Models (LLMs) and autonomous agents, need to read, analyze, and manipulate PDFs.

This is where the next big idea for Decodocs comes in: becoming a Model Context Protocol (MCP) PDF services provider.

The Model Context Protocol is a standard that allows AI systems to connect with external tools and data sources. Imagine an AI agent tasked with reviewing a contract. Instead of relying on a human to upload the PDF, extract the text, and paste it into a chat window, the AI could use an MCP service.

As an MCP provider, Decodocs could offer APIs specifically designed for machine consumption:

  • "Extract all indemnification clauses from this document."
  • "Verify if this PDF meets ISO 19005-4 (PDF/A-4) archival standards."
  • "Index the audit logs of this file and return the metadata."

By shifting the interface from a human-centric UI to a machine-centric protocol, Decodocs positions itself at the forefront of AI-accelerated workflows. It bridges the gap between the highly structured world of PDF standards and the dynamic, intelligent capabilities of modern AI.

Bringing Vasilkoff Ltd to Your Next Project

The journey of Decodocs—from navigating dense ISO specifications to envisioning a future as an MCP service provider—is a perfect example of how Vasilkoff Ltd operates.

If your organization is struggling with complex document processing, legacy system integration, or if you're looking to build the next generation of AI-powered tools, this is where we step in. We combine the speed of AI with the trust and accountability of senior engineers.

We deliver outcomes, not just code. And yes, we guarantee free bug fixes for life.

Related Reading

Interested in how we approach complex projects and modern architecture? Check out these related articles:

Last updated: