Model Context Protocol (MCP) Clearly Explained

.. PLUS: Turn PDFs and Images to Clean Markdown

In today’s newsletter:

  • OCRFlux - Open Source Toolkit to turn PDFs and Images to Clean Markdown

  • Model Context Protocol (MCP) Clearly Explained

Reading time: 3 minutes.

OCRFlux is a multimodal LLM-based toolkit for extracting clean, readable Markdown text from PDFs and images.

It’s powered by the OCRFlux-3B model, fine-tuned from Qwen2.5-VL-3B-Instruct using the private document datasets along with data from the olmOCR-mix-0225 dataset.

Key Features:

  • High-accuracy parsing with natural reading order, even in multi-column layouts, figures, and multilingual content (English and Chinese)

  • Handles complex tables, equations, and insets

  • Automatically removes headers and footers

  • Cross-page merging for both tables and paragraphs

Performance highlights:

  • Achieves up to 0.109 higher Edit Distance Similarity (EDS) over baselines like olmOCR-7B-0225-preview and Nanonets-OCR-s on the OCRFlux-bench-single benchmark

  • First parser to support cross-page table and paragraph merging

  • Runs on a lightweight 3B parameter VLM, supports inference even on a GTX 3090

It’s 100% Open Source

MCP (Model Context Protocol): Clearly Explained

Everyone is talking about MCP. Let me explain it to you clearly with visuals:

Model Context Protocol (MCP) is an open standard that enables AI models to interact with external tools and applications through a consistent, universal interface.

Think of MCP like a USB-C port for AI applications. Just as USB-C provides a standardized way to connect your devices to various peripherals and accessories, MCP provides a standardized way to connect AI models to different data sources and tools.

Before and After MCP

Before MCP every AI application (M) needed custom integrations with every tool or service (N), resulting in “M × N” unique connections.

  • Every AI app (M) needed custom code to connect with every tool (N), resulting in M × N unique integrations

  • There was no shared protocol across tools and models, so developers had to reinvent the wheel for each new connection

After MCP simplifies integration into an “M + N model”:

  • You can define or expose multiple tools within a single MCP server

  • Any AI app that supports MCP can use those tools directly

  • Integration complexity drops to M + N, since tools and models speak a shared protocol

This drastically reduces complexity, avoids duplication, and makes it easier to scale AI systems.

MCP follows a client-server architecture where a host application connects to one or more MCP servers to access external tools.

Key Components of MCP Architecture:

  • MCP Hosts: Programs like Claude Desktop, IDEs, or AI tools that want to access data through MCP

  • MCP Clients: Protocol clients that maintain 1:1 connections with servers

  • MCP Servers: Lightweight programs that each expose specific capabilities through the standardized Model Context Protocol

  • Local Data Sources: Your computer’s files, databases, and services that MCP servers can securely access

  • Remote Services: External systems available over the internet (e.g., through APIs) that MCP servers can connect to

Now let’s walk through how MCP works in practice, step by step:

Refer to the above gif for a visual breakdown.

  1. Query: The user provides a prompt to an MCP-compatible host application (such as Claude Desktop or Cursor). This prompt could be something like “Create a GitHub issue and notify my team on Slack.” The host passes this to the MCP client.

  2. MCP Client: The MCP client receives the prompt and manages the flow between the LLM, the available tools, and the final output. It handles communication with MCP servers and can be embedded within the host application.

  3. LLM (Language Model): The MCP client sends the query to a language model like GPT-4, Claude, or DeepSeek. The LLM interprets the user’s intent, determines the necessary steps, and identifies which tool(s) to invoke based on the tool schemas exposed by connected MCP servers.

  4. Tool Selection: Based on its understanding of the query and the available tools, the LLM tells the MCP client which MCP server to call and what function (tool) to invoke. This is often a JSON-formatted function call that the client can execute.

  5. Send Request to MCP Server: The MCP client sends a request to the selected MCP server. The request includes the function name and any parameters the tool requires, formatted according to the server’s OpenAPI-compatible schema.

  6. Tool Execution via MCP Server: The MCP server receives the request and routes it to the appropriate tool or service, like Slack, GitHub, or Google Drive. The tool validates the request using a JSON Schema to ensure all required inputs are present and well-formed.

  7. Response and Result: Once the tool completes the task, the MCP server sends the response back to the MCP client. The client then passes the result to the LLM, which may summarize or format it before returning it to the user.

Finally, context from the tool call is preserved, allowing follow-up interactions to build on it.

MCP helps you build agents and complex workflows on top of LLMs.

We'll be sharing more examples of MCP-powered agents and how to build your own custom MCP servers in future issues. Stay tuned!

That’s a Wrap

That’s all for today. Thank you for reading today’s edition. See you in the next issue with more AI Engineering insights.

PS: We curate this AI Engineering content for free, and your support means everything. If you find value in what you read, consider sharing it with a friend or two.

Your feedback is valuable: If there’s a topic you’re stuck on or curious about, reply to this email. We’re building this for you, and your feedback helps shape what we send.

WORK WITH US

Looking to promote your company, product, or service to 120K+ AI developers? Get in touch today by replying to this email.