Turn Research Papers into AI Agents with MCP

.. PLUS: Turn complex and messy documents into LLM-ready data

In today’s newsletter:

  • Agentic Document Extraction - Turn complex documents into LLM-ready data

  • Paper2Agent - Turn Reasearch Papers into Interactive AI Agents with MCP

  • DeepEval - Structured Evaluation for Multi-Turn Conversations

Reading time: 3 minutes.

ade-python is a Python library for Agentic Document Extraction (ADE) that outputs layout-aware structured JSON from visually complex documents.

With the new Document Pre-Trained Transformer (DPT-2) model, ADE now can handle complex large tables with merged cells, multi-level headers, and irregular grid layouts.

The output also provides spatial grounding with bounding boxes for each extracted element, along with region descriptions, ensuring every result can be fully traced and audited.

Key Features:

  • Works directly with PDFs, images, and URLs (auto format detection)

  • Supports multi-thousand-page documents with automatic pagination

  • Generates structured JSON and Markdown with explicit hierarchy and layout retention

  • Provides visual grounding with bounding boxes, coordinates, and optional previews

  • DPT-2 improves parsing accuracy for complex tables and scanned layouts

  • Includes native batching, streaming, and parallel extraction for scale

Stanford researchers released Paper2Agent, a multi-agent system that automatically transforms research papers into interactive AI agents with minimal human input.

It builds on the Model Context Protocol (MCP) and operates in two layers.

Paper2MCP Layer

  • Analyzes the paper and its code using multiple helper agents

  • Extracts the key methods and wraps them as tools in an MCP server

  • Tests and refines them until they reliably reproduce the original results

Agent Layer

  • Connects the MCP server to a chat agent like Claude Code or Gemini CLI

  • Each paper becomes a conversational assistant that researchers can query or command in plain language

This system turns static research into living, testable agents, making reproducibility faster and more reliable.

It’s 100% Open Source.

🔗 Check out the Github repo and Paper

DeepEval lets you build decision-tree based LLM-as-a-judge evals that break down complex chats step by step.

Most LLM evaluations look only at the final response, giving a single score with little context. That is not enough when real conversations span multiple turns.

Conversational DAGs (Directed Acyclic Graphs) let you create fully deterministic, multi-turn evaluations.

You can combine different nodes to return hardcoded scores based on tasks, judgments, and verdicts, building evaluation flows that are precise, transparent, and auditable.

Here’s what you can do:

  • Summarize long conversations before scoring

  • Add binary checks like “Did the assistant answer the question?”

  • Add multi-class checks like “Was the tone Rude, Neutral, or Playful?”

  • Combine these into a deterministic flow that produces clear, auditable scores

This gives you transparency and precision in evaluating entire conversations, something black-box metrics can’t provide.

Perfect for testing agents, chatbots, or any system where both accuracy and behavior matter.

It’s 100% Open Source.

That’s a Wrap

That’s all for today. Thank you for reading today’s edition. See you in the next issue with more AI Engineering insights.

PS: We curate this AI Engineering content for free, and your support means everything. If you find value in what you read, consider sharing it with a friend or two.

Your feedback is valuable: If there’s a topic you’re stuck on or curious about, reply to this email. We’re building this for you, and your feedback helps shape what we send.

WORK WITH US

Looking to promote your company, product, or service to 150K+ AI developers? Get in touch today by replying to this email.