Turn complex documents into RAG-ready data

.. PLUS: Fine-Tune 100+ LLMs Without a Single Line of Code

In today’s newsletter:

  • Turn PDFs to RAG-Ready Data

  • Fine-Tune 100+ LLMs Without a Single Line of Code

  • MCP Containers: Containerized versions of 400+ MCP servers

Reading time: 3 minutes.

ADE lets you convert visually complex documents into structured, grounded data and return a hierarchical JSON with exact element locations.

Traditional OCR pipelines only extract plain text, missing layout, structure, and visual context. LLM-based systems can interpret documents semantically but often struggle on large tables or complex multi-column layouts.

ADE bridges that gap. It combines visual understanding and structured parsing to extract not just text but also relationships, context, and layout.

ADE now comes with the new Parse Jobs API, an asynchronous API built for large-scale document processing.

Large files can slow everything down. With the Parse Jobs API, you can submit a document, get a job ID instantly, and continue your workflow while it processes in the background.

It supports files up to 1GB or 1,000 pages, making high-volume batch ingestion fast, reliable, and scalable.

Key Features:

  • Handles complex, large tables where typical VLMs and OCR pipelines fail

  • Processes PDFs, images, DOC, and PPT files at scale

  • Generates structured JSON and Markdown with hierarchy and layout retention

  • Provides visual grounding with bounding boxes

  • Built for async workflows

LLaMA-Factory lets you train and fine-tune open-source LLMs and VLMs without a single line of code

It supports over 100 models (LLaMA, Gemma, Qwen, Mistral, DeepSeek, and more) with built-in templates for fine-tuning, merging, and evaluation through a simple CLI and Web UI.

Why It Matters:

  • Zero-code CLI & Web UI for training, inference, merging, and evaluation.

  • Supports full-tuning, LoRA, QLoRA, freeze-tuning, PPO/DPO, OFT, reward modeling, and multi-modal fine-tuning.

  • Speeds up training/inference with FlashAttention-2, RoPE scaling, Liger Kernel, and vLLM backend.

  • Integrates experiment tracking via LlamaBoard, TensorBoard, Weights & Biases, MLflow, and SwanLab.

It’s 100 % open-source.

Setting up MCP servers manually often leads to dependency mismatches, unclear setup steps, and security risks.

MCP Containers is a toolkit with hundreds of MCP servers that makes spinning up and maintaining them effortless and secure.

Here is what this repo brings:

  • 450+ MCP servers pre-containerized

  • Auto-updated images with the latest features

  • Secure to run in isolated containers

  • Coverage across GitHub, Stripe, Cloudflare, databases, and more

  • Fully open source and customizable with Nixpacks

That’s a Wrap

That’s all for today. Thank you for reading today’s edition. See you in the next issue with more AI Engineering insights.

PS: We curate this AI Engineering content for free, and your support means everything. If you find value in what you read, consider sharing it with a friend or two.

Your feedback is valuable: If there’s a topic you’re stuck on or curious about, reply to this email. We’re building this for you, and your feedback helps shape what we send.

WORK WITH US

Looking to promote your company, product, or service to 150K+ AI developers? Get in touch today by replying to this email.