Turn PDFs into Clean, LLM-Ready Data

.. PLUS: Unified Backend Framework for AI Applications

In today’s newsletter:

  • Turn PDFs into Clean, LLM-Ready Data

  • Motia - A Unified Backend Framework for AI Agents

  • Trace & Evaluate Any LLM App with a Single Decorator

Reading time: 3 minutes.

PDFs lock content into complex layouts, making it difficult for LLMs to process text, tables, and images effectively.

Dolphin is an open source parsing framework that converts PDFs into structured formats such as Markdown, HTML, LaTeX, and JSON.

How It Works

  1. Layout analysis - Detects and sequences elements according to the document’s natural reading order.

  2. Parallel parsing - Processes each element with specialized prompts tailored to different content types (text blocks, tables, figures, etc.).

Key Features

  • Two-stage “analyze-then-parse” pipeline powered by a single VLM

  • Strong performance on complex document parsing tasks

  • Reading-order-aware element sequencing

  • Specialized prompts for different document elements

  • Efficient parallel parsing for faster results

It’s 100% Open Source.

Modern AI applications often rely on multiple backend components APIs, agents, background jobs, data streams, and workflows each adding integration complexity.

Motia consolidates them into a single backend runtime.

Motia is a backend framework that brings AI agents, APIs, background jobs, streams, and workflows into one unified system eliminating the need for separate services or complex integrations.

Key Capabilities

  • Unified runtime - Build and manage APIs, event-driven jobs, workflows, and AI agents in a single application.

  • Polyglot execution - Write Steps in JavaScript, TypeScript, or Python within the same codebase.

  • Visual workbench - Local GUI for building, observing, and debugging flows in real time with tracing, logs, and state inspection.

  • Built-in observability - Structured logs and complete workflow visibility without extra configuration.

  • Event-driven logic - Seamlessly connect APIs, background tasks, and AI workflows using an integrated event system.

  • Unified state management - Share and track state across Steps without additional tooling.

Why it matters:

Motia gives AI developers a backend foundation where agents, APIs, and workflows coexist naturally reducing architecture sprawl and speeding up development cycles.

It’s 100% Open Source

Most LLM evaluations compare a single input to the final output, treating the application as a black box. That’s fine for simple apps but real-world pipelines are more complex.

Bugs often hide inside the system:

  • A retriever surfaces irrelevant documents

  • A tool call fails silently

  • An agent makes the wrong decision

DeepEval brings component-level evaluation to the table. Instead of testing only the end result, you can trace and measure each stage of your pipeline.

How it works:

  • Wrap pipeline components (retrievers, generators, tools) with @observe

  • Attach metrics to individual components

  • Identify exactly which part of the system is failing and why

This approach provides actionable insights, such as detecting irrelevant retrievals, failed tool calls, or incorrect agent decisions allowing faster refinement of complex pipelines.

It’s 100% Open Source.

That’s a Wrap

That’s all for today. Thank you for reading today’s edition. See you in the next issue with more AI Engineering insights.

PS: We curate this AI Engineering content for free, and your support means everything. If you find value in what you read, consider sharing it with a friend or two.

Your feedback is valuable: If there’s a topic you’re stuck on or curious about, reply to this email. We’re building this for you, and your feedback helps shape what we send.

WORK WITH US

Looking to promote your company, product, or service to 120K+ AI developers? Get in touch today by replying to this email.