Fine-tune LLM agents without fine-tuning LLMs

.. PLUS: Lightweight Model to Parse Digital-Native Documents

In today’s newsletter:

  • DPT-2 mini: Lightweight Model to Parse Digital-Native Documents

  • Memento: Fine-tune LLM agents without fine-tuning LLMs

  • Agentic Reviewer from Stanford: Accelerating Research Iteration via arXiv-Grounded Feedback

Reading time: 3 minutes.

Turn High-Volume PDFs into LLM-Ready data with Vision-First Agentic Document AI.

LandingAI has released Agentic Document Extraction (ADE) DPT-2 Mini, a lightweight variant of the Document Pretrained Transformer 2 (DPT-2) designed for high-volume document workflows.

DPT-2 mini is a lightweight model designed for high-volume, predictable, digitally generated documents.

It’s ideal for clean PDFs that still require visual context for the most accurate extraction.

Think invoices, contracts, memos, letters, and other digital PDFs.

Key Features:

  • Structured extraction for clean digital documents

  • Accurate layout detection across simple PDF formats

  • Full chunk-type support: paragraphs, figures, logos, cards, and more

  • Reliable English text transcription

  • Optimized for scale with fast, consistent, cost-efficient processing

DPT-2 Mini focuses on speed, reliability, and cost efficiency, perfect when your documents are simple, and you need clean, structured output at scale.

You can test the model directly in the ADE Playground, or use it via the ADE Python library to integrate it into your document processing pipelines.

Memento is a memory based continual learning framework for LLM agents that lets them learn from experience over time without touching model weights.

How It Works

Memento maintains a Case Bank of historical agent trajectories, including:

  • Task descriptions

  • Subtasks and execution steps

  • Tool usage patterns

  • Intermediate reasoning

  • Outcomes and corrections

When a new request arrives, the agent retrieves similar examples and uses them to guide the solution instead of reasoning from scratch.

This process follows a two-component architecture:

1. Planner (LLM): Breaks a task into subtasks, retrieves relevant cases from memory, and selects an execution plan.

2. Executor: Runs the selected plan using tools such as code execution, search, or document processing through the Model Context Protocol (MCP). Results are stored back into the Case Bank, creating a feedback loop.

Key Features

  • Memory-Driven Learning: Improves performance by reusing stored trajectories rather than modifying model weights.

  • Planner–Executor Structure: Separates task decomposition and action execution using case-based reasoning.

  • Unified Tool Interface: Supports search, code execution, document processing, media analysis, and other capabilities through a common execution layer.

  • Retrieval Instead of Fine-Tuning: Selects and applies relevant past cases to guide reasoning without parameter updates.

  • Robustness on Long-Horizon Tasks: Demonstrates improved stability, reuse of reasoning, and better handling of out-of-distribution scenarios.

Andrew Ng introduced an automated reviewer intended to shrink the painfully slow feedback loops in academic research.

The system was trained on ICLR 2025 reviews and evaluated using Spearman correlation, which measures rank-order agreement. In peer review, this matters because you care less about identical scores and more about whether two reviewers rank papers similarly.

Here’s the comparison:

  • Human vs Human: 0.41

  • AI Reviewer vs Human: 0.42

The takeaway is not that the model “matches humans,” but that its ranking consistency is already within the natural variance of human reviewers. In other words, the disagreement between two humans is roughly the same as the disagreement between the AI reviewer and a human.

The workflow is built around retrieval. It queries arXiv, pulls relevant prior work, and uses that context to evaluate novelty, clarity, methodology, and empirical grounding.

This avoids the usual unreferenced, surface-level critiques and pushes the output closer to how actual reviewers justify assessments.

It’s still experimental, but useful for researchers who want rapid, grounded critique while drafting or refining a paper, especially in domains with strong open-access literature.

That’s a Wrap

That’s all for today. Thank you for reading today’s edition. See you in the next issue with more AI Engineering insights.

PS: We curate this AI Engineering content for free, and your support means everything. If you find value in what you read, consider sharing it with a friend or two.

Your feedback is valuable: If there’s a topic you’re stuck on or curious about, reply to this email. We’re building this for you, and your feedback helps shape what we send.

WORK WITH US

Looking to promote your company, product, or service to 160K+ AI developers? Get in touch today by replying to this email.