• AI Engineering
  • Posts
  • Free Course on Document AI: From OCR to Agentic Doc Extraction

Free Course on Document AI: From OCR to Agentic Doc Extraction

... PLUS: Tokenizer-free, open-source text-to-speech and zero-shot voice cloning

In today’s newsletter:

  • Document AI Course: From OCR to Agentic Doc Extraction

  • VoxCPM: Tokenizer-free, open-source text-to-speech and zero-shot voice cloning

  • Claude Code Templates: Reusable agents, commands, and integrations to accelerate Claude Code

Reading time: 3 minutes.

LandingAI just released a free course on Document AI that teaches you how to build document processing pipelines that extract text, tables, charts, and forms without losing layout context.

Traditional OCR extracts text but loses critical information. Table structures with merged cells disappear. Relationships between charts and captions break. Multi-column reading order gets scrambled.

This course shows you how to build agentic workflows that process documents the way humans do, using Agentic Document Extraction (ADE).

Here’s what it covers:

  • Why traditional OCR breaks on complex documents

  • How layout detection and reading order preserve structure

  • Using ADE to parse PDFs into Markdown and JSON while keeping layout intact

  • Building RAG pipelines with ADE and vector databases

  • Deploying event-driven document workflows on AWS

3 hours, 6 hands-on code examples.

VoxCPM is an open-source text-to-speech system that models speech in continuous space instead of discrete tokens.

Most TTS systems convert speech to discrete tokens before generation. This quantization creates a fundamental trade-off: tokens provide stability but lose acoustic details like breath, vocal texture, and subtle articulation.

VoxCPM skips tokenization entirely.

It models speech directly in continuous space using an end-to-end diffusion autoregressive architecture built on MiniCPM-4.

The system uses hierarchical language modeling with two specialized components: a Text-Semantic Language Model that captures high-level prosody and structure, and a Residual Acoustic Model that recovers fine-grained acoustic details.

This separation eliminates dependency on external speech tokenizers and prevents error accumulation from multi-stage pipelines.

Two flagship capabilities:

  1. Context-aware speech generation: The model comprehends text to infer appropriate prosody and speaking style. Explanations slow down naturally, emphasis appears in the right places, questions sound like questions.

  2. Zero-shot voice cloning: With just 3-10 seconds of reference audio, it replicates speaker timbre, accent, emotional tone, rhythm, and pacing.

Key features:

  • Tokenizer-free architecture with continuous speech modeling

  • Context-aware prosody generation without manual tuning

  • Zero-shot voice cloning from short reference audio

  • Streaming synthesis support for real-time applications

  • SFT and LoRA fine-tuning support

It's 100% open source

Claude Code Templates is an open-source collection of AI agents, custom commands, and integrations that you can install instantly.

Most developers configure Claude Code from scratch for every project. Writing agent instructions, setting up commands, configuring integrations.

This removes that repetition.

It provides 400+ pre-built components: specialized agents for security auditing, code review, and API design. Custom commands for testing and deployment. External integrations for GitHub, PostgreSQL, Stripe, and AWS.

What you get:

  • AI agents for specific domains (security auditor, performance optimizer, database architect)

  • Custom slash commands for common workflows

  • External service integrations (MCPs)

  • Optimized settings and automation hooks

The library also includes developer tools like an analytics dashboard to monitor your sessions and a conversation monitor to view Claude responses in real-time.

It aggregates components from multiple sources: official Anthropic skills, community contributions, and scientific workflows.

That’s all for today. Thank you for reading today’s edition. See you in the next issue with more AI Engineering insights.

PS: We curate this AI Engineering content for free, and your support means everything. If you find value in what you read, consider sharing it with a friend or two.

Your feedback is valuable: If there’s a topic you’re stuck on or curious about, reply to this email. We’re building this for you, and your feedback helps shape what we send.

WORK WITH US

Looking to promote your company, product, or service to 160K+ AI developers? Get in touch today by replying to this email.