- AI Engineering
- Posts
- Free Course on Document AI: From OCR to Agentic Doc Extraction
Free Course on Document AI: From OCR to Agentic Doc Extraction
... PLUS: Tokenizer-free, open-source text-to-speech and zero-shot voice cloning
In today’s newsletter:
Document AI Course: From OCR to Agentic Doc Extraction
VoxCPM: Tokenizer-free, open-source text-to-speech and zero-shot voice cloning
Claude Code Templates: Reusable agents, commands, and integrations to accelerate Claude Code
Reading time: 3 minutes.
LandingAI just released a free course on Document AI that teaches you how to build document processing pipelines that extract text, tables, charts, and forms without losing layout context.
Traditional OCR extracts text but loses critical information. Table structures with merged cells disappear. Relationships between charts and captions break. Multi-column reading order gets scrambled.
This course shows you how to build agentic workflows that process documents the way humans do, using Agentic Document Extraction (ADE).
Here’s what it covers:
Why traditional OCR breaks on complex documents
How layout detection and reading order preserve structure
Using ADE to parse PDFs into Markdown and JSON while keeping layout intact
Building RAG pipelines with ADE and vector databases
Deploying event-driven document workflows on AWS
3 hours, 6 hands-on code examples.
VoxCPM is an open-source text-to-speech system that models speech in continuous space instead of discrete tokens.
Most TTS systems convert speech to discrete tokens before generation. This quantization creates a fundamental trade-off: tokens provide stability but lose acoustic details like breath, vocal texture, and subtle articulation.
VoxCPM skips tokenization entirely.
It models speech directly in continuous space using an end-to-end diffusion autoregressive architecture built on MiniCPM-4.
The system uses hierarchical language modeling with two specialized components: a Text-Semantic Language Model that captures high-level prosody and structure, and a Residual Acoustic Model that recovers fine-grained acoustic details.
This separation eliminates dependency on external speech tokenizers and prevents error accumulation from multi-stage pipelines.
Two flagship capabilities:
Context-aware speech generation: The model comprehends text to infer appropriate prosody and speaking style. Explanations slow down naturally, emphasis appears in the right places, questions sound like questions.
Zero-shot voice cloning: With just 3-10 seconds of reference audio, it replicates speaker timbre, accent, emotional tone, rhythm, and pacing.
Key features:
Tokenizer-free architecture with continuous speech modeling
Context-aware prosody generation without manual tuning
Zero-shot voice cloning from short reference audio
Streaming synthesis support for real-time applications
SFT and LoRA fine-tuning support
It's 100% open source
Claude Code Templates is an open-source collection of AI agents, custom commands, and integrations that you can install instantly.
Most developers configure Claude Code from scratch for every project. Writing agent instructions, setting up commands, configuring integrations.
This removes that repetition.
It provides 400+ pre-built components: specialized agents for security auditing, code review, and API design. Custom commands for testing and deployment. External integrations for GitHub, PostgreSQL, Stripe, and AWS.
What you get:
AI agents for specific domains (security auditor, performance optimizer, database architect)
Custom slash commands for common workflows
External service integrations (MCPs)
Optimized settings and automation hooks
The library also includes developer tools like an analytics dashboard to monitor your sessions and a conversation monitor to view Claude responses in real-time.
It aggregates components from multiple sources: official Anthropic skills, community contributions, and scientific workflows.
That’s all for today. Thank you for reading today’s edition. See you in the next issue with more AI Engineering insights.
PS: We curate this AI Engineering content for free, and your support means everything. If you find value in what you read, consider sharing it with a friend or two.
Your feedback is valuable: If there’s a topic you’re stuck on or curious about, reply to this email. We’re building this for you, and your feedback helps shape what we send.
WORK WITH US
Looking to promote your company, product, or service to 160K+ AI developers? Get in touch today by replying to this email.


