Fine-tune DeepSeek-OCR locally

.. PLUS: Stanford CME 295 Transformers & LLMs from scratch

In today’s newsletter:

  • Build and Launch Your Own AI Agents

  • Fine-tune DeepSeek-OCR Locally

  • Stanford CME 295 Transformers & Large Language Models from scratch

Reading time: 3 minutes.

1M$ Challenge — TOGETHER WITH APIFY

Apify lets you turn any Python or JavaScript project into a runnable micro-app called an Actor.

Each Actor runs as a self-contained unit with its own input schema, output schema, and runtime environment.

You can build agents, MCP servers, crawlers, document analyzers, or AI tools that parse data, summarize content, or automate workflows. Each one can be deployed as an Actor on Apify.

Apify has also launched the $1M Challenge, where you can build and publish your own Actors for real-world use cases.

Top projects can win up to $30K in cash prizes, weekly rewards, and visibility across the community.

It is a great opportunity to turn your automations or AI workflows into production-ready tools.

Now, let’s get back into the newsletter!

DeepSeek released a new OCR model built for document understanding and long-context reasoning.

It’s a 3B parameter vision model that uses context optical compression to convert 2D document layouts into compact vision tokens instead of thousands of text tokens.

This lets it handle tables, forms, and handwriting while using up to 10x fewer tokens than text-based models. Despite the compression, it still reaches around 97% precision on OCR benchmarks.

The architecture combines a vision encoder that compresses the layout and a language decoder that reconstructs text from it. This makes it faster and more memory-efficient for long documents.

If you want to adapt it to your own data, Unsloth AI released a guide and notebook to fine-tune DeepSeek OCR locally.

You can train it on your domain documents or improve its language performance with your own dataset.

Stanford released their new course lectures that takes you from the basics how Transformers actually work to building Agentic workflows.

Here's what it covers in detail:

  • Transformers (tokenization, embeddings, attention mechanism, architecture)

  • LLM foundations (definition, MoEs, types of decoding)

  • LLM training and tuning (supervised/reinforcement finetuning, LoRA)

  • LLM evaluation (LLM/VLM-as-a-judge and best practices)

  • Common tricks (RoPE, attention approximation, quantization)

  • Reasoning (train/test-time scaling, context awareness)

  • Agentic workflows (RAG, tool calling)

Checkout the full series:

That’s a Wrap

That’s all for today. Thank you for reading today’s edition. See you in the next issue with more AI Engineering insights.

PS: We curate this AI Engineering content for free, and your support means everything. If you find value in what you read, consider sharing it with a friend or two.

Your feedback is valuable: If there’s a topic you’re stuck on or curious about, reply to this email. We’re building this for you, and your feedback helps shape what we send.

WORK WITH US

Looking to promote your company, product, or service to 160K+ AI developers? Get in touch today by replying to this email.