• AI Engineering
  • Posts
  • Build and Train Diffusion Language Models from Scratch

Build and Train Diffusion Language Models from Scratch

.. PLUS: SOTA omni-modal model

In today’s newsletter:

  • ERNIE 5.0 - SOTA omni-modal model that handles text, images, audio, and video

  • dllm - Open-source library for training diffusion language models

  • DeepTeam - Test and detect security issues in LLM Apps

Reading time: 3 minutes.

ERNIE 5.0: SOTA Omni-Modal Model

Baidu introduced ERNIE 5.0. It is a natively omni-modal model designed to see, hear, think, and speak across text, images, audio, and video.

Key Highlights:

  • Uses a 2.4 trillion parameter MoE architecture with less than 3 percent active per inference, giving it both scale and efficiency.

  • Supports unified understanding and generation across all modalities through a single autoregressive architecture.

  • Introduces improvements in omni-modal modeling, MoE efficiency, unified generation, and agentic planning.

Performance:

  • Competitive with models such as Gemini 2.5 Pro and GPT 5 High across more than 40 evaluations covering language, vision, and multimodal reasoning.

  • Produces image and video outputs that match the quality of leading domain-specific generators.

  • Strong scores across text understanding, generation, multimodal alignment, and long-context tasks in the preview evaluation results.

dllm is an open-source library for building, training, and evaluating diffusion-based language models without custom pipelines or handwritten training loops.

Why this matters:
Most LLMs today are autoregressive. They generate token by token, which is fast but prone to exposure bias and struggles with global coherence.

Diffusion Language Models work differently:
They rebuild text by denoising corrupted sequences over multiple steps. This gives them stronger long-range reasoning, fewer cascading errors, and more stable long-form outputs.

The real problem has been tooling:
Diffusion models need custom loops, noise schedules, and evaluation setups. None of this is plug-and-play.

dllm fixes that:
It gives you a structured, reproducible pipeline for training and evaluating diffusion LMs without writing any scaffolding.

Key Features:

  • Full training workflow for diffusion LMs using clean configs

  • Support for LoRA, DeepSpeed, and FSDP for scaling and efficiency

  • Modular model components so you can test new diffusion architectures

  • Simple dataset loading and experiment management

  • Built-in evaluation utilities for comparing runs and ablations

DeepTeam is an open-source LLM red teaming framework to safety test LLM systems.

It lets you simulate adversarial attacks using state-of-the-art techniques like jailbreaking and prompt injections, so you can fix vulnerabilities such as PII leakage before users’ data get leaked.

It works with any LLM system, including RAG pipelines, chatbots, and AI agents.

Key Features:

  • Detects 50+ vulnerabilities, including bias, PII leakage, and misinformation.

  • Supports single-turn and multi-turn attacks such as prompt injection and jailbreaking.

  • Works with any LLM setup through a simple callback interface.

  • Easily extendable to add new vulnerabilities or attacks.

  • Generates detailed risk reports for CI/CD or monitoring workflows.

  • Follows industry security guidelines such as OWASP Top 10 for LLMs.

It’s 100% Open Source

That’s a Wrap

That’s all for today. Thank you for reading today’s edition. See you in the next issue with more AI Engineering insights.

PS: We curate this AI Engineering content for free, and your support means everything. If you find value in what you read, consider sharing it with a friend or two.

Your feedback is valuable: If there’s a topic you’re stuck on or curious about, reply to this email. We’re building this for you, and your feedback helps shape what we send.

WORK WITH US

Looking to promote your company, product, or service to 160K+ AI developers? Get in touch today by replying to this email.