5 Open-Source Libraries for Fine-Tuning LLMs

... PLUS: Self-Learning Skills for OpenClaw and Claude Code

In today’s newsletter:

  • Self-Learning Skills for OpenClaw and Claude Code

  • 5 Open-Source Libraries for Fine-Tuning LLMs

Reading time: 5 minutes.

Coding agents don't remember what they learn. You fix a bug in Claude Code today. Two days later, you're explaining the same thing to OpenClaw.

AContext turns agent runs into portable skills that sync across tools.

When your agent completes a task, AContext watches what happened and distills it into a skill automatically. Successful debugging sessions become reusable playbooks. Failed attempts with your corrections become guardrails.

What it captures:

  • Successful patterns and how you solved problems

  • Failed attempts with corrections

  • Your code style and project conventions

  • Team standards and naming rules

Everything gets stored as markdown files. You can read them, edit them, and version control them. Skills learned in Claude Code work in OpenClaw. Skills from OpenClaw work in Claude Code.

Installation is one line. Tell your agent to read the setup instructions at acontext.io/SKILL.md, and it handles the rest.

Every fixed bug and clarified requirement becomes a durable skill instead of another lost chat log. Your agents stop rediscovering the same solutions.

AContext works with both OpenClaw and Claude Code. If you want to try auto-learning skills across your agents.

5 Open-Source Libraries for Fine-Tuning LLMs

Fine-tuning a 70B model requires 280GB of VRAM. Load the model weights (140GB in FP16), add optimizer states (another 140GB), account for gradients and activations, and you're looking at hardware most teams can't access.

The standard approach doesn't scale. Training Llama 4 Maverick (400B parameters) or Qwen 3.5 397B on this math would require multi-node GPU clusters costing hundreds of thousands of dollars.

Five open-source libraries changed this by rewriting how training happens. Custom kernels, smarter memory management, and efficient algorithms make it possible to fine-tune frontier models on consumer GPUs.

Here's what each library does and when to use it:

Unsloth

Unsloth cuts VRAM usage by 70% and doubles training speed through hand-optimized CUDA kernels written in Triton.

Standard PyTorch attention does three separate operations: compute queries, compute keys, compute values. Each operation launches a kernel, allocates intermediate tensors, and stores them in VRAM. Unsloth fuses all three into a single kernel that never materializes those intermediates.

Gradient checkpointing is selective. During backpropagation, you need activations from the forward pass. Standard checkpointing throws everything away and recomputes it all. Unsloth only recomputes attention and layer normalization (the memory bottlenecks) and caches everything else.

What you can train:

  • Qwen 3.5 27B on a single 24GB RTX 4090 using QLoRA

  • Llama 4 Scout (109B total, 17B active per token) on an 80GB GPU

  • Gemma 3 27B with full fine-tuning on consumer hardware

  • MoE models like Qwen 3.5 35B-A3B (12x faster than standard frameworks)

  • Vision-language models with multimodal inputs

  • 500K context length training on 80GB GPUs

Training methods:

  • LoRA and QLoRA (4-bit and 8-bit quantization)

  • Full parameter fine-tuning

  • GRPO for reinforcement learning (80% less VRAM than PPO)

  • Pretraining from scratch

For reinforcement learning, GRPO removes the critic model that PPO requires. This is what DeepSeek R1 used for its reasoning training. You get the same training quality with a fraction of the memory.

The library integrates directly with Hugging Face Transformers. Your existing training scripts work with minimal changes. Unsloth also offers Unsloth Studio, a desktop app with a WebUI if you prefer no-code training.

LLaMA-Factory

LLaMA-Factory provides a Gradio interface where non-technical team members can fine-tune models without writing code.

Launch the WebUI and you get a browser-based dashboard. Select your base model from a dropdown (supports Llama 4, Qwen 3.5, Gemma 3, Phi-4, DeepSeek R1, and 100+ others). Upload your dataset or choose from built-in ones. Pick your training method and configure hyperparameters using form fields. Click start.

What it handles:

  • Supervised fine-tuning (SFT)

  • Preference optimization (DPO, KTO, ORPO)

  • Reinforcement learning (PPO, GRPO)

  • Reward modeling

  • Real-time loss curve monitoring

  • In-browser chat interface for testing outputs mid-training

  • Export to Hugging Face or local saves

Memory efficiency:

  • LoRA and QLoRA with 2-bit through 8-bit quantization

  • Freeze-tuning (train only a subset of layers)

  • GaLore, DoRA, and LoRA+ for improved efficiency

This matters for teams where domain experts need to run experiments independently. Your legal team can test whether a different contract dataset improves clause extraction. Your support team can fine-tune on recent tickets without waiting for ML engineers to write training code.

Built-in integrations with LlamaBoard, Weights & Biases, MLflow, and SwanLab handle experiment tracking. If you prefer command-line work, it also supports YAML configuration files.

Axolotl

Axolotl uses YAML configuration files for reproducible training pipelines. Your entire setup lives in version control.

Write one config file that specifies your base model (Qwen 3.5 397B, Llama 4 Maverick, Gemma 3 27B), dataset path and format, training method, and hyperparameters. Run it on your laptop for testing. Run the exact same file on an 8-GPU cluster for production.

Training methods:

  • LoRA and QLoRA with 4-bit and 8-bit quantization

  • Full parameter fine-tuning

  • DPO, KTO, ORPO for preference optimization

  • GRPO for reinforcement learning

The library scales from single GPU to multi-node clusters with built-in FSDP2 and DeepSpeed support. Multimodal support covers vision-language models like Qwen 3.5's vision variants and Llama 4's multimodal capabilities.

Six months after training, you have an exact record of what hyperparameters and datasets produced your checkpoint. Share configs across teams. A researcher's laptop experiments use identical settings to production runs.

The tradeoff is a steeper learning curve than WebUI tools. You're writing YAML, not clicking through forms.

Torchtune

Torchtune gives you the raw PyTorch training loop with no abstraction layers.

When you need to modify gradient accumulation, implement a custom loss function, add specific logging, or change how batches are constructed, you edit PyTorch code directly. You're working with the actual training loop, not configuring a framework that wraps it.

Built and maintained by Meta's PyTorch team. The codebase provides modular components (attention mechanisms, normalization layers, optimizers) that you mix and match as needed.

This matters when you're implementing research that requires training loop modifications. Testing a new optimization algorithm. Debugging unexpected loss curves. Building custom distributed training strategies that existing frameworks don't support.

The tradeoff is control versus convenience. You write more code than using a high-level framework, but you control exactly what happens at every step.

TRL

TRL handles alignment after fine-tuning. You've trained your model on domain data, now you need it to follow instructions reliably.

The library takes preference pairs (output A is better than output B for this input) or reward signals and optimizes the model's policy.

Methods supported:

  • RLHF (Reinforcement Learning from Human Feedback)

  • DPO (Direct Preference Optimization)

  • PPO (Proximal Policy Optimization)

  • GRPO (Group Relative Policy Optimization)

GRPO drops the critic model that PPO requires, cutting VRAM by 80% while maintaining training quality. This is what DeepSeek R1 used for reasoning training.

Full integration with Hugging Face Transformers, Datasets, and Accelerate means you can take any Hugging Face model, load preference data, and run alignment training with a few function calls.

This matters when supervised fine-tuning isn't enough. Your model generates factually correct outputs but in the wrong tone. It refuses valid requests inconsistently. It follows instructions unreliably. Alignment training fixes these by directly optimizing for human preferences rather than just predicting next tokens.

That’s all for today. Thank you for reading today’s edition. See you in the next issue with more AI Engineering insights.

PS: We curate this AI Engineering content for free, and your support means everything. If you find value in what you read, consider sharing it with a friend or two.

Your feedback is valuable: If there’s a topic you’re stuck on or curious about, reply to this email. We’re building this for you, and your feedback helps shape what we send.

WORK WITH US

Looking to promote your company, product, or service to 160K+ AI developers? Get in touch today by replying to this email.