5 GitHub Repositories for AI Engineers

.. PLUS: Transform any document into LLM ready data!

In today’s newsletter:

  • Docling - Transform any document into LLM ready data

  • 5 GitHub Repositories for AI Engineers

Reading time: 3 minutes.

Transform any document into LLM-ready data!

Docling is an open-source toolkit that parses unstructured files into clean, structured formats Markdown, JSON, and more.

Key Features:

  • Parses PDFs, DOCX, HTML, PPTX, XLSX, images, and audio

  • Handles complex layouts: tables, code, formulas, and multi-column flows

  • Exports to Markdown, DocTags, HTML, or JSON

  • Fully offline support for secure environments

  • Integrates with LangChain, LlamaIndex, Haystack, Crew AI

  • OCR support for scanned documents and images

  • ASR (Automatic Speech Recognition) for MP3/WAV files

  • SmolDocling: lightweight visual model support

It’s 100% open source and runs on macOS, Linux, and Windows

5 GitHub Repositories you should definetely check as an AI Engineer

These open-source repositories aren't just helpful. They are foundational for building real AI systems.

From understanding LLM basics to building agents, fine-tuning models, and deploying full-stack ML applications, these repos will help you go from idea to production.

This repository contains the complete code examples from the book Hands-On Large Language Models.

Why it matters:

  • Run and edit real code to understand how transformers work

  • Includes practical examples for fine-tuning and deployment

  • Use the notebooks as templates for your own projects

This repository provides tutorials and implementations for various Generative AI Agent techniques, from basic to advanced.

It serves as a comprehensive guide for building intelligent, interactive AI Agents.

Why it matters:

  • Includes real implementations of planning, memory, tool-use, and multi-agent workflows.

  • Ideal if you're experimenting with ReAct, AutoGPT-style loops, or custom toolchains.

Learn how to design, develop, deploy and iterate on production-grade ML applications.

Why it matters:

  • Covers practical ML topics like versioning, monitoring, data pipelines, and CI/CD

  • Shows how to take ML projects from notebooks to production

  • Useful for learning real-world MLOps practices

The repo contains all the guides, papers, lectures, notebooks, and resources to learn and master prompt engineering.

Why it matters:

  • Offers a comprehensive collection of guides, papers, lectures, notebooks, and resources on prompt engineering

  • Explains core techniques like few-shot, zero-shot, and chain‑of‑thought prompting clearly and practically

  • Great for anyone building LLM apps, tuning system prompts, or trying to squeeze more reliability and performance from large language models

Beginner-friendly course on AI Agents.

This free 11-lesson course will teach you everything you need to get started with building AI agents.

Why it matters:

  • Great starting point if you're new to building agents.

  • Teaches foundational concepts like tool use, planning, and memory using simple Python scripts.

That’s a Wrap

That’s all for today. Thank you for reading today’s edition. See you in the next issue with more AI Engineering insights.

PS: We curate this AI Engineering content for free, and your support means everything. If you find value in what you read, consider sharing it with a friend or two.

Your feedback is valuable: If there’s a topic you’re stuck on or curious about, reply to this email. We’re building this for you, and your feedback helps shape what we send.

WORK WITH US

Looking to promote your company, product, or service to 140K+ AI developers? Get in touch today by replying to this email.