AI Engineering
Posts
5 GitHub Repositories for AI Engineers

5 GitHub Repositories for AI Engineers

.. PLUS: Transform any document into LLM ready data!

Sumanth P
July 14, 2025

In today’s newsletter:

Docling - Transform any document into LLM ready data
5 GitHub Repositories for AI Engineers

Reading time: 3 minutes.

Docling – Get your documents ready for gen AI

Transform any document into LLM-ready data!

Docling is an open-source toolkit that parses unstructured files into clean, structured formats Markdown, JSON, and more.

Key Features:

Parses PDFs, DOCX, HTML, PPTX, XLSX, images, and audio
Handles complex layouts: tables, code, formulas, and multi-column flows
Exports to Markdown, DocTags, HTML, or JSON
Fully offline support for secure environments
Integrates with LangChain, LlamaIndex, Haystack, Crew AI
OCR support for scanned documents and images
ASR (Automatic Speech Recognition) for MP3/WAV files
SmolDocling: lightweight visual model support

It’s 100% open source and runs on macOS, Linux, and Windows

👉 Check out the Github Repo

5 GitHub Repositories you should definetely check as an AI Engineer

These open-source repositories aren't just helpful. They are foundational for building real AI systems.

From understanding LLM basics to building agents, fine-tuning models, and deploying full-stack ML applications, these repos will help you go from idea to production.

1. Hands-On Large Language Models

This repository contains the complete code examples from the book Hands-On Large Language Models.

Why it matters:

Run and edit real code to understand how transformers work
Includes practical examples for fine-tuning and deployment
Use the notebooks as templates for your own projects

👉 Check out the Github Repo → here

2. GenAI Agents

This repository provides tutorials and implementations for various Generative AI Agent techniques, from basic to advanced.

It serves as a comprehensive guide for building intelligent, interactive AI Agents.

Why it matters:

Includes real implementations of planning, memory, tool-use, and multi-agent workflows.
Ideal if you're experimenting with ReAct, AutoGPT-style loops, or custom toolchains.

👉 Check out the Github Repo → here

3. Made with ML

Learn how to design, develop, deploy and iterate on production-grade ML applications.

Why it matters:

Covers practical ML topics like versioning, monitoring, data pipelines, and CI/CD
Shows how to take ML projects from notebooks to production
Useful for learning real-world MLOps practices

👉 Check out the Github Repo → here

4. Prompt Engineering Guide

The repo contains all the guides, papers, lectures, notebooks, and resources to learn and master prompt engineering.

Why it matters:

Offers a comprehensive collection of guides, papers, lectures, notebooks, and resources on prompt engineering
Explains core techniques like few-shot, zero-shot, and chain‑of‑thought prompting clearly and practically
Great for anyone building LLM apps, tuning system prompts, or trying to squeeze more reliability and performance from large language models

👉 Check out the Github Repo → here

5. AI Agents for Beginners

Beginner-friendly course on AI Agents.

This free 11-lesson course will teach you everything you need to get started with building AI agents.

Why it matters:

Great starting point if you're new to building agents.
Teaches foundational concepts like tool use, planning, and memory using simple Python scripts.

👉 Check out the Github Repo → here

That’s a Wrap

That’s all for today. Thank you for reading today’s edition. See you in the next issue with more AI Engineering insights.

PS: We curate this AI Engineering content for free, and your support means everything. If you find value in what you read, consider sharing it with a friend or two.

Your feedback is valuable: If there’s a topic you’re stuck on or curious about, reply to this email. We’re building this for you, and your feedback helps shape what we send.

WORK WITH US

Looking to promote your company, product, or service to 140K+ AI developers? Get in touch today by replying to this email.