An open API service indexing awesome lists of open source software.

https://github.com/jawherkl/llm-foundations

A structured learning path for software engineers to master Large Language Models through theory, practical exercises, and project implementation.
https://github.com/jawherkl/llm-foundations

foundations llm models

Last synced: 9 months ago
JSON representation

A structured learning path for software engineers to master Large Language Models through theory, practical exercises, and project implementation.

Awesome Lists containing this project

README

          

# LLM Foundations: From Theory to Production πŸš€

![Repository Size](https://img.shields.io/github/repo-size/JawherKl/llm-foundations)
![Last Commit](https://img.shields.io/github/last-commit/JawherKl/llm-foundations)
![Issues](https://img.shields.io/github/issues-raw/JawherKl/llm-foundations)
![Forks](https://img.shields.io/github/forks/JawherKl/llm-foundations)
![Stars](https://img.shields.io/github/stars/JawherKl/llm-foundations)
![License](https://img.shields.io/github/license/JawherKl/llm-foundations)

![llm-foundations](https://github.com/JawherKl/llm-foundations/blob/main/resources/llm-foundations.jpg)

A structured learning path and project portfolio for software engineers to master **Large Language Models**. This repository moves beyond theory, focusing on the practical application of LLMs through prompt engineering, API integration, and building scalable applications.

> **πŸ’‘ For Developers, By a Developer:** This isn't just a list of concepts. It's a hands-on curriculum designed to take you from foundational understanding to building production-ready LLM applications.

## πŸ“– Overview

The field of Large Language Models is moving fast. This repository provides a structured path to not just keep up, but to become proficient. It's organized into a **28-step curriculum** that balances deep theoretical understanding with immediate, practical application.

Whether you're building AI-powered features into your product, automating workflows, or launching a new AI-based service, this guide will help you develop the necessary skills.

## 🧭 The 28-Day Learning Path

The curriculum is divided into six logical parts:

| Part | Focus Area | What You'll Achieve |
|:---|:---|:---|
| **1. Theory Foundations** | Core Concepts | Understand how LLMs work under the hood |
| **2. Prompt Engineering** | Communication | Master the art of guiding LLMs to desired outputs |
| **3. Practical Applications** | API Integration | Build functional applications using various LLM APIs |
| **4. Advanced Topics** | Production Systems | Implement RAG, work with vector DBs, and build agents |
| **5. Build Projects** | Portfolio Development | Create showcase projects for your portfolio |
| **6. Next Steps** | Career Planning | Define your specialization and next learning goals |

## πŸ—οΈ Repository Structure

```bash
llm-foundations/
β”œβ”€β”€ 01-theory-foundations/ # Days 1-8: How LLMs work
β”œβ”€β”€ 02-prompt-engineering/ # Days 9-16: Effective prompting
β”œβ”€β”€ 03-practical-applications/ # Days 17-18: API integration & simple apps
β”œβ”€β”€ 04-advanced-topics/ # Days 19-27: RAG, vector DBs, agents
β”œβ”€β”€ 05-build-projects/ # Portfolio project development
β”œβ”€β”€ 06-reflection-next-steps/ # Day 28: Planning your path forward
β”œβ”€β”€ resources/ # Cheatsheets, tools, reading lists
└── quizzes/ # Self-assessment tools
```

## πŸ“š Learning Resources

To make the most of this curriculum, you'll want to be familiar with these core technologies and have these tools ready.

### 🐍 Prerequisite Knowledge
* **Python Programming**: Intermediate proficiency (functions, classes, decorators, async/await)
* **API Concepts**: REST APIs, HTTP requests, authentication (API keys)
* **Basic Command Line**: Navigating directories, running scripts, managing environments
* **Git & GitHub**: Cloning repositories, making commits, creating pull requests

### πŸ”§ Essential Tools & Accounts
| Category | Tools & Services | Description |
|:---|:---|:---|
| **Development** | Python 3.10+, VS Code, Jupyter Notebook | Core coding environment |
| **API Access** | [OpenAI](https://platform.openai.com/), [Anthropic](https://console.anthropic.com/), [Cohere](https://dashboard.cohere.com/) | Accounts for LLM API access (some offer free credits) |
| **Open Source LLMs** | [Ollama](https://ollama.ai/), [LM Studio](https://lmstudio.ai/) | Run models locally on your machine |
| **Vector Databases** | [Pinecone](https://www.pinecone.io/), [Chroma](https://www.trychroma.com/), [Weaviate](https://weaviate.io/) | For RAG implementations (free tiers available) |
| **UI Frameworks** | [Streamlit](https://streamlit.io/), [Gradio](https://www.gradio.app/) | For building web interfaces for your LLM apps |
| **Prompt Tools** | [PromptHero](https://prompthero.com/), [FlowGPT](https://flowgpt.com/) | For prompt inspiration and testing |

### πŸ“– Recommended Learning Path
1. **Setup Your Environment**: Install Python, create a virtual environment, and install key packages (`openai`, `langchain`, `streamlit`)
2. **Get API Access**: Sign up for OpenAI/Anthropic and get your API keys
3. **Install Ollama**: Follow the [Ollama installation guide](https://github.com/ollama/ollama) to run models locally
4. **Clone This Repo**: `git clone https://github.com/JawherKl/llm-foundations.git`
5. **Explore the Structure**: Review the repository organization and learning path

### πŸ’‘ Pro Tips
* **Start Small**: Begin with simple API calls before tackling complex frameworks
* **Use Free Tiers**: Most LLM APIs offer free credits to get started
* **Experiment Locally**: Use Ollama with smaller models (like Llama 3) for experimentation without API costs
* **Document Your Learning**: Keep notes on what works and what doesn't - this becomes valuable reference material
* **Join Communities**: Participate in Discord servers and subreddits like r/LocalLLaMA, r/LangChain, and AI developer communities

### πŸ†“ Free Resources to Supplement Learning
* [DeepLearning.AI Short Courses](https://www.deeplearning.ai/short-courses/) - Free courses on LLMs, ChatGPT, and LangChain
* [Andrew Ng's YouTube Channel](https://www.youtube.com/channel/UCep6Rpvw3PtOMJWAFpKl8Yw) - Excellent explanations of AI concepts
* [Full Stack LLM Bootcamp](https://fullstackdeeplearning.com/llm-bootcamp/) - Comprehensive video series on building LLM applications
* [Hugging Face Course](https://huggingface.co/course/chapter1) - Great for understanding transformers and open-source models

---

## πŸš€ Getting Started

### For the Structured Learner (Recommended)
1. **Start with Theory**: Begin with `01-theory-foundations/README.md`
2. **Follow the Path**: Progress through each section in order
3. **Build as You Learn**: Implement projects in `05-build-projects/` as you acquire relevant skills
4. **Assess Your Knowledge**: Use the quizzes to validate your understanding

### For the Project-Focused Learner
1. **Skim the Theory**: Review `01-theory-foundations/04-key-terminologies.md`
2. **Master Prompting**: Study `02-prompt-engineering/` thoroughly
3. **Pick a Project**: Choose a project from `05-build-projects/project-ideas.md`
4. **Learn as You Build**: Reference specific sections as needed for your project

### For the Experienced Developer
1. **Assessment First**: Take the `quizzes/final-assessment.md` to identify knowledge gaps
2. **Targeted Learning**: Focus on sections where you need reinforcement
3. **Contribute**: Share your expertise by improving content or adding new examples

## πŸ› οΈ Tech Stack & Tools

This curriculum prepares you to work with:

- **LLM APIs**: OpenAI GPT, Anthropic Claude, Cohere, OpenRouter
- **Frameworks**: LangChain, LlamaIndex, Haystack
- **Vector Databases**: Pinecone, Chroma, Weaviate, Qdrant
- **UI Tools**: Streamlit, Gradio, Chainlit
- **Open Source Models**: Llama 2/3, Mistral, Phi via Ollama
- **Development**: Python, Jupyter Notebooks, Docker

## 🀝 How to Contribute

We welcome contributions! Here's how you can help:

1. **Fix Errors**: Found a mistake? Submit a PR with corrections
2. **Add Examples**: Share your prompt engineering examples or code samples
3. **Improve Explanations**: Help make complex concepts more accessible
4. **Share Projects**: Add your LLM projects to the build-projects section
5. **Suggest Resources**: Recommend great learning materials

Please read our [Contributing Guidelines](CONTRIBUTING.md) before submitting a pull request.

## πŸ“š Bibliography & Further Reading

This repository synthesizes knowledge from a wide array of exceptional resources. The following books, articles, papers, and documentation were instrumental in its creation and serve as recommended reading for those who wish to dive deeper.

### Foundational Papers
* **[[1706.03762] Attention Is All You Need](https://arxiv.org/abs/1706.03762)** - Vaswani et al. (2017) - The seminal paper introducing the Transformer architecture.
* **[[1810.04805] BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805)** - Devlin et al. (2018) - Introduced the encoder-only Transformer and masked language modeling.
* **[[2005.14165] Language Models are Few-Shot Learners (GPT-3 Paper)](https://arxiv.org/abs/2005.14165)** - Brown et al. (2020) - Demonstrated the remarkable scaling and few-shot abilities of large autoregressive models.
* **[[1910.10683] Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer (T5 Paper)](https://arxiv.org/abs/1910.10683)** - Raffel et al. (2019) - Reframed all NLP tasks into a text-to-text format.

### Essential Books & Online Books
* **["Natural Language Processing with Transformers"](https://www.oreilly.com/library/view/natural-language-processing/9781098136789/)** by Tunstall, von Werra, & Wolf - The definitive practical guide to using the Hugging Face ecosystem.
* **["Transformers for Natural Language Processing"](https://www.packtpub.com/product/transformers-for-natural-language-processing-second-edition/9781803247335)** by Denis Rothman - A comprehensive guide to Transformer models.
* **["Hands-On Large Language Models"](https://www.oreilly.com/library/view/hands-on-large-language/9781098150952/)** by Suraj Patil & others - A very practical, project-based approach.
* **["The OpenAI API Book"](https://www.linkedin.com/pulse/openai-api-book-build-ai-products-ship-faster-smarper-michael-king/)** by Michael King - A great resource focused on practical API usage.

### Influential Blogs & Articles
* **Jay Alammar's Blog ([The Illustrated Transformer](https://jalammar.github.io/illustrated-transformer/))** - Legendary visual explanations of complex ML concepts.
* **Lil'Log ([Prompt Engineering](https://lilianweng.github.io/posts/2023-03-15-prompt-engineering/))** by Lilian Weng - In-depth and technical overview of prompt engineering techniques.
* **Andrej Karpathy's Blog ([AI for Full-Self Driving](https://karpathy.github.io/))** - While focused on AI for cars, his writing on software 2.0 and NN training is foundational.
* **Simon Willison's Blog ([LLM tag](https://simonwillison.net/tags/llm/))** - A prolific writer on practical LLM applications and emerging patterns.
* **EMAXX.IO ([RTF and CRISPE Frameworks](https://emaxx.io/blog/posts/rtf_crispe_frameworks_for_prompt_engineering.html))** - Excellent breakdown of prompt engineering frameworks.

### Official Documentation
* **[OpenAI API Documentation](https://platform.openai.com/docs/introduction)** - The source for all things GPT, embeddings, and fine-tuning on OpenAI's platform.
* **[Anthropic API Documentation](https://docs.anthropic.com/claude/docs)** - Comprehensive guide to using Claude models.
* **[LangChain Documentation](https://python.langchain.com/docs/get_started/introduction)** - Essential for building complex, multi-step LLM applications.
* **[LlamaIndex Documentation](https://docs.llamaindex.ai/en/stable/)** - The best resource for learning about Retrieval-Augmented Generation (RAG).
* **[Hugging Face Transformers Documentation](https://huggingface.co/docs/transformers/index)** - The go-to resource for working with open-source models.

### Courses & Video Series
* **[Full Stack LLM Bootcamp](https://fullstackdeeplearning.com/llm-bootcamp/)** - A free, excellent video series on building production LLM apps.
* **[DeepLearning.AI Short Courses](https://www.deeplearning.ai/short-courses/)** - Specifically "ChatGPT Prompt Engineering for Developers" and "LangChain for LLM Application Development".
* **[CS324 - Large Language Models](https://stanford-cs324.github.io/winter2022/)** - Stanford's course on LLMs, covering fundamentals and advanced topics.

### Community & Inspiration
* **r/LocalLLaMA** - The central Reddit community for open-source LLMs.
* **Hugging Face Discord** - A vibrant community for discussion and help with open-source models.
* **LangChain Discord** - Great for getting help with the LangChain framework.
* **AI Engineer Summit Talks ([YouTube](https://www.youtube.com/@aiDotEngineer))** - Talks from practitioners building the cutting edge of LLM applications.

---

*This bibliography represents a living list. If you have a resource that was foundational to your understanding, please consider contributing to this section.*

## πŸ“œ License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

## πŸ™ Acknowledgments

- Inspired by various learning paths and roadmaps
- Built upon the work of researchers and developers in the LLM space
- Thanks to all contributors who help improve this resource

---

**⭐ If you find this repository helpful, please give it a star!** This helps others discover it and encourages further development.

## πŸ—ΊοΈ What's Next?

Ready to begin your LLM journey? Start here: **[Theory Foundations](./01-theory-foundations/README.md)**

---

*This repository is maintained by [JawherKl](https://github.com/JawherKl). For questions or suggestions, please open an issue or discussion.*