awesome-generative-ai-data-scientist
  
  
    A curated list of 100+ resources for building and deploying generative AI specifically focusing on helping you become a Generative AI Data Scientist with LLMs 
    https://github.com/business-science/awesome-generative-ai-data-scientist
  
        Last synced: about 3 hours ago 
        JSON representation
    
- 
            
Free Training
- 
                    
                    
- Register for the next free workshop here.
 - Register for the next free workshop here.
 - Register for the next free workshop here.
 - Register for the next free workshop here.
 - Register for the next free workshop here.
 - Register for the next free workshop here.
 - Register for the next free workshop here.
 - Register for the next free workshop here.
 - Register for the next free workshop here.
 - Register for the next free workshop here.
 - Register for the next free workshop here.
 - Register for the next free workshop here.
 - Register for the next free workshop here.
 - Register for the next free workshop here.
 
 - 
                    
NVIDIA
- Generative AI Data Scientist Workshops - science.io/ai-register)
 - Generative AI Data Scientist Workshops - science.io/ai-register)
 - Generative AI Data Scientist Workshops - science.io/ai-register)
 - Generative AI Data Scientist Workshops - science.io/ai-register)
 
 
 - 
                    
                    
 - 
            
Data Science And AI Agents
- Microsoft Data Formulator - formulator) |
 - Jupyter Agent - agents/jupyter-agent) |
 - Jupyter AI - ai.readthedocs.io/en/latest/) \| [GitHub](https://github.com/jupyterlab/jupyter-ai) |
 - PandasAI - ai.com/) \| [GitHub](https://github.com/sinaptik-ai/pandas-ai) |
 - WrenAI - source GenBI AI Agent. Text2SQL made Easy! | [Documentation](https://docs.getwren.ai/oss/overview/introduction) \| [GitHub](https://github.com/Canner/WrenAI) |
 - Google GenAI Toolbox for Databases - source server that makes it easier to build Gen AI tools for interacting with databases. | [Blog](https://cloud.google.com/blog/products/ai-machine-learning/announcing-gen-ai-toolbox-for-databases-get-started-today) \| [Documentation](https://googleapis.github.io/genai-toolbox/getting-started/introduction/) \| [GitHub](https://github.com/googleapis/genai-toolbox) |
 - Vanna AI - ai/vanna) |
 
 - 
            
Web Parsing (HTML) and Web Crawling
- Firecrawl - ready markdown or structured data. Scrape, crawl, and extract with a single API. | [Documentation](https://docs.firecrawl.dev/) \| [GitHub](https://github.com/mendableai/firecrawl) |
 - Scrapling - Fast, and Adaptive Web Scraping for Python. | [GitHub](https://github.com/D4Vinci/Scrapling) |
 - GPT Crawler - gpt) \| [GitHub](https://github.com/BuilderIO/gpt-crawler) |
 - Gitingest
 - Crawl4AI - source, blazing-fast, AI-ready web crawling tailored for LLMs, AI agents, and data pipelines. | [Documentation](https://crawl4ai.com/mkdocs/) \| [GitHub](https://github.com/unclecode/crawl4ai) |
 - ScrapeGraphAI - ai) |
 
 - 
            
AI Frameworks (Build Your Own)
- Pocket Flow - line minimalist LLM framework for Agents, Task Decomposition, RAG, etc. | [Documentation](https://the-pocket.github.io/PocketFlow/) \| [GitHub](https://github.com/The-Pocket/PocketFlow) |
 - Google GenAI - genai/) \| [GitHub](https://github.com/googleapis/python-genai) |
 - Agency Swarm - source agent orchestration framework built on top of the latest OpenAI Assistants API. | [Documentation](https://vrsen.github.io/agency-swarm/) \| [GitHub](https://github.com/VRSEN/agency-swarm) |
 - LlamaIndex Workflows - workflows-beta-a-new-way-to-create-complex-ai-applications-with-llamaindex) |
 - LlamaIndex - augmented generative AI applications with LLMs. | [Documentation](https://docs.llamaindex.ai/) \| [GitHub](https://github.com/run-llama/llama_index) |
 - Pydantic AI - grade applications with Generative AI less painful. | [GitHub](https://github.com/pydantic/pydantic-ai) |
 - CrewAI
 - AutoGen
 - FlatAI - ai) |
 - Llama Stack - stack.readthedocs.io/en/latest/index.html) \| [GitHub](https://github.com/meta-llama/llama-stack) |
 - Haystack - source AI orchestration framework for building customizable, production-ready LLM applications. | [Documentation](https://docs.haystack.deepset.ai/docs) \| [GitHub](https://github.com/deepset-ai/haystack) |
 - Agency Swarm - source agent orchestration framework built on top of the latest OpenAI Assistants API. | [Documentation](https://vrsen.github.io/agency-swarm/) \| [GitHub](https://github.com/VRSEN/agency-swarm) |
 - AutoAgent - automated and highly self-developing framework that enables users to create and deploy LLM agents through natural language alone. | [GitHub](https://github.com/HKUDS/AutoAgent) |
 - Legion - agnostic framework designed to simplify the creation of sophisticated multi-agent systems. | [Documentation](https://legion.llmp.io/docs) \| [GitHub](https://github.com/LLMP-io/Legion) |
 
 - 
            
Huggingface Ecosystem
- Huggingface - source platform for machine learning (ML) and artificial intelligence (AI) tools and models. | [Documentation](https://huggingface.co/docs) |
 - Sentence Transformers - to Python module for accessing, using, and training state-of-the-art text and image embedding models. | [Documentation](https://sbert.net/) |
 
 - 
            
Prompt Improvement
- Microsoft PromptWizard - Aware Prompt Optimization Framework. | [GitHub](https://github.com/microsoft/PromptWizard) |
 - Promptify
 - AutoPrompt - based Prompt Calibration. | [GitHub](https://github.com/Eladlev/AutoPrompt) |
 
 - 
            
Table of Contents
- Nir Diamant GenAI Agents Hub
 - AI Engineering Hub - world AI agent applications, LLM and RAG tutorials, with examples to implement. | [GitHub](https://github.com/patchy631/ai-engineering-hub/tree/main) |
 - AI Hedge Fund - powered hedge fund. | [GitHub](https://github.com/virattt/ai-hedge-fund) |
 - AI Financial Agent - financial-agent) |
 - Awesome LLM Apps - By-Step Tutorials. | [GitHub](https://github.com/Shubhamsaboo/awesome-llm-apps) |
 - Structured Report Generation (LangGraph) - to-end process of report planning, web research, and writing. Produces reports of varying and easily configurable formats. | [Video](https://www.youtube.com/watch?v=E04rFNtwFcA) \| [Blog](https://blog.langchain.dev/structured-report-generation-blueprint/) \| [Code](https://github.com/langchain-ai/langchain-nvidia/blob/main/cookbook/structured_report_generation.ipynb) |
 - Uber QueryGPT - TW/blog/query-gpt/) |
 - StockChat - source alternative to Perplexity Finance. | [GitHub](https://github.com/clchinkc/stockchat) |
 
 - 
            
Testing and Monitoring (Observability)
- MLflow Tracing and Evaluation - evaluation/index.html) \| [GitHub](https://github.com/mlflow/mlflow) |
 - Opik - source platform for evaluating, testing, and monitoring LLM applications. | [GitHub](https://github.com/comet-ml/opik) |
 - LangSmith - grade LLM applications. It allows you to closely monitor and evaluate your application, so you can quickly and confidently ship. | [Documentation](https://docs.smith.langchain.com/) \| [GitHub](https://github.com/langchain-ai/langsmith-sdk) |
 - Langfuse
 
 - 
            
Other
- AI Agent Service Toolkit - service-toolkit.streamlit.app/) \| [GitHub](https://github.com/JoshuaC215/agent-service-toolkit) |
 - AI Suite
 - AdalFlow - optimize LLM applications, from Chatbot, RAG, to Agent by SylphAI. | [GitHub](https://github.com/SylphAI-Inc/AdalFlow) |
 - dspy
 - LiteLLM
 - Microsoft Tiny Troupe - powered multiagent persona simulation for imagination enhancement and business insights. | [GitHub](https://github.com/microsoft/TinyTroupe) |
 - Distributed Llama - llama) |
 
 - 
            
LLMOps
- MLflow
 - LLMOps - python-package) |
 - Helicone - source LLM observability platform for developers to monitor, debug, and improve production-ready applications. | [Documentation](https://docs.helicone.ai/) \| [GitHub](https://github.com/Helicone/helicone) |
 - Agenta - source LLMOps platform: prompt playground, prompt management, LLM evaluation, and LLM Observability all in one place. | [Documentation](https://docs.agenta.ai/) |
 - LangWatch - click. Drag and drop interface for LLMOps platform. | [Documentation](https://docs.langwatch.ai/) \| [GitHub](https://github.com/langwatch/langwatch) |
 
 - 
            
Agents and Tools (Build Your Own)
- Google Agent Development Kit (ADK) - source, code-first Python toolkit for building, evaluating, and deploying sophisticated AI agents with flexibility and control. | [Documentation](https://google.github.io/adk-docs/) \| [GitHub](https://github.com/google/adk-python) |
 - AutoGen AgentChat - guide/agentchat-user-guide/quickstart.html) |
 - LangChain Agents
 - LangChain Tools
 - smolagents
 - Agentarium - source framework for creating and managing simulations populated with AI-powered agents. It provides an intuitive platform for designing complex, interactive environments where agents can act, learn, and evolve. | [GitHub](https://github.com/Thytu/Agentarium) |
 
 - 
            
RAG in R
- Microsoft Azure AI Services
 - Google Vertex AI
 - AWS Bedrock
 - Microsoft Azure AI Services
 - Google Vertex AI
 - Ragnar - Augmented Generation (RAG) workflows. | [Website](https://tidyverse.github.io/ragnar/) |
 
 - 
            
Building AI
 - 
            
Microsoft Azure
- Azure Generative AI Examples - examples/tree/main/sdk/python/generative-ai) |
 - Microsoft Generative AI for Beginners - ai-for-beginners) |
 - Azure Generative AI Examples - examples/tree/main/sdk/python/generative-ai) |
 
 - 
            
LLM Providers
- Ollama
 - Anthropic Claude - sdk-python) |
 - Google Gemini - gemini/generative-ai-python) |
 - Grok - python) |
 - Hugging Face Models
 - OpenAI - python) |
 - OpenAI Agents - agent workflows. | [GitHub](https://github.com/openai/openai-agents-python) |
 - Google Gemini - gemini/generative-ai-python) |
 - Grok - python) |
 
 - 
            
Vector Databases (RAG)
- FAISS
 - NVIDIA NIM - host GPU-accelerated inferencing microservices for pretrained and customized AI models across clouds, data centers, and workstations.
 - ChromaDB - core/chroma) |
 - Pinecone - io/pinecone-python-client) |
 - FAISS
 - Milvus - source vector database built to power embedding similarity search and AI applications. | [GitHub](https://github.com/milvus-io/milvus) |
 - NVIDIA NIM - host GPU-accelerated inferencing microservices for pretrained and customized AI models across clouds, data centers, and workstations.
 - Qdrant - Performance Vector Search at Scale. | [Website](https://qdrant.tech/) |
 - ChromaDB - core/chroma) |
 - Pinecone - io/pinecone-python-client) |
 - Milvus - source vector database built to power embedding similarity search and AI applications. | [GitHub](https://github.com/milvus-io/milvus) |
 - SQLite Vec - vec) |
 
 - 
            
LangChain Ecosystem
- LangGraph - actor applications with LLMs, used to create agent and multi-agent workflows. | [Documentation](https://langchain-ai.github.io/langgraph/) \| [Tutorials](https://github.com/langchain-ai/langgraph/tree/main/docs/docs/tutorials) |
 - LangChain - ai/langchain) \| [Cookbook](https://github.com/langchain-ai/langchain/tree/master/cookbook) |
 
 - 
            
LLM Models
 - 
            
AI LLM Frameworks
- LangChain
 - LlamaIndex - augmented generative AI applications with LLMs.
 - LlamaIndex Workflows - complex AI application we see our users building.
 - LangGraph - actor applications with LLMs, used to create agent and multi-agent workflows.
 - LlamaIndex - augmented generative AI applications with LLMs.
 
 - 
            
Cookbooks and Examples:
- LangChain Cookbook - to-end examples.
 
 - 
            
Amazon Web Services (AWS)
 - 
            
Google Cloud Platform (GCP)
 - 
            
Cloud Examples:
- Amazon Bedrock Workshop
 - Google Vertex AI Examples
 - NVIDIA NIM Anywhere - sized labs and up to production environments.
 - NVIDIA NIM Deploy
 
 - 
            
8-Week AI Bootcamp by Business Science
 - 
            
NVIDIA
- NVIDIA NIM Anywhere - sized labs and up to production environments. | [GitHub](https://github.com/NVIDIA/nim-anywhere) |
 - NVIDIA NIM Deploy - deploy) |
 - Python AI/ML Tips - science/free-ai-tips) |
 - unwind ai
 
 - 
            
LLM Models and Providers
 - 
            
Pretraining
- tinygrad
 - micrograd
 - PyTorch - source machine learning library based on the Torch library, used for applications such as computer vision and natural language processing. | [Website](https://pytorch.org/) |
 - TensorFlow - source machine learning library developed by Google. | [Website](https://www.tensorflow.org/) |
 - JAX - performance computing and automatic differentiation. | [GitHub](https://github.com/jax-ml/jax) |
 
 - 
            
Fine-tuning
- Transformers - Hugging Face Transformers is a popular library for Natural Language Processing (NLP) tasks, including fine-tuning large language models.
 - Unsloth - 3.5 & Gemma 2-5x faster with 80% less memory! | [GitHub](https://github.com/unslothai/unsloth) |
 - LitGPT - performance LLMs with recipes to pretrain, finetune, and deploy at scale. | [GitHub](https://github.com/Lightning-AI/litgpt) |
 - AutoTrain - tuning of LLMs and other machine learning tasks. | [GitHub](https://github.com/huggingface/autotrain-advanced) |
 
 - 
            
Document Parsing
- Docling by IBM
 - Markitdown by Microsoft
 - Embedchain - started/quickstart) \| [GitHub](https://github.com/mem0ai/mem0/tree/main/embedchain) |
 - DocETL - powered data processing and ETL. | [Documentation](https://ucbepic.github.io/docetl/) \| [GitHub](https://github.com/ucbepic/docetl) |
 - LangChain Document Loaders
 - Unstructured.io - tuning. | [Documentation](https://docs.unstructured.io/welcome) \| [GitHub](https://github.com/Unstructured-IO/unstructured) \| [Paper](https://www.iarpa.gov/images/PropsersDayPDFs/BENGAL/Unstructured.io%20Federal%20Capabilities%20Statement%20for%20IARPA.pdf) |
 
 - 
            
LLM Memory
- Memary
 - Memobase - Based Memory for GenAI Apps. | [Documentation](https://docs.memobase.io/introduction) \| [GitHub](https://github.com/memodb-io/memobase) |
 - Mem0 - improving memory layer for LLM applications, enabling personalized AI experiences that save costs and delight users. | [Documentation](https://docs.mem0.ai/) \| [GitHub](https://github.com/mem0ai/mem0) |
 
 - 
            
Miscellaneous
- Pyspur - Based Editor for LLM Workflows
 - Browser-Use
 - AWS Bedrock - performing foundation models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon
 
 - 
            
AI Frameworks (Drag and Drop)
- AutoGen Studio - code interface to rapidly prototype AI agents, enhance them with tools, compose them into teams, and interact with them to accomplish tasks. Built on AutoGen AgentChat. | [Documentation](https://microsoft.github.io/autogen/stable/user-guide/autogenstudio-user-guide/index.html) |
 - LangGraph Studio - ai/langgraph-studio) |
 - Pyspur - Based Editor for LLM Workflows. | [Documentation](https://docs.pyspur.dev/introduction) \| [GitHub](https://github.com/PySpur-Dev/PySpur) |
 - n8n - code workflow automation platform with native AI capabilities. Combine visual building with custom code, self-host or cloud, 400+ integrations. | [Documentation](https://docs.n8n.io/) \| [GitHub](https://github.com/n8n-io/n8n) |
 - Langflow - code tool that makes building powerful AI agents and workflows that can use any API, model, or database easier. | [Documentation](https://docs.langflow.org/) \| [GitHub](https://github.com/langflow-ai/langflow) |
 
 - 
            
Open Source LLM Models
- DeepSeek-R1
 - Qwen
 - Llama - llama/llama) |
 - DeepSeek-R1 - ai/DeepSeek-R1) |
 - Qwen
 
 - 
            
Code Sandbox (Security)
- AutoGen Docker Code Executor
 - E2B - source runtime for executing AI-generated code in secure cloud sandboxes. Made for agentic & AI use cases. | [Documentation](https://e2b.dev/docs) \| [GitHub](https://github.com/e2b-dev/e2b) |
 
 - 
            
Browser Control Agents
- Browser-Use - use.com/) \| [GitHub](https://github.com/browser-use/browser-use) |
 - WebUI - use` functionalities. This UI is designed to be user-friendly and enables easy interaction with the browser agent. | [GitHub](https://github.com/browser-use/web-ui) |
 - WebRover - powered web agent that combines autonomous browsing with advanced research capabilities. | [GitHub](https://github.com/hrithikkoduri/WebRover) |
 
 - 
            
Curated Python AI, Data Science, and ML Compilations
- Best of ML Python - tooling/best-of-ml-python) |
 - Awesome Python Data Science - python-data-science) |
 - LLM Engineer Toolkit - NLP/llm-engineer-toolkit) |
 - Awesome Production Machine Learning - production-machine-learning) |
 - Awesome AI Agents - dev/awesome-ai-agents) |
 
 - 
            
LangGraph Extensions
- LangGraph Prebuilt Agents - ai.github.io/langgraph/prebuilt/) |
 - LangMem - term memory. | [GitHub](https://github.com/langchain-ai/langmem) |
 - LangGraph Supervisor - agent systems using LangGraph. | [GitHub](https://github.com/langchain-ai/langgraph-supervisor) |
 - Open Deep Research - source assistant that automates research and produces customizable reports on any topic. | [GitHub](https://github.com/langchain-ai/open_deep_research) |
 - LangGraph Reflection - style architecture to check and improve an initial agent's output. | [GitHub](https://github.com/langchain-ai/langgraph-reflection) |
 - LangGraph Big Tool - ai/langgraph-bigtool) |
 - LangGraph CodeAct - ai/langgraph-codeact) |
 - LangGraph Swarm - style multi-agent systems using LangGraph. Agents dynamically hand off control to one another based on their specializations. | [GitHub](https://github.com/langchain-ai/langgraph-swarm-py) |
 - LangChain MCP Adapters - ai/langchain-mcp-adapters) |
 - AI Data Science Team - powered data science team of agents to help you perform common data science tasks 10X faster. | [GitHub](https://github.com/business-science/ai-data-science-team) |
 
 - 
            
Paid Courses
 - 
            
Huggingface Platform
 - 
            
Agents and Tools (Prebuilt)
- Composio
 - Agno (Formerly Phidata) - source platform to build, ship and monitor agentic systems. | [Documentation](https://docs.agno.com/) \| [GitHub](https://github.com/agno-agi/agno) |
 
 - 
            
Coding Agents
- Qwen-Agent - Agent/tree/main/docs) \| [Examples](https://github.com/QwenLM/Qwen-Agent/tree/main/examples) \| [GitHub](https://github.com/QwenLM/Qwen-Agent) |
 
 - 
            
Deep Research Agents
- HuggingFace OpenDeepResearch - deep-research) \| [Example](https://github.com/huggingface/smolagents/blob/gaia-submission-r1/examples/open_deep_research/visual_vs_text_browser.ipynb) \| [GitHub](https://github.com/huggingface/smolagents/tree/gaia-submission-r1/examples/open_deep_research) |
 - OpenDeepResearcher
 
 - 
            
Other Popular Interfaces to LLM Models in R
- tidychatmodels - rapp.de/) |
 - tidyllm - compatible APIs. | [Website](https://edubruell.github.io/tidyllm/) |
 - gemini.R
 - ollama-r - r/) |
 - rollama
 - chatgpt
 - groqR - fast LPU (Language Processing Unit) technology directly to your R workflow. | [Website](https://gabrielkaiserqfin.github.io/groqR) |
 - gptstudio
 - llmR
 
 - 
            
Curated AI, ML, Data Science Lists
- LLM tools for R - book/r-pkgs.html) |
 
 - 
            
Ellmer-Verse
- ellmer
 - hellmer
 - chores - to-automate tasks quickly. | [Documentation](https://simonpcouch.github.io/chores/) |
 - ggpal
 - gander - performance and low-friction chat experience for data scientists in RStudio and Positron–sort of like completions with Copilot, but it knows how to talk to the objects in your R environment. | [Documentation](https://simonpcouch.github.io/gander/) |
 
 - 
            
mlverse
 
            Programming Languages
          
          
        
            Categories
          
          
              
                Free Training
                18
              
              
                AI Frameworks (Build Your Own)
                14
              
              
                Vector Databases (RAG)
                12
              
              
                LangGraph Extensions
                10
              
              
                LLM Providers
                9
              
              
                Other Popular Interfaces to LLM Models in R
                9
              
              
                Table of Contents
                8
              
              
                Building AI
                7
              
              
                Other
                7
              
              
                Data Science And AI Agents
                7
              
              
                Web Parsing (HTML) and Web Crawling
                6
              
              
                Agents and Tools (Build Your Own)
                6
              
              
                Document Parsing
                6
              
              
                RAG in R
                6
              
              
                Curated Python AI, Data Science, and ML Compilations
                5
              
              
                LLMOps
                5
              
              
                Ellmer-Verse
                5
              
              
                AI Frameworks (Drag and Drop)
                5
              
              
                Pretraining
                5
              
              
                Open Source LLM Models
                5
              
              
                AI LLM Frameworks
                5
              
              
                Fine-tuning
                4
              
              
                NVIDIA
                4
              
              
                Cloud Examples:
                4
              
              
                Testing and Monitoring (Observability)
                4
              
              
                LLM Memory
                3
              
              
                mlverse
                3
              
              
                Microsoft Azure
                3
              
              
                Prompt Improvement
                3
              
              
                Browser Control Agents
                3
              
              
                Miscellaneous
                3
              
              
                Agents and Tools (Prebuilt)
                2
              
              
                LLM Models
                2
              
              
                LangChain Ecosystem
                2
              
              
                Google Cloud Platform (GCP)
                2
              
              
                Deep Research Agents
                2
              
              
                Huggingface Ecosystem
                2
              
              
                8-Week AI Bootcamp by Business Science
                2
              
              
                Code Sandbox (Security)
                2
              
              
                Huggingface Platform
                1
              
              
                Amazon Web Services (AWS)
                1
              
              
                Curated AI, ML, Data Science Lists
                1
              
              
                LLM Models and Providers
                1
              
              
                Cookbooks and Examples:
                1
              
              
                Coding Agents
                1
              
              
                Paid Courses
                1
              
          
        
            Sub Categories
          
          
        
            Keywords
          
          
              
                llm
                28
              
              
                python
                17
              
              
                ai
                15
              
              
                llms
                15
              
              
                openai
                12
              
              
                agents
                11
              
              
                generative-ai
                10
              
              
                rag
                10
              
              
                langchain
                8
              
              
                machine-learning
                7
              
              
                chatgpt
                6
              
              
                llama3
                6
              
              
                r
                6
              
              
                llama
                5
              
              
                gemini
                5
              
              
                genai
                5
              
              
                gemini-api
                5
              
              
                framework
                5
              
              
                prompt-engineering
                5
              
              
                deep-learning
                5
              
              
                large-language-models
                5
              
              
                golang
                4
              
              
                vector-database
                4
              
              
                agent
                4
              
              
                nlp
                4
              
              
                data-science
                4
              
              
                gpt-3
                4
              
              
                gpt
                4
              
              
                vertex-ai
                4
              
              
                faiss
                3
              
              
                llm-inference
                3
              
              
                llama2
                3
              
              
                llmops
                3
              
              
                mistral
                3
              
              
                ollama
                3
              
              
                qwen
                3
              
              
                awesome
                3
              
              
                google
                3
              
              
                gemma3
                3
              
              
                fine-tuning
                3
              
              
                gemma
                3
              
              
                deepseek
                3
              
              
                mlops
                3
              
              
                data
                3
              
              
                langgraph
                3
              
              
                ml
                3
              
              
                gpt-4
                3
              
              
                vertexai
                3
              
              
                automl
                3
              
              
                scikit-learn
                2