awesome-generative-ai-data-scientist
A curated list of 100+ resources for building and deploying generative AI specifically focusing on helping you become a Generative AI Data Scientist with LLMs
https://github.com/business-science/awesome-generative-ai-data-scientist
Last synced: 1 day ago
JSON representation
-
Data Science And AI Agents
- Microsoft Data Formulator - formulator) |
- Jupyter Agent - agents/jupyter-agent) |
- Jupyter AI - ai.readthedocs.io/en/latest/) \| [GitHub](https://github.com/jupyterlab/jupyter-ai) |
- PandasAI - ai.com/) \| [GitHub](https://github.com/sinaptik-ai/pandas-ai) |
- WrenAI - source GenBI AI Agent. Text2SQL made Easy! | [Documentation](https://docs.getwren.ai/oss/overview/introduction) \| [GitHub](https://github.com/Canner/WrenAI) |
- Google GenAI Toolbox for Databases - source server that makes it easier to build Gen AI tools for interacting with databases. | [Blog](https://cloud.google.com/blog/products/ai-machine-learning/announcing-gen-ai-toolbox-for-databases-get-started-today) \| [Documentation](https://googleapis.github.io/genai-toolbox/getting-started/introduction/) \| [GitHub](https://github.com/googleapis/genai-toolbox) |
- Vanna AI - ai/vanna) |
-
Web Parsing (HTML) and Web Crawling
- Scrapling - Fast, and Adaptive Web Scraping for Python. | [GitHub](https://github.com/D4Vinci/Scrapling) |
- Firecrawl - ready markdown or structured data. Scrape, crawl, and extract with a single API. | [Documentation](https://docs.firecrawl.dev/) \| [GitHub](https://github.com/mendableai/firecrawl) |
- GPT Crawler - gpt) \| [GitHub](https://github.com/BuilderIO/gpt-crawler) |
- Gitingest
- Crawl4AI - source, blazing-fast, AI-ready web crawling tailored for LLMs, AI agents, and data pipelines. | [Documentation](https://crawl4ai.com/mkdocs/) \| [GitHub](https://github.com/unclecode/crawl4ai) |
- ScrapeGraphAI - ai) |
-
LLM Memory
- Memobase - Based Memory for GenAI Apps. [Documentation](https://docs.memobase.io/introduction) | [GitHub](https://github.com/memodb-io/memobase)
- Memary
- Mem0 - improving memory layer for LLM applications, enabling personalized AI experiences that save costs and delight users. | [Documentation](https://docs.mem0.ai/) \| [GitHub](https://github.com/mem0ai/mem0) |
-
AI Frameworks (Build Your Own)
- Pocket Flow - line minimalist LLM framework for Agents, Task Decomposition, RAG, etc. | [Documentation](https://the-pocket.github.io/PocketFlow/) \| [GitHub](https://github.com/The-Pocket/PocketFlow) |
- Google GenAI - genai/) \| [GitHub](https://github.com/googleapis/python-genai) |
- LlamaIndex Workflows - workflows-beta-a-new-way-to-create-complex-ai-applications-with-llamaindex) |
- LlamaIndex - augmented generative AI applications with LLMs. | [Documentation](https://docs.llamaindex.ai/) \| [GitHub](https://github.com/run-llama/llama_index) |
- CrewAI
- AutoGen
- Pydantic AI - grade applications with Generative AI less painful. | [GitHub](https://github.com/pydantic/pydantic-ai) |
- FlatAI - ai) |
- Llama Stack - stack.readthedocs.io/en/latest/index.html) \| [GitHub](https://github.com/meta-llama/llama-stack) |
- Haystack - source AI orchestration framework for building customizable, production-ready LLM applications. | [Documentation](https://docs.haystack.deepset.ai/docs) \| [GitHub](https://github.com/deepset-ai/haystack) |
- Agency Swarm - source agent orchestration framework built on top of the latest OpenAI Assistants API. | [Documentation](https://vrsen.github.io/agency-swarm/) \| [GitHub](https://github.com/VRSEN/agency-swarm) |
- AutoAgent - automated and highly self-developing framework that enables users to create and deploy LLM agents through natural language alone. | [GitHub](https://github.com/HKUDS/AutoAgent) |
- Legion - agnostic framework designed to simplify the creation of sophisticated multi-agent systems. | [Documentation](https://legion.llmp.io/docs) \| [GitHub](https://github.com/LLMP-io/Legion) |
-
Huggingface Ecosystem
- Huggingface - source platform for machine learning (ML) and artificial intelligence (AI) tools and models. | [Documentation](https://huggingface.co/docs) |
- Sentence Transformers - to Python module for accessing, using, and training state-of-the-art text and image embedding models. | [Documentation](https://sbert.net/) |
-
Prompt Improvement
- Microsoft PromptWizard - Aware Prompt Optimization Framework. | [GitHub](https://github.com/microsoft/PromptWizard) |
- Promptify
- AutoPrompt - based Prompt Calibration. | [GitHub](https://github.com/Eladlev/AutoPrompt) |
-
Free Training
-
NVIDIA
- Generative AI Data Scientist Workshops - science.io/ai-register)
- Generative AI Data Scientist Workshops - science.io/ai-register)
- Generative AI Data Scientist Workshops - science.io/ai-register)
- Generative AI Data Scientist Workshops - science.io/ai-register)
- Generative AI Data Scientist Workshops - science.io/ai-register)
- Generative AI Data Scientist Workshops - science.io/ai-register)
- Generative AI Data Scientist Workshops - science.io/ai-register)
-
-
Table of Contents
- Nir Diamant GenAI Agents Hub
- AI Engineering Hub - world AI agent applications, LLM and RAG tutorials, with examples to implement. | [GitHub](https://github.com/patchy631/ai-engineering-hub/tree/main) |
- AI Hedge Fund - powered hedge fund. | [GitHub](https://github.com/virattt/ai-hedge-fund) |
- AI Financial Agent - financial-agent) |
- Awesome LLM Apps - By-Step Tutorials. | [GitHub](https://github.com/Shubhamsaboo/awesome-llm-apps) |
- Structured Report Generation (LangGraph) - to-end process of report planning, web research, and writing. Produces reports of varying and easily configurable formats. | [Video](https://www.youtube.com/watch?v=E04rFNtwFcA) \| [Blog](https://blog.langchain.dev/structured-report-generation-blueprint/) \| [Code](https://github.com/langchain-ai/langchain-nvidia/blob/main/cookbook/structured_report_generation.ipynb) |
- Uber QueryGPT - TW/blog/query-gpt/) |
- StockChat - source alternative to Perplexity Finance. | [GitHub](https://github.com/clchinkc/stockchat) |
-
LLMOps
- LangWatch - click. Drag and drop interface for LLMOps platform. | [Documentation](https://docs.langwatch.ai/) \| [GitHub](https://github.com/langwatch/langwatch) |
- MLflow
- LLMOps - python-package) |
- Helicone - source LLM observability platform for developers to monitor, debug, and improve production-ready applications. | [Documentation](https://docs.helicone.ai/) \| [GitHub](https://github.com/Helicone/helicone) |
- Agenta - source LLMOps platform: prompt playground, prompt management, LLM evaluation, and LLM Observability all in one place. | [Documentation](https://docs.agenta.ai/) |
-
Testing and Monitoring (Observability)
- MLflow Tracing and Evaluation - evaluation/index.html) \| [GitHub](https://github.com/mlflow/mlflow) |
- Opik - source platform for evaluating, testing, and monitoring LLM applications. | [GitHub](https://github.com/comet-ml/opik) |
- LangSmith - grade LLM applications. It allows you to closely monitor and evaluate your application, so you can quickly and confidently ship. | [Documentation](https://docs.smith.langchain.com/) \| [GitHub](https://github.com/langchain-ai/langsmith-sdk) |
- Langfuse
-
Other
- AI Agent Service Toolkit - service-toolkit.streamlit.app/) \| [GitHub](https://github.com/JoshuaC215/agent-service-toolkit) |
- AI Suite
- AdalFlow - optimize LLM applications, from Chatbot, RAG, to Agent by SylphAI. | [GitHub](https://github.com/SylphAI-Inc/AdalFlow) |
- dspy
- LiteLLM
- Microsoft Tiny Troupe - powered multiagent persona simulation for imagination enhancement and business insights. | [GitHub](https://github.com/microsoft/TinyTroupe) |
- Distributed Llama - llama) |
-
Agents and Tools (Build Your Own)
- Google Agent Development Kit (ADK) - source, code-first Python toolkit for building, evaluating, and deploying sophisticated AI agents with flexibility and control. | [Documentation](https://google.github.io/adk-docs/) \| [GitHub](https://github.com/google/adk-python) |
- AutoGen AgentChat - guide/agentchat-user-guide/quickstart.html) |
- smolagents
- LangChain Agents
- LangChain Tools
- Agentarium - source framework for creating and managing simulations populated with AI-powered agents. It provides an intuitive platform for designing complex, interactive environments where agents can act, learn, and evolve. | [GitHub](https://github.com/Thytu/Agentarium) |
-
RAG in R
- Microsoft Azure AI Services
- Google Vertex AI
- AWS Bedrock
- Microsoft Azure AI Services
- Google Vertex AI
- Ragnar - Augmented Generation (RAG) workflows. | [Website](https://tidyverse.github.io/ragnar/) |
-
Building AI
-
Microsoft Azure
- Azure Generative AI Examples - examples/tree/main/sdk/python/generative-ai) |
- Microsoft Generative AI for Beginners - ai-for-beginners) |
- Microsoft Intro to Generative AI Course - us/training/paths/introduction-generative-ai/) |
- Azure Generative AI Examples - examples/tree/main/sdk/python/generative-ai) |
-
LLM Providers
- Meta Llama Models - source AI model you can fine-tune, distill, and deploy anywhere. | [Meta](https://llama.meta.com/) |
- Ollama
- Anthropic Claude - sdk-python) |
- Google Gemini - gemini/generative-ai-python) |
- Grok - python) |
- OpenAI - python) |
- Hugging Face Models
- OpenAI Agents - agent workflows. | [GitHub](https://github.com/openai/openai-agents-python) |
- Google Gemini - gemini/generative-ai-python) |
- Grok - python) |
-
Vector Databases (RAG)
- FAISS
- NVIDIA NIM - host GPU-accelerated inferencing microservices for pretrained and customized AI models across clouds, data centers, and workstations.
- ChromaDB - core/chroma) |
- FAISS
- Pinecone - io/pinecone-python-client) |
- Milvus - source vector database built to power embedding similarity search and AI applications. | [GitHub](https://github.com/milvus-io/milvus) |
- NVIDIA NIM - host GPU-accelerated inferencing microservices for pretrained and customized AI models across clouds, data centers, and workstations.
- Qdrant - Performance Vector Search at Scale. | [Website](https://qdrant.tech/) |
- ChromaDB - core/chroma) |
- Pinecone - io/pinecone-python-client) |
- Milvus - source vector database built to power embedding similarity search and AI applications. | [GitHub](https://github.com/milvus-io/milvus) |
- SQLite Vec - vec) |
-
LLM Models
-
AI LLM Frameworks
- LangChain
- LlamaIndex - augmented generative AI applications with LLMs.
- LlamaIndex Workflows - complex AI application we see our users building.
- LangGraph - actor applications with LLMs, used to create agent and multi-agent workflows.
- LlamaIndex - augmented generative AI applications with LLMs.
-
LangChain Ecosystem
- LangGraph - actor applications with LLMs, used to create agent and multi-agent workflows. | [Documentation](https://langchain-ai.github.io/langgraph/) \| [Tutorials](https://github.com/langchain-ai/langgraph/tree/main/docs/docs/tutorials) |
- LangChain - ai/langchain) \| [Cookbook](https://github.com/langchain-ai/langchain/tree/master/cookbook) |
-
Cookbooks and Examples:
- LangChain Cookbook - to-end examples.
-
Amazon Web Services (AWS)
-
Cloud Examples:
- Amazon Bedrock Workshop
- Google Vertex AI Examples
- NVIDIA NIM Anywhere - sized labs and up to production environments.
- NVIDIA NIM Deploy
-
Google Cloud Platform (GCP)
-
NVIDIA
- NVIDIA NIM Anywhere - sized labs and up to production environments. | [GitHub](https://github.com/NVIDIA/nim-anywhere) |
- NVIDIA NIM Deploy - deploy) |
- Python AI/ML Tips - science/free-ai-tips) |
- unwind ai
-
8-Week AI Bootcamp by Business Science
-
LLM Models and Providers
-
Pretraining
- tinygrad
- micrograd
- PyTorch - source machine learning library based on the Torch library, used for applications such as computer vision and natural language processing. | [Website](https://pytorch.org/) |
- TensorFlow - source machine learning library developed by Google. | [Website](https://www.tensorflow.org/) |
- JAX - performance computing and automatic differentiation. | [GitHub](https://github.com/jax-ml/jax) |
-
Fine-tuning
- Transformers - Hugging Face Transformers is a popular library for Natural Language Processing (NLP) tasks, including fine-tuning large language models.
- Unsloth - 3.5 & Gemma 2-5x faster with 80% less memory! | [GitHub](https://github.com/unslothai/unsloth) |
- LitGPT - performance LLMs with recipes to pretrain, finetune, and deploy at scale. | [GitHub](https://github.com/Lightning-AI/litgpt) |
- AutoTrain - tuning of LLMs and other machine learning tasks. | [GitHub](https://github.com/huggingface/autotrain-advanced) |
-
Document Parsing
- Embedchain - started/quickstart) \| [GitHub](https://github.com/mem0ai/mem0/tree/main/embedchain) |
- Docling by IBM
- Markitdown by Microsoft
- DocETL - powered data processing and ETL. | [Documentation](https://ucbepic.github.io/docetl/) \| [GitHub](https://github.com/ucbepic/docetl) |
- LangChain Document Loaders
- Unstructured.io - tuning. | [Documentation](https://docs.unstructured.io/welcome) \| [GitHub](https://github.com/Unstructured-IO/unstructured) \| [Paper](https://www.iarpa.gov/images/PropsersDayPDFs/BENGAL/Unstructured.io%20Federal%20Capabilities%20Statement%20for%20IARPA.pdf) |
-
Miscellaneous
- Pyspur - Based Editor for LLM Workflows
- Browser-Use
- AWS Bedrock - performing foundation models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon
-
AI Frameworks (Drag and Drop)
- AutoGen Studio - code interface to rapidly prototype AI agents, enhance them with tools, compose them into teams, and interact with them to accomplish tasks. Built on AutoGen AgentChat. | [Documentation](https://microsoft.github.io/autogen/stable/user-guide/autogenstudio-user-guide/index.html) |
- LangGraph Studio - ai/langgraph-studio) |
- Pyspur - Based Editor for LLM Workflows. | [Documentation](https://docs.pyspur.dev/introduction) \| [GitHub](https://github.com/PySpur-Dev/PySpur) |
- n8n - code workflow automation platform with native AI capabilities. Combine visual building with custom code, self-host or cloud, 400+ integrations. | [Documentation](https://docs.n8n.io/) \| [GitHub](https://github.com/n8n-io/n8n) |
- Langflow - code tool that makes building powerful AI agents and workflows that can use any API, model, or database easier. | [Documentation](https://docs.langflow.org/) \| [GitHub](https://github.com/langflow-ai/langflow) |
-
Open Source LLM Models
- DeepSeek-R1
- Qwen
- Llama - llama/llama) |
- DeepSeek-R1 - ai/DeepSeek-R1) |
- Qwen
-
Code Sandbox (Security)
- AutoGen Docker Code Executor
- E2B - source runtime for executing AI-generated code in secure cloud sandboxes. Made for agentic & AI use cases. | [Documentation](https://e2b.dev/docs) \| [GitHub](https://github.com/e2b-dev/e2b) |
-
Browser Control Agents
- Browser-Use - use.com/) \| [GitHub](https://github.com/browser-use/browser-use) |
- WebUI - use` functionalities. This UI is designed to be user-friendly and enables easy interaction with the browser agent. | [GitHub](https://github.com/browser-use/web-ui) |
- WebRover - powered web agent that combines autonomous browsing with advanced research capabilities. | [GitHub](https://github.com/hrithikkoduri/WebRover) |
-
Curated Python AI, Data Science, and ML Compilations
- Best of ML Python - tooling/best-of-ml-python) |
- Awesome Python Data Science - python-data-science) |
- LLM Engineer Toolkit - NLP/llm-engineer-toolkit) |
- Awesome Production Machine Learning - production-machine-learning) |
- Awesome AI Agents - dev/awesome-ai-agents) |
-
LangGraph Extensions
- LangGraph Prebuilt Agents - ai.github.io/langgraph/prebuilt/) |
- LangMem - term memory. | [GitHub](https://github.com/langchain-ai/langmem) |
- LangGraph Supervisor - agent systems using LangGraph. | [GitHub](https://github.com/langchain-ai/langgraph-supervisor) |
- AI Data Science Team - powered data science team of agents to help you perform common data science tasks 10X faster. | [GitHub](https://github.com/business-science/ai-data-science-team) |
- Open Deep Research - source assistant that automates research and produces customizable reports on any topic. | [GitHub](https://github.com/langchain-ai/open_deep_research) |
- LangGraph Reflection - style architecture to check and improve an initial agent's output. | [GitHub](https://github.com/langchain-ai/langgraph-reflection) |
- LangGraph Big Tool - ai/langgraph-bigtool) |
- LangGraph CodeAct - ai/langgraph-codeact) |
- LangGraph Swarm - style multi-agent systems using LangGraph. Agents dynamically hand off control to one another based on their specializations. | [GitHub](https://github.com/langchain-ai/langgraph-swarm-py) |
- LangChain MCP Adapters - ai/langchain-mcp-adapters) |
-
Paid Courses
-
Huggingface Platform
-
Agents and Tools (Prebuilt)
- Phidata - source platform to build, ship and monitor agentic systems. [Documentation](https://docs.phidata.com/) | [Github](https://github.com/phidatahq/phidata)
- Composio
- Agno (Formerly Phidata) - source platform to build, ship and monitor agentic systems. | [Documentation](https://docs.agno.com/) \| [GitHub](https://github.com/agno-agi/agno) |
-
Coding Agents
- Qwen-Agent - Agent/tree/main/docs) \| [Examples](https://github.com/QwenLM/Qwen-Agent/tree/main/examples) \| [GitHub](https://github.com/QwenLM/Qwen-Agent) |
-
Deep Research Agents
- HuggingFace OpenDeepResearch - deep-research) \| [Example](https://github.com/huggingface/smolagents/blob/gaia-submission-r1/examples/open_deep_research/visual_vs_text_browser.ipynb) \| [GitHub](https://github.com/huggingface/smolagents/tree/gaia-submission-r1/examples/open_deep_research) |
- OpenDeepResearcher
-
Other Popular Interfaces to LLM Models in R
- tidychatmodels - rapp.de/) |
- tidyllm - compatible APIs. | [Website](https://edubruell.github.io/tidyllm/) |
- gemini.R
- ollama-r - r/) |
- rollama
- chatgpt
- groqR - fast LPU (Language Processing Unit) technology directly to your R workflow. | [Website](https://gabrielkaiserqfin.github.io/groqR) |
- gptstudio
- llmR
-
Curated AI, ML, Data Science Lists
- LLM tools for R - book/r-pkgs.html) |
-
Ellmer-Verse
- ellmer
- hellmer
- chores - to-automate tasks quickly. | [Documentation](https://simonpcouch.github.io/chores/) |
- ggpal
- gander - performance and low-friction chat experience for data scientists in RStudio and Positron–sort of like completions with Copilot, but it knows how to talk to the objects in your R environment. | [Documentation](https://simonpcouch.github.io/gander/) |
-
mlverse
Programming Languages
Categories
AI Frameworks (Build Your Own)
13
Vector Databases (RAG)
12
LLM Providers
10
LangGraph Extensions
10
Other Popular Interfaces to LLM Models in R
9
Free Training
8
Table of Contents
8
Other
7
Building AI
7
Data Science And AI Agents
7
Web Parsing (HTML) and Web Crawling
6
Agents and Tools (Build Your Own)
6
Document Parsing
6
RAG in R
6
LLMOps
5
Curated Python AI, Data Science, and ML Compilations
5
AI LLM Frameworks
5
Ellmer-Verse
5
AI Frameworks (Drag and Drop)
5
Pretraining
5
Open Source LLM Models
5
Microsoft Azure
4
NVIDIA
4
Fine-tuning
4
Cloud Examples:
4
Testing and Monitoring (Observability)
4
LLM Memory
3
Agents and Tools (Prebuilt)
3
mlverse
3
Prompt Improvement
3
Browser Control Agents
3
Miscellaneous
3
LLM Models
2
LangChain Ecosystem
2
Google Cloud Platform (GCP)
2
Deep Research Agents
2
Huggingface Ecosystem
2
8-Week AI Bootcamp by Business Science
2
Code Sandbox (Security)
2
Huggingface Platform
1
Amazon Web Services (AWS)
1
Curated AI, ML, Data Science Lists
1
LLM Models and Providers
1
Coding Agents
1
Cookbooks and Examples:
1
Paid Courses
1
Sub Categories
Keywords
llm
27
python
16
llms
15
openai
11
ai
11
rag
10
generative-ai
10
agents
8
langchain
8
machine-learning
6
r
6
chatgpt
6
llama3
6
genai
5
prompt-engineering
5
gemini-api
5
llama
5
deep-learning
5
large-language-models
5
gemini
5
vertex-ai
4
golang
4
agent
4
framework
4
nlp
4
gpt
4
gpt-3
4
gpt-4
4
vector-database
4
llm-inference
4
llmops
4
data-science
4
mistral
3
ollama
3
phi3
3
google
3
llama2
3
vertexai
3
gemma3
3
gemma2
3
gemma
3
deepseek
3
fine-tuning
3
automl
3
langgraph
3
faiss
3
mlops
3
awesome
3
ml
3
data-analysis
2