An open API service indexing awesome lists of open source software.

awesome-generative-ai-data-scientist

A curated list of 100+ resources for building and deploying generative AI specifically focusing on helping you become a Generative AI Data Scientist with LLMs
https://github.com/business-science/awesome-generative-ai-data-scientist

Last synced: 1 day ago
JSON representation

  • Data Science And AI Agents

    • Microsoft Data Formulator - formulator) |
    • Jupyter Agent - agents/jupyter-agent) |
    • Jupyter AI - ai.readthedocs.io/en/latest/) \| [GitHub](https://github.com/jupyterlab/jupyter-ai) |
    • PandasAI - ai.com/) \| [GitHub](https://github.com/sinaptik-ai/pandas-ai) |
    • WrenAI - source GenBI AI Agent. Text2SQL made Easy! | [Documentation](https://docs.getwren.ai/oss/overview/introduction) \| [GitHub](https://github.com/Canner/WrenAI) |
    • Google GenAI Toolbox for Databases - source server that makes it easier to build Gen AI tools for interacting with databases. | [Blog](https://cloud.google.com/blog/products/ai-machine-learning/announcing-gen-ai-toolbox-for-databases-get-started-today) \| [Documentation](https://googleapis.github.io/genai-toolbox/getting-started/introduction/) \| [GitHub](https://github.com/googleapis/genai-toolbox) |
    • Vanna AI - ai/vanna) |
  • Web Parsing (HTML) and Web Crawling

    • Scrapling - Fast, and Adaptive Web Scraping for Python. | [GitHub](https://github.com/D4Vinci/Scrapling) |
    • Firecrawl - ready markdown or structured data. Scrape, crawl, and extract with a single API. | [Documentation](https://docs.firecrawl.dev/) \| [GitHub](https://github.com/mendableai/firecrawl) |
    • GPT Crawler - gpt) \| [GitHub](https://github.com/BuilderIO/gpt-crawler) |
    • Gitingest
    • Crawl4AI - source, blazing-fast, AI-ready web crawling tailored for LLMs, AI agents, and data pipelines. | [Documentation](https://crawl4ai.com/mkdocs/) \| [GitHub](https://github.com/unclecode/crawl4ai) |
    • ScrapeGraphAI - ai) |
  • LLM Memory

    • Memobase - Based Memory for GenAI Apps. [Documentation](https://docs.memobase.io/introduction) | [GitHub](https://github.com/memodb-io/memobase)
    • Memary
    • Mem0 - improving memory layer for LLM applications, enabling personalized AI experiences that save costs and delight users. | [Documentation](https://docs.mem0.ai/) \| [GitHub](https://github.com/mem0ai/mem0) |
  • AI Frameworks (Build Your Own)

    • Pocket Flow - line minimalist LLM framework for Agents, Task Decomposition, RAG, etc. | [Documentation](https://the-pocket.github.io/PocketFlow/) \| [GitHub](https://github.com/The-Pocket/PocketFlow) |
    • Google GenAI - genai/) \| [GitHub](https://github.com/googleapis/python-genai) |
    • LlamaIndex Workflows - workflows-beta-a-new-way-to-create-complex-ai-applications-with-llamaindex) |
    • LlamaIndex - augmented generative AI applications with LLMs. | [Documentation](https://docs.llamaindex.ai/) \| [GitHub](https://github.com/run-llama/llama_index) |
    • CrewAI
    • AutoGen
    • Pydantic AI - grade applications with Generative AI less painful. | [GitHub](https://github.com/pydantic/pydantic-ai) |
    • FlatAI - ai) |
    • Llama Stack - stack.readthedocs.io/en/latest/index.html) \| [GitHub](https://github.com/meta-llama/llama-stack) |
    • Haystack - source AI orchestration framework for building customizable, production-ready LLM applications. | [Documentation](https://docs.haystack.deepset.ai/docs) \| [GitHub](https://github.com/deepset-ai/haystack) |
    • Agency Swarm - source agent orchestration framework built on top of the latest OpenAI Assistants API. | [Documentation](https://vrsen.github.io/agency-swarm/) \| [GitHub](https://github.com/VRSEN/agency-swarm) |
    • AutoAgent - automated and highly self-developing framework that enables users to create and deploy LLM agents through natural language alone. | [GitHub](https://github.com/HKUDS/AutoAgent) |
    • Legion - agnostic framework designed to simplify the creation of sophisticated multi-agent systems. | [Documentation](https://legion.llmp.io/docs) \| [GitHub](https://github.com/LLMP-io/Legion) |
  • Huggingface Ecosystem

    • Huggingface - source platform for machine learning (ML) and artificial intelligence (AI) tools and models. | [Documentation](https://huggingface.co/docs) |
    • Sentence Transformers - to Python module for accessing, using, and training state-of-the-art text and image embedding models. | [Documentation](https://sbert.net/) |
  • Prompt Improvement

    • Microsoft PromptWizard - Aware Prompt Optimization Framework. | [GitHub](https://github.com/microsoft/PromptWizard) |
    • Promptify
    • AutoPrompt - based Prompt Calibration. | [GitHub](https://github.com/Eladlev/AutoPrompt) |
  • Free Training

  • Table of Contents

    • Nir Diamant GenAI Agents Hub
    • AI Engineering Hub - world AI agent applications, LLM and RAG tutorials, with examples to implement. | [GitHub](https://github.com/patchy631/ai-engineering-hub/tree/main) |
    • AI Hedge Fund - powered hedge fund. | [GitHub](https://github.com/virattt/ai-hedge-fund) |
    • AI Financial Agent - financial-agent) |
    • Awesome LLM Apps - By-Step Tutorials. | [GitHub](https://github.com/Shubhamsaboo/awesome-llm-apps) |
    • Structured Report Generation (LangGraph) - to-end process of report planning, web research, and writing. Produces reports of varying and easily configurable formats. | [Video](https://www.youtube.com/watch?v=E04rFNtwFcA) \| [Blog](https://blog.langchain.dev/structured-report-generation-blueprint/) \| [Code](https://github.com/langchain-ai/langchain-nvidia/blob/main/cookbook/structured_report_generation.ipynb) |
    • Uber QueryGPT - TW/blog/query-gpt/) |
    • StockChat - source alternative to Perplexity Finance. | [GitHub](https://github.com/clchinkc/stockchat) |
  • LLMOps

    • LangWatch - click. Drag and drop interface for LLMOps platform. | [Documentation](https://docs.langwatch.ai/) \| [GitHub](https://github.com/langwatch/langwatch) |
    • MLflow
    • LLMOps - python-package) |
    • Helicone - source LLM observability platform for developers to monitor, debug, and improve production-ready applications. | [Documentation](https://docs.helicone.ai/) \| [GitHub](https://github.com/Helicone/helicone) |
    • Agenta - source LLMOps platform: prompt playground, prompt management, LLM evaluation, and LLM Observability all in one place. | [Documentation](https://docs.agenta.ai/) |
  • Testing and Monitoring (Observability)

    • MLflow Tracing and Evaluation - evaluation/index.html) \| [GitHub](https://github.com/mlflow/mlflow) |
    • Opik - source platform for evaluating, testing, and monitoring LLM applications. | [GitHub](https://github.com/comet-ml/opik) |
    • LangSmith - grade LLM applications. It allows you to closely monitor and evaluate your application, so you can quickly and confidently ship. | [Documentation](https://docs.smith.langchain.com/) \| [GitHub](https://github.com/langchain-ai/langsmith-sdk) |
    • Langfuse
  • Other

    • AI Agent Service Toolkit - service-toolkit.streamlit.app/) \| [GitHub](https://github.com/JoshuaC215/agent-service-toolkit) |
    • AI Suite
    • AdalFlow - optimize LLM applications, from Chatbot, RAG, to Agent by SylphAI. | [GitHub](https://github.com/SylphAI-Inc/AdalFlow) |
    • dspy
    • LiteLLM
    • Microsoft Tiny Troupe - powered multiagent persona simulation for imagination enhancement and business insights. | [GitHub](https://github.com/microsoft/TinyTroupe) |
    • Distributed Llama - llama) |
  • Agents and Tools (Build Your Own)

    • Google Agent Development Kit (ADK) - source, code-first Python toolkit for building, evaluating, and deploying sophisticated AI agents with flexibility and control. | [Documentation](https://google.github.io/adk-docs/) \| [GitHub](https://github.com/google/adk-python) |
    • AutoGen AgentChat - guide/agentchat-user-guide/quickstart.html) |
    • smolagents
    • LangChain Agents
    • LangChain Tools
    • Agentarium - source framework for creating and managing simulations populated with AI-powered agents. It provides an intuitive platform for designing complex, interactive environments where agents can act, learn, and evolve. | [GitHub](https://github.com/Thytu/Agentarium) |
  • RAG in R

  • Building AI

  • Microsoft Azure

  • LLM Providers

  • Vector Databases (RAG)

    • FAISS
    • NVIDIA NIM - host GPU-accelerated inferencing microservices for pretrained and customized AI models across clouds, data centers, and workstations.
    • ChromaDB - core/chroma) |
    • FAISS
    • Pinecone - io/pinecone-python-client) |
    • Milvus - source vector database built to power embedding similarity search and AI applications. | [GitHub](https://github.com/milvus-io/milvus) |
    • NVIDIA NIM - host GPU-accelerated inferencing microservices for pretrained and customized AI models across clouds, data centers, and workstations.
    • Qdrant - Performance Vector Search at Scale. | [Website](https://qdrant.tech/) |
    • ChromaDB - core/chroma) |
    • Pinecone - io/pinecone-python-client) |
    • Milvus - source vector database built to power embedding similarity search and AI applications. | [GitHub](https://github.com/milvus-io/milvus) |
    • SQLite Vec - vec) |
  • LLM Models

  • AI LLM Frameworks

    • LangChain
    • LlamaIndex - augmented generative AI applications with LLMs.
    • LlamaIndex Workflows - complex AI application we see our users building.
    • LangGraph - actor applications with LLMs, used to create agent and multi-agent workflows.
    • LlamaIndex - augmented generative AI applications with LLMs.
  • LangChain Ecosystem

    • LangGraph - actor applications with LLMs, used to create agent and multi-agent workflows. | [Documentation](https://langchain-ai.github.io/langgraph/) \| [Tutorials](https://github.com/langchain-ai/langgraph/tree/main/docs/docs/tutorials) |
    • LangChain - ai/langchain) \| [Cookbook](https://github.com/langchain-ai/langchain/tree/master/cookbook) |
  • Cookbooks and Examples:

  • Amazon Web Services (AWS)

  • Cloud Examples:

  • Google Cloud Platform (GCP)

  • NVIDIA

  • 8-Week AI Bootcamp by Business Science

  • LLM Models and Providers

  • Pretraining

    • tinygrad
    • micrograd
    • PyTorch - source machine learning library based on the Torch library, used for applications such as computer vision and natural language processing. | [Website](https://pytorch.org/) |
    • TensorFlow - source machine learning library developed by Google. | [Website](https://www.tensorflow.org/) |
    • JAX - performance computing and automatic differentiation. | [GitHub](https://github.com/jax-ml/jax) |
  • Fine-tuning

    • Transformers - Hugging Face Transformers is a popular library for Natural Language Processing (NLP) tasks, including fine-tuning large language models.
    • Unsloth - 3.5 & Gemma 2-5x faster with 80% less memory! | [GitHub](https://github.com/unslothai/unsloth) |
    • LitGPT - performance LLMs with recipes to pretrain, finetune, and deploy at scale. | [GitHub](https://github.com/Lightning-AI/litgpt) |
    • AutoTrain - tuning of LLMs and other machine learning tasks. | [GitHub](https://github.com/huggingface/autotrain-advanced) |
  • Document Parsing

    • Embedchain - started/quickstart) \| [GitHub](https://github.com/mem0ai/mem0/tree/main/embedchain) |
    • Docling by IBM
    • Markitdown by Microsoft
    • DocETL - powered data processing and ETL. | [Documentation](https://ucbepic.github.io/docetl/) \| [GitHub](https://github.com/ucbepic/docetl) |
    • LangChain Document Loaders
    • Unstructured.io - tuning. | [Documentation](https://docs.unstructured.io/welcome) \| [GitHub](https://github.com/Unstructured-IO/unstructured) \| [Paper](https://www.iarpa.gov/images/PropsersDayPDFs/BENGAL/Unstructured.io%20Federal%20Capabilities%20Statement%20for%20IARPA.pdf) |
  • Miscellaneous

    • Pyspur - Based Editor for LLM Workflows
    • Browser-Use
    • AWS Bedrock - performing foundation models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon
  • AI Frameworks (Drag and Drop)

    • AutoGen Studio - code interface to rapidly prototype AI agents, enhance them with tools, compose them into teams, and interact with them to accomplish tasks. Built on AutoGen AgentChat. | [Documentation](https://microsoft.github.io/autogen/stable/user-guide/autogenstudio-user-guide/index.html) |
    • LangGraph Studio - ai/langgraph-studio) |
    • Pyspur - Based Editor for LLM Workflows. | [Documentation](https://docs.pyspur.dev/introduction) \| [GitHub](https://github.com/PySpur-Dev/PySpur) |
    • n8n - code workflow automation platform with native AI capabilities. Combine visual building with custom code, self-host or cloud, 400+ integrations. | [Documentation](https://docs.n8n.io/) \| [GitHub](https://github.com/n8n-io/n8n) |
    • Langflow - code tool that makes building powerful AI agents and workflows that can use any API, model, or database easier. | [Documentation](https://docs.langflow.org/) \| [GitHub](https://github.com/langflow-ai/langflow) |
  • Open Source LLM Models

  • Code Sandbox (Security)

    • AutoGen Docker Code Executor
    • E2B - source runtime for executing AI-generated code in secure cloud sandboxes. Made for agentic & AI use cases. | [Documentation](https://e2b.dev/docs) \| [GitHub](https://github.com/e2b-dev/e2b) |
  • Browser Control Agents

    • Browser-Use - use.com/) \| [GitHub](https://github.com/browser-use/browser-use) |
    • WebUI - use` functionalities. This UI is designed to be user-friendly and enables easy interaction with the browser agent. | [GitHub](https://github.com/browser-use/web-ui) |
    • WebRover - powered web agent that combines autonomous browsing with advanced research capabilities. | [GitHub](https://github.com/hrithikkoduri/WebRover) |
  • Curated Python AI, Data Science, and ML Compilations

  • LangGraph Extensions

    • LangGraph Prebuilt Agents - ai.github.io/langgraph/prebuilt/) |
    • LangMem - term memory. | [GitHub](https://github.com/langchain-ai/langmem) |
    • LangGraph Supervisor - agent systems using LangGraph. | [GitHub](https://github.com/langchain-ai/langgraph-supervisor) |
    • AI Data Science Team - powered data science team of agents to help you perform common data science tasks 10X faster. | [GitHub](https://github.com/business-science/ai-data-science-team) |
    • Open Deep Research - source assistant that automates research and produces customizable reports on any topic. | [GitHub](https://github.com/langchain-ai/open_deep_research) |
    • LangGraph Reflection - style architecture to check and improve an initial agent's output. | [GitHub](https://github.com/langchain-ai/langgraph-reflection) |
    • LangGraph Big Tool - ai/langgraph-bigtool) |
    • LangGraph CodeAct - ai/langgraph-codeact) |
    • LangGraph Swarm - style multi-agent systems using LangGraph. Agents dynamically hand off control to one another based on their specializations. | [GitHub](https://github.com/langchain-ai/langgraph-swarm-py) |
    • LangChain MCP Adapters - ai/langchain-mcp-adapters) |
  • Paid Courses

  • Huggingface Platform

  • Agents and Tools (Prebuilt)

    • Phidata - source platform to build, ship and monitor agentic systems. [Documentation](https://docs.phidata.com/) | [Github](https://github.com/phidatahq/phidata)
    • Composio
    • Agno (Formerly Phidata) - source platform to build, ship and monitor agentic systems. | [Documentation](https://docs.agno.com/) \| [GitHub](https://github.com/agno-agi/agno) |
  • Coding Agents

    • Qwen-Agent - Agent/tree/main/docs) \| [Examples](https://github.com/QwenLM/Qwen-Agent/tree/main/examples) \| [GitHub](https://github.com/QwenLM/Qwen-Agent) |
  • Deep Research Agents

    • HuggingFace OpenDeepResearch - deep-research) \| [Example](https://github.com/huggingface/smolagents/blob/gaia-submission-r1/examples/open_deep_research/visual_vs_text_browser.ipynb) \| [GitHub](https://github.com/huggingface/smolagents/tree/gaia-submission-r1/examples/open_deep_research) |
    • OpenDeepResearcher
  • Curated AI, ML, Data Science Lists

  • Ellmer-Verse

    • ellmer
    • hellmer
    • chores - to-automate tasks quickly. | [Documentation](https://simonpcouch.github.io/chores/) |
    • ggpal
    • gander - performance and low-friction chat experience for data scientists in RStudio and Positron–sort of like completions with Copilot, but it knows how to talk to the objects in your R environment. | [Documentation](https://simonpcouch.github.io/gander/) |
  • mlverse

    • mall - wise over a specified column. | [Website](https://mlverse.github.io/mall/) |
    • lang - the-fly. | [Website](https://mlverse.github.io/lang/) |
    • chattr
Sub Categories