Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

awesome-generative-ai-data-scientist

A curated list of 100+ resources for building and deploying generative AI specifically focusing on helping you become a Generative AI Data Scientist with LLMs
https://github.com/business-science/awesome-generative-ai-data-scientist

Last synced: 4 days ago
JSON representation

  • Contents:

  • LLMOps

    • LangWatch - click. Drag and drop interface for LLMOps platform. [Documentation](https://docs.langwatch.ai/) | [GitHub](https://github.com/langwatch/langwatch)
    • MLflow
    • LLMOps
    • Agenta - source LLMOps platform: prompt playground, prompt management, LLM evaluation, and LLM Observability all in one place. [Documentation](https://docs.agenta.ai/)
    • Helicone - source LLM observability platform for developers to monitor, debug, and improve production-ready applications. [Documentation](https://docs.helicone.ai/) | [Github](https://github.com/Helicone/helicone)
  • Testing and Monitoring (Observability)

    • MLflow Tracing and Evaluation - evaluation/index.html) | [GitHub](https://github.com/mlflow/mlflow)
    • Opik - source platform for evaluating, testing and monitoring LLM applications
    • LangSmith - grade LLM applications. It allows you to closely monitor and evaluate your application, so you can quickly and confidently ship. [Documentation](https://docs.smith.langchain.com/) | [Github](https://github.com/langchain-ai/langsmith-sdk)
  • Web Parsing (HTML) and Crawlers

    • GPT Crawler - gpt) | [Github](https://github.com/BuilderIO/gpt-crawler)
    • Gitingest
    • ScrapeGraphAI - ai)
    • Crawl4AI - source, blazing-fast, AI-ready web crawling tailored for LLMs, AI agents, and data pipelines. [Documentation](https://crawl4ai.com/mkdocs/) | [Github](https://github.com/unclecode/crawl4ai)
  • Miscellaneous

    • AI Agent Service Toolkit - service-toolkit.streamlit.app/) | [GitHub](https://github.com/JoshuaC215/agent-service-toolkit)
    • Microsoft Azure AI Services - edge, market-ready, and responsible applications with out-of-the-box and prebuilt and customizable APIs and models.
    • Google Vertex AI - managed, unified AI development platform for building and using generative AI.
    • AWS Bedrock - performing foundation models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon
    • AdalFlow - optimize LLM applications, from Chatbot, RAG, to Agent by SylphAI.
    • dspy
    • AutoPrompt - based Prompt Calibration.
    • PromptFify
    • LiteLLM
    • Jupyter Agent
    • Jupyter AI - ai.readthedocs.io/en/latest/)
    • Pyspur - Based Editor for LLM Workflows
    • Browser-Use
    • AWS Bedrock - performing foundation models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon
    • AI Suite
  • Free Training

  • LLM Providers

  • AI Frameworks (Build Your Own)

    • LlamaIndex Workflows - complex AI application we see our users building.
    • LlamaIndex - augmented generative AI applications with LLMs. [Documentation](https://docs.llamaindex.ai/) | [Github](https://github.com/run-llama/llama_index)
    • CrewAI
    • AutoGen
    • Pydantic AI - grade applications with Generative AI less painful. [Github](https://github.com/pydantic/pydantic-ai)
    • FlatAI
    • Llama Stack - stack.readthedocs.io/en/latest/index.html) | [GitHub](https://github.com/meta-llama/llama-stack)
    • Haystack - ready LLM applications. [Documentation](https://docs.haystack.deepset.ai/docs) | [GitHub](https://github.com/deepset-ai/haystack)
  • Vector Databases (RAG)

    • ChromaDB
    • FAISS
    • Pinecone
    • Milvus - source vector database built to power embedding similarity search and AI applications.
    • NVIDIA NIM - host GPU-accelerated inferencing microservices for pretrained and customized AI models across clouds, data centers, and workstations.
    • ChromaDB
    • FAISS
    • Pinecone
    • Milvus - source vector database built to power embedding similarity search and AI applications.
    • Microsoft Azure AI Services - edge, market-ready, and responsible applications with out-of-the-box and prebuilt and customizable APIs and models.
    • Google Vertex AI - managed, unified AI development platform for building and using generative AI.
    • NVIDIA NIM - host GPU-accelerated inferencing microservices for pretrained and customized AI models across clouds, data centers, and workstations.
    • Qdrant - Performance Vector Search at Scale
  • Building AI

  • Deploying AI

  • LLM Models

  • AI LLM Frameworks

    • LangChain
    • LlamaIndex - augmented generative AI applications with LLMs.
    • LlamaIndex Workflows - complex AI application we see our users building.
    • LangGraph - actor applications with LLMs, used to create agent and multi-agent workflows.
    • LlamaIndex - augmented generative AI applications with LLMs.
  • LangChain Platform

    • LangGraph - actor applications with LLMs, used to create agent and multi-agent workflows. [Documentation](https://langchain-ai.github.io/langgraph/) [Tutorials](https://github.com/langchain-ai/langgraph/tree/main/docs/docs/tutorials)
    • LangChain - ai/langchain) [Cookbook](https://github.com/langchain-ai/langchain/tree/master/cookbook)
  • Cookbooks and Examples:

  • Cloud Examples:

  • 8-Week AI Bootcamp by Business Science

  • Data Science And AI Agents

    • AI Data Science Team In Python - science/ai-data-science-team/tree/master/examples) | [Github](https://github.com/business-science/ai-data-science-team)
    • PandasAI - ai.com/) | [Github](https://github.com/sinaptik-ai/pandas-ai)
  • AI Frameworks (Drag and Drop)

    • Langflow - code tool that makes building powerful AI agents and workflows that can use any API, model, or database easier. [Documentation](https://docs.langflow.org/) | [Github](https://github.com/langflow-ai/langflow)
    • AutoGen Studio - code interface built to help you rapidly prototype AI agents, enhance them with tools, compose them into teams and interact with them to accomplish tasks. It is built on AutoGen AgentChat - a high-level API for building multi-agent applications.
    • LangGraph Studio
    • Pyspur - Based Editor for LLM Workflows [Documentation]() | [Github](https://github.com/PySpur-Dev/PySpur)
    • n8n - code workflow automation platform with native AI capabilities. Combine visual building with custom code, self-host or cloud, 400+ integrations. [Documentation](https://docs.n8n.io/) | [GitHub](https://github.com/n8n-io/n8n)
  • LLM Models and Providers

  • Huggingface Platform

    • Huggingface - source platform for machine learning (ML) and artificial intelligence (AI) tools and models. [Documentation](https://huggingface.co/docs)
    • Tokenizers
    • Sentence Transformers - to Python module for accessing, using, and training state-of-the-art text and image embedding models.
  • Pretraining

    • PyTorch - source machine learning library based on the Torch library, used for applications such as computer vision and natural language processing.
    • TensorFlow - source machine learning library developed by Google.
    • JAX - performance computing and automatic differentiation.
    • tinygrad
    • micrograd
  • Fine-tuning

    • Transformers - Hugging Face Transformers is a popular library for Natural Language Processing (NLP) tasks, including fine-tuning large language models.
    • Unsloth - 3.5 & Gemma 2-5x faster with 80% less memory!
    • LitGPT - performance LLMs with recipes to pretrain, finetune, and deploy at scale.
    • AutoTrain - tuning of LLMs and other machine learning tasks.
  • Document Parsing

  • LLM Memory

    • Mem0 - improving memory layer for LLM applications, enabling personalized AI experiences that save costs and delight users. [Documentation](https://docs.mem0.ai/) | [Github](https://github.com/mem0ai/mem0)
    • Memary
  • Paid Courses

  • Open Source LLM Models

  • Agents and Tools (Build Your Own)

  • Code Sandbox (Security)

    • AutoGen Docker Code Executor
    • E2B - source runtime for executing AI-generated code in secure cloud sandboxes. Made for agentic & AI use cases. [Documentation](https://e2b.dev/docs) | [Github]([https://github.com/e2b-dev](https://github.com/e2b-dev/e2b))
  • Browser Control Agents

    • Browser-Use - use.com/) | [GitHub](https://github.com/browser-use/browser-use)
    • WebUI - use` functionalities. This UI is designed to be user-friendly and enables easy interaction with the browser agent. [GitHub](https://github.com/browser-use/web-ui)
  • Data Science and Machine Learning

  • Agents and Tools (Prebuilt)

    • Phidata - source platform to build, ship and monitor agentic systems. [Documentation](https://docs.phidata.com/) | [Github](https://github.com/phidatahq/phidata)
    • Composio
    • Agno (Formerly Phidata) - source platform to build, ship and monitor agentic systems. [Documentation](https://docs.agno.com/) | [Github](https://github.com/agno-agi/agno)
  • Coding Agents

    • Qwen-Agent - Agent/tree/main/docs) | [Examples](https://github.com/QwenLM/Qwen-Agent/tree/main/examples) | [Github](https://github.com/QwenLM/Qwen-Agent)
  • Deep Research Agents

    • HuggingFace OpenDeepResearch - deep-research) | [Example](https://github.com/huggingface/smolagents/blob/gaia-submission-r1/examples/open_deep_research/visual_vs_text_browser.ipynb) | [GitHub](https://github.com/huggingface/smolagents/tree/gaia-submission-r1/examples/open_deep_research)
    • OpenDeepResearcher