Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
awesome-generative-ai-data-scientist
A curated list of 100+ resources for building and deploying generative AI specifically focusing on helping you become a Generative AI Data Scientist with LLMs
https://github.com/business-science/awesome-generative-ai-data-scientist
Last synced: 5 days ago
JSON representation
-
Contents:
- Nir Diamant GenAI Agents
- AI Engineering Hub - world AI agent applications, LLM and RAG tutorials, with examples to implement. [GitHub](https://github.com/patchy631/ai-engineering-hub/tree/main)
- AI Hedge Fund - powered hedge fund
- AI Financial Agent
- Awesome LLM Apps - By-Step Tutorials
- Strutured Report Generation (LangGraph) - to-end process of report planning, web research, and writing. We show that this agent can produce reports of varying and easily configurable format. [Video](https://www.youtube.com/watch?v=E04rFNtwFcA) | [Blog](https://blog.langchain.dev/structured-report-generation-blueprint/) | [Code](https://github.com/langchain-ai/langchain-nvidia/blob/main/cookbook/structured_report_generation.ipynb)
- Uber QueryGPT
-
LLMOps
- LangWatch - click. Drag and drop interface for LLMOps platform. [Documentation](https://docs.langwatch.ai/) | [GitHub](https://github.com/langwatch/langwatch)
- MLflow
- LLMOps
- Agenta - source LLMOps platform: prompt playground, prompt management, LLM evaluation, and LLM Observability all in one place. [Documentation](https://docs.agenta.ai/)
- Helicone - source LLM observability platform for developers to monitor, debug, and improve production-ready applications. [Documentation](https://docs.helicone.ai/) | [Github](https://github.com/Helicone/helicone)
-
Testing and Monitoring (Observability)
- MLflow Tracing and Evaluation - evaluation/index.html) | [GitHub](https://github.com/mlflow/mlflow)
- Opik - source platform for evaluating, testing and monitoring LLM applications
- LangSmith - grade LLM applications. It allows you to closely monitor and evaluate your application, so you can quickly and confidently ship. [Documentation](https://docs.smith.langchain.com/) | [Github](https://github.com/langchain-ai/langsmith-sdk)
-
Web Parsing (HTML) and Crawlers
- GPT Crawler - gpt) | [Github](https://github.com/BuilderIO/gpt-crawler)
- Gitingest
- ScrapeGraphAI - ai)
- Crawl4AI - source, blazing-fast, AI-ready web crawling tailored for LLMs, AI agents, and data pipelines. [Documentation](https://crawl4ai.com/mkdocs/) | [Github](https://github.com/unclecode/crawl4ai)
-
Miscellaneous
- AI Agent Service Toolkit - service-toolkit.streamlit.app/) | [GitHub](https://github.com/JoshuaC215/agent-service-toolkit)
- Microsoft Azure AI Services - edge, market-ready, and responsible applications with out-of-the-box and prebuilt and customizable APIs and models.
- Google Vertex AI - managed, unified AI development platform for building and using generative AI.
- AWS Bedrock - performing foundation models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon
- AdalFlow - optimize LLM applications, from Chatbot, RAG, to Agent by SylphAI.
- dspy
- AutoPrompt - based Prompt Calibration.
- PromptFify
- LiteLLM
- Jupyter Agent
- Jupyter AI - ai.readthedocs.io/en/latest/)
- Pyspur - Based Editor for LLM Workflows
- Browser-Use
- AWS Bedrock - performing foundation models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon
- AI Suite
-
Free Training
-
NVIDIA
- Generative AI Data Scientist Workshops - science.io/ai-register)
- Generative AI Data Scientist Workshops - science.io/ai-register)
- Generative AI Data Scientist Workshops - science.io/ai-register)
-
-
LLM Providers
- Meta Llama Models - tune, distill and deploy anywhere.
- Google Gemini
- Ollama
- Grok
- Anthropic Claude
- OpenAI
- Hugging Face Models
-
AI Frameworks (Build Your Own)
- LlamaIndex Workflows - complex AI application we see our users building.
- LlamaIndex - augmented generative AI applications with LLMs. [Documentation](https://docs.llamaindex.ai/) | [Github](https://github.com/run-llama/llama_index)
- CrewAI
- AutoGen
- Pydantic AI - grade applications with Generative AI less painful. [Github](https://github.com/pydantic/pydantic-ai)
- FlatAI
- Llama Stack - stack.readthedocs.io/en/latest/index.html) | [GitHub](https://github.com/meta-llama/llama-stack)
-
Vector Databases (RAG)
- ChromaDB
- FAISS
- Pinecone
- Milvus - source vector database built to power embedding similarity search and AI applications.
- NVIDIA NIM - host GPU-accelerated inferencing microservices for pretrained and customized AI models across clouds, data centers, and workstations.
- ChromaDB
- FAISS
- Pinecone
- Milvus - source vector database built to power embedding similarity search and AI applications.
- Microsoft Azure AI Services - edge, market-ready, and responsible applications with out-of-the-box and prebuilt and customizable APIs and models.
- Google Vertex AI - managed, unified AI development platform for building and using generative AI.
- NVIDIA NIM - host GPU-accelerated inferencing microservices for pretrained and customized AI models across clouds, data centers, and workstations.
- Qdrant - Performance Vector Search at Scale
-
Building AI
- LangChain Cookbook - to-end examples.
- LangGraph Examples
- Llama Index Examples
- Streamlit LLM Examples
-
Deploying AI
-
Amazon Web Services (AWS)
-
Google Cloud Platform (GCP)
-
NVIDIA
- NVIDIA NIM Anywhere - sized labs and up to production environments.
- NVIDIA NIM Deploy
- Python AI/ML Tips
- unwind ai
-
Microsoft Azure
- Microsoft Generative AI for Beginners - ai-for-beginners)
- Microsoft Intro to Generative AI Course
-
-
LLM Models
-
AI LLM Frameworks
- LangChain
- LlamaIndex - augmented generative AI applications with LLMs.
- LlamaIndex Workflows - complex AI application we see our users building.
- LangGraph - actor applications with LLMs, used to create agent and multi-agent workflows.
- LlamaIndex - augmented generative AI applications with LLMs.
-
LangChain Platform
- LangGraph - actor applications with LLMs, used to create agent and multi-agent workflows. [Documentation](https://langchain-ai.github.io/langgraph/) [Tutorials](https://github.com/langchain-ai/langgraph/tree/main/docs/docs/tutorials)
- LangChain - ai/langchain) [Cookbook](https://github.com/langchain-ai/langchain/tree/master/cookbook)
-
Cookbooks and Examples:
- LangChain Cookbook - to-end examples.
- LangGraph Examples
- Llama Index Examples
- Streamlit LLM Examples
-
Cloud Examples:
- Azure Generative AI Examples
- Amazon Bedrock Workshop
- Google Vertex AI Examples
- NVIDIA NIM Anywhere - sized labs and up to production environments.
- NVIDIA NIM Deploy
-
8-Week AI Bootcamp by Business Science
-
Data Science And AI Agents
- AI Data Science Team In Python - science/ai-data-science-team/tree/master/examples) | [Github](https://github.com/business-science/ai-data-science-team)
- PandasAI - ai.com/) | [Github](https://github.com/sinaptik-ai/pandas-ai)
-
AI Frameworks (Drag and Drop)
- Langflow - code tool that makes building powerful AI agents and workflows that can use any API, model, or database easier. [Documentation](https://docs.langflow.org/) | [Github](https://github.com/langflow-ai/langflow)
- AutoGen Studio - code interface built to help you rapidly prototype AI agents, enhance them with tools, compose them into teams and interact with them to accomplish tasks. It is built on AutoGen AgentChat - a high-level API for building multi-agent applications.
- LangGraph Studio
- Pyspur - Based Editor for LLM Workflows [Documentation]() | [Github](https://github.com/PySpur-Dev/PySpur)
-
LLM Models and Providers
-
Huggingface Platform
- Huggingface - source platform for machine learning (ML) and artificial intelligence (AI) tools and models. [Documentation](https://huggingface.co/docs)
- Tokenizers
- Sentence Transformers - to Python module for accessing, using, and training state-of-the-art text and image embedding models.
-
Pretraining
- PyTorch - source machine learning library based on the Torch library, used for applications such as computer vision and natural language processing.
- TensorFlow - source machine learning library developed by Google.
- JAX - performance computing and automatic differentiation.
- tinygrad
- micrograd
-
Fine-tuning
- Transformers - Hugging Face Transformers is a popular library for Natural Language Processing (NLP) tasks, including fine-tuning large language models.
- Unsloth - 3.5 & Gemma 2-5x faster with 80% less memory!
- LitGPT - performance LLMs with recipes to pretrain, finetune, and deploy at scale.
- AutoTrain - tuning of LLMs and other machine learning tasks.
-
Document Parsing
- Embedchain - started/quickstart) [Github Repo](https://github.com/mem0ai/mem0/tree/main/embedchain)
- Docling by IBM
- Markitdown by Microsoft
- DocETL - powered data processing and ETL. [Documentation](https://ucbepic.github.io/docetl/) | [GitHub](https://github.com/ucbepic/docetl)
- LangChain Document Loaders
-
LLM Memory
-
Paid Courses
-
NVIDIA
- 8-Week AI Bootcamp To Become A Generative AI-Data Scientist - Powered Data Science Solutions using LangChain, LangGraph, Pandas, Scikit Learn, Streamlit, AWS, Bedrock, and EC2.
-
-
Open Source LLM Models
- DeepSeek-R1
- Qwen
- Llama - llama/llama)
- DeepSeek-R1 - ai/DeepSeek-R1)
- Qwen
-
Agents and Tools (Build Your Own)
- AutoGen AgentChat
- smolagents
- LangChain Agents
- LangChain Tools
- Agentarium - source framework for creating and managing simulations populated with AI-powered agents. It provides an intuitive platform for designing complex, interactive environments where agents can act, learn, and evolve. [GitHub](https://github.com/Thytu/Agentarium)
-
Code Sandbox (Security)
- AutoGen Docker Code Executor
- E2B - source runtime for executing AI-generated code in secure cloud sandboxes. Made for agentic & AI use cases. [Documentation](https://e2b.dev/docs) | [Github]([https://github.com/e2b-dev](https://github.com/e2b-dev/e2b))
-
Browser Control Agents
- Browser-Use - use.com/) | [GitHub](https://github.com/browser-use/browser-use)
- WebUI - use` functionalities. This UI is designed to be user-friendly and enables easy interaction with the browser agent. [GitHub](https://github.com/browser-use/web-ui)
-
Data Science and Machine Learning
- Best of ML Python
- Awesome Python Data Science - python-data-science)
-
Agents and Tools (Prebuilt)
-
Coding Agents
- Qwen-Agent - Agent/tree/main/docs) | [Examples](https://github.com/QwenLM/Qwen-Agent/tree/main/examples) | [Github](https://github.com/QwenLM/Qwen-Agent)
Programming Languages
Categories
Miscellaneous
15
Vector Databases (RAG)
13
Deploying AI
10
AI Frameworks (Build Your Own)
7
LLM Providers
7
Contents:
7
Agents and Tools (Build Your Own)
5
Open Source LLM Models
5
Pretraining
5
Cloud Examples:
5
LLMOps
5
AI LLM Frameworks
5
Document Parsing
5
Building AI
4
Cookbooks and Examples:
4
AI Frameworks (Drag and Drop)
4
Web Parsing (HTML) and Crawlers
4
LLM Models
4
Fine-tuning
4
Testing and Monitoring (Observability)
3
Free Training
3
Huggingface Platform
3
Browser Control Agents
2
Data Science And AI Agents
2
Data Science and Machine Learning
2
8-Week AI Bootcamp by Business Science
2
Code Sandbox (Security)
2
LangChain Platform
2
LLM Memory
2
Agents and Tools (Prebuilt)
2
Coding Agents
1
LLM Models and Providers
1
Paid Courses
1
Keywords
llm
20
llms
11
python
11
openai
9
ai
8
langchain
8
generative-ai
8
rag
8
agents
7
prompt-engineering
5
machine-learning
5
llama3
5
llama
5
gemini-api
5
gemini
5
genai
5
vector-database
4
golang
4
chatgpt
4
llmops
4
deep-learning
4
large-language-models
4
framework
3
fine-tuning
3
ml
3
gemma2
3
gemma
3
deepseek
3
automl
3
google
3
nlp
3
faiss
3
vertex-ai
3
vertexai
3
llm-inference
3
mistral
3
agent
3
langgraph
3
data-science
3
phi3
3
gpt
3
markdown
2
pdf
2
colab
2
multi-agents
2
llamaindex
2
data
2
application
2
vector-store
2
vector-similarity
2