An open API service indexing awesome lists of open source software.

https://github.com/business-science/ai-data-science-team

An AI-powered data science team of agents to help you perform common data science tasks 10X faster.
https://github.com/business-science/ai-data-science-team

agents ai ai-engineer ai-engineering copilot data-science data-scientist generative-ai gpt machine-learning ml-engineer ml-engineering openai

Last synced: 14 days ago
JSON representation

An AI-powered data science team of agents to help you perform common data science tasks 10X faster.

Awesome Lists containing this project

README

        




AI Data Science Team




An AI-powered data science team of agents to help you perform common data science tasks 10X faster


PyPI
versions
license
GitHub Repo stars

# Your AI Data Science Team (🪖 An Army Of Agents)

**An AI-powered data science team of agents to help you perform common data science tasks 10X faster**.

[**Please ⭐ us on GitHub (it takes 2 seconds and means a lot).**](https://github.com/business-science/ai-data-science-team)

*Beta - This Python library is under active development. There may be breaking changes that occur until release of 0.1.0.*

---

The AI Data Science Team of Copilots includes Agents that specialize data cleaning, preparation, feature engineering, modeling (machine learning), and interpretation of various business problems like:

- Churn Modeling
- Employee Attrition
- Lead Scoring
- Insurance Risk
- Credit Card Risk
- And more

## Table of Contents

- [Your AI Data Science Team (🪖 An Army Of Agents)](#your-ai-data-science-team--an-army-of-agents)
- [Table of Contents](#table-of-contents)
- [Companies That Want A Custom AI Data Science Team (And AI Apps)](#companies-that-want-a-custom-ai-data-science-team-and-ai-apps)
- [Generative AI for Data Scientists Workshop](#generative-ai-for-data-scientists-workshop)
- [Data Science Agents](#data-science-agents)
- [🔥 NEW: Data Science Apps](#-new-data-science-apps)
- [NEW: Multi-Agents](#new-multi-agents)
- [🔥 Agentic Applications](#-agentic-applications)
- [Agents Available Now](#agents-available-now)
- [Standard Agents](#standard-agents)
- [🔥🔥 NEW! Machine Learning Agents](#-new-machine-learning-agents)
- [🔥 NEW! Data Science Agents](#-new-data-science-agents)
- [Multi-Agents](#multi-agents)
- [Agents Coming Soon](#agents-coming-soon)
- [Disclaimer](#disclaimer)
- [Installation](#installation)
- [Usage](#usage)
- [Example: H2O Machine Learning Agent](#example-h2o-machine-learning-agent)
- [Contributing](#contributing)
- [License](#license)
- [Want To Become A Full-Stack Generative AI Data Scientist?](#want-to-become-a-full-stack-generative-ai-data-scientist)
- [⭐️ Star History](#️-star-history)

## Companies That Want A Custom AI Data Science Team (And AI Apps)

Want to have your own _customized_ enterprise-grade AI Data Science Team and *domain-specific* AI-powered Apps?

**Send inquiries here:** [https://www.business-science.io/contact.html](https://www.business-science.io/contact.html)

## Generative AI for Data Scientists Workshop

If you're an aspiring data scientist who wants to learn how to build AI Agents and AI Apps for your company that performs Data Science, Business Intelligence, Churn Modeling, Time Series Forecasting, and more, then I'd love to help you.

[**Register for my next Generative AI for Data Scientists workshop here.**](https://learn.business-science.io/ai-register)

## Data Science Agents

This project is a work in progress. New data science agents will be released soon.

![AI Data Science Team](/img/ai_data_science_team.jpg)

### 🔥 NEW: Data Science Apps

**🔥 Open Pandas AI Data Analyst:** Load an Excel or CSV file and ask it questions. Get data and charts back.

![Pandas Data Analyst App](/img/apps/ai_pandas_data_analyst_app.jpg)

**🔥 SQL Database Agent:** Connects any SQL Database, generates SQL queries from natural language, and returns data as a downloadable table.

**🔥 Exploratory Data Copilot:** An AI-powered data science app that performs automated exploratory data analysis (EDA) with EDA Reporting, Missing Data Analysis, Correlation Analysis, and more.

[See all available apps here](/apps)

### NEW: Multi-Agents

**🔥 Pandas Data Analyst Agent:** Combines the ability to wrangle, transform, and analyze data with an optional data visualization agent that can create interactive plots.

![Pandas Data Analyst Agent](/img/multi_agent_pandas_data_analyst.jpg)

#### 🔥 Agentic Applications

1. **NEW Exploratory Data Copilot**: An AI-powered data science app that performs automated exploratory data analysis (EDA) with EDA Reporting, Missing Data Analysis, Correlation Analysis, and more. [See Application](/apps/exploratory-copilot-app/)

![Exploratory Data Copilot](/img/apps/ai_exploratory_copilot.jpg)

2. **SQL Database Agent App:** Connects any SQL Database, generates SQL queries from natural language, and returns data as a downloadable table. [See Application](/apps/sql-database-agent-app/)

### Agents Available Now

#### Standard Agents

1. **Data Wrangling Agent:** Merges, Joins, Preps and Wrangles data into a format that is ready for data analysis. [See Example](https://github.com/business-science/ai-data-science-team/blob/master/examples/data_wrangling_agent.ipynb)
2. **Data Visualization Agent:** Creates visualizations to help you understand your data. Returns JSON serializable plotly visualizations. [See Example](https://github.com/business-science/ai-data-science-team/blob/master/examples/data_visualization_agent.ipynb)
3. **🔥 Data Cleaning Agent:** Performs Data Preparation steps including handling missing values, outliers, and data type conversions. [See Example](https://github.com/business-science/ai-data-science-team/blob/master/examples/data_cleaning_agent.ipynb)
4. **Feature Engineering Agent:** Converts the prepared data into ML-ready data. Adds features to increase predictive accuracy of ML models. [See Example](https://github.com/business-science/ai-data-science-team/blob/master/examples/feature_engineering_agent.ipynb)
5. **🔥 SQL Database Agent:** Connects to SQL databases to pull data into the data science environment. Creates pipelines to automate data extraction. Performs Joins, Aggregations, and other SQL Query operations. [See Example](https://github.com/business-science/ai-data-science-team/blob/master/examples/sql_database_agent.ipynb)
6. **🔥 Data Loader Tools Agent:** Loads data from various sources including CSV, Excel, Parquet, and Pickle files. [See Example](https://github.com/business-science/ai-data-science-team/blob/master/examples/data_loader_tools_agent.ipynb)

#### 🔥🔥 NEW! Machine Learning Agents

1. **🔥 H2O Machine Learning Agent:** Builds and logs 100's of high-performance machine learning models. [See Example](https://github.com/business-science/ai-data-science-team/blob/master/examples/ml_agents/h2o_machine_learning_agent.ipynb)
2. **🔥 MLflow Tools Agent (MLOps):** This agent has 11+ tools for managing models, ML projects, and making production ML predictions with MLflow. [See Example](https://github.com/business-science/ai-data-science-team/blob/master/examples/ml_agents/mlflow_tools_agent.ipynb)

#### 🔥 NEW! Data Science Agents

1. **🔥🔥 EDA Tools Agent:** Performs automated exploratory data analysis (EDA) with EDA Reporting, Missing Data Analysis, Correlation Analysis, and more. [See Example](https://github.com/business-science/ai-data-science-team/blob/master/examples/ds_agents/eda_tools_agent.ipynb)

#### Multi-Agents

1. **🔥🔥 Pandas Data Analyst Agent:** Combines the ability to wrangle, transform, and analyze data with an optional data visualization agent that can create interactive plots. [See Example](https://github.com/business-science/ai-data-science-team/blob/master/examples/multiagents/pandas_data_analyst.ipynb)
2. **🔥🔥 SQL Data Analyst Agent:** Connects to SQL databases to pull data into the data science environment. Creates pipelines to automate data extraction. Performs Joins, Aggregations, and other SQL Query operations. Includes a Data Visualization Agent that creates visualizations to help you understand your data. [See Example](https://github.com/business-science/ai-data-science-team/blob/master/examples/multiagents/sql_data_analyst.ipynb)

### Agents Coming Soon

1. **Data Analyst:** Analyzes data structure, creates exploratory visualizations, and performs correlation analysis to identify relationships.
2. **Interpretability Agent:** Performs Interpretable ML to explain why the model returned predictions including which features were the most important to the model.
3. **Supervisor:** Forms task list. Moderates sub-agents. Returns completed assignment.

## Disclaimer

**This project is for educational purposes only.**

- It is not intended to replace your company's data science team
- No warranties or guarantees provided
- Creator assumes no liability for financial loss
- Consult an experienced Generative AI Data Scientist for building your own custom AI Data Science Team
- If you want a custom enterprise-grade AI Data Science Team, [send inquiries here](https://www.business-science.io/contact.html).

By using this software, you agree to use it solely for learning purposes.

## Installation

You can install via PyPI (note that this is a beta version and breaking changes may occur until 0.1.0):

``` bash
pip install ai-data-science-team
```

Or, if you want the latest version from GitHub:

``` bash
pip install git+https://github.com/business-science/ai-data-science-team.git --upgrade
```

## Usage

[See all examples here.](/examples)

### Example: H2O Machine Learning Agent

[See the full example here.](https://github.com/business-science/ai-data-science-team/blob/master/examples/ml_agents/h2o_machine_learning_agent.ipynb)

``` python
# Import libraries
from langchain_openai import ChatOpenAI
import pandas as pd
import h2o
import os
from ai_data_science_team.ml_agents import H2OMLAgent

# Load the data
df = pd.read_csv("data/churn_data.csv")
df

# Initialize the language model
os.environ['OPENAI_API_KEY'] = "YOUR_OPENAI_API_KEY"
llm = ChatOpenAI(model=MODEL)
llm

# Initialize the H2O ML Agent
ml_agent = H2OMLAgent(
model=llm,
log=True,
log_path="logs/",
model_directory="h2o_models/",
enable_mlflow=True, # Use this if you wish to log models to MLflow
)
ml_agent

# Run the agent
ml_agent.invoke_agent(
data_raw=df.drop(columns=["customerID"]),
user_instructions="Please do classification on 'Churn'. Use a max runtime of 30 seconds.",
target_variable="Churn"
)

# Retrieve and display the leaderboard of models
ml_agent.get_leaderboard()
```

## Contributing

1. Fork the repository
2. Create a feature branch
3. Commit your changes
4. Push to the branch
5. Create a Pull Request

## License

This project is licensed under the MIT License. See LICENSE file for details.

# Want To Become A Full-Stack Generative AI Data Scientist?

![Generative AI Data Scientist](/img/become_a_generative_ai_data_scientist.jpg)

I teach Generative AI Data Science to help you build AI-powered data science apps. [**Register for my next Generative AI for Data Scientists workshop here.**](https://learn.business-science.io/ai-register)

# ⭐️ Star History

[![Star History Chart](https://api.star-history.com/svg?repos=business-science/ai-data-science-team&type=Date)](https://star-history.com/#)

[**Please ⭐ us on GitHub (it takes 2 seconds and means a lot).**](https://github.com/business-science/ai-data-science-team)