https://github.com/rvanasa/pandas-gpt

Power up your data science workflow with ChatGPT.
https://github.com/rvanasa/pandas-gpt

chatgpt claude-ai data-cleaning data-engineering data-science data-visualization gemini generative-ai gpt4 jupyter-notebook litellm low-code matplotlib numpy o1 openai pandas productivity scipy seaborn

Last synced: about 2 months ago
JSON representation

Power up your data science workflow with ChatGPT.

Host: GitHub
URL: https://github.com/rvanasa/pandas-gpt
Owner: rvanasa
License: mit
Created: 2023-04-28T03:33:51.000Z (about 2 years ago)
Default Branch: main
Last Pushed: 2025-04-10T23:28:09.000Z (3 months ago)
Last Synced: 2025-05-09T00:05:59.546Z (about 2 months ago)
Topics: chatgpt, claude-ai, data-cleaning, data-engineering, data-science, data-visualization, gemini, generative-ai, gpt4, jupyter-notebook, litellm, low-code, matplotlib, numpy, o1, openai, pandas, productivity, scipy, seaborn
Language: Jupyter Notebook
Homepage: https://pypi.org/project/pandas-gpt
Size: 494 KB
Stars: 58
Watchers: 4
Forks: 8
Open Issues: 2
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        # `pandas-gpt` [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/rvanasa/pandas-gpt/blob/main/notebooks/pandas_gpt_demo.ipynb)

> ### Power up your data science workflow with LLMs.

---

`pandas-gpt` is a Python library for doing almost anything with a [pandas](https://pandas.pydata.org/) DataFrame using ChatGPT or any other [Large Language Model](https://www.cloudflare.com/learning/ai/what-is-large-language-model/) (LLM).

## Installation

```bash

pip install pandas-gpt[openai]

```

You may also want to install the optional [`openai`](https://pypi.org/project/openai/) and/or [`litellm`](https://pypi.org/project/litellm/) dependencies.

Next, set the `OPENAI_API_KEY` environment variable to your [OpenAI API key](https://platform.openai.com/account/api-keys), or use the following code snippet:

```python

import openai

openai.api_key = ''

```

If you're looking for a free alternative to the OpenAI API, we encourage using [Google Gemini](https://ai.google.dev/gemini-api/docs/api-key) for code completion:

```bash

pip install pandas-gpt[litellm]

```

```python

import pandas_gpt

pandas_gpt.completer = pandas_gpt.LiteLLM('gemini/gemini-1.5-pro', api_key='...')

```

## Examples

Setup and usage examples are available in this **[Google Colab notebook](https://colab.research.google.com/github/rvanasa/pandas-gpt/blob/main/notebooks/pandas_gpt_demo.ipynb)**.

```python

import pandas as pd

import pandas_gpt

df = pd.DataFrame('https://gist.githubusercontent.com/bluecoconut/9ce2135aafb5c6ab2dc1d60ac595646e/raw/c93c3500a1f7fae469cba716f09358cfddea6343/sales_demo_with_pii_and_all_states.csv')

# Data transformation

df = df.ask('drop purchases from Laurenchester, NY')

df = df.ask('add a new Category column with values "cheap", "regular", or "expensive"')

# Queries

weekday = df.ask('which day of the week had the largest number of orders?')

top_10 = df.ask('what are the top 10 most popular products, as a table')

# Plotting

df.ask('plot monthly and hourly sales')

top_10.ask('horizontal bar plot with pastel colors')

# Allow changes to original dataset

df.ask('do something interesting', mutable=True)

# Show source code before running

df.ask('convert prices from USD to GBP', verbose=True)

```

## Custom Language Models

It's possible to use a different language model with the `completer` config option:

```python

import pandas_gpt

# Global default

pandas_gpt.completer = pandas_gpt.OpenAI('gpt-3.5-turbo')

# Custom completer for a specific request

df.ask('Do something interesting with the data', completer=pandas_gpt.LiteLLM('gemini/gemini-1.5-pro'))

```

By default, API keys are picked up from environment variables such as `OPENAI_API_KEY`.

It's also possible to specify an API key for a particular call:

```python

df.ask('Do something important with the data', completer=pandas_gpt.OpenAI('gpt-4o', api_key='...'))

```

### OpenAI

```python

pandas_gpt.completer = pandas_gpt.OpenAI('gpt-4o')

```

### LiteLLM

```python

pandas_gpt.completer = pandas_gpt.LiteLLM('gemini/gemini-1.5-pro')

```

### Local (Huggingface)

```python

pandas_gpt.completer = pandas_gpt.LiteLLM('huggingface/meta-llama/Meta-Llama-3.1-8B-Instruct')

```

### OpenRouter

```python

pandas_gpt.completer = pandas_gpt.OpenRouter('anthropic/claude-3.5-sonnet')

```

### Anything

```python

def my_custom_completer(prompt: str) -> str:

  # Use an LLM or any other method to create a `process()` function that

  # takes a pandas DataFrame as a single argument, does some operations on it,

  # and return a DataFrame.

  return 'def process(df): ...'

pandas_gpt.completer = my_custom_completer

```

If you want to use a fully customized API host such as [Azure OpenAI Service](https://azure.microsoft.com/en-us/products/cognitive-services/openai-service),

you can globally configure the `openai` and `pandas-gpt` packages:

```python

import openai

openai.api_type = 'azure'

openai.api_base = ''

openai.api_version = ''

openai.api_key = ''

import pandas_gpt

pandas_gpt.completer = pandas_gpt.OpenAI(

  model='gpt-3.5-turbo',

  engine='',

  deployment_id='',

)

```

## Alternatives

- [GitHub Copilot](https://github.com/features/copilot): General-purpose code completion (paid subscription)

- [Sketch](https://github.com/approximatelabs/sketch): AI-powered data summarization and code suggestions (works without an API key)

## Disclaimer

Please note that the [limitations](https://github.com/openai/gpt-3/blob/master/model-card.md#limitations) of ChatGPT also apply to this library. I would recommend using `pandas-gpt` in a sandboxed environment such as [Google Colab](https://colab.research.google.com), [Kaggle](https://www.kaggle.com/docs/notebooks), or [GitPod](https://www.gitpod.io/).

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/rvanasa/pandas-gpt

Awesome Lists containing this project

README