https://github.com/ccurme/yolopandas
https://github.com/ccurme/yolopandas
Last synced: 9 days ago
JSON representation
- Host: GitHub
- URL: https://github.com/ccurme/yolopandas
- Owner: ccurme
- License: mit
- Created: 2023-01-12T14:47:23.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2023-05-11T13:08:53.000Z (almost 2 years ago)
- Last Synced: 2025-03-30T09:08:36.464Z (16 days ago)
- Language: Python
- Size: 404 KB
- Stars: 197
- Watchers: 6
- Forks: 15
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-ChatGPT-repositories - yolopandas - チャットでpandas操作これ便利。"fy23 q2のどこどこ会社の何何製品に関する売り上げデータをグラフ化してね。"ってセールスオペレーションチームにお願いしているので、これ使って簡単にchatgptにデータ取得お願いできるようにしよう。 (Chatbots)
README
# YOLOPandas
Interact with Pandas objects via LLMs and [LangChain](https://github.com/hwchase17/langchain).
YOLOPandas lets you specify commands with natural language and execute them directly on Pandas objects.
You can preview the code before executing, or set `yolo=True` to execute the code straight from the LLM.**Warning**: YOLOPandas will execute arbitrary Python code on the machine it runs on. This is a dangerous thing to do.
https://user-images.githubusercontent.com/26529506/214591990-c295a283-b9e6-4775-81e4-28917183ebb1.mp4
## Quick Install
`pip install yolopandas`
## Basic usage
YOLOPandas adds a `llm` accessor to Pandas dataframes.
```python
from yolopandas import pddf = pd.DataFrame(
[
{"name": "The Da Vinci Code", "type": "book", "price": 15, "quantity": 300, "rating": 4},
{"name": "Jurassic Park", "type": "book", "price": 12, "quantity": 400, "rating": 4.5},
{"name": "Jurassic Park", "type": "film", "price": 8, "quantity": 6, "rating": 5},
{"name": "Matilda", "type": "book", "price": 5, "quantity": 80, "rating": 4},
{"name": "Clockwork Orange", "type": None, "price": None, "quantity": 20, "rating": 4},
{"name": "Walden", "type": None, "price": None, "quantity": 100, "rating": 4.5},
],
)df.llm.query("What item is the least expensive?")
```
The above will generate Pandas code to answer the question, and prompt the user to accept or reject the proposed code.
Accepting it in this case will return a Pandas dataframe containing the result.Alternatively, you can execute the LLM output without first previewing it:
```python
df.llm.query("What item is the least expensive?", yolo=True)
````.query` can return the result of the computation, which we do not constrain. For instance, while `"Show me products under $10"` will return a dataframe, the query `"Split the dataframe into two, 1/3 in one, 2/3 in the other. Return (df1, df2)"` can return a tuple of two dataframes. You can also chain queries together, for instance:
```python
df.llm.query("Group by type and take the mean of all numeric columns.", yolo=True).llm.query("Make a bar plot of the result and use a log scale.", yolo=True)
```Also, if you want to get a better idea of how much each query costs, you can use the function `run_query_with_cost` found in the utils module to compute the cost in $USD broken down by prompt/completion tokens:
```python
from yolopandas.utils.query_helpers import run_query_with_cost
run_query_with_cost(df, "What item is the least expensive?", yolo=True)
```
After running the above code, the output looks like the following:```
Total Tokens: 267
Prompt Tokens: 252
Completion Tokens: 15
Total Cost (USD): $0.00534
```See the [example notebook](docs/example_notebooks/example.ipynb) for more ideas.
## LangChain Components
This package uses several LangChain components, making it easy to work with if you are familiar with LangChain. In particular, it utilizes the LLM, Chain, and Memory abstractions.
### LLM Abstraction
By working with LangChain's LLM abstraction, it is very easy to plug-and-play different LLM providers into YOLOPandas. You can do this in a few different ways:
1. You can change the default LLM by specifying a config path using the `LLPANDAS_LLM_CONFIGURATION` environment variable. The file at this path should be in [one of the accepted formats](https://langchain.readthedocs.io/en/latest/modules/llms/examples/llm_serialization.html).
2. If you have a LangChain LLM wrapper in memory, you can set it as the default LLM to use by doing:
```python
import yolopandas
yolopandas.set_llm(llm)
```3. You can set the LLM wrapper to use for a specific dataframe by doing: `df.reset_chain(llm=llm)`
### Chain Abstraction
By working with LangChain's Chain abstraction, it is very easy to plug-and-play different chains into YOLOPandas. This can be useful if you want to customize the prompt, customize the chain, or anything like that.
To use a custom chain for a particular dataframe, you can do:
```python
df.set_chain(chain)
```If you ever want to reset the chain to the base chain, you can do:
```python
df.reset_chain()
```### Memory Abstraction
The default chain used by YOLOPandas utilizes the LangChain concept of [memory](https://langchain.readthedocs.io/en/latest/modules/memory.html). This allows for "remembering" of previous commands, making it possible to ask follow up questions or ask for execution of commands that stem from previous interactions.
For example, the query `"Make a seaborn plot of price grouped by type"` can be followed with `"Can you use a dark theme, and pastel colors?"` upon viewing the initial result.
By default, memory is turned on. In order to have it turned off by default, you can set the environment variable `LLPANDAS_USE_MEMORY=False`.
If you are resetting the chain, you can also specify whether to use memory there:
```python
df.reset_chain(use_memory=False)
```